Adding a new column to a production database looks simple. It is not. The operation can lock tables, spike query latency, or break downstream services if defaults and nullability are not handled. In distributed systems, the risk compounds. Rolling out schema changes in live systems demands precision, version control, and rollback strategies.
To add a new column safely, start with a migration that adds the column as nullable. Avoid setting a default that rewrites the entire table, especially for large datasets. After the deploy, backfill the column in controlled batches to prevent locking. Once backfilled, run a second migration to make it non-null or add constraints. This two-step approach ensures zero downtime and predictable performance.
In environments with multiple services, update code to read from and write to both old and new structures during the transition. Use feature flags to toggle gradual adoption. Test migrations against production snapshots to catch type mismatches and index-related performance hits before release.