The migration was almost done when the schema broke. We needed a new column, but the deployment window was seconds away from closing.
Adding a new column sounds simple. In practice, it can crash production if done without care. Schema changes lock tables, block writes, and trigger cascading updates in dependent services. The wrong approach leads to downtime and data loss.
A new column belongs in a controlled, tested migration path. First, define the column with defaults set to NULL to avoid rewrites. Run an additive migration rather than modifying existing constraints. Keep it backward-compatible so older code still works until the full rollout completes.
Use feature flags to guard reads and writes to the new column. Deploy the schema change first, let it propagate, then ship the feature code. For large datasets, run background jobs to backfill in small batches. Monitor errors, deadlocks, and replication lag before considering the change complete.