The migration broke at 02:13. The logs said nothing useful. The culprit: a missing new column in production.
Adding a new column should be simple. In practice, it can take down systems. Schema changes are dangerous when not planned for scale and zero downtime. A new column alters data shape, query plans, and cache behavior. Every downstream service consuming the table must understand the new schema before it appears in production.
The safest path is explicit. Add the column in a backward‑compatible way. Deploy code that ignores it first. Populate it with defaults or backfill in controlled batches. Ensure indexes are added after data migration, not before — unless reads must be optimized immediately. Avoid heavy locks by using online schema change tools or database‑native background processes.
In relational databases like PostgreSQL or MySQL, adding a nullable column without default is usually instant, but adding a default with NOT NULL can rewrite the whole table. For large datasets, run ALTER TABLE without defaults, then update in chunks. Verify metrics for replication lag, I/O spikes, and lock waits.