The migration script failed at 2:14 a.m., and the issue was a single missing new column.
Adding a new column sounds simple. It is not. Done wrong, it can lock your tables, block writes, and take down production. At scale, the wrong statement is an outage. The right approach is careful, deliberate, and tested.
In relational databases like PostgreSQL or MySQL, ALTER TABLE ... ADD COLUMN is the basic command. But the options, defaults, and execution path matter. Adding a column with a default value in older versions can rewrite the entire table. On billions of rows, that means downtime. Modern database engines now support metadata-only changes in some cases, but behavior varies by version. Always check the database documentation before running a migration.
When you add a new column, define the type, constraints, and indexes with purpose. Avoid heavy indexes until the data is populated. If you need to backfill, run batched updates, not a single huge transaction. Test the sequence end-to-end in a staging environment with realistic data sizes. Measure locking times, replication lag, and recovery steps.