The migration broke at 2:13 a.m., and the logs pointed to a single cause: a missing new column.
Adding a new column should be simple. In SQL, it starts with ALTER TABLE. But the moment a production table has millions of rows, the risk rises. Every change must be planned. Downtime, locks, replication lag, and schema drift can turn one small change into hours of cleanup.
Before adding a new column, confirm the impact on indexes and constraints. Decide if the column should allow NULLs, have a default value, or require a backfill. For large datasets, backfill in small batches to avoid locking the table for long periods. Use database-specific options like ONLINE in MySQL or CONCURRENTLY in Postgres when possible.
Test the migration in a staging environment with production-size data. Measure execution time. Monitor replication delay if you’re using read replicas. Plan for rollback—dropping a column with data is faster than rolling back a failed add.