The migration stalled. Everyone stared at the schema. The problem was simple: we needed a new column. The solution was not.
Adding a new column in production is never just one change. It touches code, migrations, indexes, data backfills, deployments, and monitoring. It changes what queries return and what errors logs show. If you skip a step, you’ll break something.
The safe path starts with a migration script that adds the column with the correct type and default. In high-traffic systems, default values should be set in code, not in the migration, to avoid long locks. When dealing with huge tables, add the new column without defaults and update in batches. This keeps writes fast and avoids downtime.
Test the migration in a staging environment with production-like data. Check query plans before and after. If the new column will be indexed, create the index after the backfill is complete. Avoid automatic indexing during the initial migration; locks can cascade and block critical writes.