The migration hit production an hour before sunrise. Logs lit up with errors. The missing piece: a new column that never made it into the schema.
Adding a new column should be simple. In reality, it can be the step that slows a release or breaks live traffic. Schema changes are code changes. They need careful rollout, backfills, and zero-downtime strategies. Whether you use Postgres, MySQL, or distributed SQL, the principles are the same—plan ahead, deploy safely, and keep the application in sync.
Start by defining the column in your migration scripts. Use clear defaults to avoid null-related issues in downstream services. For large datasets, add the column without backfilling in the same transaction. Backfill in batches to prevent locks from blocking queries. If you must alter indexes, separate those changes into their own migrations.
Coordinate application code to handle both old and new schemas during the rollout. Feature flags or conditional logic can bridge the gap while both states exist. Monitor query performance after the column is live. Even a small column change can alter query plans or affect replication lag.