The query returned. The migration failed. The team stared at the log output. The problem was simple: the schema needed a new column, and adding it the wrong way would take the service down.
Adding a new column in production is never just a line in a migration file. It touches performance, locking, replication lag, and deployment timing. A single ALTER TABLE without preparation can lock an entire dataset. The right approach keeps the system online, avoids blocking writes, and makes rollbacks possible.
First, define the new column with defaults applied at the application layer, not the database layer. This prevents the database from rewriting every row on creation. Create the column as nullable to avoid full-table locks. Once deployed, backfill the data in small, controlled batches. Monitor replication lag and query performance during the process.
Second, enforce constraints and set final defaults only after the backfill completes. This two-step process reduces risk. It also allows safe rollbacks if you detect a bad value or an unexpected query plan. Use feature flags in your application to control when the new column becomes active.