The deployment froze. Everyone stared at the log. The query was fine—until it wasn’t. The problem was a new column.
Adding a new column to a production database sounds simple. It isn’t. Schema changes at scale can break systems, block writes, lock tables, and trigger cascading failures. In most environments, you need zero downtime operations. That means planning, testing, and executing the new column migration without halting service.
The first consideration is nullability. Adding a non-nullable column with no default to a large table forces a full table rewrite. That can lock the table for minutes or hours. Always start with a nullable column, backfill it in controlled batches, then enforce constraints later.
Next is default values. Implicit defaults on large datasets are dangerous. Use an explicit migration that writes default values in chunks to avoid transaction bloat and excessive I/O. Monitor during the backfill for replication lag if your setup includes read replicas.
Indexing must follow data population, not precede it. Creating an index on an empty column wastes resources. Populate first, then add the index online if your database supports it. For PostgreSQL, CREATE INDEX CONCURRENTLY avoids locking writes. For MySQL, use ALGORITHM=INPLACE where possible.