The migration broke at 2:14 a.m., and the logs made it clear: the missing new column stopped everything cold.
Adding a new column to a database table should be simple, but scale, concurrency, and uptime requirements can turn it into a live-wire operation. The wrong approach risks locking tables, blocking writes, and corrupting data under load. The right approach ensures zero downtime, clean rollouts, and easy rollbacks.
Start by defining the new column with safe defaults. Avoid populating or altering large datasets in the same transaction as the schema change. In systems like PostgreSQL, adding a nullable column without a default is instant, but adding a default value rewrites the whole table — a cost that can stall production. For MySQL, use ALGORITHM=INPLACE or ALGORITHM=INSTANT when possible to prevent heavy locks.
Next, backfill the column in small, batched updates. Use background workers or scheduled jobs to avoid spikes in CPU and I/O. Monitor replication lag closely if you have read replicas. For distributed systems, ensure that schema changes are backward compatible with old application versions during deployment.