The migration script had failed before dawn. Logs filled with warnings about a missing new column. Someone had pushed to production without updating the schema.
Adding a new column is simple. Doing it right—without downtime, without corrupting data, without blocking queries—is harder. In modern systems, schema changes can choke throughput if they lock large tables or trigger massive rewrites. The cost grows with table size, replication lag, and number of nodes.
First, define the new column in a way that avoids full-table locks. In PostgreSQL, adding a nullable column without a default is fast. In MySQL, online DDL can keep reads and writes live, but the details vary between engines and storage formats. Test in staging with production-scale data to measure impact.
Backfill in batches. Never run a single massive UPDATE on a billion-row table. Use controlled transactions, commit often, and monitor load. For high-traffic systems, schedule backfills during low-usage hours or use background workers with throttling.