Adding a new column sounds simple, but in production, it’s a knife’s edge. You balance schema changes, query performance, and live traffic. The wrong approach locks rows, kills latency, and burns your SLA.
The first step is knowing the database engine. In MySQL, ALTER TABLE can block writes. In PostgreSQL, some column additions are instant, others trigger rewrites. In distributed systems like CockroachDB, schema changes are asynchronous but still affect load. The execution plan depends on the type, default value, and constraints you choose.
If the column is nullable and without a default, many engines can append metadata without rewriting the table. This is fast but comes with a cost — your code must handle NULL safely. If you need a default value, consider a two-step migration. Add the empty column first, deploy the application changes, and backfill in small batches while monitoring CPU, I/O, and replication lag. Only set constraints or defaults after the data is ready.
In high-traffic environments, rolling schema changes are best. Apply them live but in small, controlled iterations. Keep migrations reversible. Test on production snapshots to measure exact timing, disk usage, and index impact.