The migration hit production at midnight. Five terabytes of live data. One urgent requirement: add a new column without breaking anything.
A new column can be simple in theory. But in a high-traffic system, schema changes carry risk. Locking a table for seconds can cascade into lost orders, stalled writes, or dropped connections. The solution starts with understanding how your database handles schema changes.
In MySQL, ALTER TABLE can be blocking, depending on storage engine and version. Postgres may require a full table rewrite for certain column types, but adding a column with a default value that is NULL is instantaneous. In distributed SQL systems, the same operation may propagate asynchronously across nodes. Timing, locking, and replication behavior must be clear before any migration starts.
The safest pattern is to add the new column in a non-blocking way, backfill in batches, then set constraints or defaults. Split the migration into multiple deploys. Use feature flags to decouple schema from code rollout. Measure query plans before and after.