The migration failed at 2:13 a.m., and every query started returning errors. The cause was simple: a missing new column.
Adding a new column sounds trivial, but in production systems, it is one of the quickest ways to introduce downtime, break schema contracts, or cause performance hits. The difference between a controlled rollout and a disaster comes down to precision.
When creating a new column in SQL, define its type, nullability, and default values explicitly. Never assume the database will handle missing defaults in the way you expect. If the table handles high write volume, add the column without an immediate backfill, then populate it in controlled batches to avoid locks and timeouts.
For MySQL and PostgreSQL, schema changes can lock tables. Use tools like pt-online-schema-change or pg_online_schema_change to add a column without blocking queries. For systems with replicas, apply the new column first to secondaries, promote them, then backfill. In distributed databases like CockroachDB or Yugabyte, a new column may propagate across nodes at different times—test consistency before shipping.