The migration failed at 2:17 a.m., right after the new column was deployed to production.
A new column sounds simple. It is not. Adding a column to an existing database table can lock writes, stall queries, or trigger costly downtime. In hardened systems with high traffic, even a single ALTER TABLE statement can ripple through services and queues. Done wrong, it leaves you with broken applications and panicked alerts.
The first step in adding a new column is understanding your table’s size and indexing. Large tables require a strategy to avoid blocking operations. Option one is using an online schema change tool like gh-ost or pt-online-schema-change. Both let you add a column while writes continue, by creating a shadow table and syncing changes before a quick swap. You should also benchmark how the new column impacts index size, disk I/O, and cache performance.
Default values matter. Adding a new column with a non-null default can trigger a full table rewrite, consuming CPU and creating replication lag. Instead, add the nullable column first, backfill data in controlled batches, then add constraints. This three-step migration reduces downtime risk and keeps replicas in sync.