The migration froze at 76%. The query was simple, but the table was massive, and every row had to carry the weight of a new column.
Adding a new column sounds easy in theory. In practice, it’s a sharp edge in production systems. Schema changes ripple through databases, ORMs, migrations, and codebases. A careless ALTER TABLE locks writes, burns CPU, and fills queues. The longer it runs, the more pressure builds.
Design it right from the start. Decide if the column needs defaults or nullability. Defaults force writes to every row; nulls avoid immediate rewrites. For high-traffic tables, use online schema change tools. They copy data in the background, keeping downtime near zero. MySQL has gh-ost and pt-online-schema-change. Postgres supports ALTER TABLE ADD COLUMN in constant time when no default is set, but defaults trigger table rewrites.
Consider indexing. A new column with an index can double the cost. Create the column first, backfill in controlled batches, then add the index. Keep transactions small to avoid lock contention. Monitor replication lag during backfills; secondary nodes often fail silently under the load.