The migration broke at 02:13. The logs were clean until they weren’t, and the failing query revealed the problem: the schema needed a new column. Adding one sounds simple, but in production, nothing is simple. A single ALTER TABLE can lock writes, trigger table rewrites, and impact latency. Done wrong, it can cascade failures across dependent services.
A new column in a database table changes storage, indexes, and sometimes application logic. In PostgreSQL, adding a nullable column with a default before version 11 rewrites the table on disk — a potential downtime hazard. In MySQL, even with ALGORITHM=INPLACE, engine version and data type choice affect speed and blocking behavior. The cost is always tied to data size, constraints, and replication setup.
Best practice is to stage the change. First, add the column as NULL without a default to avoid rewrites. Backfill data in small, incremental batches to reduce load. Only then add NOT NULL constraints or defaults in a separate migration. If code depends on the column immediately, feature flags can control its activation. In distributed systems, you must ensure consumers can handle both pre- and post-migration data shapes to maintain compatibility during rollout.