The migration failed at 02:14. A single schema change brought the system to a halt. The cause: a new column added without thought to scale, indexing, or locking.
Adding a new column seems simple. In practice, it can slow queries, lock tables, or trigger full table rewrites. When the dataset is small, these side effects hide. At billions of rows, they surface hard and fast.
The right way to add a new column starts with understanding the underlying database engine. In MySQL, ALTER TABLE will block writes unless you use an online DDL operation. In PostgreSQL, adding a column with a default value before Postgres 11 rewrites the table; after Postgres 11, it is instant if you specify a NULL default. In distributed systems like CockroachDB, schema changes propagate across the cluster and may require careful coordination.
Nullability matters. A NOT NULL column without a default forces the database to update every record, which can lock the table for long periods. Using a nullable column and then backfilling in batches is often safer.