The migration broke at 2:14 a.m. The error message was clear: the table was missing a new column.
Adding a new column sounds simple. It is not. Done wrong, it can lock tables, stall writes, and block deploys. Done right, it is invisible to the users and safe for production traffic.
A new column must define its type, default, and nullability. Each choice has cost. A NOT NULL column with no default freezes large tables during backfill. Adding indexes at creation can block queries. The safest path is often to add the column with a null default, backfill in batches, and apply constraints later.
In systems with strict uptime demands, schema changes need feature-flagged rollouts. Create the new column first. Populate it incrementally. Switch reads and writes when complete. Drop fallback code only after full validation.