The deployment halted at 2:13 a.m. because the schema was wrong. A single new column was missing, and the service could not read the data it needed.
Adding a new column in production is routine, but it is also one of the most dangerous schema changes you can make. It can lock tables, block queries, or cause cascade failures in applications that assume a fixed structure. The right approach is precise: plan the migration, apply it in stages, and verify integrity before routing traffic through it.
First, define the new column with explicit data types. Avoid defaults that seem harmless but introduce hidden costs in storage or indexes. If the column must be backfilled, split the update into batches to protect performance. Monitor query latencies, lock times, and replication lag during the process.
Next, update application code to read and write the new column only after confirming the schema change has propagated everywhere. This prevents null reads or write errors on lagging replicas. Deploy these code changes in a separate step so you can roll back without touching the database again.