The migration broke at 2:17 a.m. The root cause: a missing new column in production.
Adding a new column should be simple. In practice, it can break an entire release if not done with care. Schema changes must handle live traffic, active queries, and existing data without downtime. The process starts with a clear migration plan that works forward and backward. Always add a new column in a way that does not lock the table for writes.
In most relational databases, adding a nullable column with a default is safe. Avoid non-null constraints until after the column is populated. Populate it in batches to prevent long transactions. For large tables, use background jobs or write-through updates from the application layer. Validate correctness incrementally.
Deploy schema changes in separate steps from the application code that depends on them. First, create the new column. Second, backfill the data. Third, deploy the updated code to read from it. Fourth, add constraints if needed. Reversibility matters—be ready to drop the column or ignore it if the rollout fails.