The migration script failed at 2:14 a.m. because someone forgot a single new column. Everything after that was a cascade of broken queries and angry alerts. A missing column is simple to add, but in production it is never just simple.
A new column changes the shape of your data. It affects indexes, constraints, triggers, default values, and application logic. Adding it means more than running ALTER TABLE. You must account for null handling, type safety, and backfilling existing rows. In high-traffic systems, even a short lock can cause latency spikes.
The safest way to introduce a new column is to run it in phases. First, create the column with a NULL default. Ensure the migration is metadata-only when possible, especially on large tables. Second, backfill the column in controlled batches, avoiding full table scans during peak hours. Third, add constraints or make it NOT NULL once data is consistent. Each step must be idempotent, so you can rerun scripts without side effects.
Application code must tolerate the old and new schema during the transition. Deploy feature flags or conditional logic until every service reading or writing the table is aware of the new column. Integrate migration checks into CI pipelines so a schema mismatch never reaches production again.