The migration script had failed at 2:14 a.m. The logs told the story: missing column, broken queries, tests in red. All because a new column needed to exist—and didn’t.
Adding a new column sounds simple. It rarely is. Schema changes can break APIs, crash services, and corrupt data if handled carelessly. In fast-moving systems, adding a column to a large table is risky. It can lock the DB for seconds or minutes. It can trigger deadlocks. It can cause waterfalls of timeout errors.
The first step is clear definition. Decide the column name, type, nullability, and default value. Be explicit. Avoid silent assumptions. For existing rows, plan how to populate the field. Use default values or backfill carefully. Backfill in small batches, not a single massive write.
The second step is safe deployment. For Postgres and MySQL, adding a column with a default can rewrite the table. Avoid it on large datasets. Instead, add the column as NULL, then update in steps. Test the migration on a copy of production data. Profile the runtime. Monitor locks.