The migration script failed at 2 a.m., and the logs showed only one hint: add a new column.
A new column in a database schema is simple in theory—just an ALTER TABLE statement. But in production, with millions of rows and live traffic, it becomes a point of failure. Schema changes can lock tables, spike CPU load, block writes, and cascade into outages. The difference between a seamless update and a disaster is in how you design, stage, and deploy.
Always start by defining the new column and its constraints. Know if it will store nullable values, have a default, or require an index. Defaults on large tables can backfill data, so avoid them when uptime matters—apply them in a second step after creation. If you're adding NOT NULL constraints, first create the column as nullable, populate it through background jobs, then enforce the constraint when the table is ready.
Use transactional DDL only when the underlying database supports it for your case size. In PostgreSQL, adding a nullable column without a default is instantaneous. Adding with a default rewrites the table and blocks access. In MySQL, even "instant"algorithms can silently fall back to table-copy operations if other constraints are in place. Always test against a copy of data with a realistic row count.