Adding a new column should be simple. Define it, set its type, run the migration, and push. But real systems aren’t clean. Tables have millions of rows. Queries are hardwired into services you forgot existed. A new column can change query plans, break indexes, or crash an API that expected a fixed schema.
The safest way to add a new column is to treat it as a multi-step operation. First, create the column as nullable with no defaults to avoid locking the table. This keeps the migration fast, even under heavy load. Then backfill the data in small batches, monitoring write amplification and replication lag. Only after the data is complete do you enforce NOT NULL or add constraints.
Always check version compatibility. If multiple services touch the same table, deploy schema changes in a way that’s backward-compatible. Queries and code using the new column should be able to handle old data until the transition is complete. This means feature flags, staged rollouts, and avoiding breaking changes in shared APIs.