The schema broke at midnight. A missing field. A failed build. A team scattered across time zones staring at logs that all agreed on one thing: the database needed a new column.
Adding a new column should be simple. In production systems handling millions of requests, it is not. Schema migrations can block writes, slow queries, and cascade failures through dependent services. The wrong approach can lock tables, spike CPU, or take down critical endpoints.
The first step is defining the new column in the schema with precision. Choose the correct data type. Avoid nullable unless it is absolutely required. Consider default values to avoid backfilling delays. Always version your schema changes alongside application code so they deploy in sync.
Use an additive migration strategy. First, deploy code that can handle both the old and new schema. Then run an online migration tool—such as pt-online-schema-change or gh-ost—to add the column without locking the table. Test the migration on staging data at production scale. Monitor I/O, replication lag, and query performance during the rollout.