The build had been green for weeks, but the schema was wrong. A single missing column slowed every deploy, every feature, every test. You know the fix. Add a new column. Push. Ship. Done—except it’s never that clean.
A new column changes more than the table. It changes code paths, query plans, indexes, and integrations you forgot existed. The schema migration has to be safe, fast, and reversible. You need zero downtime. You need to guarantee that production reads and writes stay live while the database shifts under them.
Start with the migration script. Define the new column with a default that keeps nulls and app logic predictable. Avoid locking the table with expensive alters in peak traffic. Use ADD COLUMN with care—test the DDL in a staging environment that mirrors production load. Monitor execution time and index impact. If you need an index, create it in a separate step to isolate risks.
Update the application in phases. First, write to both old and new columns. Then backfill data for existing rows in small batches, ensuring you stay under lock timeouts. Verify that replication lag doesn’t spike. Once the backfill completes, switch reads to the new column. Keep the old column until you’re certain no consumer depends on it. Only then drop it.