The migration went live at midnight. By 12:01, every query depending on the old schema began to fail. The fix required adding a new column—fast, clean, and without downtime.
Adding a new column in production is not hard. Doing it right is. Schema changes can break contracts between services, trigger data mismatches, or create race conditions. The goal is to add the column, backfill data if needed, and deploy code updates with zero disruption.
First, model the column. Set its name, type, defaults, and constraints. Consider whether it should allow NULL values. For non-null columns on existing tables, deploy in two steps: add the column as nullable, backfill, then set it to NOT NULL. This avoids locking writes for extended periods.
Second, choose the right migration strategy. For small datasets, an immediate ALTER TABLE works. For large datasets, use an online schema change tool like pt-online-schema-change or gh-ost. These tools copy data into a shadow table with the new column, then swap tables with minimal blocking.
Third, backfill safely. Run batched updates to prevent load spikes. Each batch should be small enough not to impact query latency. Verify results between batches.