The deployment was live, but the table was wrong. A missing new column was breaking everything.
Adding a new column in production is simple if done right and dangerous if done lazy. It is not just an ALTER TABLE and deploy. Schema changes can stall queries, lock writes, and trigger costly downtime. The key is to add the column without breaking reads or writes, and without blocking the application.
Start by defining the column in a migration. If the column has a default, avoid setting it in the initial ALTER TABLE for large datasets. Add it as nullable first. This makes the migration fast, since it does not rewrite the entire table. Then backfill the data in small, controlled batches. Use application-level feature flags to start writing to the new column before you read from it.
For PostgreSQL, use ADD COLUMN ... NULL as the first step. For MySQL, the syntax is similar, but test the migration time on a replica before running in production. In systems with strict SLAs, use online schema change tools like pg_online_schema_change or gh-ost.
Once data is fully backfilled, enforce constraints or set non-null defaults in a separate, fast migration. Keep check constraints light; heavy expressions slow down writes. Monitor query plans after introducing the new column—indexes can shift execution paths. Build indexes concurrently to avoid locking.
Treat schema evolution as part of your deployment strategy, not an afterthought. A new column should never be a surprise; it should be a planned, observable, and reversible change.
If you want to see this kind of change deployed without the pain, watch it happen live in minutes at hoop.dev.