The database migration was already live when the alert hit. A missing field had broken half the queries. The fix was simple: add a new column. The hard part was doing it fast, without blocking writes or corrupting data.
Adding a new column in production is never just an ALTER TABLE. On small datasets, it’s instant. On large, high-traffic tables, it can lock rows, cause replication lag, or trigger downtime. Choosing the right strategy is the difference between a clean deploy and an outage that lasts hours.
For relational databases like PostgreSQL, MySQL, and MariaDB, adding a new column safely often means using operations that are concurrent or performed in small batches. In PostgreSQL, ALTER TABLE ADD COLUMN is metadata-only for most default cases, but problems start if you set a non-null default on creation—it rewrites the table. Instead, create the column as nullable, then backfill in controlled steps before adding constraints.
In MySQL, especially older versions, adding columns can cause a table rebuild. Online schema change tools like gh-ost or pt-online-schema-change can handle this by creating a shadow table, migrating rows in batches, and switching over atomically. These tools minimize locks but require careful monitoring.