The root cause was simple: a new column had been added, but the system wasn’t prepared for it.
Adding a new column to a live database is never just about schema changes. It’s a direct strike to performance, index planning, and deployment workflows. When done poorly, it locks tables, bloats storage, and creates long-running queries that cascade into failures. When done well, it’s fast, safe, and doesn’t wake anyone at 3 a.m.
A new column seems small. It’s not. Before running ALTER TABLE in production, you need a migration strategy that matches your database engine and workload. On PostgreSQL, adding a nullable column without a default is near instant, but adding a non-null column with a default rewrites the entire table. On MySQL or MariaDB, even trivial changes can trigger a full table copy depending on engine settings. In cloud-managed systems, downtime may also mean lost revenue and SLA hits.
Plan the new column with your index strategy in mind. Adding indexes after the column is created often reduces lock time. For heavy tables, use online schema change tools like gh-ost or pt-online-schema-change to avoid blocking writes. If you need to backfill historical data, batch the operation with small commits to control I/O and replication lag. Always test migrations against realistic datasets before running them in production.