The cause was simple: a new column.
Adding a new column to a database table sounds straightforward, but the execution can decide the uptime of your entire system. Schema changes are high-risk operations. A poorly planned addition can lock a table, block writes, and cascade failures through dependent services. Understanding how to add a new column without downtime is not just best practice. It’s survival.
The first step is to analyze the table size. On large datasets, ALTER TABLE ... ADD COLUMN can trigger a full table rewrite, depending on the engine and storage format. In MySQL with InnoDB, adding a nullable column without a default is fast and non-blocking in newer versions, but adding a default value can still cause delays. Postgres handles many cases more gracefully, but older versions still rewrite the table for certain column types or constraints.
For production systems, the safest approach is an online schema change. In MySQL, tools like gh-ost or pt-online-schema-change create and backfill a shadow table before atomically swapping it in. In Postgres, you can often add a column instantly, then backfill in controlled batches. This reduces lock time and preserves service availability.