The bug surfaced when a single row refused to align. Hours of tracing logs led to one cause: the need for a new column.
Adding a new column to a production database is simple in theory and dangerous in practice. Schema changes can lock tables, block writes, and slow queries. In high-traffic systems, a careless migration can take an entire service down. The right approach minimizes risk while keeping deployments fast.
First, define the new column with defaults that avoid retroactive writes. Adding a nullable column is usually instant; adding a column with a non-null default forces a full table rewrite. In PostgreSQL, use ADD COLUMN with NULL allowed, then backfill in controlled batches. In MySQL, check version-specific behavior, as online DDL features vary.
Second, plan for application-level rollout. Deploy code that can read and write the new column before populating it. Old readers should ignore unknown columns to avoid parsing failures. This staged release lets you migrate live data without breaking active queries.