The query finished in under a second, but the results were wrong. The root cause was simple: the table needed a new column.
Adding a new column sounds like a small change. In production, it can trigger downtime, data loss, or broken APIs if done carelessly. The right approach depends on the database, the size of the table, and your uptime requirements.
In relational databases like PostgreSQL or MySQL, the safest pattern is additive changes followed by backfill. Create the new column with a nullable default. Avoid blocking schema changes on large tables during high-traffic hours. Use background jobs to migrate existing data in batches. When complete, update application code to read from the new column. Finally, enforce constraints and drop deprecated fields.
In distributed systems, adding a new column means considering replication lag, index builds, and query planners. Schema changes can lock nodes or create inconsistent states if not coordinated. Tools such as pt-online-schema-change or native online DDL operations reduce risk, but always test on replicas before production.