A database query stalled. The cause: a missing column that should have been there from the start. You push a fix, but now the schema needs to evolve without downtime. The next deploy demands a new column.
Adding a new column is simple in theory. In production, it’s about precision and impact. Schema changes alter storage, indexes, defaults, and constraints. A careless migration can lock tables, block writes, or cause replication lag. Even a single new column can turn into hours of degraded performance if applied without care.
The safest way to add a new column is to plan for how the database will handle it under load. For PostgreSQL, adding a column without a default value is fast. Adding one with a default rewrites every row, which is expensive. MySQL behaves differently, and older versions can lock the whole table. In distributed systems, you must add the column in a way that allows both old and new code to run side by side.
Best practice is to split the deployment into phases. First, add the new column with no constraints and no default. Then deploy code that writes to both the old and new structures. Backfill data in small batches to avoid overwhelming the database. Only after that should you add constraints, indexes, or defaults. This keeps rolling deployments safe, especially during peak hours.