Adding a new column is simple in theory. In practice, it can lock tables, block writes, and bring down production if done carelessly. The right approach avoids downtime, preserves data integrity, and scales with traffic.
First, define the column type and constraints. Align it with data models and downstream APIs. Mismatches here create hidden bugs. Use explicit defaults where possible. NULL behavior should be intentional.
Next, choose the migration path. Small datasets can handle direct schema changes. Large datasets demand online migrations. Tools like pt-online-schema-change or native ALTER TABLE with ONLINE modifiers reduce lock time. Staging changes behind feature flags prevents exposing incomplete features.
For high availability systems, batch backfills to populate values without overwhelming I/O. Throttle writes. Monitor replication lag. Validate changes in staging with production-like load.