The migration broke at midnight. Logs lit up with red. The error was simple: the new column didn’t exist where the code expected it.
Adding a new column sounds trivial. It’s not. Schema changes can block deploys, trigger downtime, and cause silent data corruption. The way you create, backfill, and roll out that new column determines whether your system stays online or burns out under load.
First, define the new column with precision. Use explicit types, constraints, and defaults. Never rely on implicit NULL values unless you’ve confirmed the application can handle them. Keep the schema migration idempotent. Make sure re-running it won’t break production.
Second, decouple schema creation from data population. Adding a column and filling it in one step can lock large tables, stall queries, and block writes. Instead, deploy the new column empty. Then backfill in controlled batches to avoid spikes in I/O and replication lag.