The migration ran clean, but the schema was wrong. You needed a new column, and every second without it burned through deployments and trust.
Adding a new column is simple until scale turns it complex. At the surface, it’s one SQL statement. Underneath, it can lock tables, spike I/O, stall writes, and cascade failures downstream. In high-traffic systems, an unmanaged schema change can take hours—or take you offline.
Start with the basics: define the column type with precision. Use the smallest type that fits your data to reduce storage overhead. Set defaults to avoid null-related bugs. When possible, make changes in a transaction to keep the schema consistent.
For large tables, use an online migration strategy. Tools like pt-online-schema-change or gh-ost create the new column in a shadow table and migrate data without blocking. Test on a realistic dataset before touching production. Check replication lag, slow query logs, and monitor load closely during the operation.