High Availability Lean is a discipline for building services that stay online under stress, while avoiding the heavy overhead of traditional high availability architectures. The goal is simple: keep critical paths running with minimal downtime, without burning resources on unnecessary redundancy.
It starts with ruthless prioritization. Identify the smallest set of components that must be available at all times. Optimize them for resilience. Remove dependencies that pull your uptime down. Build health checks that are cheap to run and precise in their detection. Design restart loops and failover mechanisms that recover within seconds.
High Availability Lean means choosing stateless designs when possible. It means caching intelligently, not everywhere. It means using load balancing that adapts instantly. Services should degrade gracefully instead of breaking. When one subsystem collapses, the rest must continue without waiting.
Monitoring and alerting are essential, but they must be lean too. Each alert should point directly to root cause within minutes. Noise destroys focus and delays recovery. Logging should be structured, indexable, and partitioned so you can scan it under pressure.