One moment requests flowed clean. The next, the error logs caught fire. “Unavailable,” “Deadline Exceeded,” “Internal Error.” The culprit wasn’t a broken service. It was the load balancer.
gRPC load balancer errors can appear without warning. You may see timeouts, dropped connections, or flaky health checks. Sometimes the issue hides in the transport layer. Sometimes in DNS resolution. Sometimes in the policy your balancer applies for routing. If your load balancer is HTTP-aware but not HTTP/2-native, gRPC will choke. The protocol relies on long-lived streams. A misconfigured balancer that cuts them short is poison.
First, confirm your load balancer supports HTTP/2 without downgrading. Look at settings like connection drain, max streams, and idle timeouts. Keepalive pings are essential. Without them, inert connections may get killed by upstream or downstream. Match gRPC client keepalive configurations to what your balancer allows, or you’ll hit connection resets mid-stream.
Next, check naming and service discovery. DNS caching in clients can point traffic to dead endpoints if TTLs are long. Use gRPC’s built-in name resolver plugins or integrate with a registry that updates fast. If your gRPC environment is in Kubernetes, make sure the readiness probes match what the service needs to accept traffic reliably.