You know the type of bug. Silent until it isn’t. Everything downstream breaks. Metrics spike. No deploy caused it, no config changed, and your pager is going off. The root cause hides in the dark spaces between services — in the transport layer you rarely need to think about until it breaks.
What “outbound-only connectivity” means
In gRPC, this error tells you the client can reach out to the server, but the server’s connection back is dead. This often happens when network rules, firewalls, NATs, or proxies allow outbound traffic but block the reverse path. TLS handshakes fail silently. Keepalives never get a reply. The channel flips to failure mode.
Common causes that trigger gRPC outbound-only connectivity
- Load balancers closing connections that gRPC expects to keep alive
- Asymmetric network routes between pods or data centers
- Cloud security group rules that allow egress but not ingress
- Sidecar proxies or service meshes dropping return packets
- Misconfigured health checks causing early connection resets
How to trace it fast
- Run a
grpcurlrequest from both directions of the connection - Inspect firewall and security group egress/ingress rules — they must allow both directions
- Check the load balancer idle timeout setting against
grpc.keepalive_time - Enable TCP packet capture to confirm if SYN, SYN-ACK, ACK are completing both ways
- Review proxy or service mesh connection policies for bidirectional streaming
When this happens in production
Outbound-only connectivity errors can masquerade as server crashes, bad data, or slow endpoints. But chasing application logs without looking at the network wastes hours. The key is to confirm bidirectional reachability early. If one side can only send, not receive, gRPC calls will keep failing no matter how clean your code is.