The gRPC call hung, then died with an unreadable error. Logs pointed nowhere. The culprit was hiding deep inside Kubernetes Network Policies.
Kubernetes Network Policies control which pods can talk to each other. They enforce rules at the network layer, often using tools like Calico or Cilium. When these rules block traffic, gRPC can fail in strange ways—timeouts, EOF errors, connection resets, or stream closures mid-call. The issue often appears only in certain environments, like staging or production, where restrictive policies are applied.
gRPC relies on persistent HTTP/2 connections. If a network policy denies ingress from the client to the server, or egress from the server back to the client, the handshake never completes. Even a partial restriction—allowing some ports but not others—can break gRPC streams without triggering clear firewall logs. Misconfigured CIDRs, namespace selectors, or peer labels are common sources of failure.