What gdpr expects from vector database access
A GDPR‑compliant vector database delivers provable control over personal data, with every query logged, sensitive fields masked, and access granted only when justified. The regulation’s core accountability principle requires organizations to demonstrate that they process personal data lawfully, transparently, and securely. For a vector store, that means:
- Lawful basis documentation: each embedding request must be tied to a legitimate purpose.
- Data minimisation: only the necessary vectors are returned, and any personally identifiable information (PII) that appears in results must be redacted.
- Record‑keeping: a log of who queried what, when, and with which credentials.
- Right‑to‑access and erasure support: the ability to locate and delete an individual’s vectors on demand.
Most teams treat a vector database like any other cache: they hand out a static credential, let services connect directly, and rely on the application layer to enforce policy. That approach leaves three gaps.
Why a dedicated gateway is required
Without an intervening control point, the following risks surface:
- Untracked queries: engineers or automated jobs can issue similarity searches that surface PII without any audit trail.
- Over‑exposure of embeddings: a single mis‑configured client can retrieve entire collections, violating the data‑minimisation principle.
- No inline protection: the database returns raw vectors; downstream services must implement masking, which is error‑prone and often omitted.
GDPR does not prescribe a specific technology, but it does require that the enforcement point be under the organization’s control and that it can produce evidence for auditors. A Layer 7 gateway that sits between identities and the vector store satisfies both requirements: it is the only place where traffic can be inspected, altered, or rejected, and it can emit logs that map each request to a verified identity.
How hoop.dev fulfills the evidence requirements
hoop.dev is built exactly for this role. It acts as an identity‑aware proxy that forwards standard client protocols, such as the search endpoint of a Pinecone‑compatible API, to the underlying vector engine. Because every request passes through the gateway, hoop.dev can provide the enforcement outcomes that GDPR demands:
- Session recording: hoop.dev records each query and its response payload in an audit log that can be reviewed by auditors.
- Inline masking: before the response reaches the caller, hoop.dev can redact fields that match PII patterns, ensuring that downstream services never see raw personal data.
- Just‑in‑time (JIT) approval: high‑risk queries, such as bulk vector exports, can be routed to a human approver. The approval decision is logged alongside the request.
- Command‑level blocking: prohibited operations (for example, delete‑all) are intercepted and denied, preventing accidental mass erasure that would breach the accountability record.
All of these actions are tied to the authenticated identity that initiated the request. hoop.dev verifies OIDC or SAML tokens, extracts group membership, and enforces least‑privilege policies before any traffic reaches the vector store. Because the enforcement happens in the data path, the outcomes exist only because hoop.dev is present.
Implementation overview
To generate GDPR‑ready evidence, organisations typically follow three steps:
- Configure identity: integrate an OIDC provider (Okta, Azure AD, Google Workspace, etc.). The provider issues short‑lived tokens that identify the user or service account.
- Deploy the gateway: run hoop.dev via Docker Compose or Kubernetes near the vector database. The gateway stores the database credentials, so clients never see them.
- Define policies: specify which queries require JIT approval, which fields must be masked, and which roles are allowed to perform bulk exports. Policies are expressed once and applied to every request.
Once deployed, every interaction with the vector store is automatically recorded, masked, and, where needed, approved. The resulting logs can be exported to a SIEM or retained in a storage location that satisfies your organization’s retention policies and GDPR requirements. For detailed step‑by‑step guidance, see the getting‑started guide and the broader learn section.
FAQ
Does hoop.dev replace the vector database’s own authentication?No. hoop.dev authenticates the caller via OIDC and then uses its own stored credential to talk to the database. The database’s native auth remains in place, providing defense‑in‑depth.Can hoop.dev handle high‑throughput similarity searches?Yes. Because it operates at the protocol layer, it can stream data without adding noticeable latency, while still applying masking and logging.What evidence does hoop.dev provide for a GDPR audit?It supplies session logs that tie each query to a verified identity, timestamps, the exact request parameters, and the masked response. Approvals and blocking decisions are also recorded, giving auditors a complete picture of lawful processing.
By placing an identity‑aware gateway in front of your vector store, you obtain the audit trail, data‑minimisation, and access‑control guarantees that GDPR requires, without rewriting your application code. hoop.dev makes the gateway easy to deploy and operate, turning compliance from a manual checklist into an automated, continuously verified process.
Explore the source code and contribute on GitHub.