All posts

Segregation of Duties for Vector Databases

Segregation of duties is critical because when a vector database is accessed with a single shared API key, any team member can read, write, or delete embeddings without oversight, and a single mistake can corrupt an entire knowledge base, leading to costly model retraining and lost trust. In many organizations the key lives in a configuration file or environment variable that is copied across development, staging, and production environments. Data scientists, engineers, and automated pipelines

Free White Paper

DPoP (Demonstration of Proof-of-Possession) + Vector Database Access Control: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

Segregation of duties is critical because when a vector database is accessed with a single shared API key, any team member can read, write, or delete embeddings without oversight, and a single mistake can corrupt an entire knowledge base, leading to costly model retraining and lost trust.

In many organizations the key lives in a configuration file or environment variable that is copied across development, staging, and production environments. Data scientists, engineers, and automated pipelines all use the same credential, so there is no way to prove who added a particular vector, who altered a collection, or which automated job performed a bulk import. Without an audit trail, privilege creep goes unnoticed until a downstream service returns irrelevant results, forcing expensive re‑indexing and potentially exposing sensitive customer data.

Why segregation of duties matters for vector databases

Segregation of duties (SoD) is a control that separates the people who create or modify data from those who approve or audit those changes. In the context of vector stores, SoD prevents a single actor from both inserting malicious embeddings and later querying them to manipulate model outputs. The goal is to make every high‑impact operation visible, auditable, and, when appropriate, subject to a human review before it reaches the database.

Implementing SoD requires three distinct layers:

  • Setup: Identity providers (OIDC or SAML) issue tokens that identify a user or service account. Roles and groups are defined in the IdP, but these tokens alone cannot enforce fine‑grained policies on the database connection.
  • The data path: A gateway placed between the client and the vector database is the only place where requests can be inspected, approved, or altered.
  • Enforcement outcomes: Session recording, just‑in‑time approval, inline masking, and audit logs are produced only when the gateway mediates the traffic.

hoop.dev fulfills the data‑path role. It sits on the network edge, receives an authenticated token, and then proxies the request to the target vector store. Because the gateway is the sole conduit, it can enforce SoD policies without relying on the database’s native permissions.

How hoop.dev enables segregation of duties

When a request arrives, hoop.dev first validates the OIDC token against the configured IdP. The token tells hoop.dev who is making the call, but the gateway decides what that identity may do. Policies are expressed as combinations of roles, actions, and resource patterns. For a vector database, a typical policy set might include:

  • Data scientists may search and retrieve vectors but cannot upsert or delete without an explicit approval step.
  • Ops engineers can upsert and delete but only after a peer‑reviewed change ticket is attached to the request.
  • Automated pipelines receive read‑only tokens that expire after a short window, and any write operation triggers a temporary hold until a human reviewer releases it.

Because hoop.dev sits in the data path, it can:

  • Record each session: hoop.dev logs every query, the identity that issued it, and the full response payload. hoop.dev stores each session log so it can be replayed for forensic analysis.
  • Mask sensitive fields: If a vector store returns personally identifiable information alongside embeddings, hoop.dev can strip or redact those fields before they reach the client.
  • Require just‑in‑time approval: For write operations that cross a duty boundary, hoop.dev pauses the request and routes it to an approver defined in the policy. The approver can grant or deny the action from a web console.
  • Enforce least‑privilege connections: The gateway holds the database credentials; users never see them, eliminating credential sprawl.

All of these outcomes are possible only because hoop.dev is the gateway that inspects the wire‑level protocol. A setup that only defines roles in an IdP would leave the database exposed to unchecked commands.

Continue reading? Get the full guide.

DPoP (Demonstration of Proof-of-Possession) + Vector Database Access Control: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Practical steps to adopt segregation of duties

1. Define duty groups in your IdP. Create separate groups such as Vector‑Read, Vector‑Write, and Vector‑Approve. Assign users and service accounts to the appropriate group.

2. Deploy the hoop.dev gateway. Follow the Getting started guide to run the Docker Compose deployment in the same network segment as your vector store. The gateway will register a single service identity that holds the database credentials.

3. Configure policies. Using the feature documentation, create policies that map IdP groups to allowed actions. Include approval steps for any operation that crosses a duty boundary.

4. Enable session recording and masking. Turn on the recording flag for the vector database connection. Add field‑masking rules for any column that may contain raw user data.

5. Test the workflow. Have a data scientist attempt an upsert. The request should pause, notify an approver, and only proceed after explicit consent. Verify that the audit log captures the identity, the request payload, and the approval decision.

6. Review and iterate. Periodically review the duty groups and policy definitions to ensure they evolve with your team’s responsibilities.

FAQ

Q: Does hoop.dev replace the authentication mechanism of the vector store?
A: No. Authentication is still performed by the IdP. hoop.dev only validates the token and then uses its own stored credentials to talk to the database.

Q: Can I use hoop.dev with any vector database?
A: hoop.dev supports any database that communicates over a standard wire protocol. As long as the vector store exposes a compatible protocol (for example, PostgreSQL with the pgvector extension), the gateway can proxy the traffic.

Q: How long are audit logs retained?
A: Retention is a configuration choice of your logging backend. hoop.dev guarantees that each session is persisted exactly as it occurred; you decide how long to keep the records.

Ready to try it out? Contribute on GitHub and follow the open‑source repository for the latest releases and community guidance.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts