Vector Databases and Non-Human Identities: What to Know

An offboarded contractor’s CI pipeline, acting as a non-human identity, keeps querying the company’s vector‑search service, using a service‑account key that was never rotated. The same credential is also embedded in a nightly batch job that enriches product recommendations. When the key is leaked, an attacker can retrieve embeddings, reverse‑engineer proprietary data, and even poison the index – all without any alert, because the organization never tied the request to a distinct identity.

Today most teams treat a vector database like any other data store: they create a single service account, store the secret in a vault or environment file, and hand it to every automation component. The database sees only the same static principal, regardless of whether the request originates from a CI runner, a model‑training job, or a monitoring script. Because the gateway is missing, there is no per‑request audit, no inline data masking, and no way to require a human to approve a risky query.

This approach satisfies the immediate need to get data in and out, but it leaves three critical gaps. First, the system cannot prove which non-human identity performed a given operation, making forensic analysis impossible. Second, the lack of a runtime enforcement point means dangerous commands – such as bulk vector export or index deletion – run unchecked. Third, compliance frameworks that demand evidence of least‑privilege access and just‑in‑time approval have no foothold in the current data path.

Why non-human identity matters for vector databases

Non-human identities – service accounts, CI tokens, and machine‑issued credentials – are the backbone of modern AI pipelines. They enable automated model training, data ingestion, and feature extraction without human interaction. However, because they are not tied to a person, they often receive broader permissions than necessary and are rarely rotated. When a non-human identity is compromised, the attacker inherits exactly the same level of access the automation originally had, and the breach can persist unnoticed for weeks.

For vector databases, the risk is amplified. Embeddings can contain sensitive user information, and bulk retrieval can expose entire knowledge graphs. Without a mechanism that distinguishes one automation job from another, security teams cannot enforce the principle of least privilege at the granularity required for responsible AI.

Where enforcement must happen: the data path

The only place to guarantee that every query is vetted, recorded, and, when needed, masked is the data path itself – the network hop that sits between the caller and the vector store. This gateway can inspect the wire‑protocol, apply policy checks, and emit immutable audit records before the request ever reaches the database. Because the enforcement point is external to the client and the database, it cannot be bypassed by re‑configuring the service account or by running a rogue container.

When the gateway is present, three enforcement outcomes become possible:

Continue reading? Get the full guide.

Non-Human Identity Management + Vector Database Access Control: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Session recording: hoop.dev captures each query and response, storing a replayable log that ties the operation to a specific non-human identity.
Inline masking: Sensitive fields in query results – such as personally identifiable embeddings – are redacted in real time, ensuring downstream systems never see raw data.
Just‑in‑time approval: High‑risk commands, like bulk export or index deletion, are routed to an approver before execution, preventing accidental or malicious data loss.

All of these outcomes rely on hoop.dev being the active component in the data path. If the gateway were removed, the database would again see only a static service account and none of the above controls would exist.

How hoop.dev implements the required controls

hoop.dev is an open‑source Layer 7 gateway that sits between non-human identities and the vector database. It authenticates callers via OIDC or SAML, extracts group membership, and maps that information to fine‑grained policies. The gateway holds the database credential, so the caller never sees it. When a request arrives, hoop.dev evaluates the policy, decides whether to allow, mask, or pause for approval, and then forwards the traffic to the target.

Because the gateway operates at the protocol layer, it works with any client that speaks the database’s wire protocol – psql‑style clients for PostgreSQL‑backed vectors, custom SDKs for proprietary stores, or even generic HTTP‑based vector APIs. The same enforcement model applies regardless of the underlying storage engine.

For teams that already use CI/CD pipelines, the integration is simple: configure the pipeline to use the hoop.dev endpoint instead of the raw database host, and let the gateway handle credential rotation and policy enforcement. The result is a unified audit trail that ties every vector operation to a distinct non-human identity, satisfying both security and compliance needs.

Getting started with hoop.dev

To try the approach, follow the getting‑started guide. It walks you through deploying the gateway, registering a vector database connection, and defining a policy that requires approval for bulk export commands. The learn section contains deeper discussions of masking strategies and just‑in‑time workflows.

FAQ

Do I need to change my existing vector database credentials?

No. hoop.dev stores the credential internally and presents a stable endpoint to your applications. Your existing secret can remain where it is, but it is no longer exposed to the callers.

Can hoop.dev differentiate between two CI jobs that use the same service account?

Yes. By configuring each job with a distinct OIDC token or by attaching unique group claims, hoop.dev can apply separate policies and produce separate audit records for each non-human identity.

Is the audit data stored securely?

hoop.dev writes immutable session logs to a storage backend of your choice. The logs can be retained for the period required by your compliance framework.

Explore the source code and contribute on GitHub.