All posts

PHI for Vector Databases: A Compliance Guide

Many assume that encrypting a vector database is enough to satisfy PHI regulations, but encryption alone does not prove who accessed what, when, or whether the data was altered. True compliance requires continuous evidence that every query, insertion, or deletion involving PHI is recorded, reviewed, and, when necessary, masked. In most organizations, engineers connect to vector stores using a shared service account or a long‑lived API key. The credential lives in a configuration file, a CI secr

Free White Paper

Vector Database Access Control: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

Many assume that encrypting a vector database is enough to satisfy PHI regulations, but encryption alone does not prove who accessed what, when, or whether the data was altered. True compliance requires continuous evidence that every query, insertion, or deletion involving PHI is recorded, reviewed, and, when necessary, masked.

In most organizations, engineers connect to vector stores using a shared service account or a long‑lived API key. The credential lives in a configuration file, a CI secret store, or even a developer’s laptop. Because engineers connect directly to the database, the database itself sees only a generic service account, and its native audit logs contain just that account name. The result is a compliance gap: regulators ask for evidence of who accessed PHI, yet the system cannot answer.

Law defines PHI as any individually identifiable health information. Storing such data in a vector database for similarity search changes the risk profile. A query that appears innocuous, searching for similar patient records, may inadvertently return identifiers or clinical notes. Regulators expect organizations to demonstrate that they authorize every retrieval, redact sensitive fields, and maintain an immutable trail for each operation.

To bridge the gap, the environment must enforce three core requirements:

  • Identity‑aware access that maps each request to a real user or service principal.
  • Real‑time enforcement at the protocol layer, including inline masking and command‑level approval.
  • Continuous evidence collection that survives the lifetime of the database and can be supplied to auditors on demand.

Setting up the identity foundation

Replace static credentials with federated identities as the first step. By integrating an OIDC or SAML provider, each engineer receives a short‑lived token that conveys group membership and risk level. The gateway verifies the token before allowing any connection to proceed. This setup determines *who* may start a session, but it does not itself enforce *what* the session can do.

Because the token validation occurs outside the vector engine, the engine still sees only the gateway’s service identity. Without an additional enforcement point, the system cannot block a risky similarity search or redact a protected attribute.

Placing the enforcement point in the data path

hoop.dev sits in the data path as a Layer 7 gateway. It intercepts every wire‑protocol message between the client and the vector store, applies policy, and then forwards the request. Because the gateway is the only place the traffic passes, hoop.dev can enforce all required controls.

Continue reading? Get the full guide.

Vector Database Access Control: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

hoop.dev records each session, captures the full request and response payload, and writes the logs to a durable store. It masks fields that match PHI patterns before they reach the client, ensuring that even authorized users only see sanitized results unless an explicit approval is granted. For high‑risk operations, such as bulk export or schema changes, hoop.dev triggers a just‑in‑time approval workflow, pausing the request until a designated reviewer signs off.

hoop.dev delivers all of these outcomes because it sits in the data path. Removing the gateway would cause the vector database to revert to the insecure baseline described earlier.

How continuous evidence supports PHI audits

Regulators require proof that organizations authorize and monitor access to PHI. hoop.dev generates evidence continuously, without waiting for a periodic export. Each log entry includes:

  • The identity of the requester, which the OIDC token provides.
  • A timestamp and the exact query issued.
  • The masked response that hoop.dev delivers.
  • Any approval workflow metadata, such as reviewer name and decision time.

This granular audit trail satisfies the “access log” and “audit trail” sections of most health‑information standards. The gateway produces the logs, preventing alteration by a compromised vector engine, and the logs survive even if the underlying database is rebuilt.

Getting started with hoop.dev

To adopt this approach, begin with the getting‑started guide. It walks you through deploying the gateway, configuring OIDC authentication, and registering a vector database connection. The learn section provides deeper coverage of masking policies, approval workflows, and session replay features.

You express all configuration in declarative YAML files, and the gateway runs as a container that can be placed on the same network segment as the vector store. Because hoop.dev is open source, you can inspect the code, contribute improvements, or fork the repository to meet internal policy requirements.

FAQ

Does hoop.dev store PHI itself?

No. The gateway only buffers traffic long enough to apply policies and write audit records. The records contain masked data and metadata, never the raw PHI.

Can I use hoop.dev with any vector database?

hoop.dev supports any database that speaks a standard wire protocol. For proprietary protocols you can still place the gateway in front of a proxy that translates the traffic.

What happens if an approval is denied?

hoop.dev aborts the request and returns a clear error to the client. The denial is logged with the reviewer’s identity and the reason provided.

Explore the source code, submit issues, or contribute enhancements on GitHub.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts