June 22, 20264 min read

Embeddings and PHI Compliance

How can you prove that an AI embeddings pipeline respects PHI regulations when the data never leaves your network? Most organizations start by granting a service account direct access to the vector store or database that holds patient information. The account often uses a static secret that is shared across multiple jobs, CI pipelines, and even ad‑hoc scripts. Engineers push new model code, data scientists run experiments, and monitoring agents write logs, all through the same credential. No ce

Free White Paper

PHI Compliance: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Coleman Nye

How can you prove that an AI embeddings pipeline respects PHI regulations when the data never leaves your network?

Most organizations start by granting a service account direct access to the vector store or database that holds patient information. The account often uses a static secret that is shared across multiple jobs, CI pipelines, and even ad‑hoc scripts. Engineers push new model code, data scientists run experiments, and monitoring agents write logs, all through the same credential. No central policy checks whether a particular request should be allowed, and no record exists that shows which user or job generated a specific embedding. When an auditor asks for evidence, the answer is usually “the logs are on the host” or “the database has an audit table”, neither of which ties the operation to an identity or demonstrates that sensitive fields were protected.

The first step toward compliance is to introduce a non‑human identity that is scoped to the minimum set of actions required for the embedding job. The job still needs to reach the vector store, but the connection is now made with a short‑lived token that carries only the permissions needed to write an embedding. This change stops the spread of a perpetual secret, yet it leaves two gaps: the request still travels directly to the database without a checkpoint that can verify intent, and there is no immutable record that shows who asked for the embedding, what data was returned, or whether PHI was masked.

That is where hoop.dev enters the architecture. hoop.dev sits in the data path as a Layer 7 gateway that proxies every embedding request. The gateway validates the short‑lived token, applies policy checks, can require a human approval for high‑risk vectors, masks any PHI that appears in responses, and records the full session for replay. Because hoop.dev is the only point where traffic is inspected, all enforcement outcomes – audit logs, inline masking, just‑in‑time approval, and session recording – exist solely because hoop.dev is present in the path.

Why phi compliance matters for embeddings

PHI (Protected Health Information) is subject to strict handling rules under HIPAA. Regulators expect to see:

Evidence that every access to PHI is tied to an authenticated identity.
Proof that only the minimum necessary data was exposed.
Records of any manual or automated approvals that allowed the access.
Immutable logs that can be reproduced during an audit.

When embeddings are generated, the model often receives raw patient notes or clinical text. If the pipeline does not mask identifiers before they are stored, the vector database itself becomes a repository of PHI. Auditors will ask for proof that the system never persisted raw identifiers and that each write operation was authorized.

How hoop.dev provides audit evidence

hoop.dev produces immutable logs that include the caller identity, request payload, masking actions, any approval decisions, and a replayable session record.

Because the gateway sits in the data path, it enforces inline masking of PHI fields before the data reaches the vector store. The mask is defined once in the policy and applied consistently, ensuring that no raw identifiers ever touch the downstream system.

Continue reading? Get the full guide.

PHI Compliance: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

When a request exceeds a risk threshold – for example, an embedding that includes a full clinical note instead of a snippet – hoop.dev can pause the flow and route the request to a designated approver. hoop.dev logs the approval decision alongside the request, giving auditors a clear trail of who authorized the operation and why.

hoop.dev automatically generates all of these artifacts – identity‑bound logs, masking actions, and approval records – and you can export them to a SIEM, a compliance reporting tool, or retain them in the built‑in audit store for the period required by your organization. hoop.dev retains them, so you do not need to instrument each downstream service individually; the evidence comes from a single, authoritative source.

Setup: identity and least‑privilege tokens

The first layer of protection is the authentication flow. Engineers obtain short‑lived OIDC tokens from your identity provider (Okta, Azure AD, Google Workspace, etc.). hoop.dev acts as the relying party, validates the token, and extracts group membership to decide which embedding policies apply. The token itself carries only the scopes needed for the job – write‑only to the vector store, no read access to raw patient records. This setup ensures that the request is attributable to a specific service account or CI job, but on its own does not block unauthorized data from being written.

The data path: hoop.dev as the enforcement boundary

All traffic from the embedding service to the vector store passes through hoop.dev. Because the gateway operates at the protocol layer, it can inspect the payload, apply masking rules, and enforce approval workflows before any data reaches the target. No other component in the stack can alter these decisions without changing the gateway configuration, making the data path the only place where enforcement can happen.

Enforcement outcomes you can demonstrate

Session recording: hoop.dev captures the full request and response stream, providing a replayable record for auditors.
Inline masking: hoop.dev redacts PHI fields in real time and logs the masking action.
Just‑in‑time approval: High‑risk embeddings trigger an approval step that hoop.dev logs with the request.
Identity‑bound audit logs: Every entry includes the verified OIDC subject, the policy applied, and the outcome.

These outcomes exist only because hoop.dev sits in the data path; removing it would eliminate the masking, approval, and recording capabilities.

Getting started

To add hoop.dev to your embeddings pipeline, start with the quick‑start guide that walks you through deploying the gateway, configuring OIDC authentication, and defining a masking policy for PHI. The documentation explains how to register your vector store as a connection and how to grant the gateway the minimal credentials it needs to write embeddings.

For detailed steps, see the getting‑started guide and the broader learn section. The source code and example configurations are available in the public repository.

FAQ

Does hoop.dev store PHI itself?

No. hoop.dev only proxies traffic and records metadata about the request.

Can I use hoop.dev with existing vector store credentials?

Yes. The gateway holds the credential securely, and the embedding job never sees it. You simply point the connection definition to the existing store and let hoop.dev enforce the policies.

What evidence does hoop.dev generate for an audit?

hoop.dev produces immutable logs that include the caller identity, request payload, masking actions, any approval decisions, and a replayable session record. These artifacts satisfy the typical audit requirements for PHI handling.

Ready to see the code in action? Explore the hoop.dev repository on GitHub and start building PHI‑compliant embeddings today.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts