Data Classification for Code Execution

Running untrusted code against production data is a recipe for data leakage.

Code execution platforms, whether they host serverless functions, CI pipelines, or interactive notebooks, often grant programs direct read/write access to databases, caches, and internal APIs. When developers or automated agents execute queries without clear boundaries, they can inadvertently expose personally identifiable information, trade secrets, or regulatory‑level data. Simply labeling a column as "PII" or tagging a table as "confidential" does not stop a script from dumping the contents to an external bucket, printing it in logs, or sending it over an insecure channel.

Data classification is the process of assigning a sensitivity level to each data element, such as public, internal, confidential, or restricted. The classification informs who should see the data, how it may be transmitted, and what safeguards are required. However, without a runtime enforcement point, classification remains a documentation exercise. The real challenge is bridging the gap between static labels and dynamic execution.

Why data classification matters for code execution

When a function reads a "restricted" field, the system must decide in real time whether the caller is authorized to view that value. If the decision is made only at the authentication layer, the function can still retrieve the raw value and decide later to mask or drop it, by then the data may have already been logged or exfiltrated. Embedding classification checks directly in the code base is error‑prone: developers must remember to add guards around every query, and any new library or third‑party script bypasses those checks.

Effective protection therefore requires three ingredients:

Continue reading? Get the full guide.

Data Classification + Lambda Execution Roles: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Clear taxonomy. Define classification levels and map them to concrete policies (e.g., "restricted" data must be masked in responses, and any write operation requires multi‑person approval).
Identity‑aware enforcement. The enforcement point must know who is invoking the code and what classification level the data carries.
Audit and replay. Every access should be recorded so that compliance teams can prove who saw what and when.

Enforcing classification with a gateway

Placing a Layer 7 gateway between the executor and the target resource creates the single place where all three ingredients can be applied. The gateway sits in the data path, intercepting protocol traffic (SQL, HTTP, SSH, etc.) before it reaches the backend. Because the gateway controls the flow, it can read the classification metadata attached to tables or fields and apply the appropriate policy in real time.

hoop.dev implements exactly this pattern. It authenticates users and service accounts via OIDC or SAML, then uses group membership and token claims to decide which classification levels a caller may access. When a request for a "restricted" column arrives, hoop.dev can:

Mask the sensitive column in the response, ensuring the executor never sees the raw value.
Block a dangerous command such as a DROP TABLE that would affect confidential data before it reaches the database.
Route the request to a human approver when the operation exceeds a predefined risk threshold.
Record the entire session, including the masked output, for later audit and replay.

All of these actions happen in the gateway, not in the code that initiates the request. This separation guarantees that even a compromised executor cannot bypass the controls, because the agent never holds the underlying credentials and never sees unmasked data.

To adopt this approach, teams should start by cataloguing their data assets and assigning classification levels. Next, map each level to a policy profile in the gateway, e.g., "public" data passes unchanged, "internal" data is logged, "confidential" data is masked, and "restricted" data triggers approval. Finally, configure the gateway to enforce those profiles on the relevant connection types (PostgreSQL, MySQL, HTTP APIs, etc.). The getting‑started guide walks through deploying the gateway and registering a connection, while the learn section provides deeper examples of masking and approval workflows.

Practical steps for teams

Define a classification matrix. List data stores, enumerate tables or endpoints, and assign a sensitivity label.
Translate labels to policies. Decide which actions (read, write, export) are allowed per label and whether masking or approval is required.
Deploy the gateway. Use the provided Docker Compose quickstart or a Kubernetes manifest to run the gateway alongside your resources.
Configure enforcement. In the gateway UI or configuration file, bind each classification level to the desired guardrails (masking, command blocking, JIT approval).
Validate and iterate. Run test workloads, verify that sensitive fields are masked and that audit logs capture the expected details.

Once the gateway is in place, any new code, whether written by a developer, an AI‑assisted tool, or an automated CI job, must pass through the same enforcement layer. This uniformity eliminates the need for per‑application security checks and reduces the risk of accidental data exposure.

In summary, data classification only becomes a protective control when it is enforced at the point where code talks to data. A Layer 7 gateway provides that enforcement surface, and hoop.dev supplies the open‑source implementation that can mask, block, approve, and audit in real time.

Explore the open‑source repository on GitHub: https://github.com/hoophq/hoop

Data Classification for Code Execution

Why data classification matters for code execution

Enforcing classification with a gateway

Practical steps for teams

Save the open-source gateway for agent data access