All posts

Putting access controls around GitHub Copilot: database access for AI coding agents (on BigQuery)

When GitHub Copilot writes code that talks to a production data warehouse, the AI agent often runs with a service account that has blanket database access rights. That single credential can surface in logs, be reused by a compromised CI pipeline, and let the model extract sensitive rows without any human oversight. The cost is not just a data leak; it expands the blast radius of a compromised build, forces expensive retroactive remediation, and erodes trust in AI‑assisted development. Organizati

Free White Paper

AI Model Access Control + Vector Database Access Control: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

When GitHub Copilot writes code that talks to a production data warehouse, the AI agent often runs with a service account that has blanket database access rights. That single credential can surface in logs, be reused by a compromised CI pipeline, and let the model extract sensitive rows without any human oversight. The cost is not just a data leak; it expands the blast radius of a compromised build, forces expensive retroactive remediation, and erodes trust in AI‑assisted development. Organizations that let Copilot query BigQuery directly accept these risks because they lack a gate that can inspect each query, enforce least‑privilege, and record who asked what.

Why database access needs tighter control for AI coding agents

The core problem is a mismatch between identity and authority. The identity that launches a Copilot‑generated script is typically a CI service account, not an individual engineer. That account is granted broad database access so the pipeline can run any migration or analytics job. The setup satisfies the immediate need to keep builds fast, but it leaves three gaps:

  • There is no real‑time review of the SQL that the AI proposes.
  • Sensitive columns such as personally identifiable information or financial figures are returned to the agent unfiltered.
  • Every query runs without a durable audit record tied to a human decision.

These gaps are especially dangerous when the AI model is used to autocomplete queries based on vague prompts. An engineer might unintentionally expose a customer table, and the downstream impact is hard to trace.

What a proper control model looks like

A sound model starts with three pillars:

  1. Setup: Define who can request a database connection. This is done with OIDC or SAML tokens, group membership, and service‑account roles that encode the intent of the request. The setup decides who the request is and whether it may start, but it does not enforce any guardrails on its own.
  2. The data path: Place an enforcement point on the actual traffic between the Copilot‑generated client and BigQuery. The gateway is the only place where commands can be inspected, approved, masked, or recorded.
  3. Enforcement outcomes: Require just‑in‑time approvals before write queries, mask columns that contain regulated data, and log every statement with the originating identity for replay.

Without a gateway in the data path, the setup alone cannot guarantee that a privileged service account does not run an unintended destructive command or exfiltrate a credit‑card column. The enforcement outcomes must be produced where the traffic flows.

Introducing hoop.dev as the enforcement layer

hoop.dev implements the data‑path gateway that satisfies the model above. It sits between the AI‑driven client and BigQuery, proxying the wire‑level protocol. Because hoop.dev controls the connection, it can enforce every policy you need:

  • Just‑in‑time approval: When a Copilot script attempts a data‑modifying or schema‑changing operation, hoop.dev pauses the request and routes it to an approver. The approver sees the exact SQL and can grant or deny access in real time.
  • Inline data masking: For SELECT statements that reference columns marked as sensitive, hoop.dev rewrites the response on the fly, redacting or tokenizing the values before they reach the AI agent.
  • Session recording: Every query, along with the identity token that initiated it, is stored in an audit log that provides a reliable record of activity.
  • Command blocking: Policies can be defined to reject dangerous patterns such as commands that drop an entire database or that scan full tables, preventing accidental performance degradation.

All of these outcomes are possible only because hoop.dev is the active component in the data path. If you removed hoop.dev and left the service account to talk directly to BigQuery, none of the approvals, masking, or recordings would happen.

Continue reading? Get the full guide.

AI Model Access Control + Vector Database Access Control: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

How the pieces fit together for Copilot

1. Identity provisioning: Teams configure an OIDC provider (Okta, Azure AD, Google Workspace, etc.) that issues short‑lived tokens to the CI runner. The token carries the group that signals “AI‑generated code runner”.

2. Connection registration: In the hoop.dev UI or API, you register a BigQuery connection, supplying a service‑account key that hoop.dev uses to authenticate to the data warehouse. The key never leaves the gateway.

3. Policy definition: Using the hoop.dev policy language, you declare which groups may run read‑only queries versus write queries, which columns require masking, and the approval workflow for schema changes.

4. Runtime flow: When Copilot produces a snippet that calls the standard BigQuery client, the CI runner points the client at the hoop.dev endpoint. hoop.dev validates the token, checks the policy, applies any masking, records the request, and either forwards it to BigQuery or pauses for approval.

This flow keeps the AI agent from ever seeing raw credentials, ensures that every database interaction is subject to the same governance as a human‑initiated query, and provides a complete audit trail for compliance teams.

Getting started

To try this in your environment, follow the getting‑started guide to deploy the gateway and register a BigQuery connection. The learn section walks through policy creation, approval workflow configuration, and masking rules. All of the heavy lifting is open source, and you can inspect or extend the codebase on GitHub.

FAQ

Does hoop.dev store my BigQuery credentials?

No. The gateway holds the credential in memory and uses it only to forward approved traffic. The client never receives the secret.

Can I audit who approved a write operation?

Yes. Each approval event is logged with the approver’s identity, the exact SQL, and a timestamp. The session record can be replayed for forensic analysis.

What happens if the AI model tries to query a column I didn’t mark for masking?

If the column is not listed in a masking rule, hoop.dev returns the value unchanged. You control which fields are sensitive by updating the masking policy.

Take the next step

Explore the source code, contribute improvements, and see how the community has implemented similar use cases on the GitHub repository.

Open source

Save the open-source gateway for agent data access

Hoop is MIT-licensed infrastructure for controlling how AI agents reach production data. Star hoophq/hoop so you can inspect it, deploy it, or share it when your team starts governing agent access.

Star and save the repo →More posts