A service account with read access to your customer database runs every night and nobody watches it, because it is supposed to run every night. That is exactly the condition data exfiltration thrives in. The hardest exfiltration to catch is the one that uses a non-human identity doing something it is allowed to do, just more of it, or to the wrong place.
Skip the abstract threat modeling. Operationally, non-human identity is the channel most likely to move large amounts of data without tripping a human's attention, because there is no human attached to watch.
Why machines are the exfiltration path
- Bulk access by design. Pipelines and reporting agents read large volumes legitimately, so a large read does not look anomalous.
- No watcher. A person notices their own account doing something odd. A service account has no one looking at its behavior in real time.
- Standing reach to sensitive data. The same credential that powers a dashboard can dump the table behind it.
The exfiltration risk of a non-human identity is the volume of sensitive data its credential can read, with little chance anyone notices the read at the time.
Three controls against data exfiltration
- Mask sensitive fields in transit. If the data leaving the database is redacted before it reaches the client, an exfiltration attempt pulls masked values, not raw card numbers or PII.
- Record every command per principal. A bulk read becomes visible after the fact, tied to the exact identity and statement, so detection and response have something to work from.
- Scope and time-box access. A credential that can only reach what it needs, only when it needs it, cannot be repurposed to dump everything later.
Enforce these on the connection
The architectural requirement: masking and recording have to happen on the access path, between the identity and the data, not inside the workload that could be the thing exfiltrating. A control the actor can bypass is not a control against that actor.
hoop.dev is an open-source access gateway between identities and infrastructure. A non-human identity reaches a database through it, sensitive fields can be masked in the result stream before they reach the client via a configured data classification provider, and every command is recorded under a named principal. So a reporting agent reads what it needs with PII redacted, and a query that tries to pull raw sensitive data at volume is both masked and logged. Data exfiltration through that path is far harder to do quietly and far easier to catch. Masking is configured per connection; see the getting-started guide for setup.
