Skip to main content

Prerequisites

  • Have properly installed hoop.dev through one of the options available in the deployment overview
  • Be on an enterprise plan
  • Enough access to your infrastructure so you can load environment variables to your hoop.dev instance
  • Have admin access to your hoop.dev instance
This page is dedicated to the setup of Live Data Masking in self-hosted instances. If you are looking for the Learn Guides of Live Data Masking, click here.

Setup

This service currently supports Microsoft Presidio for data classification and PII detection. Google Cloud Data Loss Prevention (DLP) is still available for existing customers but is deprecated for new installations.
You must be on an Enterprise plan to have full access to the Live Data Masking feature.

Microsoft Presidio

1

Install

Check the Microsoft Presidio documentation to install it.

Microsoft Presidio Docker Installation

Visit the Microsoft Presidio documentation to install it using Docker.

Microsoft Presidio Kubernetes Installation

Visit the Microsoft Presidio documentation to install it using Kubernetes.
2

Set up

Set the new environment variables in hoop.dev’s Gateway with the following values:
Environment variable keyValue
DLP_PROVIDERmspresidio
DLP_MODEbest-effort or strict
MSPRESIDIO_ANALYZER_URL<host-to-analyzer:port>
MSPRESIDIO_ANONYMIZER_URL<host-to-anonymizer:port>
3

Run hoop.dev's Gateway with the new configs

After setting up the environment variables, hoop.dev will use Microsoft Presidio to mask sensitive data in real-time in the data stream of any resource role you configure.

Google Cloud Data Loss Prevention (DLP)

Create an account at Google Cloud Data Loss Prevention and a service account with the permission roles/dlp.user. When installing hoop.dev, you need to set the environment variable GOOGLE_APPLICATION_CREDENTIALS_JSON with your GCP DLP credentials in hoop’s Gateway. Hoop.dev uses Google Cloud’s DLP at our protocol layer to mask sensitive data in real-time in the data stream of any resource role you configure.
Google Cloud Data Loss Prevention (DLP) is still available for existing customers but is deprecated for new installations.

Redact Modes

The gateway now supports two operational modes that control how redaction failures are handled. Configure your preferred mode by setting the environment variable DLP_MODE to either strict or best-effort.

best-effort

This is the default mode, it will redact the content, however if it find any error it will continue to operate without any disruption.
  • DLP_MODE=best-effort

strict

This mode will return an error in case it find any redaction issue
  • DLP_MODE=strict
Start with best-effort to avoid blocking legitimate queries while you tune which fields are masked, then move to strict once detection is dialed in.

Activate on your resource roles

In the Web App, open the Discover section in the main sidebar and select AI Data Masking. From there, activate masking on each resource role you want to protect, and use the Configure button to choose which fields are masked. A set of default, most-used fields is enabled automatically, so masking works as soon as you activate a resource role. You can add or remove fields at any time.

Detected data types

Once activated, Live Data Masking detects a wide range of sensitive data out of the box, grouped into the categories below.

Personal Information

TypeExampleMasked As
Person NameJohn Smith[PERSON]
Email Addressjohn@example.com[EMAIL]
Phone Number555-123-4567[PHONE]
Physical Address123 Main St[ADDRESS]

Government IDs

TypeExampleMasked As
SSN (US)123-45-6789[SSN]
Passport NumberAB1234567[PASSPORT]
Driver’s LicenseD1234567[LICENSE]

Financial Data

TypeExampleMasked As
Credit Card4111-1111-1111-1111[CREDIT_CARD]
Bank Account123456789012[BANK_ACCOUNT]
IBANGB82WEST12345698765432[IBAN]

Credentials

TypeExampleMasked As
API Keysk_live_abc123…[API_KEY]
Passwordpassword123[PASSWORD]
AWS KeyAKIA…[AWS_KEY]

Health Information

TypeExampleMasked As
Medical RecordMRN-12345[MEDICAL_RECORD]
Health Plan IDHPL-98765[HEALTH_ID]
For the complete, provider-specific list, see Supported Fields.

Troubleshooting

Data Not Being Masked

Check:
  1. Live Data Masking is enabled on the resource role
  2. DLP provider is running and accessible
  3. Gateway environment variables are set correctly
  4. The data type is in the supported fields list
Test the DLP provider directly:
curl -X POST http://presidio-analyzer:5001/analyze \
  -H "Content-Type: application/json" \
  -d '{"text": "John Smith, SSN 123-45-6789", "language": "en"}'

Too Much Data Being Masked

If legitimate data is being masked incorrectly:
  1. Check which field type is triggering
  2. Disable that specific field type in configuration
  3. Or use Guardrails for more precise control

Performance Impact

Live Data Masking adds latency to query results:
Result SizeTypical Latency
< 100 rows50-100ms
100-1000 rows100-500ms
> 1000 rows500ms+
To reduce latency:
  • Use LIMIT clauses in queries
  • Select only needed columns (avoid SELECT *)
  • Consider disabling masking for high-volume analytics

How it works

At the protocol layer when communicating with a database or server, hoop.dev will open the package and communicate with DLP provider to mask sensitive data in it. This happens in memory and in real-time, so the data is never stored in the database or server in its original form. After the setup, we automatically give you a bunch of default and most used fields, so you don’t need to worry much about that part either. You can remove or add fields as much as you like. To see all fields available, check our documentation page for all fields supported.