Prerequisites
- Have properly installed hoop.dev through one of the options available in the deployment overview
- Be on an enterprise plan
- Enough access to your infrastructure so you can load environment variables to your hoop.dev instance
- Have admin access to your hoop.dev instance
This page is dedicated to the setup of Live Data Masking in self-hosted instances. If you are looking for the Learn Guides of Live Data Masking, click here.
Setup
This service currently supports Microsoft Presidio for data classification and PII detection. Google Cloud Data Loss Prevention (DLP) is still available for existing customers but is deprecated for new installations.You must be on an Enterprise plan to have full access to the Live Data Masking feature.
Microsoft Presidio
Install
Check the Microsoft Presidio documentation to install it.
Microsoft Presidio Docker Installation
Visit the Microsoft Presidio documentation to install it using Docker.
Microsoft Presidio Kubernetes Installation
Visit the Microsoft Presidio documentation to install it using Kubernetes.
Set up
Set the new environment variables in hoop.dev’s Gateway with the following values:
| Environment variable key | Value |
|---|---|
DLP_PROVIDER | mspresidio |
DLP_MODE | best-effort or strict |
MSPRESIDIO_ANALYZER_URL | <host-to-analyzer:port> |
MSPRESIDIO_ANONYMIZER_URL | <host-to-anonymizer:port> |
Google Cloud Data Loss Prevention (DLP)
Create an account at Google Cloud Data Loss Prevention and a service account with the permissionroles/dlp.user.
When installing hoop.dev, you need to set the environment variable GOOGLE_APPLICATION_CREDENTIALS_JSON with your GCP DLP credentials in hoop’s Gateway. Hoop.dev uses Google Cloud’s DLP at our protocol layer to mask sensitive data in real-time in the data stream of any resource role you configure.
Google Cloud Data Loss Prevention (DLP) is still available for existing customers but is deprecated for new installations.
Redact Modes
The gateway now supports two operational modes that control how redaction failures are handled. Configure your preferred mode by setting the environment variable DLP_MODE to eitherstrict or best-effort.
best-effort
This is the default mode, it will redact the content, however if it find any error it will continue to operate without any disruption.DLP_MODE=best-effort
strict
This mode will return an error in case it find any redaction issueDLP_MODE=strict
Activate on your resource roles
In the Web App, open the Discover section in the main sidebar and select AI Data Masking. From there, activate masking on each resource role you want to protect, and use the Configure button to choose which fields are masked. A set of default, most-used fields is enabled automatically, so masking works as soon as you activate a resource role. You can add or remove fields at any time.Detected data types
Once activated, Live Data Masking detects a wide range of sensitive data out of the box, grouped into the categories below.Personal Information
| Type | Example | Masked As |
|---|---|---|
| Person Name | John Smith | [PERSON] |
| Email Address | john@example.com | [EMAIL] |
| Phone Number | 555-123-4567 | [PHONE] |
| Physical Address | 123 Main St | [ADDRESS] |
Government IDs
| Type | Example | Masked As |
|---|---|---|
| SSN (US) | 123-45-6789 | [SSN] |
| Passport Number | AB1234567 | [PASSPORT] |
| Driver’s License | D1234567 | [LICENSE] |
Financial Data
| Type | Example | Masked As |
|---|---|---|
| Credit Card | 4111-1111-1111-1111 | [CREDIT_CARD] |
| Bank Account | 123456789012 | [BANK_ACCOUNT] |
| IBAN | GB82WEST12345698765432 | [IBAN] |
Credentials
| Type | Example | Masked As |
|---|---|---|
| API Key | sk_live_abc123… | [API_KEY] |
| Password | password123 | [PASSWORD] |
| AWS Key | AKIA… | [AWS_KEY] |
Health Information
| Type | Example | Masked As |
|---|---|---|
| Medical Record | MRN-12345 | [MEDICAL_RECORD] |
| Health Plan ID | HPL-98765 | [HEALTH_ID] |
Troubleshooting
Data Not Being Masked
Check:- Live Data Masking is enabled on the resource role
- DLP provider is running and accessible
- Gateway environment variables are set correctly
- The data type is in the supported fields list
Too Much Data Being Masked
If legitimate data is being masked incorrectly:- Check which field type is triggering
- Disable that specific field type in configuration
- Or use Guardrails for more precise control
Performance Impact
Live Data Masking adds latency to query results:| Result Size | Typical Latency |
|---|---|
| < 100 rows | 50-100ms |
| 100-1000 rows | 100-500ms |
| > 1000 rows | 500ms+ |
- Use
LIMITclauses in queries - Select only needed columns (avoid
SELECT *) - Consider disabling masking for high-volume analytics