All posts

BigQuery Data Masking in SQLPlus: A Simple Guide to Protect Sensitive Data

Data security is a critical aspect of any modern data system. Masking sensitive information plays a central role in protecting user data while enabling teams to work productively. For teams using BigQuery with SQLPlus, implementing data masking effectively can ensure compliance with privacy regulations and limit access to personal or sensitive data. This guide will show you how to use BigQuery to perform data masking in SQL, focusing on practical examples and steps you can apply directly. Wha

Free White Paper

Data Masking (Dynamic / In-Transit) + BigQuery IAM: The Complete Guide

Architecture patterns, implementation strategies, and security best practices. Delivered to your inbox.

Free. No spam. Unsubscribe anytime.

Data security is a critical aspect of any modern data system. Masking sensitive information plays a central role in protecting user data while enabling teams to work productively. For teams using BigQuery with SQLPlus, implementing data masking effectively can ensure compliance with privacy regulations and limit access to personal or sensitive data.

This guide will show you how to use BigQuery to perform data masking in SQL, focusing on practical examples and steps you can apply directly.


What is Data Masking in BigQuery?

Data masking is the process of hiding or obfuscating sensitive data to prevent unauthorized access. Instead of exposing sensitive information, you display de-identified or partial values that retain their usability for analytics or debugging.

For example:

  • A credit card number (4456-1234-5678-9101) could be masked like this: XXXX-XXXX-XXXX-9101.
  • A social security number (123-45-6789) could be masked as XXX-XX-6789.

BigQuery lets you perform this transformation directly within SQL logic. This is essential for maintaining privacy while using shared datasets or controlled environments.


Why Use Data Masking in SQLPlus?

SQLPlus, as a command-line interface for SQL, gives you the flexibility to handle data stored in BigQuery. Masking sensitive data directly at the query level has clear advantages:

  • Simplified Compliance: Meet GDPR, HIPAA, or other data privacy regulation requirements by obfuscating sensitive details.
  • Role-Based Access Control (RBAC): Enforce restrictions by masking data for non-privileged users or teams.
  • Secure Collaboration: Share the same BigQuery datasets across teams without revealing sensitive information.

How to Apply BigQuery Data Masking in SQL

Using SUBSTR for Partial Masking

The SUBSTR function extracts parts of a string. You can combine it with string literals to replace sensitive values with placeholders.

Continue reading? Get the full guide.

Data Masking (Dynamic / In-Transit) + BigQuery IAM: Architecture Patterns & Best Practices

Free. No spam. Unsubscribe anytime.

Example: Masking credit card numbers

SELECT 
 CONCAT("XXXX-XXXX-XXXX-", SUBSTR(credit_card_number, 16)) AS masked_credit_card 
FROM `project.dataset.table`;

What happens here?
- SUBSTR(credit_card_number, 16) extracts the last four digits.
- "XXXX-XXXX-XXXX-" replaces the sensitive part with non-identifiable text.
- CONCAT combines the two into the final masked result.


Masking with CASE Statements

CASE allows you to create conditional masking. This is useful when roles or permissions dictate whether a user sees masked or unmasked data.

Example: Role-based masking

SELECT 
 CASE 
 WHEN user_role = 'admin' THEN credit_card_number
 ELSE CONCAT("XXXX-XXXX-XXXX-", SUBSTR(credit_card_number, 16))
 END AS card_output
FROM `project.dataset.table`;

Using REGEXP_REPLACE for Pattern Masking

The REGEXP_REPLACE function can replace parts of strings based on patterns. This method is incredibly flexible for complex masking requirements.

Example: Masking phone numbers

SELECT 
 REGEXP_REPLACE(phone_number, r"(\d{3})-\d{3}-(\d{4})", "XXX-XXX-\1") AS masked_phone
FROM `project.dataset.table`;

What happens here?
- The REGEXP_REPLACE function identifies the first two groups of digits and replaces them with XXX.
- Remaining digit patterns are left intact.


Additional Considerations for BigQuery Data Masking

  • VIEW vs. TABLE: Instead of permanently masking table data, you can create a view. This ensures the data remains original, while downstream queries access only the masked values.
CREATE OR REPLACE VIEW `project.dataset.masked_view` AS
SELECT 
 column1,
 CONCAT("XXXX-XXXX-XXXX-", SUBSTR(credit_card_number, 16)) AS masked_credit_card
FROM `project.dataset.table`;
  • Dynamic Data Masking: Use BigQuery's Column-Level Security feature alongside masking queries for added control.

See It Live in Minutes

Learning advanced data-handling techniques is one thing, but testing them live brings value. At Hoop.dev, you can preview and execute SQLPlus query transformations in real-time to optimize your BigQuery queries efficiently. Start exploring—get your data masking logic live in just a few clicks!

Get started

See hoop.dev in action

One gateway for every database, container, and AI agent. Deploy in minutes.

Get a demoMore posts