Protecting sensitive information is critical when working with cloud data platforms like Google BigQuery. Ensuring that only authorized users can access sensitive data without disrupting workflows improves both compliance and security. In this guide, we’ll walk through how to combine BigQuery’s data masking features with Keycloak to address privacy concerns while maintaining seamless access control.
You’ll learn how to use Keycloak’s role-based access control (RBAC) and integrate it with BigQuery’s column-level security functions for masking sensitive data dynamically.
What is Data Masking in BigQuery?
Data masking is a process that hides real data by replacing it with fictional or altered values. In BigQuery, data masking is achieved using column-level security policies. These policies allow you to define how specific columns are masked or protected in query results, depending on the user’s security clearance.
Sensitive data like personally identifiable information (PII)—such as emails, phone numbers, or addresses—often needs to be masked for anonymous use or shared access without exposing private details.
For example:
- A masked email address might appear as
xxxxx@gmail.com instead of john.doe@gmail.com. - A masked credit card number could show only the last four digits.
By masking the data, you protect sensitive information from unauthorized access while allowing low-privilege users to query the dataset.
How Keycloak Enhances BigQuery Security
Keycloak is an open-source identity and access management tool. It simplifies authentication, authorization, and user management for modern applications. By leveraging Keycloak’s RBAC, you can control which types of data users can view or modify based on their roles.
When integrated with BigQuery, Keycloak can manage fine-grained access to datasets. Each user's role in Keycloak will determine which data they can see—and whether sensitive fields should be masked or left unaltered.
Why Combine BigQuery and Keycloak?
Together, they provide:
- Centralized Data Governance – Use Keycloak to define access rules for all your services, not just BigQuery.
- Dynamic Data Access – Mask or unmask data in real time based on the querying user’s role.
- Improved Security Compliance – Enforce data masking policies to meet various legal and organizational standards.
Setting Up BigQuery Data Masking with Keycloak
Let’s break down the setup into digestible steps to make your implementation seamless.
Start by defining BigQuery column security policies for the fields you want to protect.
- Open your dataset in BigQuery.
- Under “Security Policies,” create rules for the column needing masking (e.g.,
email or phone_number). - Set up masking functions like:
MASKED_WITH_VALUE(): Replace columns with static fictitious values.MASKED(): Obscure the original data with default patterns.
For example, in SQL:
ALTER TABLE my_dataset.my_table
ADD COLUMN POLICY email
MASKED WITH (FUNCTION = 'MASKED') ON (ROLE = 'restricted_user');
Step 2: Set Up Keycloak Roles
In Keycloak, define the roles that correspond to BigQuery permissions.
- Login as the Keycloak admin.
- Create roles like
restricted_user, shared_user, or full_access. - Assign users to roles based on data access needs.
Step 3: Map Keycloak Roles to GCP IAM Permissions
Link Keycloak roles to Google Cloud Identity and Access Management (IAM) roles for BigQuery. Use a mapping between Keycloak groups/roles and BigQuery account permissions.
This can be achieved via group claims in Keycloak, which dynamically maps users to their assigned access levels in BigQuery.
Example IAM roles per user/group:
- BigQuery Data Viewer: Can see all data, including masked.
- BigQuery Data Editor: Can query without restrictions.
Step 4: Enforce Role-Based Masking in Real Time
When users query BigQuery tables through the integrated Keycloak pipeline:
- Their roles are verified.
- BigQuery applies column-level security dynamically.
- Sensitive fields follow the masking policies based on the user’s role.
Keycloak ensures authentication and authorization, while BigQuery automatically enforces masking rules defined in your dataset.
Why Use This Approach?
Traditionally, managing data masking and access control in silos can lead to errors, inefficiencies, and security risks. By blending Keycloak’s strong identity management capabilities with BigQuery’s native data masking, you achieve:
- Customizable Access – Scale access roles without rewriting queries or hardcoding logic.
- Context-Based Security – Enforce role-specific masking during query execution.
- Time Savings – Centralize user management while leveraging BigQuery’s built-in security features.
See Dynamic Security in Action
Masking sensitive data doesn’t need to be complicated or slow down development. With tools like Keycloak and BigQuery, you can create intelligent, dynamic systems where data access adapts to each user’s role.
To explore real-world dynamic masking in BigQuery integrated with Keycloak, check out hoop.dev and see how you can build a secure, role-based data platform in under five minutes. Start experimenting and reduce complexity while adding powerful security to your cloud applications.
Secure your data with confidence. Try it now.