The query ran, but the numbers looked wrong. Sensitive columns were still showing clear text when they should have been masked. That is when you find the cracks in your BigQuery data masking strategy — and you realize how much depends on your sub-processors.
BigQuery data masking is simple in theory: protect sensitive data by hiding, encrypting, or replacing it before it leaves storage. In practice, especially in large pipelines, the data may flow through a web of sub-processors — ETL tools, third-party analytics, machine learning models, reverse ETLs. Each has access to raw or masked data. Each can be a point of failure if the masking rules aren’t enforced end-to-end.
A sub-processor here is any service or vendor that processes data on your behalf. You may control the SQL in BigQuery. You may enforce masking at the query level with SAFE.SUBSTR, REGEXP_REPLACE, or policy tags. But when that data is exported or streamed to a sub-processor, the game changes. Without strict contracts, technical enforcement, and audit logs, sensitive values can leak.
The right approach starts at the schema. Use BigQuery’s column-level security and policy tags to classify and restrict fields containing PII, PCI, or PHI. Bind masking policies directly to these tags, not to ad-hoc queries. Then, configure authorized views or data row policies for every consumer, including downstream processing tools. When a sub-processor connects, they must inherit these restrictions.