A log file lands in your lap. It’s full of production data. Somewhere inside, buried, is a customer’s phone number, credit card, or national ID. You need to find it fast.
Microsoft Presidio Sensitive Data detection is built for this job. It’s an open-source framework that scans text for PII and other regulated identifiers. It supports names, credit cards, bank accounts, IP addresses, email addresses, driver’s license numbers, and more. You can run it locally or in the cloud. Its analyzers use built-in recognizers with regex and contextual named-entity models. You can extend those recognizers with custom patterns for domain-specific sensitive data.
Presidio breaks down into three main services: Analyzer, Anonymizer, and Recognizer Registry. The Analyzer processes input text, detects sensitive entities, and assigns a confidence score. The Anonymizer can mask or replace matching values, keeping data useful while reducing compliance risk. The Recognizer Registry handles built-in and custom recognizers so detection logic is easy to maintain.