Detection Methods
Prompt Guard uses a combination of pattern matching and information theory to detect known and unknown sensitive data.
Pattern-Based Detection
Uses regular expressions to match known patterns like email addresses, phone numbers, credit card numbers, and API keys with known prefixes.
Personal Information (PII)
API Keys & Tokens
| Provider | Pattern |
|---|---|
| OpenAI | sk-... or sk-proj-... |
| Anthropic | sk-ant-... |
| AIza... | |
| AWS | AKIA... |
| GitHub | ghp_..., gho_..., ghs_... |
| Stripe | sk_live_..., pk_live_... |
| Azure | 86-char base64 keys |
| Discord | Bot token format |
| Hugging Face | hf_... |
| JWT Tokens | eyJ... |
Secrets & Credentials
- • Password assignments in code
- • Private keys (RSA, SSH, PGP)
- • Database connection strings
- • OAuth tokens and bearer tokens
Shannon Entropy Analysis
Uses information theory to detect high-randomness strings that are likely secrets, even if they don't match known patterns.
How It Works
Shannon entropy measures the randomness of a string. Higher entropy means more randomness, which is characteristic of cryptographic secrets, API keys, and generated passwords.
Context-Aware Detection
Uses lower thresholds (>3.5 bits/char) when near keywords like "password", "key", "token", or "secret". Standalone strings use higher thresholds (>4.0).
Encoding Detection
Identifies Base64 and hexadecimal encoded secrets that might otherwise evade pattern-based detection.
Entropy Reference
| Text Type | Entropy | Assessment |
|---|---|---|
| English Text | 3.0 - 4.0 | Not a secret |
| Variable Names | 3.5 - 4.2 | Unlikely secret |
| Random Passwords | 4.5 - 6.0 | Likely secret |
| API Keys (Base64) | 5.5 - 6.0 | Definitely secret |
Character Class Analysis
Checks for mix of uppercase, lowercase, digits, and symbols — a common characteristic of generated secrets.