Privacy & Anonymization¶
Privacy is core to Egregora's design. Real names never reach the LLM - all conversations are automatically anonymized before processing.
How Anonymization Works¶
Deterministic UUIDs¶
Input adapters anonymize author names before the data enters the core pipeline. Each adapter chooses the namespace that best matches its privacy requirements and calls deterministic_author_uuid() while constructing the IR table:
| Python | |
|---|---|
Key properties:
- Deterministic: Same namespace + author → same UUID
- Scoped anonymity: Opt into tenant-specific namespaces when isolation is needed
- Local control: Namespace selection lives inside the adapter, so projects can decide how aggressively to separate identities
Process Flow¶
sequenceDiagram
participant User
participant Egregora
participant LLM
User->>Egregora: WhatsApp export (real names)
Egregora->>Egregora: Adapter derives namespace + UUIDs
Egregora->>Egregora: Core validates anonymized IR
Egregora->>LLM: Anonymized conversation
LLM->>Egregora: Blog post (with UUIDs)
Egregora->>Egregora: Reverse mapping (optional)
Egregora->>User: Published post (real names or aliases) PII Detection¶
Egregora scans for personally identifiable information:
| Python | |
|---|---|
Patterns detected:
| Type | Pattern | Example |
|---|---|---|
| Phone | +1-555-123-4567 | US, international formats |
user@example.com | All standard emails | |
| Address | 123 Main St, City | Street addresses |
| Credit Card | 4111-1111-1111-1111 | Major card formats |
| SSN | 123-45-6789 | US social security |
Actions:
- Warn: Log PII detection
- Redact: Replace with
[REDACTED] - Skip: Exclude message from processing
User Controls¶
Users can manage their privacy directly in WhatsApp by sending commands:
Set Alias¶
Use a display name instead of UUID:
| Text Only | |
|---|---|
Result: Posts show "Casey" instead of a3f2b91c.
Set Bio¶
Add profile information:
| Text Only | |
|---|---|
This bio is used when generating author profiles.
Opt-Out¶
Exclude yourself from future posts:
| Text Only | |
|---|---|
Your messages will still be visible in the chat, but won't be included in generated content.
Opt-In¶
Re-enable participation:
| Text Only | |
|---|---|
Privacy Levels¶
Configure privacy strictness:
| Bash | |
|---|---|
Data Storage¶
Local Storage¶
All data is stored locally:
| Text Only | |
|---|---|
Important: Never commit .egregora/mapping.json to public repositories!
What Gets Sent to LLMs?¶
Sent:
- Anonymized messages (UUIDs instead of names)
- Timestamps (for context)
- Enriched descriptions (if enabled)
- Previous blog posts (for RAG)
Never sent:
- Real names
- Phone numbers (if detected)
- Email addresses (if detected)
- The UUID mapping
Best Practices¶
1. Export Without Media¶
WhatsApp offers two export options:
- ✅ Without media (recommended): Text only, no images/videos
- ❌ With media: Includes photos, videos, voice messages
For privacy, always export without media. Egregora can optionally enrich URLs using LLMs instead.
2. Review Before Publishing¶
Preview generated posts before making your site public:
| Bash | |
|---|---|
Review that:
- Names are properly anonymized
- No PII leaked
- Content is appropriate
3. Configure Git Ignore¶
Ensure your .gitignore includes:
4. Secure API Keys¶
Never commit API keys:
| Bash | |
|---|---|
Or use a .env file (and add to .gitignore):
| Bash | |
|---|---|
Privacy FAQ¶
Can the LLM figure out real names from context?¶
Possible but unlikely. Egregora only sends message content, not:
- Contact lists
- Profile pictures
- Phone numbers
- Email addresses
However, if someone mentions "my name is Alice" in a message, that information is included. For maximum privacy, review exports before processing.
What about message content privacy?¶
Egregora sends message content to Google's Gemini API. If your conversations contain sensitive information:
- Use Google's enterprise offerings with data residency controls
- Or run a local LLM (requires code modifications)
- Or carefully review and redact sensitive messages before export
Is my data used to train Google's models?¶
Per Google's Gemini API Terms:
Your data is not used to train Google's models.
However, review Google's current policies to ensure compliance with your requirements.
Can I use a local LLM?¶
Yes, but requires code changes. Egregora uses Google's Gemini client, but you can swap in:
See Development Guide for details.
Next Steps¶
- RAG & Knowledge Base - How retrieval works
- Content Generation - LLM writer internals
- API Reference - Privacy Module - Code documentation