Settings¶
settings ¶
Centralized configuration for Egregora (ALPHA VERSION).
This module consolidates ALL configuration code in one place: - Pydantic models for .egregora.toml (in root) - Loading and saving functions - Runtime dataclasses for function parameters - Model configuration utilities
Why TOML? - Native Python support in 3.11+ (tomllib) - Unambiguous date/time parsing (unlike YAML which can be ambiguous) - Clearer specification, avoiding "The Norway Problem" - Standard for Python ecosystem (matches pyproject.toml)
Benefits: - Single source of truth for all configuration - Backend independence (works with Hugo, Astro, etc.) - Type safety (Pydantic validation at load time) - No backward compatibility - clean alpha design
Strategy: - ONLY loads from .egregora.toml in root - Creates default config if missing in root
ModelSettings ¶
Bases: BaseModel
LLM model configuration for different tasks.
- Pydantic-AI agents expect provider-prefixed IDs like
google-gla:gemini-flash-latest - Direct Google GenAI SDK calls expect
models/<name>identifiers
validate_pydantic_model_format classmethod ¶
Validate Pydantic-AI model name format.
Source code in src/egregora/config/settings.py
validate_google_model_format classmethod ¶
Validate Google GenAI SDK model name format.
Source code in src/egregora/config/settings.py
ImageGenerationSettings ¶
Bases: BaseModel
Configuration for image generation requests.
RAGSettings ¶
Bases: BaseModel
Retrieval-Augmented Generation (RAG) configuration.
⭐ MAGICAL FEATURE: Contextual Memory This is one of the three features that make Egregora special. Posts reference previous discussions, creating connected narratives.
Uses LanceDB for vector storage and similarity search. Embedding API uses dual-queue router for optimal throughput.
validate_top_k classmethod ¶
Validate top_k is reasonable and warn if too high.
Source code in src/egregora/config/settings.py
WriterAgentSettings ¶
Bases: BaseModel
Blog post writer configuration.
PrivacySettings ¶
Bases: BaseModel
Privacy and PII configuration.
EnrichmentSettings ¶
Bases: BaseModel
Enrichment settings for URLs and media.
PipelineSettings ¶
Bases: BaseModel
Pipeline execution settings.
PathsSettings ¶
Bases: BaseModel
Site directory paths configuration.
All paths are relative to site_root (output directory). Provides defaults that match the standard .egregora/ structure.
validate_safe_path classmethod ¶
Validate path is relative and does not contain traversal sequences.
Source code in src/egregora/config/settings.py
OutputAdapterConfig ¶
Bases: BaseModel
Configuration for a single output adapter.
Each adapter represents a target format (e.g., MkDocs, Hugo) with its own configuration file.
OutputSettings ¶
Bases: BaseModel
Output adapter registry.
Registers all output adapters used for this site. Each adapter has a type and optional config path.
SourceOverrideSettings ¶
Bases: BaseModel
Per-source overrides for pipeline execution.
Allows fine-grained control of windowing, enrichment, and date ranges without requiring repeated CLI flags.
SourceSettings ¶
Bases: BaseModel
Configuration for a single input source.
SiteSettings ¶
Bases: BaseModel
Site-level configuration including configured sources.
DatabaseSettings ¶
Bases: BaseModel
Database configuration for pipeline and observability.
All values must be valid Ibis connection URIs (e.g. DuckDB, Postgres, SQLite).
ReaderSettings ¶
Bases: BaseModel
Reader agent configuration for post evaluation and ranking.
⭐ MAGICAL FEATURE: Content Discovery This is one of the three features that make Egregora special. It should be enabled by default for 95% of users.
TaxonomySettings ¶
Bases: BaseModel
Semantic taxonomy generation settings.
After posts are generated, clusters similar posts and assigns consistent tags using LLM analysis. Uses K-Means clustering.
FeaturesSettings ¶
Bases: BaseModel
Feature flags for experimental or optional functionality.
QuotaSettings ¶
Bases: BaseModel
Configuration for LLM usage budgets and concurrency.
ProfileSettings ¶
Bases: BaseModel
Configuration for profile generation agent.
⭐ MAGICAL FEATURE: Author Profiles This is one of the three features that make Egregora special. Creates loving portraits of people from their messages - storytelling, not analytics. Always enabled (no opt-out flag).
EgregoraConfig ¶
Bases: BaseSettings
Root configuration for Egregora.
This model defines the complete .egregora.toml schema.
Supports environment variable overrides with the pattern: EGREGORA_SECTION__KEY (e.g., EGREGORA_MODELS__WRITER)
validate_cross_field ¶
Validate cross-field dependencies and warn about potential issues.
Source code in src/egregora/config/settings.py
from_cli_overrides classmethod ¶
Create a new config instance with CLI overrides applied.
Handles nested updates for pipeline, enrichment, rag, etc. CLI arguments are expected to be flat key-value pairs or dicts matching the argument structure of CLI commands.
Source code in src/egregora/config/settings.py
RuntimeContext dataclass ¶
RuntimeContext(
output_dir: Annotated[Path, "Directory for the generated site"],
input_file: Annotated[Path | None, "Path to the chat export file"] = None,
model_override: Annotated[str | None, "Model override from CLI"] = None,
debug: Annotated[bool, "Enable debug logging"] = False,
)
Runtime-only context that cannot be persisted to config file.
This is the minimal set of fields that are truly runtime-specific: - Paths resolved at invocation time - Debug flags
API keys are read directly from environment variables by pydantic-ai/genai. All other configuration lives in EgregoraConfig (single source of truth).
WriterRuntimeConfig dataclass ¶
WriterRuntimeConfig(
posts_dir: Annotated[Path, "Directory to save posts"],
profiles_dir: Annotated[Path, "Directory to save profiles"],
rag_dir: Annotated[Path, "Directory for RAG data"],
model_config: Annotated[object | None, "Model configuration"] = None,
enable_rag: Annotated[bool, "Enable RAG"] = True,
)
Runtime configuration for post writing (not the Pydantic WriterConfig).
MediaEnrichmentContext dataclass ¶
MediaEnrichmentContext(
media_type: Annotated[str, "The type of media (e.g., 'image', 'video')"],
media_filename: Annotated[str, "The filename of the media"],
author: Annotated[str, "The author of the message containing the media"],
timestamp: Annotated[str, "The timestamp of the message"],
nearby_messages: Annotated[str, "Messages sent before and after the media"],
ocr_text: Annotated[str, "Text extracted from the media via OCR"] = "",
detected_objects: Annotated[str, "Objects detected in the media"] = "",
)
Context for media enrichment prompts.
EnrichmentRuntimeConfig dataclass ¶
EnrichmentRuntimeConfig(
client: Annotated[object, "The Gemini client"],
output_dir: Annotated[Path, "The directory to save enriched data"],
model: Annotated[
str, "The Gemini model to use for enrichment"
] = ModelDefaults.ENRICHER,
)
Runtime configuration for enrichment operations.
PipelineEnrichmentConfig dataclass ¶
PipelineEnrichmentConfig(
batch_threshold: int = 10,
max_enrichments: int = 500,
enable_url: bool = True,
enable_media: bool = True,
)
Extended enrichment configuration for pipeline operations.
Extends basic enrichment config with pipeline-specific settings.
__post_init__ ¶
Validate configuration after initialization.
Source code in src/egregora/config/settings.py
from_cli_args classmethod ¶
Create config from CLI arguments.
Source code in src/egregora/config/settings.py
find_egregora_config ¶
Search upward for .egregora.toml.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_dir | Path | Starting directory for upward search | required |
site | str | None | Optional site identifier (reserved for future use) | None |
Returns:
| Type | Description |
|---|---|
Path | Path to config file if found |
Raises:
| Type | Description |
|---|---|
ConfigNotFoundError | If the config file cannot be found |
Source code in src/egregora/config/settings.py
load_egregora_config ¶
Load Egregora configuration from .egregora.toml.
Configuration priority (highest to lowest): 1. CLI (applied via from_cli_overrides later) 2. Environment variables (EGREGORA_SECTION__KEY) 3. Config file (.egregora.toml) 4. Defaults
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
site_root | Path | None | Root directory of the site. If None, uses current working directory. | None |
site | str | None | Optional site identifier to select from the sites mapping. | None |
Returns:
| Type | Description |
|---|---|
EgregoraConfig | Validated EgregoraConfig instance |
Raises:
| Type | Description |
|---|---|
ConfigValidationError | If config file contains invalid data |
ConfigNotFoundError | If the config file cannot be found and a default one is not created. |
Source code in src/egregora/config/settings.py
create_default_config ¶
Create default .egregora.toml and return it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
site_root | Path | Root directory of the site | required |
site | str | Site identifier to write under the sites mapping | DEFAULT_SITE_NAME |
Returns:
| Type | Description |
|---|---|
EgregoraConfig | EgregoraConfig with all defaults |
Source code in src/egregora/config/settings.py
save_egregora_config ¶
save_egregora_config(
config: EgregoraConfig, site_root: Path, *, site: str = DEFAULT_SITE_NAME
) -> Path
Save EgregoraConfig to .egregora.toml in site_root.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config | EgregoraConfig | EgregoraConfig instance to save | required |
site_root | Path | Root directory of the site | required |
site | str | Site identifier to write under the sites mapping | DEFAULT_SITE_NAME |
Returns:
| Type | Description |
|---|---|
Path | Path to the saved config file |
Source code in src/egregora/config/settings.py
parse_date_arg ¶
Parse a date string in YYYY-MM-DD format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
date_str | str | Date string in YYYY-MM-DD format | required |
_arg_name | str | Name of the argument (for error messages) | 'date' |
Returns:
| Type | Description |
|---|---|
date | date object in UTC |
Raises:
| Type | Description |
|---|---|
InvalidDateFormatError | If date_str is not in YYYY-MM-DD format |
Source code in src/egregora/config/settings.py
validate_timezone ¶
Validate timezone string and return ZoneInfo object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
timezone_str | str | Timezone identifier (e.g., 'America/New_York', 'UTC') | required |
Returns:
| Type | Description |
|---|---|
ZoneInfo | ZoneInfo object for the specified timezone |
Raises:
| Type | Description |
|---|---|
InvalidTimezoneError | If timezone_str is not a valid timezone identifier |