Data Model
PVGIS Data Model Architecture: YAML to Pydantic¶
1. Overview¶
A three-layer system showing the transformation pipeline from YAML → Consolidated Definitions → Runtime Pydantic Models
2. YAML to Python Transformation¶
- Why YAML? Domain-driven design rationale
- The complete transformation pipeline with code examples
- Step-by-step process from file loading to model generation
3. Comprehensive Pros & Cons Analysis¶
Advantages: Maintainability, reusability, type safety, flexibility, versioning
Disadvantages: Build complexity, runtime errors, IDE limitations, learning curve
4. Flexibility and Difficulty Assessment¶
Flexibility aspects: - Compositional inheritance via require: directive - Conditional output structures based on verbosity - Union types and computed dependencies
Difficulty challenges (rated 🔴 HIGH, 🟡 MEDIUM): - Understanding require chains (HIGH) - Debugging merge conflicts (HIGH) - Type resolution mapping (MEDIUM) - Circular dependency prevention (MEDIUM)
5. Three Complete Examples¶
Example 1: Atomic Attribute (ValueAttribute)¶
The simplest building block - a reusable value field
Example 2: Complete Data Model (SolarAltitude)¶
Shows full inheritance chain: - Inherits from data_model_template - Adds solar algorithms - Includes atmospheric properties - Defines output structure - Final model has 20+ fields from 5+ parent sources
Example 3: Output Structure Definition¶
SolarIrradianceCoreOutputStructureElement - defines what appears in output sections
6. The Factory Systems¶
Definition Factory (Build-Time)¶
-build_python_data_models() - aggregates all YAML files - resolve_requires() - recursive parent resolution - merge_dictionaries() - smart merging with deduplication DataModelFactory (Runtime)¶
- Dynamic Pydantic class generation usingtype() - Type mapping (YAML string → Python type) - Custom __hash__, __eq__, __getattr__ methods - Caching for performance ContextBuilder (Output Generation)¶
- Reads output structure definitions - Evaluates conditional sections - Generates nested output dictionaries7. Command Lifecycle for PVGIS¶
Complete workflow from API request → model generation → calculation → structured output
8. Additional Components¶
- File organization patterns
- Property functions (computed attributes)
- Array manipulation methods
- Validation strategies
9. What Else to Document?¶
Suggestions for future documentation: - Testing strategy - Performance optimization - Migration and versioning - Error handling patterns - Development workflow checklists - Advanced YAML patterns
Key Insights¶
When This Approach Excels: - 50+ similar data models - Cross-disciplinary teams (scientists + developers) - Rapidly changing domain requirements - Complex structured output needs
Trade-off: Higher complexity justified by flexibility and domain expert empowerment