Skip to content

Data Model

PVGIS Data Model Architecture: YAML to Pydantic

1. Overview

A three-layer system showing the transformation pipeline from YAML → Consolidated Definitions → Runtime Pydantic Models

2. YAML to Python Transformation

  • Why YAML? Domain-driven design rationale
  • The complete transformation pipeline with code examples
  • Step-by-step process from file loading to model generation

3. Comprehensive Pros & Cons Analysis

Advantages: Maintainability, reusability, type safety, flexibility, versioning
Disadvantages: Build complexity, runtime errors, IDE limitations, learning curve

4. Flexibility and Difficulty Assessment

Flexibility aspects: - Compositional inheritance via require: directive - Conditional output structures based on verbosity - Union types and computed dependencies

Difficulty challenges (rated 🔴 HIGH, 🟡 MEDIUM): - Understanding require chains (HIGH) - Debugging merge conflicts (HIGH) - Type resolution mapping (MEDIUM) - Circular dependency prevention (MEDIUM)

5. Three Complete Examples

Example 1: Atomic Attribute (ValueAttribute)

The simplest building block - a reusable value field

Example 2: Complete Data Model (SolarAltitude)

Shows full inheritance chain: - Inherits from data_model_template - Adds solar algorithms - Includes atmospheric properties - Defines output structure - Final model has 20+ fields from 5+ parent sources

Example 3: Output Structure Definition

SolarIrradianceCoreOutputStructureElement - defines what appears in output sections

6. The Factory Systems

Definition Factory (Build-Time)

python build_definitions.py  definitions.py
- build_python_data_models() - aggregates all YAML files - resolve_requires() - recursive parent resolution - merge_dictionaries() - smart merging with deduplication
DataModelFactory (Runtime)

Model = DataModelFactory.get_data_model("SolarAltitude", definitions)
- Dynamic Pydantic class generation using type() - Type mapping (YAML string → Python type) - Custom __hash__, __eq__, __getattr__ methods - Caching for performance
ContextBuilder (Output Generation)

builder.populate_context(model, verbose=2)
- Reads output structure definitions - Evaluates conditional sections - Generates nested output dictionaries

7. Command Lifecycle for PVGIS

Complete workflow from API request → model generation → calculation → structured output

8. Additional Components

  • File organization patterns
  • Property functions (computed attributes)
  • Array manipulation methods
  • Validation strategies

9. What Else to Document?

Suggestions for future documentation: - Testing strategy - Performance optimization - Migration and versioning - Error handling patterns - Development workflow checklists - Advanced YAML patterns

Key Insights

When This Approach Excels: - 50+ similar data models - Cross-disciplinary teams (scientists + developers) - Rapidly changing domain requirements - Complex structured output needs

Trade-off: Higher complexity justified by flexibility and domain expert empowerment