downloader.entsoe.build_zone_qc_frame

downloader.entsoe.build_zone_qc_frame(zones=None, *, data_home=None)

Build a bottom-up QC frame from per-zone interim CSVs.

Reads each interim/zone_<col>.csv written by download_zone_loads, outer-joins the actual-load columns, and appends two aggregate columns:

Fail-safe contract: a missing interim file raises FileNotFoundError naming the path rather than silently skipping the zone. No fallback, no substitution.

Parameters

Name Type Description Default
zones Optional[Dict[str, str]] Mapping of column name to entsoe-py Area identifier. When None, uses GERMAN_TSO_ZONES. None
data_home Optional[Union[Path, str]] Override for the data home directory (a Path-like or str). When None, get_data_home() is used (reads the SPOTFORECAST2_DATA environment variable). None

Returns

Name Type Description
pd.DataFrame A pd.DataFrame with:
pd.DataFrame * One column per zone containing the per-zone actual load values.
pd.DataFrame * An "Actual Load" column with the bottom-up total (NaN if any zone is missing for that timestamp).
pd.DataFrame * A "Forecasted Load" column with the sum of per-zone forecasts if all zones provide a forecast column, else all-NaN.
pd.DataFrame * Index: UTC datetimes parsed from the "Time (UTC)" CSV column, sorted ascending.

Raises

Name Type Description
FileNotFoundError If a per-zone interim file does not exist.

Notes

Files written by successful zones in a partial collect run are not rolled back. Call build_zone_qc_frame with the subset of succeeded zones to avoid triggering the FileNotFoundError guard for zones whose download failed.

Examples

import os
import tempfile
from pathlib import Path

import pandas as pd

from spotforecast2_safe.downloader.entsoe import build_zone_qc_frame

zones = {"load_a": "AREA_A", "load_b": "AREA_B"}
idx = pd.date_range("2023-01-01", periods=4, freq="h", tz="UTC")

with tempfile.TemporaryDirectory() as tmp:
    os.environ["SPOTFORECAST2_DATA"] = tmp
    interim = Path(tmp) / "interim"
    interim.mkdir(parents=True)
    # Write synthetic per-zone CSVs with actual and forecast columns.
    for col, (actual, forecast) in {
        "load_a": (100.0, 105.0),
        "load_b": (50.0, 52.0),
    }.items():
        pd.DataFrame(
            {col: actual, f"{col}_forecast": forecast}, index=idx
        ).rename_axis("Time (UTC)").to_csv(interim / f"zone_{col}.csv")

    qc = build_zone_qc_frame(zones=zones)
    print(qc.columns.tolist())
    print(f"Actual Load (first row): {qc['Actual Load'].iloc[0]}")
    print(f"Forecasted Load (first row): {qc['Forecasted Load'].iloc[0]}")
    assert qc["Actual Load"].iloc[0] == 150.0
    assert qc["Forecasted Load"].iloc[0] == 157.0
['load_a', 'load_b', 'Actual Load', 'Forecasted Load']
Actual Load (first row): 150.0
Forecasted Load (first row): 157.0