downloader.entsoe.assemble_zone_loads

downloader.entsoe.assemble_zone_loads(
    zones=None,
    output_file='energy_load_zones.csv',
    on_missing='raise',
)

Join the per-zone interim load files into one aligned, validated frame.

Reads each interim/zone_<col>.csv written by download_zone_loads, takes the actual-load column, and outer-joins the zones onto a single complete hourly UTC index (gaps surface as NaN rows). The combined frame is written to interim/<output_file> and returned. The columns are the zone keys, so their sum is the bottom-up total German load.

Fail-safe contract: with on_missing="raise" (the default), any missing hour in any zone raises ValueError (via spotforecast2_safe.preprocessing.checking.check_y) rather than being silently filled. Pass on_missing="passthrough" to return the frame with NaN left in place, so a downstream caller can opt into imputation explicitly (e.g. via the MultiTask impute step).

Parameters

Name Type Description Default
zones Optional[Dict[str, str]] Mapping of column name to Area identifier. When None, uses GERMAN_TSO_ZONES. None
output_file str Interim filename to write under interim/. Defaults to "energy_load_zones.csv". 'energy_load_zones.csv'
on_missing str "raise" (default) to reject any gap; "passthrough" to keep NaN for explicit downstream imputation. 'raise'

Returns

Name Type Description
pd.DataFrame The aligned 4-column load frame (index "Time (UTC)").

Raises

Name Type Description
ValueError If on_missing is unknown, if a per-zone file lacks its zone column, or (when on_missing="raise") if any zone has a gap.
FileNotFoundError If a per-zone interim file is missing.

Examples

import tempfile
from pathlib import Path

import pandas as pd

from spotforecast2_safe.downloader.entsoe import assemble_zone_loads

zones = {"load_a": "AREA_A", "load_b": "AREA_B"}
idx = pd.date_range("2023-01-01", periods=6, freq="h", tz="UTC")

with tempfile.TemporaryDirectory() as tmp:
    import os

    os.environ["SPOTFORECAST2_DATA"] = tmp
    interim = Path(tmp) / "interim"
    interim.mkdir(parents=True)
    for col, level in {"load_a": 100.0, "load_b": 50.0}.items():
        pd.DataFrame({col: level}, index=idx).rename_axis(
            "Time (UTC)"
        ).to_csv(interim / f"zone_{col}.csv")

    frame = assemble_zone_loads(zones=zones)
    total = frame.sum(axis=1)
    print(frame.columns.tolist())
    print(f"bottom-up total (first hour): {total.iloc[0]}")
    assert total.iloc[0] == 150.0
['load_a', 'load_b']
bottom-up total (first hour): 150.0