data.entsoe_loader

data.entsoe_loader

ENTSO-E interim-CSV data loaders.

Config-driven loaders for the merged ENTSO-E interim CSV, suitable for the data_loader / test_data_loader hooks on ConfigEntsoe. Ported from spotforecast2.tasks.task_entsoe ahead of that subpackage’s removal.

Functions

Name Description
entsoe_data_loader Read the merged interim ENTSO-E CSV that config.data_filename points at.
entsoe_test_data_loader Return the merged ENTSO-E CSV sliced to the forecast horizon.

entsoe_data_loader

data.entsoe_loader.entsoe_data_loader(config)

Read the merged interim ENTSO-E CSV that config.data_filename points at.

Parameters

Name Type Description Default
config ConfigEntsoe A ConfigEntsoe with data_filename set. Relative paths are resolved against spotforecast2_safe.data.fetch_data.get_data_home. required

Returns

Name Type Description
pd.DataFrame DataFrame indexed by the ENTSO-E timestamp column (Time (UTC))
pd.DataFrame with the load columns as data columns.

Raises

Name Type Description
FileNotFoundError If the merged CSV does not exist. Run spotforecast2-entsoe download and merge first.

Examples

import os
import tempfile

import pandas as pd
from spotforecast2_safe.configurator import ConfigEntsoe
from spotforecast2_safe.data.entsoe_loader import entsoe_data_loader

# Build a tiny synthetic interim CSV in a temp directory.
tmp = tempfile.mkdtemp()
csv_path = os.path.join(tmp, "energy_load.csv")
idx = pd.date_range(
    "2025-01-01", periods=48, freq="h", tz="UTC", name="Time (UTC)"
)
pd.DataFrame({"Actual Load": range(48)}, index=idx).to_csv(csv_path)

# Absolute path bypasses get_data_home; loader returns the full frame.
config = ConfigEntsoe()
config.data_filename = csv_path
df = entsoe_data_loader(config)

print(df.shape)
assert df.shape == (48, 1)
assert df.index.name == "Time (UTC)"
(48, 1)

entsoe_test_data_loader

data.entsoe_loader.entsoe_test_data_loader(config)

Return the merged ENTSO-E CSV sliced to the forecast horizon.

The slice spans (end_train, end_train + predict_size * 1 h] so that build_prediction_package’s test_actual = ts.reindex(future_pred.index) matches the hourly forecast row-for-row. end_train is taken from config.end_train_default (treated as the inclusive last training timestamp, the same convention the forecaster uses), and the step is assumed to be 1 h after the pipeline’s hourly resampling.

For the live ENTSO-E exemplar with end_train_default = D-2 23:00 UTC and predict_size = 24, this returns the rows for [D-1 00:00, D 00:00) — i.e., y_{-1}. For backtests at an arbitrary end_train_default, it returns the post-cutoff window the model is actually predicting, rather than always “yesterday in wall-clock UTC”.

Parameters

Name Type Description Default
config ConfigEntsoe A ConfigEntsoe with data_filename, end_train_default, and predict_size set; the merged interim CSV must already contain data covering the forecast horizon (run spotforecast2-entsoe download first). required

Returns

Name Type Description
pd.DataFrame DataFrame indexed by Time (UTC) with the rows the forecast will be
pd.DataFrame scored against.

Examples

import os
import tempfile

import pandas as pd
from spotforecast2_safe.configurator import ConfigEntsoe
from spotforecast2_safe.data.entsoe_loader import entsoe_test_data_loader

# Synthetic interim CSV spanning the forecast window.
tmp = tempfile.mkdtemp()
csv_path = os.path.join(tmp, "energy_load.csv")
idx = pd.date_range(
    "2025-12-29 00:00", periods=120, freq="h", tz="UTC", name="Time (UTC)"
)
pd.DataFrame({"Actual Load": range(120)}, index=idx).to_csv(csv_path)

config = ConfigEntsoe()
config.data_filename = csv_path
config.end_train_default = "2025-12-31 00:00+00:00"
config.predict_size = 24

test_df = entsoe_test_data_loader(config)

# The slice covers exactly predict_size hourly steps after end_train.
print(test_df.shape)
assert test_df.shape == (24, 1)
assert test_df.index[0] == pd.Timestamp("2025-12-31 01:00", tz="UTC")
(24, 1)