downloader.entsoe.merge_build_manual

downloader.entsoe.merge_build_manual(
    output_file='energy_load.csv',
    keep_forecast_future=False,
    raw_subdir=None,
)

Merge all raw CSV files from the ‘raw’ directory into a single interim file.

This function looks for all .csv files in get_data_home() / "raw" (or, when raw_subdir is given, in get_data_home() / "raw" / raw_subdir), merges them oldest-download-first (file mtime, filename as tiebreaker), and saves the unique combined data to get_data_home() / "interim" / output_file.

Timestamps covered by several raw files are collapsed cell-wise: per column, the newest non-missing value wins. A newer pull therefore revises older values, while a raw file that holds NaN for hours another pull has filled in (e.g. a pull made before ENTSO-E published a day’s Actual Load) can never mask those values in the interim file.

Namespacing raw files by raw_subdir keeps the day-ahead side-tables (renewable_forecast.csv, day_ahead_price.csv) from clobbering the load schema in energy_load.csv: the default glob is non-recursive, so load files in raw/ and renewable files in raw/renewable/ never mix.

Parameters

Name Type Description Default
output_file str The name of the combined output file. Defaults to “energy_load.csv”. 'energy_load.csv'
keep_forecast_future bool If False (the default), rows with a timestamp after the current UTC moment are dropped, so the interim file holds only data that is “actual” up to now – the correct, leakage-free input for model training. If True, those future rows are retained, which preserves ENTSO-E’s day-ahead Forecasted Load for tomorrow (future Actual Load is still NaN, so no future target leaks). Use True only for forecast-baseline / comparison workflows that need the published day-ahead forecast. False
raw_subdir Optional[str] Optional sub-directory under raw/ to merge instead of raw/ itself. Used by the day-ahead side-table downloaders (e.g. "renewable", "price"). Defaults to None. None

Raises

Name Type Description
FileNotFoundError If the raw directory does not exist.
ValueError If no valid CSV files are found for merging.

Notes

Individual raw CSV files that fail to parse are logged at ERROR level and skipped; the merge continues with the remaining files. When zero files parse successfully the function returns early without writing an output file.

Logging information can be selected by setting the log level for the spotforecast2_safe.downloader.entsoe logger. Common levels are DEBUG, INFO, WARNING, ERROR, and CRITICAL. The cell below shows the default (WARNING); change the level to INFO or DEBUG for more verbose output.

import logging
logging.getLogger("spotforecast2_safe.downloader.entsoe").setLevel(logging.WARNING)

Examples

from spotforecast2_safe.downloader.entsoe import merge_build_manual

# Merge with the default output filename
merge_build_manual()

# Or merge with a custom output filename
merge_build_manual(output_file="custom_energy_load.csv")

# Retain future rows so tomorrow's day-ahead `Forecasted Load`
# survives (forecast-baseline workflows only; not for training)
merge_build_manual(keep_forecast_future=True)