ENTSO-E Energy Forecasting Guide

This guide provides comprehensive examples for using spotforecast2 with ENTSO-E energy data. Examples are organized from beginner to advanced, with each code snippet backed by automated tests.

Prerequisites

Before running these examples, ensure you have:

  1. spotforecast2 installed: pip install spotforecast2
  2. An ENTSO-E API key (optional for training examples)

Tip “API Key Management”

Store your ENTSO-E API key in the ENTSOE_API_KEY environment variable to avoid passing it on every command:

export ENTSOE_API_KEY="your-api-key-here"
echo $ENTSOE_API_KEY
uv run spotforecast2-entsoe download 202301010000

Or set up a script with the following content:

#!/bin/zsh
export ENTSOE_API_KEY=your_api_key

Subcommands

# Download data from ENTSO-E
uv run spotforecast2-entsoe download --api-key YOUR_API_KEY 202301010000

# Train a model (lgbm or xgb)
uv run spotforecast2-entsoe train lgbm --force

# Generate predictions and plot (defaults to lgbm)
uv run spotforecast2-entsoe predict --plot

# Generate predictions with explicit model selection
uv run spotforecast2-entsoe predict lgbm --plot
uv run spotforecast2-entsoe predict xgb --plot

# Merge raw data files
uv run spotforecast2-entsoe merge

Download arguments and time format

The positional argument 202301010000 is a UTC timestamp in the format YYYYMMDDHHMM. It represents the start of the download window. You can provide either one timestamp (start only) or two timestamps (start and end).

# Start only (end defaults to now, UTC)
uv run spotforecast2-entsoe download 202301010000

# Start and end (UTC)
uv run spotforecast2-entsoe download 202301010000 202312312300

Hidden arguments and defaults for download:

  • –api-key or ENTSOE_API_KEY environment variable
  • –force to re-download even if files already exist
  • data home controlled by SPOTFORECAST2_DATA (default is ~/spotforecast2_data)

Configuration

The ENTSO-E task uses a configuration class that can be customized programmatically. All configuration parameters have sensible defaults but can be overridden when needed.

Using Default Configuration

from spotforecast2_safe.configurator import ConfigEntsoe

# Create default configuration instance
config = ConfigEntsoe()

# Access configuration values
print(config.country_code)  # 'DE'
print(config.predict_size)      # 24
print(config.train_size)        # Timedelta(days=1095)
DE
24
1095 days 00:00:00

Custom Configuration

from spotforecast2_safe.configurator import ConfigEntsoe
import pandas as pd

# Create custom configuration
custom_config = ConfigEntsoe(
    country_code='DE',
    predict_size=48,
    refit_size=14,
    train_size=pd.Timedelta(days=365),
    random_state=42
)

# Use in your code
print(custom_config.country_code)  # 'DE'
print(custom_config.predict_size)      # 48

Available Configuration Parameters

Parameter Type Default Description
country_code str “DE” ISO country code for ENTSO-E API
predict_size int 24 Number of hours to predict ahead
refit_size int 7 Number of days between model refits
train_size Timedelta 3 years Training data window
end_train_default str “2025-12-31 00:00+00:00” Default training end date
delta_val Timedelta 10 weeks Validation window size
random_state int 314159 Random seed for reproducibility
n_hyperparameters_trials int 20 Hyperparameter tuning trials
lags_consider List[int] [1..23] Lag values for features
periods List[Period] 5 periods Cyclical feature encodings

For more details, see the ConfigEntsoe API documentation.

Time intervals for download, training, prediction, validation, and testing

Download interval is defined by the start/end timestamps passed to the download command.

Training, prediction, validation, and testing intervals are configured via the ConfigEntsoe class. The CLI uses default configuration values which can be modified programmatically:

  • training end time: config.end_train_default (defaults to “2025-12-31 00:00+00:00”)
  • training window size: config.train_size (defaults to 3 years)
  • prediction window: config.predict_size * config.refit_size hours

Validation and testing are derived from the prediction window:

  • validation metrics use the first 24 hours of the prediction window
  • testing metrics use the full prediction window

Visualize Results

The prediction plot shows the following graphs:

  • Total system load (actual): The real-time electricity demand (consumption) within the bidding zone. This includes network losses but excludes consumption for pumped storage and generating auxiliaries.
  • Total system load (model prediction): The demand forecast generated by the spotforecast2 machine learning model (e.g., LightGBM or XGBoost) based on historical data and exogenous features.
  • Benchmark Forecast (e.g. ENTSOE): The reference forecast provided by the Transmission System Operators (TSOs) via the ENTSO-E Transparency Platform.
  • Actual (last week): The actual system load from exactly one week ago at the same time, which serves as a seasonal baseline comparison.

The prediction plot is saved as an HTML file named index.html in the data home directory. By default this is ~/spotforecast2_data/index.html or the path defined by SPOTFORECAST2_DATA.

# Default location on macOS/Linux
open ~/spotforecast2_data/index.html

# If you use a custom data home
open "$SPOTFORECAST2_DATA/index.html"

Check the CLI logs for the exact path (look for “Plot saved to …”).


Feature Engineering

Period Dataclass

Periods define cyclical time features using radial basis functions:

from spotforecast2_safe.data import Period

daily = Period(
    name='daily',
    n_periods=12,
    column='hour',
    input_range=(1, 24)
)
print(daily.name)        # 'daily'
print(daily.n_periods)   # 12

RepeatingBasisFunction

Transform time features into smooth cyclical encodings:

from spotforecast2_safe.preprocessing import RepeatingBasisFunction
import pandas as pd

rbf = RepeatingBasisFunction(
    n_periods=12,
    column='hour',
    input_range=(1, 24)
)

df = pd.DataFrame({'hour': range(1, 25)})
features = rbf.transform(df)
print(features.shape)  # (24, 12)

ExogBuilder

Build complete exogenous feature sets including holidays and weekends:

from spotforecast2_safe.preprocessing import ExogBuilder
from spotforecast2_safe.data import Period
import pandas as pd

periods = [
    Period(name='daily', n_periods=12, column='hour', input_range=(1, 24)),
    Period(name='weekly', n_periods=7, column='dayofweek', input_range=(0, 6)),
]

builder = ExogBuilder(periods=periods, country_code='DE')
X = builder.build(
    pd.Timestamp('2025-01-01', tz='UTC'),
    pd.Timestamp('2025-01-02', tz='UTC')
)
print(X.shape)  # (25, 21) - 12 + 7 + 2 (holiday, weekend)

Using ConfigEntsoe with ExogBuilder

Combine configuration and feature building:

from spotforecast2_safe.configurator import ConfigEntsoe
from spotforecast2_safe.preprocessing import ExogBuilder
import pandas as pd

config = ConfigEntsoe()
builder = ExogBuilder(
    periods=config.periods,
    country_code=config.country_code
)
X = builder.build(
    pd.Timestamp('2025-12-31', tz='UTC'),
    pd.Timestamp('2026-01-01', tz='UTC')
)
print(f"Generated {X.shape[1]} features for {X.shape[0]} hours")

Data Preprocessing

Linear Interpolation

Handle missing values in time series data:

from spotforecast2_safe.preprocessing import LinearlyInterpolateTS
import pandas as pd
import numpy as np

ts = pd.Series(
    [1.0, np.nan, 3.0, np.nan, 5.0],
    index=pd.date_range('2025-01-01', periods=5, freq='h')
)

interpolator = LinearlyInterpolateTS()
ts_clean = interpolator.fit_transform(ts)

print(ts_clean.values)  # [1.0, 2.0, 3.0, 4.0, 5.0]

Forecaster Models

The pipeline builds its forecasters through factory functions that take a ConfigEntsoe and return a ready-to-fit spotforecast2_safe.forecaster.recursive.ForecasterRecursive.

LightGBM Forecaster

The stock LightGBM factory lives in the safe package:

from spotforecast2_safe.configurator import ConfigEntsoe
from spotforecast2_safe.multitask.factories import default_lgbm_forecaster_factory

config = ConfigEntsoe()
forecaster = default_lgbm_forecaster_factory(config)

print(type(forecaster).__name__)
print("lags:", len(forecaster.lags))
ForecasterRecursive
lags: 23

XGBoost Forecaster

The XGBoost variant ships with the CLI module (xgboost is not a safe-package dependency):

from spotforecast2_safe.configurator import ConfigEntsoe

from spotforecast2.entsoe_cli import entsoe_xgb_factory

config = ConfigEntsoe()
forecaster = entsoe_xgb_factory(config)

print(type(forecaster.estimator).__name__)
XGBRegressor

Both factories honour config.random_state, config.lags_consider, and config.window_size; supply your own factory through config.forecaster_factory to customise further (see the Multitask tutorial).

Using the Python API (Notebooks & Quarto)

Full Prediction Pipeline

For users working in Jupyter Notebooks or Quarto, the entire ENTSO-E pipeline can be executed through MultiTask — the same path the CLI’s train and predict subcommands take. This approach is recommended for research as it gives precise control over time windows and hyperparameters.

import logging
import os

from spotforecast2_safe.configurator import ConfigEntsoe
from spotforecast2_safe.data.entsoe_loader import entsoe_data_loader
from spotforecast2_safe.data.fetch_data import get_cache_home
from spotforecast2_safe.downloader.entsoe import download_new_data
from spotforecast2_safe.multitask.factories import default_lgbm_forecaster_factory

from spotforecast2.multitask import MultiTask

# 1. Download data (optional, requires ENTSOE_API_KEY)
api_key = os.environ.get("ENTSOE_API_KEY")
if api_key:
    download_new_data(api_key=api_key, start="202301010000")

# 2. Wire the loader and factory into the config
config = ConfigEntsoe()
config.targets = ["Actual Load"]
config.agg_weights = [1.0]
config.bounds = [(-1e9, 1e9)]
config.data_loader = entsoe_data_loader
config.forecaster_factory = default_lgbm_forecaster_factory
config.data_frame_name = "entsoe-lgbm"

# 3. Run the five-step pipeline (task="defaults" trains; "predict" reuses
#    the saved model)
mt = MultiTask(
    config,
    task="defaults",
    cache_home=get_cache_home(config.cache_home),
    log_level=logging.ERROR,
)
mt.prepare_data()
mt.detect_outliers()
mt.impute()
mt.build_exogenous_features()
mt.run(show=True)

File Paths

Data Home Directory

Access the data storage location:

from spotforecast2_safe.data import get_data_home

data_home = get_data_home()
print(data_home)  # ~/spotforecast2_data or SPOTFORECAST2_DATA

CLI Commands

Download Data

# Download with API key
uv run spotforecast2-entsoe download --api-key YOUR_API_KEY 202301010000

# Download with date range
uv run spotforecast2-entsoe download 202301010000 202312312300

# Force re-download
uv run spotforecast2-entsoe download --force 202301010000

Train Models

# Train LightGBM model
uv run spotforecast2-entsoe train lgbm

# Train XGBoost model
uv run spotforecast2-entsoe train xgb

# Force retraining
uv run spotforecast2-entsoe train lgbm --force

Generate Predictions

# Predict with default model (lgbm)
uv run spotforecast2-entsoe predict

# Predict with specific model
uv run spotforecast2-entsoe predict lgbm
uv run spotforecast2-entsoe predict xgb

# Predict and generate plot
uv run spotforecast2-entsoe predict --plot

Merge Data Files

uv run spotforecast2-entsoe merge

Environment Variables

Variable Description
ENTSOE_API_KEY ENTSO-E API key for data downloads
SPOTFORECAST2_DATA Custom data directory (default: ~/spotforecast2_data)

Testing

All examples in this guide are validated by automated tests:

# Run documentation example tests
uv run pytest tests/test_docs_entsoe_examples.py -v

# Run all ENTSO-E tests
uv run pytest tests/test_tasks_entsoe.py -v

See Also