ENTSO-E single-target forecasting

This page walks through driving the ENTSO-E single-target pipeline directly through spotforecast2.multitask.multi.MultiTask. The publication-style “Approach 1-4” variants from the bart26b_spotforecast2 paper map onto the four task values:

Publication approach task= value
Approach 1 — Lazy fitting "lazy"
Approach 2 — Training without tuning "defaults"
Approach 3 — Optuna tuning "optuna"
Approach 4 — SpotOptim tuning "spotoptim"

ConfigEntsoe plus two hook callables (config.data_loader and config.forecaster_factory) drive every variant. No per-approach wrapper classes are needed.

Setup

import tempfile
import warnings

import numpy as np
import pandas as pd

warnings.filterwarnings("ignore")

from spotforecast2_safe.configurator import ConfigEntsoe
from spotforecast2_safe.multitask.factories import (
    default_lgbm_forecaster_factory as entsoe_lgbm_factory,
)

from spotforecast2.multitask.multi import MultiTask

CACHE_HOME = tempfile.mkdtemp()

Synthetic ENTSO-E load curve

For this tutorial we synthesise a 30-day hourly load series with a daily sine bump and a weekly amplitude variation. In production, the download_new_data + merge_build_manual helpers from spotforecast2_safe.downloader.entsoe write the merged interim CSV that the real entsoe_data_loader reads.

def synthetic_entsoe_df(n_days: int = 30) -> pd.DataFrame:
    n = n_days * 24
    idx = pd.date_range("2024-01-01", periods=n, freq="h", tz="UTC")
    idx.name = "Time (UTC)"
    rng = np.random.default_rng(0)
    hours = np.arange(n) % 24
    days = np.arange(n) // 24
    load = (
        50_000
        + 10_000 * np.sin(hours * 2 * np.pi / 24)
        + 5_000 * np.sin(days * 2 * np.pi / 7)
        + rng.normal(0, 500, n)
    )
    return pd.DataFrame({"Actual Load": load}, index=idx)


load_df = synthetic_entsoe_df()
load_df.head()
Actual Load
Time (UTC)
2024-01-01 00:00:00+00:00 50062.865111
2024-01-01 01:00:00+00:00 52522.138019
2024-01-01 02:00:00+00:00 55320.211325
2024-01-01 03:00:00+00:00 57123.517870
2024-01-01 04:00:00+00:00 58392.419351

Wiring the data loader

BaseTask.prepare_data calls config.data_loader(config) whenever no DataFrame is supplied. Here we wrap the synthetic DataFrame as a stub loader so the tutorial is self-contained.

def stub_loader(_config):
    return load_df.copy()

Approach 1 — Lazy fitting

task="lazy" fits with the factory’s defaults and, if a cached tuning result is available, applies it before fitting.

config_lazy = ConfigEntsoe(
    targets=["Actual Load"],
    agg_weights=[1.0],
    bounds=[(-1e9, 1e9)],
    index_name="Time (UTC)",
    use_exogenous_features=False,
    use_outlier_detection=False,
    predict_size=12,
)
config_lazy.data_loader = stub_loader
config_lazy.forecaster_factory = entsoe_lgbm_factory
config_lazy.data_frame_name = "entsoe-tutorial"

mt_lazy = MultiTask(config_lazy, task="lazy", cache_home=CACHE_HOME, log_level=40)
mt_lazy.prepare_data()
mt_lazy.detect_outliers()
mt_lazy.impute()
mt_lazy.build_exogenous_features()
result_lazy = mt_lazy.run(show=False)

forecast_lazy = result_lazy["future_pred"].to_frame("forecast")
forecast_lazy.head()
forecast
2024-01-31 00:00:00+00:00 54603.707472
2024-01-31 01:00:00+00:00 57033.472936
2024-01-31 02:00:00+00:00 59698.707103
2024-01-31 03:00:00+00:00 62247.518716
2024-01-31 04:00:00+00:00 63519.573904

Approach 2 — Training without tuning

task="defaults" is the deterministic baseline: always trains with the factory defaults, never reads the tuning cache.

config_defaults = ConfigEntsoe(
    targets=["Actual Load"],
    agg_weights=[1.0],
    bounds=[(-1e9, 1e9)],
    index_name="Time (UTC)",
    use_exogenous_features=False,
    use_outlier_detection=False,
    predict_size=12,
)
config_defaults.data_loader = stub_loader
config_defaults.forecaster_factory = entsoe_lgbm_factory
config_defaults.data_frame_name = "entsoe-tutorial"

mt_defaults = MultiTask(
    config_defaults, task="defaults", cache_home=CACHE_HOME, log_level=40
)
mt_defaults.prepare_data()
mt_defaults.detect_outliers()
mt_defaults.impute()
mt_defaults.build_exogenous_features()
result_defaults = mt_defaults.run(show=False)

forecast_defaults = result_defaults["future_pred"].to_frame("forecast")
forecast_defaults.head()
forecast
2024-01-31 00:00:00+00:00 54603.707472
2024-01-31 01:00:00+00:00 57033.472936
2024-01-31 02:00:00+00:00 59698.707103
2024-01-31 03:00:00+00:00 62247.518716
2024-01-31 04:00:00+00:00 63519.573904

Approach 3 — Optuna tuning

task="optuna" performs Bayesian hyperparameter search via Optuna and trains with the best parameters. n_trials_optuna controls the search budget; we use a tiny value here so the tutorial renders fast.

config_optuna = ConfigEntsoe(
    targets=["Actual Load"],
    agg_weights=[1.0],
    bounds=[(-1e9, 1e9)],
    index_name="Time (UTC)",
    use_exogenous_features=False,
    use_outlier_detection=False,
    predict_size=12,
)
config_optuna.data_loader = stub_loader
config_optuna.forecaster_factory = entsoe_lgbm_factory
config_optuna.data_frame_name = "entsoe-tutorial"

mt_optuna = MultiTask(
    config_optuna,
    task="optuna",
    cache_home=CACHE_HOME,
    log_level=40,
    n_trials_optuna=2,
)
mt_optuna.prepare_data()
mt_optuna.detect_outliers()
mt_optuna.impute()
mt_optuna.build_exogenous_features()
result_optuna = mt_optuna.run(show=False)

forecast_optuna = result_optuna["future_pred"].to_frame("forecast")
forecast_optuna.head()
forecast
2024-01-31 00:00:00+00:00 51632.495336
2024-01-31 01:00:00+00:00 51632.495336
2024-01-31 02:00:00+00:00 51632.495336
2024-01-31 03:00:00+00:00 51632.495336
2024-01-31 04:00:00+00:00 51632.495336

Approach 4 — SpotOptim tuning

task="spotoptim" runs surrogate-model-based tuning via spotoptim.

config_spotoptim = ConfigEntsoe(
    targets=["Actual Load"],
    agg_weights=[1.0],
    bounds=[(-1e9, 1e9)],
    index_name="Time (UTC)",
    use_exogenous_features=False,
    use_outlier_detection=False,
    predict_size=12,
)
config_spotoptim.data_loader = stub_loader
config_spotoptim.forecaster_factory = entsoe_lgbm_factory
config_spotoptim.data_frame_name = "entsoe-tutorial"

mt_spotoptim = MultiTask(
    config_spotoptim,
    task="spotoptim",
    cache_home=CACHE_HOME,
    log_level=40,
    n_trials_spotoptim=3,
    n_initial_spotoptim=2,
)
mt_spotoptim.prepare_data()
mt_spotoptim.detect_outliers()
mt_spotoptim.impute()
mt_spotoptim.build_exogenous_features()
result_spotoptim = mt_spotoptim.run(show=False)

forecast_spotoptim = result_spotoptim["future_pred"].to_frame("forecast")
forecast_spotoptim.head()
`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [  1   2   3  23  24  25  47  48 167 168 169 336] 
  Parameters: {'estimator__num_leaves': 31, 'estimator__max_depth': 3, 'estimator__learning_rate': 0.1, 'estimator__n_estimators': 100, 'estimator__bagging_fraction': 0.75, 'estimator__feature_fraction': 0.75, 'estimator__reg_alpha': 0.01, 'estimator__reg_lambda': 0.01}
  Backtesting metric: 621.457817991302
forecast
2024-01-31 00:00:00+00:00 54867.033573
2024-01-31 01:00:00+00:00 57176.592882
2024-01-31 02:00:00+00:00 59068.230580
2024-01-31 03:00:00+00:00 61610.352471
2024-01-31 04:00:00+00:00 63364.196157

Predicting from a saved model

After any training task, the fitted model is auto-saved to the project’s cache directory. task="predict" skips fitting and loads the latest saved model for each target. The cell below requires that at least one of the training cells above has executed in the same render session — any of them writes models into the shared CACHE_HOME/"entsoe-tutorial" path that predict loads from.

config_predict = ConfigEntsoe(
    targets=["Actual Load"],
    agg_weights=[1.0],
    bounds=[(-1e9, 1e9)],
    index_name="Time (UTC)",
    use_exogenous_features=False,
    use_outlier_detection=False,
    predict_size=12,
)
config_predict.data_loader = stub_loader
config_predict.forecaster_factory = entsoe_lgbm_factory
config_predict.data_frame_name = "entsoe-tutorial"

mt_predict = MultiTask(
    config_predict, task="predict", cache_home=CACHE_HOME, log_level=40
)
mt_predict.prepare_data()
mt_predict.detect_outliers()
mt_predict.impute()
mt_predict.build_exogenous_features()
result_predict = mt_predict.run(show=False)

forecast_predict = result_predict["future_pred"].to_frame("forecast")
forecast_predict.head()
forecast
2024-01-31 00:00:00+00:00 54867.033573
2024-01-31 01:00:00+00:00 57176.592882
2024-01-31 02:00:00+00:00 59068.230580
2024-01-31 03:00:00+00:00 61610.352471
2024-01-31 04:00:00+00:00 63364.196157

Putting it together

The spotforecast2-entsoe CLI drives MultiTask directly:

  1. Downloads the latest ENTSO-E data (spotforecast2_safe.downloader.entsoe.download_new_data).
  2. Merges the raw CSVs into the interim file (merge_build_manual).
  3. Constructs a MultiTask(config, task="defaults", ...), runs all five pipeline steps, and saves the fitted model automatically.
  4. On a schedule, constructs a MultiTask(config, task="predict", ...), runs the five steps, and reads the saved model to produce the next forecast.

The same four task values are available to notebook callers — pick the one that matches the level of tuning the operational policy allows.