import tempfile
import warnings
import numpy as np
import pandas as pd
warnings.filterwarnings("ignore")
from spotforecast2_safe.configurator import ConfigEntsoe
from spotforecast2_safe.multitask.factories import (
default_lgbm_forecaster_factory as entsoe_lgbm_factory,
)
from spotforecast2.multitask.multi import MultiTask
CACHE_HOME = tempfile.mkdtemp()ENTSO-E single-target forecasting
This page walks through driving the ENTSO-E single-target pipeline directly through spotforecast2.multitask.multi.MultiTask. The publication-style “Approach 1-4” variants from the bart26b_spotforecast2 paper map onto the four task values:
| Publication approach | task= value |
|---|---|
| Approach 1 — Lazy fitting | "lazy" |
| Approach 2 — Training without tuning | "defaults" |
| Approach 3 — Optuna tuning | "optuna" |
| Approach 4 — SpotOptim tuning | "spotoptim" |
ConfigEntsoe plus two hook callables (config.data_loader and config.forecaster_factory) drive every variant. No per-approach wrapper classes are needed.
Setup
Synthetic ENTSO-E load curve
For this tutorial we synthesise a 30-day hourly load series with a daily sine bump and a weekly amplitude variation. In production, the download_new_data + merge_build_manual helpers from spotforecast2_safe.downloader.entsoe write the merged interim CSV that the real entsoe_data_loader reads.
def synthetic_entsoe_df(n_days: int = 30) -> pd.DataFrame:
n = n_days * 24
idx = pd.date_range("2024-01-01", periods=n, freq="h", tz="UTC")
idx.name = "Time (UTC)"
rng = np.random.default_rng(0)
hours = np.arange(n) % 24
days = np.arange(n) // 24
load = (
50_000
+ 10_000 * np.sin(hours * 2 * np.pi / 24)
+ 5_000 * np.sin(days * 2 * np.pi / 7)
+ rng.normal(0, 500, n)
)
return pd.DataFrame({"Actual Load": load}, index=idx)
load_df = synthetic_entsoe_df()
load_df.head()| Actual Load | |
|---|---|
| Time (UTC) | |
| 2024-01-01 00:00:00+00:00 | 50062.865111 |
| 2024-01-01 01:00:00+00:00 | 52522.138019 |
| 2024-01-01 02:00:00+00:00 | 55320.211325 |
| 2024-01-01 03:00:00+00:00 | 57123.517870 |
| 2024-01-01 04:00:00+00:00 | 58392.419351 |
Wiring the data loader
BaseTask.prepare_data calls config.data_loader(config) whenever no DataFrame is supplied. Here we wrap the synthetic DataFrame as a stub loader so the tutorial is self-contained.
def stub_loader(_config):
return load_df.copy()Approach 1 — Lazy fitting
task="lazy" fits with the factory’s defaults and, if a cached tuning result is available, applies it before fitting.
config_lazy = ConfigEntsoe(
targets=["Actual Load"],
agg_weights=[1.0],
bounds=[(-1e9, 1e9)],
index_name="Time (UTC)",
use_exogenous_features=False,
use_outlier_detection=False,
predict_size=12,
)
config_lazy.data_loader = stub_loader
config_lazy.forecaster_factory = entsoe_lgbm_factory
config_lazy.data_frame_name = "entsoe-tutorial"
mt_lazy = MultiTask(config_lazy, task="lazy", cache_home=CACHE_HOME, log_level=40)
mt_lazy.prepare_data()
mt_lazy.detect_outliers()
mt_lazy.impute()
mt_lazy.build_exogenous_features()
result_lazy = mt_lazy.run(show=False)
forecast_lazy = result_lazy["future_pred"].to_frame("forecast")
forecast_lazy.head()| forecast | |
|---|---|
| 2024-01-31 00:00:00+00:00 | 54603.707472 |
| 2024-01-31 01:00:00+00:00 | 57033.472936 |
| 2024-01-31 02:00:00+00:00 | 59698.707103 |
| 2024-01-31 03:00:00+00:00 | 62247.518716 |
| 2024-01-31 04:00:00+00:00 | 63519.573904 |
Approach 2 — Training without tuning
task="defaults" is the deterministic baseline: always trains with the factory defaults, never reads the tuning cache.
config_defaults = ConfigEntsoe(
targets=["Actual Load"],
agg_weights=[1.0],
bounds=[(-1e9, 1e9)],
index_name="Time (UTC)",
use_exogenous_features=False,
use_outlier_detection=False,
predict_size=12,
)
config_defaults.data_loader = stub_loader
config_defaults.forecaster_factory = entsoe_lgbm_factory
config_defaults.data_frame_name = "entsoe-tutorial"
mt_defaults = MultiTask(
config_defaults, task="defaults", cache_home=CACHE_HOME, log_level=40
)
mt_defaults.prepare_data()
mt_defaults.detect_outliers()
mt_defaults.impute()
mt_defaults.build_exogenous_features()
result_defaults = mt_defaults.run(show=False)
forecast_defaults = result_defaults["future_pred"].to_frame("forecast")
forecast_defaults.head()| forecast | |
|---|---|
| 2024-01-31 00:00:00+00:00 | 54603.707472 |
| 2024-01-31 01:00:00+00:00 | 57033.472936 |
| 2024-01-31 02:00:00+00:00 | 59698.707103 |
| 2024-01-31 03:00:00+00:00 | 62247.518716 |
| 2024-01-31 04:00:00+00:00 | 63519.573904 |
Approach 3 — Optuna tuning
task="optuna" performs Bayesian hyperparameter search via Optuna and trains with the best parameters. n_trials_optuna controls the search budget; we use a tiny value here so the tutorial renders fast.
config_optuna = ConfigEntsoe(
targets=["Actual Load"],
agg_weights=[1.0],
bounds=[(-1e9, 1e9)],
index_name="Time (UTC)",
use_exogenous_features=False,
use_outlier_detection=False,
predict_size=12,
)
config_optuna.data_loader = stub_loader
config_optuna.forecaster_factory = entsoe_lgbm_factory
config_optuna.data_frame_name = "entsoe-tutorial"
mt_optuna = MultiTask(
config_optuna,
task="optuna",
cache_home=CACHE_HOME,
log_level=40,
n_trials_optuna=2,
)
mt_optuna.prepare_data()
mt_optuna.detect_outliers()
mt_optuna.impute()
mt_optuna.build_exogenous_features()
result_optuna = mt_optuna.run(show=False)
forecast_optuna = result_optuna["future_pred"].to_frame("forecast")
forecast_optuna.head()| forecast | |
|---|---|
| 2024-01-31 00:00:00+00:00 | 51632.495336 |
| 2024-01-31 01:00:00+00:00 | 51632.495336 |
| 2024-01-31 02:00:00+00:00 | 51632.495336 |
| 2024-01-31 03:00:00+00:00 | 51632.495336 |
| 2024-01-31 04:00:00+00:00 | 51632.495336 |
Approach 4 — SpotOptim tuning
task="spotoptim" runs surrogate-model-based tuning via spotoptim.
config_spotoptim = ConfigEntsoe(
targets=["Actual Load"],
agg_weights=[1.0],
bounds=[(-1e9, 1e9)],
index_name="Time (UTC)",
use_exogenous_features=False,
use_outlier_detection=False,
predict_size=12,
)
config_spotoptim.data_loader = stub_loader
config_spotoptim.forecaster_factory = entsoe_lgbm_factory
config_spotoptim.data_frame_name = "entsoe-tutorial"
mt_spotoptim = MultiTask(
config_spotoptim,
task="spotoptim",
cache_home=CACHE_HOME,
log_level=40,
n_trials_spotoptim=3,
n_initial_spotoptim=2,
)
mt_spotoptim.prepare_data()
mt_spotoptim.detect_outliers()
mt_spotoptim.impute()
mt_spotoptim.build_exogenous_features()
result_spotoptim = mt_spotoptim.run(show=False)
forecast_spotoptim = result_spotoptim["future_pred"].to_frame("forecast")
forecast_spotoptim.head()`Forecaster` refitted using the best-found lags and parameters, and the whole data set:
Lags: [ 1 2 3 23 24 25 47 48 167 168 169 336]
Parameters: {'estimator__num_leaves': 31, 'estimator__max_depth': 3, 'estimator__learning_rate': 0.1, 'estimator__n_estimators': 100, 'estimator__bagging_fraction': 0.75, 'estimator__feature_fraction': 0.75, 'estimator__reg_alpha': 0.01, 'estimator__reg_lambda': 0.01}
Backtesting metric: 621.457817991302
| forecast | |
|---|---|
| 2024-01-31 00:00:00+00:00 | 54867.033573 |
| 2024-01-31 01:00:00+00:00 | 57176.592882 |
| 2024-01-31 02:00:00+00:00 | 59068.230580 |
| 2024-01-31 03:00:00+00:00 | 61610.352471 |
| 2024-01-31 04:00:00+00:00 | 63364.196157 |
Predicting from a saved model
After any training task, the fitted model is auto-saved to the project’s cache directory. task="predict" skips fitting and loads the latest saved model for each target. The cell below requires that at least one of the training cells above has executed in the same render session — any of them writes models into the shared CACHE_HOME/"entsoe-tutorial" path that predict loads from.
config_predict = ConfigEntsoe(
targets=["Actual Load"],
agg_weights=[1.0],
bounds=[(-1e9, 1e9)],
index_name="Time (UTC)",
use_exogenous_features=False,
use_outlier_detection=False,
predict_size=12,
)
config_predict.data_loader = stub_loader
config_predict.forecaster_factory = entsoe_lgbm_factory
config_predict.data_frame_name = "entsoe-tutorial"
mt_predict = MultiTask(
config_predict, task="predict", cache_home=CACHE_HOME, log_level=40
)
mt_predict.prepare_data()
mt_predict.detect_outliers()
mt_predict.impute()
mt_predict.build_exogenous_features()
result_predict = mt_predict.run(show=False)
forecast_predict = result_predict["future_pred"].to_frame("forecast")
forecast_predict.head()| forecast | |
|---|---|
| 2024-01-31 00:00:00+00:00 | 54867.033573 |
| 2024-01-31 01:00:00+00:00 | 57176.592882 |
| 2024-01-31 02:00:00+00:00 | 59068.230580 |
| 2024-01-31 03:00:00+00:00 | 61610.352471 |
| 2024-01-31 04:00:00+00:00 | 63364.196157 |
Putting it together
The spotforecast2-entsoe CLI drives MultiTask directly:
- Downloads the latest ENTSO-E data (
spotforecast2_safe.downloader.entsoe.download_new_data). - Merges the raw CSVs into the interim file (
merge_build_manual). - Constructs a
MultiTask(config, task="defaults", ...), runs all five pipeline steps, and saves the fitted model automatically. - On a schedule, constructs a
MultiTask(config, task="predict", ...), runs the five steps, and reads the saved model to produce the next forecast.
The same four task values are available to notebook callers — pick the one that matches the level of tuning the operational policy allows.