task_multi: Config-Driven Multi-Target Forecasting with MultiTask

What spotforecast2_safe.multitask provides, how ConfigMulti drives it, and a complete runnable example.

spotforecast2_safe.multitask is the config-driven orchestrator for multi-target time-series forecasting. It owns the complete pipeline — data preparation, outlier handling, imputation, exogenous features, training, prediction, persistence — and is driven by a single ConfigMulti object. The unrestricted sibling package spotforecast2 inherits this pipeline and adds only what is deliberately excluded here: hyperparameter tuning (Optuna, SpotOptim) and interactive plotting.

Before version 16.0.0 this pipeline existed twice: once in the sibling package and once, in procedural form, behind the n-to-1 task’s 18 keyword arguments and a hard-coded weight list. Both paths are now one implementation in this package, and the dependency between the siblings is strictly one-way: spotforecast2 imports from spotforecast2-safe, never the reverse.

Vocabulary

Definition 1 (MultiTask) The task dispatcher of the multitask package. A MultiTask instance is constructed from a ConfigMulti and a DataFrame, prepared with the four pipeline stages (prepare_data, detect_outliers, impute, build_exogenous_features), and executed with run(task=...), where task selects one of the available task modes.

Definition 2 (N-to-1 aggregation) The reduction of per-target forecasts to a single series as a weighted sum, with weights taken from ConfigMulti.agg_weights in target order. Equal weights are used when agg_weights is None.

Task modes

The safe package ships four task modes:

Mode	What it does
`lazy`	Fit one `ForecasterRecursive` per target with LightGBM defaults, applying cached tuning results when present.
`defaults`	Same fit, but ignoring any tuning cache — fully deterministic baseline.
`predict`	Load previously saved models and predict without retraining.
`clean`	Remove the pipeline’s cache directory (models, tuning results, logs).

The tuning modes optuna and spotoptim exist only in spotforecast2; requesting them here raises an explicit ValueError (see Fail-safe behaviour).

A complete worked example

Example 1 (Synthetic data and a minimal configuration) Two hourly target series over four weeks, a named DatetimeIndex matching ConfigMulti.index_name (default "DateTime"), and a configuration with the expensive options disabled so the example runs in seconds and offline.

import tempfile
import warnings

import numpy as np
import pandas as pd

from spotforecast2_safe.configurator.config_multi import ConfigMulti

warnings.filterwarnings("ignore")

rng = np.random.default_rng(0)
n = 24 * 28  # 4 weeks, hourly
idx = pd.date_range("2023-01-01", periods=n, freq="h", tz="UTC")
idx.name = "DateTime"
df = pd.DataFrame(
    {
        "a": 100 + 10 * np.sin(np.arange(n) * 2 * np.pi / 24) + rng.normal(0, 2, n),
        "b": 200 + 20 * np.cos(np.arange(n) * 2 * np.pi / 24) + rng.normal(0, 4, n),
    },
    index=idx,
)

cache = tempfile.mkdtemp()
cfg = ConfigMulti(
    predict_size=6,                 # forecast horizon: 6 hours
    agg_weights=[1.0, -1.0],        # n-to-1 combination: a - b
    use_exogenous_features=False,   # offline example: no weather/calendar
    use_outlier_detection=False,
    auto_save_models=True,          # persist models for the predict mode below
    number_folds=2,
    random_state=42,
    verbose=False,
)
df.tail(3)

	a	b
DateTime
2023-01-28 21:00:00+00:00	93.591621	219.025208
2023-01-28 22:00:00+00:00	94.698887	220.480717
2023-01-28 23:00:00+00:00	97.692551	224.620083

Example 2 (Running the pipeline) The four stages chain (each returns the task), then run fits and predicts. The returned aggregated package carries the combined forecast under "future_pred"; the per-target packages live in task.results.

from spotforecast2_safe.multitask import MultiTask

mt = MultiTask(cfg, dataframe=df, cache_home=cache)
result = (
    mt.prepare_data()
      .detect_outliers()
      .impute()
      .build_exogenous_features()
      .run(task="defaults")
)
result["future_pred"]

2023-01-29 00:00:00+00:00   -117.612059
2023-01-29 01:00:00+00:00   -120.726869
2023-01-29 02:00:00+00:00   -117.511861
2023-01-29 03:00:00+00:00   -107.689754
2023-01-29 04:00:00+00:00   -103.317560
2023-01-29 05:00:00+00:00    -95.314405
Freq: h, dtype: float64

Example 3 (The aggregation is exactly the configured weighted sum) Definition 2 can be verified directly against the per-target forecasts:

pred_a = mt.results["defaults"]["a"]["future_pred"]
pred_b = mt.results["defaults"]["b"]["future_pred"]
manual = 1.0 * pred_a + (-1.0) * pred_b

print("max |aggregated - manual| =", float((result["future_pred"] - manual).abs().max()))

max |aggregated - manual| = 0.0

Example 4 (Train once, predict many times) With auto_save_models=True the fitted forecasters were persisted under cache_home. A later predict run loads them instead of retraining — the production pattern for scheduled forecasts:

mt2 = MultiTask(cfg, dataframe=df, cache_home=cache)
mt2.prepare_data().detect_outliers().impute().build_exogenous_features()
reloaded = mt2.run(task="predict")
reloaded["future_pred"]

2023-01-29 00:00:00+00:00   -117.612059
2023-01-29 01:00:00+00:00   -120.726869
2023-01-29 02:00:00+00:00   -117.511861
2023-01-29 03:00:00+00:00   -107.689754
2023-01-29 04:00:00+00:00   -103.317560
2023-01-29 05:00:00+00:00    -95.314405
Freq: h, dtype: float64

Example 5 (Determinism) Same input, same configuration, bit-identical output — a hard requirement of this package, enforced by the test suite and demonstrable here with a fresh instance in a fresh cache directory:

mt3 = MultiTask(cfg, dataframe=df.copy(), cache_home=tempfile.mkdtemp())
rerun = (
    mt3.prepare_data()
       .detect_outliers()
       .impute()
       .build_exogenous_features()
       .run(task="defaults")
)
pd.testing.assert_series_equal(result["future_pred"], rerun["future_pred"], check_exact=True)
print("bit-identical:", result["future_pred"].equals(rerun["future_pred"]))

bit-identical: True

Fail-safe behaviour

Invalid requests raise immediately instead of degrading silently. Requesting a tuning mode in the safe package names the package that provides it:

try:
    mt.run(task="spotoptim")
except ValueError as err:
    print(err)

Task 'spotoptim' requires auto-tuning, which is not available in spotforecast2-safe. Use the spotforecast2 package, or task='lazy'/'defaults'.

The same policy applies to unexpected keyword arguments (TypeError instead of silent dropping) and to plotting: MultiTask.plot_with_outliers() raises NotImplementedError because no plotting library is permitted in this package.

Scaling up from the toy example

For a real run, switch the feature machinery on instead of off: use_exogenous_features=True with include_holiday_features, include_holiday_adjacency_features (bridge days), and include_weather_windows adds calendar, holiday, day/night, weather, and polynomial-interaction covariates before training. Weather features require network access; on_weather_failure keeps its fail-safe default "raise" unless you explicitly opt into "skip".

Upgrade path: the same config in spotforecast2

The unrestricted sibling subclasses this pipeline and re-adds tuning and plotting. The configuration object travels unchanged:

# spotforecast2 (not installable here — one-way dependency)
from spotforecast2.multitask import MultiTask

mt = MultiTask(cfg, dataframe=df, task="spotoptim")
mt.prepare_data().detect_outliers().impute().build_exogenous_features()
mt.run(show=True)   # hyperparameter search + interactive figures

Note

The dependency between the packages is strictly one-way: spotforecast2 imports from spotforecast2-safe, never the reverse. That is why the cell above is a listing rather than executed code — this documentation builds in an environment where spotforecast2 is, by design, absent.

Where to go next

API reference: MultiTask, BaseTask, ConfigMulti, runner.run.