multitask.MultiTask

multitask.MultiTask(
    config=None,
    *,
    task='lazy',
    dataframe=None,
    data_test=None,
    cache_home=None,
    dry_run=False,
    show_progress=False,
    log_level=logging.INFO,
    **overrides,
)

Orchestrates a multi-target time-series forecasting pipeline.

Data must be provided either as a pandas DataFrame via dataframe. A test dataset can optionally be provided via data_test.

The typical usage flow is:

Instantiate with config (or omit to auto-construct ConfigMulti()).
Call method prepare_data to load, resample, and validate data.
Call method detect_outliers to apply hard bounds and IsolationForest.
Call method impute to fill gaps.
Call method build_exogenous_features to construct weather / calendar / day-night / holiday covariates.
Call method run (or individual run_task_* methods) to train, predict, and aggregate.

Parameters

Name	Type	Description	Default
config	Optional[`PipelineConfig`]	A `PipelineConfig`-conforming object (e.g. `ConfigMulti` or `ConfigEntsoe`). When `None`, a fresh `ConfigMulti()` is constructed with default fields.	`None`
task	str	Pipeline task mode — `"lazy"`, `"defaults"`, `"optuna"`, `"spotoptim"`, `"predict"`, or `"clean"`. Defaults to `"lazy"`.	`'lazy'`
dataframe	Optional[pd.DataFrame]	Pre-loaded input DataFrame with training data. The DataFrame must contain a datetime column matching `config.index_name` plus at least one numeric target column. Optional for the `"clean"` task, required for all others.	`None`
data_test	Optional[pd.DataFrame]	Pre-loaded input DataFrame with test data. Optional.	`None`
cache_home	Optional[Path]	Cache directory override. When not `None`, replaces `config.cache_home` for this task instance.	`None`
dry_run	bool	If `True`, do not clean cache or save models.	`False`
show_progress	bool	Whether to print progress messages during pipeline execution.	`False`
log_level	int	Logging level for the pipeline logger.	`logging.INFO`
**overrides	Any	Forwarded to `config.set_params(**overrides)` — a convenience for one-line tweaks without building a fresh config. Mutates the caller’s config object.	`{}`

Examples

import pandas as pd
from spotforecast2.multitask import MultiTask
from spotforecast2_safe.configurator.config_multi import ConfigMulti
from spotforecast2_safe.data.fetch_data import fetch_data, get_package_data_home

data_home = get_package_data_home()
df = fetch_data(filename=str(data_home / "demo10.csv"))

mt = MultiTask(ConfigMulti(predict_size=24), dataframe=df)
print(f"DataFrame stored: {mt._dataframe is not None}")
print(f"Task: {mt.TASK}")

DataFrame stored: True
Task: lazy

Methods

Name	Description
agg_predictor	Aggregate per-target prediction packages into a weighted forecast.
build_exogenous_features	Build, combine, encode, and merge exogenous feature covariates.
create_forecaster	Create a fresh forecaster for the given target.
cv_ts	Build a `TimeSeriesFold` for cross-validation.
detect_outliers	Apply hard-bound filtering and IsolationForest outlier detection.
impute	Fill missing values using the configured imputation strategy.
load_models	Load the most recent fitted models from the cache directory.
load_tuning_results	Load the most recent tuning results for a target from cache.
log_summary	Log a summary of the current pipeline configuration.
plot_with_outliers	Visualise original vs. cleaned data with outlier markers.
prepare_data	Load, resample, validate, and configure the pipeline data.
run	Run the task specified by `task` (or `self.TASK`).
run_task_clean	Remove all cached data from the pipeline cache directory.
run_task_defaults	Defaults fitting — no tuning, no cached params.
run_task_lazy	Lazy Fitting with default LightGBM parameters.
run_task_optuna	Optuna Bayesian hyperparameter tuning.
run_task_predict	Predict-only using previously saved models.
run_task_spotoptim	SpotOptim surrogate-model Bayesian tuning.
save_models	Save fitted forecaster models to the cache directory.
save_tuning_results	Save tuning results (best parameters and lags) to a JSON file.

agg_predictor

multitask.MultiTask.agg_predictor(results, targets, weights)

Aggregate per-target prediction packages into a weighted forecast.

Delegates to the module-level agg_predictor function. Available as an instance method so that subclasses can override the aggregation strategy when needed.

Parameters

Name	Type	Description	Default
results	Dict[str, Dict[str, Any]]	Mapping of target name to prediction package (as returned by `build_prediction_package`).	required
targets	List[str]	Ordered list of target names to include.	required
weights	List[float]	Per-target aggregation weights aligned with `targets`.	required

Returns

Name	Type	Description
	Dict[str, Any]	Aggregated prediction package dict.

Examples

import tempfile
import numpy as np
import pandas as pd
from spotforecast2_safe.multitask import LazyTask
from spotforecast2_safe.configurator.config_multi import ConfigMulti

rng = np.random.default_rng(0)
idx_train = pd.date_range("2023-01-01", periods=48, freq="h", tz="UTC")
idx_future = pd.date_range("2023-01-03", periods=6, freq="h", tz="UTC")

def _pkg(train_val, future_val):
    return {
        "train_actual": pd.Series(np.full(48, train_val), index=idx_train),
        "train_pred": pd.Series(np.full(48, train_val * 0.99), index=idx_train),
        "future_pred": pd.Series(np.full(6, future_val), index=idx_future),
        "future_actual": pd.Series(dtype="float64"),
    }

with tempfile.TemporaryDirectory() as tmp:
    cfg = ConfigMulti(cache_home=tmp, verbose=False)
    task = LazyTask(cfg)
    results = {"wind": _pkg(100.0, 110.0), "solar": _pkg(200.0, 210.0)}
    agg = task.agg_predictor(results, ["wind", "solar"], [0.4, 0.6])
    print(f"Weighted future_pred: {agg['future_pred'].iloc[0]:.1f}")

Weighted future_pred: 170.0

build_exogenous_features

multitask.MultiTask.build_exogenous_features()

Build, combine, encode, and merge exogenous feature covariates.

This is step 4-7 of the pipeline (run after prepare_data, detect_outliers, and impute). It assembles the full exogenous-covariate matrix that the forecaster consumes, then merges it onto the target data. The orchestration proceeds in order:

4a — Weather, via get_weather_features (Open-Meteo). The response is parquet-cached only when config.cache_home is set. Fetch failures are handled per config.on_weather_failure: "raise" re-raises WeatherFetchError; "skip" logs a warning and continues with an empty weather frame (fail-safe).
4b — Calendar features, via get_calendar_features.
4c — Day/night (solar) features, via get_day_night_features (computed with astral from config.latitude / config.longitude).
4d — Holiday features, via get_holiday_features for config.country_code / config.state.
5 — The four frames are concatenated along the columns and any residual gaps are back- then forward-filled. Provider-based exogenous columns are then appended via build_providers_from_config (requires spotforecast2-safe >= 15.7.0). The active providers are governed by the config flags include_covid_infection_rate, include_entsoe_forecast_load, include_entsoe_renewable_forecast, include_entsoe_net_load, and include_entsoe_day_ahead_price. Cyclical (sine/cosine) encoding is then applied via apply_cyclical_encoding, and degree-config.poly_features_degree interaction terms are added via create_interaction_features. When the degree is at least 2, the polynomial columns are ranked by mutual information with the primary target and capped to config.max_poly_features via select_top_poly_features.
6 — The training feature set is chosen via select_exogenous_features, with provider columns appended (order-preserving, de-duplicated).
7 — Targets and covariates are merged via merge_data_and_covariates into self.data_with_exog and the forecast-horizon covariates self.exo_pred.

When config.use_exogenous_features is False the method is a no-op and returns self immediately, leaving the pipeline target-only.

Attributes

Name	Type	Description
weather_aligned	pd.DataFrame	Weather frame aligned to the pipeline index, reused by the interaction and selection steps.
zone_weather_aligned	Dict[str, pd.DataFrame]	Per-zone weather frames keyed by target name, indexed over `[data_start, cov_end]` (covering the forecast horizon). Populated only when `config.per_zone_weather` is True and every zone fetch succeeded; empty otherwise (including the fail-safe “skip” degradation). Consumed at the per-target seam in `_get_target_data` to overwrite the shared weather columns.
exogenous_features	pd.DataFrame	Full combined, encoded, and capped exogenous feature matrix.
exog_feature_names	List[str]	Names of the exogenous features selected for training (including provider columns).
data_with_exog	pd.DataFrame	Target data merged with the selected exogenous covariates.
exo_pred	pd.DataFrame	Exogenous covariates spanning the forecast horizon, supplied to the forecaster at predict time.

Returns

Name	Type	Description
	`BaseTask`	`self` (for method chaining).

Raises

Name	Type	Description
	RuntimeError	If `prepare_data` has not been called.
	`WeatherFetchError`	If the Open-Meteo fetch fails and `config.on_weather_failure == "raise"`.

Examples

With exogenous features disabled the method is a no-op, so the example below runs without any network access and leaves the pipeline target-only.

import tempfile
import pandas as pd
import numpy as np
from spotforecast2_safe.multitask import MultiTask
from spotforecast2_safe.configurator.config_multi import ConfigMulti

rng = np.random.default_rng(0)
idx = pd.date_range("2023-01-01", periods=24 * 14, freq="h", tz="UTC")
df = pd.DataFrame({"a": rng.normal(100, 10, len(idx))}, index=idx)
df.index.name = "DateTime"

with tempfile.TemporaryDirectory() as tmp:
    cfg = ConfigMulti(
        predict_size=6,
        use_exogenous_features=False,
        use_outlier_detection=False,
        cache_home=tmp,
    )
    mt = MultiTask(cfg, dataframe=df)
    mt.prepare_data().detect_outliers().impute().build_exogenous_features()
    print(f"Exogenous features used: {mt.config.use_exogenous_features}")
    print(f"Selected exog feature names: {mt.exog_feature_names}")

Exogenous features used: False
Selected exog feature names: []

create_forecaster

multitask.MultiTask.create_forecaster(target=None)

Create a fresh forecaster for the given target.

Delegates to config.forecaster_factory when set; otherwise falls back to default_lgbm_forecaster_factory. This factory hook lets callers swap the estimator without subclassing BaseTask.

Parameters

Name	Type	Description	Default
target	Optional[str]	Optional target column name. Forwarded to the factory so that custom factories can specialise per target.	`None`

Returns

Name	Type	Description
	Any	A new, unfitted forecaster instance.

Examples

import tempfile
from pathlib import Path
from spotforecast2_safe.multitask import LazyTask
from spotforecast2_safe.configurator.config_multi import ConfigMulti

with tempfile.TemporaryDirectory() as tmp:
    cfg = ConfigMulti(
        predict_size=6,
        use_exogenous_features=False,
        cache_home=Path(tmp),
    )
    task = LazyTask(cfg)
    forecaster = task.create_forecaster()
print(f"Type: {type(forecaster).__name__}")
print(f"Lags: {forecaster.lags}")

Type: ForecasterRecursive
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]

cv_ts

multitask.MultiTask.cv_ts(y_train)

Build a TimeSeriesFold for cross-validation.

Constructs the cross-validation splitter used by all tuning tasks. Internally uses sklearn.model_selection.TimeSeriesSplit to compute split boundaries that respect temporal ordering and avoid data leakage between folds.

The validation boundary is determined by run_state.end_train_ts minus config.delta_val. When config.train_size is set, the sklearn splitter uses a sliding fixed-size training window (max_train_size); otherwise an expanding window is used.

Parameters

Name	Type	Description	Default
y_train	pd.Series	Training time series for the current target. Used both to determine the validation boundary and as the sequence passed to `TimeSeriesSplit.split` to derive `initial_train_size`.	required

Returns

Name	Type	Description
	`TimeSeriesFold`	A configured `TimeSeriesFold` instance ready to be passed to
	`TimeSeriesFold`	a model-selection function.

Examples

import tempfile
import numpy as np
import pandas as pd
from spotforecast2_safe.multitask import MultiTask
from spotforecast2_safe.configurator.config_multi import ConfigMulti

rng = np.random.default_rng(0)
idx = pd.date_range("2023-01-01", periods=24 * 14, freq="h", tz="UTC")
df = pd.DataFrame({"a": rng.normal(100, 10, len(idx))}, index=idx)
df.index.name = "DateTime"

with tempfile.TemporaryDirectory() as tmp:
    cfg = ConfigMulti(
        predict_size=6,
        use_exogenous_features=False,
        use_outlier_detection=False,
        cache_home=tmp,
        number_folds=2,
        auto_save_models=False,
        verbose=False,
    )
    mt = MultiTask(cfg, dataframe=df)
    mt.prepare_data().detect_outliers().impute().build_exogenous_features()
    y_train = mt.df_pipeline["a"]
    cv = mt.cv_ts(y_train)
    print(f"TimeSeriesFold steps: {cv.steps}")
    print(f"initial_train_size: {cv.initial_train_size}")

TimeSeriesFold steps: 6
initial_train_size: 324

detect_outliers

multitask.MultiTask.detect_outliers()

Apply hard-bound filtering and IsolationForest outlier detection.

Hard bounds from config.bounds are applied to the pipeline data (out-of-bound values are removed and later filled by impute()). IsolationForest detection (config.use_outlier_detection) is advisory: detected outliers are logged per column but not removed.

Returns

Name	Type	Description
	`BaseTask`	`self` (for method chaining).

Raises

Name	Type	Description
	RuntimeError	If method `prepare_data` has not been called.

Examples

import tempfile
import numpy as np
import pandas as pd
from spotforecast2_safe.multitask import MultiTask
from spotforecast2_safe.configurator.config_multi import ConfigMulti

rng = np.random.default_rng(0)
idx = pd.date_range("2023-01-01", periods=24 * 14, freq="h", tz="UTC")
df = pd.DataFrame({"a": rng.normal(100, 10, len(idx))}, index=idx)
df.index.name = "DateTime"

with tempfile.TemporaryDirectory() as tmp:
    cfg = ConfigMulti(
        predict_size=6,
        use_exogenous_features=False,
        use_outlier_detection=False,
        cache_home=tmp,
        auto_save_models=False,
        verbose=False,
    )
    mt = MultiTask(cfg, dataframe=df)
    mt.prepare_data()
    mt.detect_outliers()
    print(f"Pipeline shape: {mt.df_pipeline.shape}")
    assert mt.df_pipeline_original is not None

Pipeline shape: (336, 1)

impute

multitask.MultiTask.impute()

Fill missing values using the configured imputation strategy.

Returns

Name	Type	Description
	`BaseTask`	`self` (for method chaining).

Raises

Name	Type	Description
	RuntimeError	If method `prepare_data` has not been called.

Examples

import tempfile
import numpy as np
import pandas as pd
from spotforecast2_safe.multitask import MultiTask
from spotforecast2_safe.configurator.config_multi import ConfigMulti

rng = np.random.default_rng(0)
idx = pd.date_range("2023-01-01", periods=24 * 14, freq="h", tz="UTC")
values = rng.normal(100, 10, len(idx))
values[10:13] = float("nan")  # inject a few gaps
df = pd.DataFrame({"a": values}, index=idx)
df.index.name = "DateTime"

with tempfile.TemporaryDirectory() as tmp:
    cfg = ConfigMulti(
        predict_size=6,
        use_exogenous_features=False,
        use_outlier_detection=False,
        cache_home=tmp,
        auto_save_models=False,
        verbose=False,
    )
    mt = MultiTask(cfg, dataframe=df)
    mt.prepare_data().detect_outliers().impute()
    missing = mt.df_pipeline["a"].isna().sum()
    print(f"Missing values after imputation: {missing}")
    assert missing == 0

Missing values after imputation: 0

load_models

multitask.MultiTask.load_models(task_name=None, target=None, max_age_days=None)

Load the most recent fitted models from the cache directory.

Scans <cache_home>/models/<data_frame_name>/ for .joblib files matching the current data_frame_name. Optionally filters by task_name, target, and max_age_days.

Parameters

Name	Type	Description	Default
task_name	Optional[str]	If given, only load models from this task (`"lazy"`, `"defaults"`, `"optuna"`, or `"spotoptim"`). `None` accepts any task.	`None`
target	Optional[str]	If given, only load the model for this target column. `None` loads the most recent model for every target found.	`None`
max_age_days	Optional[float]	Maximum age in days. Models older than this are ignored. `None` accepts any age.	`None`

Returns

Name	Type	Description
	Dict[str, Any]	Mapping `{target: forecaster}` of loaded model objects.
	Dict[str, Any]	Empty dict if no matching models were found.

Examples

import tempfile
from pathlib import Path
from spotforecast2_safe.multitask import LazyTask
from spotforecast2_safe.configurator.config_multi import ConfigMulti

with tempfile.TemporaryDirectory() as tmp:
    cfg = ConfigMulti(
        data_frame_name="demo",
        cache_home=Path(tmp),
        verbose=False,
    )
    task = LazyTask(cfg)
    # Save a dummy object, then load it back.
    dummy_forecaster = {"lags": [1, 2, 24]}
    task.save_models(
        task_name="lazy",
        forecasters={"load": dummy_forecaster},
    )
    loaded = task.load_models(task_name="lazy")
    print(f"Loaded targets: {list(loaded.keys())}")
    assert loaded["load"]["lags"] == [1, 2, 24]

Loaded targets: ['load']

load_tuning_results

multitask.MultiTask.load_tuning_results(
    target,
    task_name=None,
    max_age_days=None,
)

Load the most recent tuning results for a target from cache.

Scans <cache_home>/tuning_results/ for files matching the current data_frame_name and target. Optionally filters by task_name and discards results older than max_age_days.

Parameters

Name	Type	Description	Default
target	str	Name of the forecast target column.	required
task_name	Optional[str]	If given, only consider results from this tuning algorithm (e.g. `"optuna"` or `"spotoptim"`). `None` accepts any algorithm.	`None`
max_age_days	Optional[float]	Maximum age in days. Results older than this are ignored. `None` accepts any age.	`None`

Returns

Name	Type	Description
	Optional[Dict[str, Any]]	A dictionary with keys `best_params`, `best_lags`,
	Optional[Dict[str, Any]]	`task_name`, `target`, `data_frame_name`, and
	Optional[Dict[str, Any]]	`timestamp`; or `None` if no matching file was found.

Examples

import tempfile
from pathlib import Path
from spotforecast2_safe.multitask import LazyTask
from spotforecast2_safe.configurator.config_multi import ConfigMulti

with tempfile.TemporaryDirectory() as tmp:
    cfg = ConfigMulti(data_frame_name="demo10", cache_home=Path(tmp))
    task = LazyTask(cfg)
    task.save_tuning_results(
        target="target_0",
        task_name="optuna",
        best_params={"n_estimators": 100},
        best_lags=24,
    )
    result = task.load_tuning_results(target="target_0")
    print(result["best_params"])

{'n_estimators': 100}

log_summary

multitask.MultiTask.log_summary()

Log a summary of the current pipeline configuration.

Examples

import tempfile
import numpy as np
import pandas as pd
from spotforecast2_safe.multitask import MultiTask
from spotforecast2_safe.configurator.config_multi import ConfigMulti

rng = np.random.default_rng(0)
idx = pd.date_range("2023-01-01", periods=24 * 14, freq="h", tz="UTC")
df = pd.DataFrame({"a": rng.normal(100, 10, len(idx))}, index=idx)
df.index.name = "DateTime"

with tempfile.TemporaryDirectory() as tmp:
    cfg = ConfigMulti(
        predict_size=6,
        use_exogenous_features=False,
        use_outlier_detection=False,
        cache_home=tmp,
        auto_save_models=False,
        verbose=False,
    )
    mt = MultiTask(cfg, dataframe=df)
    mt.prepare_data().detect_outliers().impute().build_exogenous_features()
    # log_summary writes to the pipeline logger; call it to confirm
    # it runs without error.
    mt.log_summary()
    print("log_summary completed without error")

log_summary completed without error

plot_with_outliers

multitask.MultiTask.plot_with_outliers()

Visualise original vs. cleaned data with outlier markers.

Raises

Name	Type	Description
	RuntimeError	If `detect_outliers` has not been called.

Examples

import tempfile
import numpy as np
import pandas as pd
from spotforecast2_safe.configurator.config_multi import ConfigMulti
from spotforecast2.multitask import LazyTask

rng = np.random.default_rng(0)
idx = pd.date_range("2023-01-01", periods=24 * 14, freq="h", tz="UTC")
df = pd.DataFrame({"load": rng.normal(100, 10, len(idx))}, index=idx)
df.index.name = "DateTime"

with tempfile.TemporaryDirectory() as tmp:
    cfg = ConfigMulti(
        predict_size=6,
        use_exogenous_features=False,
        use_outlier_detection=False,
        bounds=[(50, 150)],
        auto_save_models=False,
        cache_home=tmp,
    )
    task = LazyTask(cfg, dataframe=df)
    task.prepare_data().detect_outliers()
    task.plot_with_outliers()

prepare_data

multitask.MultiTask.prepare_data(demo_data=None, df_test=None)

Load, resample, validate, and configure the pipeline data.

Uses the following precedence for the training data:

demo_data argument (if provided).
self._dataframe set via the constructor.

Similarly for test data:

df_test argument (if provided).
self.data_test set via the constructor.
self.config.test_data_loader(self.config) if set.

Parameters

Name	Type	Description	Default
demo_data	Optional[pd.DataFrame]	Pre-loaded input DataFrame. When `None`, the constructor `dataframe` is used.	`None`
df_test	Optional[pd.DataFrame]	Pre-loaded test DataFrame. When `None`, the constructor `data_test` is used, then `config.test_data_loader`.	`None`

Returns

Name	Type	Description
	`BaseTask`	`self` (for method chaining).

Raises

Name	Type	Description
	ValueError	If no data source is available (no `demo_data`, no constructor `dataframe`).

Examples

import tempfile
import pandas as pd
import numpy as np
from spotforecast2_safe.multitask import MultiTask
from spotforecast2_safe.configurator.config_multi import ConfigMulti

rng = np.random.default_rng(0)
idx = pd.date_range("2023-01-01", periods=24 * 14, freq="h", tz="UTC")
df = pd.DataFrame({"a": rng.normal(100, 10, len(idx))}, index=idx)
df.index.name = "DateTime"

with tempfile.TemporaryDirectory() as tmp:
    cfg = ConfigMulti(
        predict_size=6,
        use_exogenous_features=False,
        use_outlier_detection=False,
        cache_home=tmp,
    )
    mt = MultiTask(cfg, dataframe=df)
    mt.prepare_data()
    print(f"Pipeline shape: {mt.df_pipeline.shape}")
    print(f"Targets: {mt.run_state.targets}")

Pipeline shape: (336, 1)
Targets: ['a']

run

multitask.MultiTask.run(task=None, show=True, **kwargs)

Run the task specified by task (or self.TASK).

Parameters

Name	Type	Description	Default
task	Optional[str]	Override the task mode. `None` uses `self.TASK`.	`None`
show	bool	If `True`, display prediction figures.	`True`

Returns

Name	Type	Description
	Dict[str, Any]	Aggregated prediction package. Per-target results are stored
	Dict[str, Any]	on `self.results[<task_key>]`.

Raises

Name	Type	Description
	ValueError	If `task` is not one of `"lazy"`, `"defaults"`, `"optuna"`, `"spotoptim"`, `"predict"`, `"clean"`.
	RuntimeError	If method `prepare_data` has not been called (for training and prediction tasks).

Examples

import warnings
import tempfile
warnings.filterwarnings("ignore")
from spotforecast2_safe.data.fetch_data import fetch_data, get_package_data_home
from spotforecast2_safe.configurator.config_multi import ConfigMulti
from spotforecast2.multitask import MultiTask

data_home = get_package_data_home()
df = fetch_data(filename=str(data_home / "demo10.csv")).iloc[:500]

config = ConfigMulti(
    predict_size=12,
    targets=["A"],
    lags_consider=[1, 2, 3],
    window_size=4,
    number_folds=2,
    use_exogenous_features=False,
    use_outlier_detection=False,
    auto_save_models=False,
    verbose=False,
)
config.cache_home = tempfile.mkdtemp()

# run() dispatches to run_task_lazy when task="lazy".
mt = MultiTask(config, task="lazy", dataframe=df, show_progress=False)
mt.prepare_data()
mt.impute()
result = mt.run(task="lazy", show=False)
print("Result keys:", list(result.keys())[:4])
assert "future_pred" in result

WeightFunction: all sample weights for the requested index are zero (the window falls entirely within gap-penalty zones). Returning None so ForecasterRecursive uses uniform weighting.

Result keys: ['train_actual', 'train_pred', 'future_actual', 'future_pred']

run_task_clean

multitask.MultiTask.run_task_clean(show=True, dry_run=False, cache_home=None)

Remove all cached data from the pipeline cache directory.

Does not require prepare_data() to be called first.

Parameters

Name	Type	Description	Default
show	bool	Accepted for API consistency. Not used by the clean task.	`True`
dry_run	bool	If `True`, report what would be deleted without actually removing anything.	`False`
cache_home	Optional[Path]	Override the directory to clean. `None` uses the cache directory configured on this instance.	`None`

Returns

Name	Type	Description
	Dict[str, Any]	Dict with keys status, cache_dir, and deleted_items.

Raises

Name	Type	Description
	RuntimeError	If the cache directory cannot be removed.

Examples

import warnings
import tempfile
warnings.filterwarnings("ignore")
from spotforecast2_safe.data.fetch_data import fetch_data, get_package_data_home
from spotforecast2_safe.configurator.config_multi import ConfigMulti
from spotforecast2.multitask import MultiTask

data_home = get_package_data_home()
df = fetch_data(filename=str(data_home / "demo10.csv")).iloc[:500]
cache_dir = tempfile.mkdtemp()

config = ConfigMulti(
    predict_size=12,
    targets=["A"],
    lags_consider=[1, 2, 3],
    window_size=4,
    number_folds=2,
    use_exogenous_features=False,
    use_outlier_detection=False,
    auto_save_models=False,
    verbose=False,
)
config.cache_home = cache_dir

# dry_run=True reports what would be removed without deleting.
mt = MultiTask(config, task="clean", dataframe=df, show_progress=False)
result = mt.run_task_clean(dry_run=True)
print("status:", result["status"])
assert result["status"] == "dry_run"

[clean] Dry run — would delete: /tmp/tmpun7efvks
  Would remove: logging
status: dry_run

run_task_defaults

multitask.MultiTask.run_task_defaults(show=True)

Defaults fitting — no tuning, no cached params.

Distinct from run_task_lazy only in that it never consults the tuning-result cache. Use this for deterministic baselines and for ENTSO-E “Approach 2: Training without Tuning”.

Parameters

Name	Type	Description	Default
show	bool	If `True`, display prediction figures.	`True`

Returns

Name	Type	Description
	Dict[str, Any]	Aggregated prediction package. Per-target results in
	Dict[str, Any]	`self.results["defaults"]`.

Examples

import warnings
import tempfile
warnings.filterwarnings("ignore")
from spotforecast2_safe.data.fetch_data import fetch_data, get_package_data_home
from spotforecast2_safe.configurator.config_multi import ConfigMulti
from spotforecast2.multitask import MultiTask

data_home = get_package_data_home()
df = fetch_data(filename=str(data_home / "demo10.csv")).iloc[:500]

config = ConfigMulti(
    predict_size=12,
    targets=["A"],
    lags_consider=[1, 2, 3],
    window_size=4,
    number_folds=2,
    use_exogenous_features=False,
    use_outlier_detection=False,
    auto_save_models=False,
    verbose=False,
)
config.cache_home = tempfile.mkdtemp()

mt = MultiTask(config, task="defaults", dataframe=df, show_progress=False)
mt.prepare_data()
mt.impute()
result = mt.run_task_defaults(show=False)
print("Result keys:", list(result.keys())[:4])
assert "future_pred" in result

WeightFunction: all sample weights for the requested index are zero (the window falls entirely within gap-penalty zones). Returning None so ForecasterRecursive uses uniform weighting.

Result keys: ['train_actual', 'train_pred', 'future_actual', 'future_pred']

run_task_lazy

multitask.MultiTask.run_task_lazy(show=True)

Lazy Fitting with default LightGBM parameters.

Parameters

Name	Type	Description	Default
show	bool	If `True`, display prediction figures.	`True`

Returns

Name	Type	Description
	Dict[str, Any]	Aggregated prediction package. Per-target results in
	Dict[str, Any]	`self.results["lazy"]`.

Examples

import warnings
import tempfile
warnings.filterwarnings("ignore")
from spotforecast2_safe.data.fetch_data import fetch_data, get_package_data_home
from spotforecast2_safe.configurator.config_multi import ConfigMulti
from spotforecast2.multitask import MultiTask

data_home = get_package_data_home()
df = fetch_data(filename=str(data_home / "demo10.csv")).iloc[:500]

config = ConfigMulti(
    predict_size=12,
    targets=["A"],
    lags_consider=[1, 2, 3],
    window_size=4,
    number_folds=2,
    use_exogenous_features=False,
    use_outlier_detection=False,
    auto_save_models=False,
    verbose=False,
)
config.cache_home = tempfile.mkdtemp()

mt = MultiTask(config, task="lazy", dataframe=df, show_progress=False)
mt.prepare_data()
mt.impute()
result = mt.run_task_lazy(show=False)
print("Result keys:", list(result.keys())[:4])
assert "future_pred" in result

WeightFunction: all sample weights for the requested index are zero (the window falls entirely within gap-penalty zones). Returning None so ForecasterRecursive uses uniform weighting.

Result keys: ['train_actual', 'train_pred', 'future_actual', 'future_pred']

run_task_optuna

multitask.MultiTask.run_task_optuna(
    search_space=None,
    show=True,
    show_progress=False,
)

Optuna Bayesian hyperparameter tuning.

Parameters

Name	Type	Description	Default
search_space	Optional[Callable]	Callable `(trial) -> dict`.	`None`
show	bool	If `True`, display prediction figures.	`True`

Returns

Name	Type	Description
	Dict[str, Any]	Aggregated prediction package. Per-target results in
	Dict[str, Any]	`self.results["optuna"]`.

Examples

import warnings
import tempfile
warnings.filterwarnings("ignore")
from spotforecast2_safe.data.fetch_data import fetch_data, get_package_data_home
from spotforecast2_safe.configurator.config_multi import ConfigMulti
from spotforecast2.multitask import MultiTask

data_home = get_package_data_home()
df = fetch_data(filename=str(data_home / "demo10.csv")).iloc[:500]

config = ConfigMulti(
    predict_size=12,
    targets=["A"],
    lags_consider=[1, 2, 3],
    window_size=4,
    number_folds=2,
    use_exogenous_features=False,
    use_outlier_detection=False,
    auto_save_models=False,
    verbose=False,
    n_trials_optuna=2,
)
config.cache_home = tempfile.mkdtemp()

mt = MultiTask(config, task="optuna", dataframe=df, show_progress=False)
mt.prepare_data()
mt.impute()
result = mt.run_task_optuna(show=False)
print("Result keys:", list(result.keys())[:4])
assert "future_pred" in result

WeightFunction: all sample weights for the requested index are zero (the window falls entirely within gap-penalty zones). Returning None so ForecasterRecursive uses uniform weighting.
WeightFunction: all sample weights for the requested index are zero (the window falls entirely within gap-penalty zones). Returning None so ForecasterRecursive uses uniform weighting.
WeightFunction: all sample weights for the requested index are zero (the window falls entirely within gap-penalty zones). Returning None so ForecasterRecursive uses uniform weighting.
WeightFunction: all sample weights for the requested index are zero (the window falls entirely within gap-penalty zones). Returning None so ForecasterRecursive uses uniform weighting.

Result keys: ['train_actual', 'train_pred', 'future_actual', 'future_pred']

run_task_predict

multitask.MultiTask.run_task_predict(
    show=True,
    task_name=None,
    max_age_days=None,
)

Predict-only using previously saved models.

Loads fitted models from the cache directory and produces predictions without any training. Raises RuntimeError if no saved models are found.

Parameters

Name	Type	Description	Default
show	bool	If `True`, display prediction figures.	`True`
task_name	Optional[str]	Restrict model loading to a specific source task (`"lazy"`, `"optuna"`, or `"spotoptim"`). `None` loads the most recent model regardless of source.	`None`
max_age_days	Optional[float]	Maximum age in days for saved models. `None` accepts any age.	`None`

Returns

Name	Type	Description
	Dict[str, Any]	Aggregated prediction package. Per-target results in
	Dict[str, Any]	`self.results["predict"]`.

Raises

Name	Type	Description
	RuntimeError	If no saved models are found.

Examples

import warnings
import tempfile
warnings.filterwarnings("ignore")
from spotforecast2_safe.data.fetch_data import fetch_data, get_package_data_home
from spotforecast2_safe.configurator.config_multi import ConfigMulti
from spotforecast2.multitask import MultiTask

data_home = get_package_data_home()
df = fetch_data(filename=str(data_home / "demo10.csv")).iloc[:500]
cache_dir = tempfile.mkdtemp()

# First train and save a model with the lazy task.
config_train = ConfigMulti(
    predict_size=12,
    targets=["A"],
    lags_consider=[1, 2, 3],
    window_size=4,
    number_folds=2,
    use_exogenous_features=False,
    use_outlier_detection=False,
    auto_save_models=True,
    verbose=False,
)
config_train.cache_home = cache_dir
mt_train = MultiTask(config_train, task="lazy", dataframe=df, show_progress=False)
mt_train.prepare_data()
mt_train.impute()
mt_train.run_task_lazy(show=False)

# Then load and predict without re-training.
config_pred = ConfigMulti(
    predict_size=12,
    targets=["A"],
    lags_consider=[1, 2, 3],
    window_size=4,
    number_folds=2,
    use_exogenous_features=False,
    use_outlier_detection=False,
    auto_save_models=False,
    verbose=False,
)
config_pred.cache_home = cache_dir
mt_pred = MultiTask(config_pred, task="predict", dataframe=df, show_progress=False)
mt_pred.prepare_data()
mt_pred.impute()
result = mt_pred.run_task_predict(show=False, task_name="lazy")
print("Result keys:", list(result.keys())[:4])
assert "future_pred" in result

WeightFunction: all sample weights for the requested index are zero (the window falls entirely within gap-penalty zones). Returning None so ForecasterRecursive uses uniform weighting.

Result keys: ['train_actual', 'train_pred', 'future_actual', 'future_pred']

run_task_spotoptim

multitask.MultiTask.run_task_spotoptim(search_space=None, show=True)

SpotOptim surrogate-model Bayesian tuning.

Parameters

Name	Type	Description	Default
search_space	Optional[Dict[str, Any]]	Dictionary defining the SpotOptim search space.	`None`
show	bool	If `True`, display prediction figures.	`True`

Returns

Name	Type	Description
	Dict[str, Any]	Aggregated prediction package. Per-target results in
	Dict[str, Any]	`self.results["spotoptim"]`.

Examples

import warnings
import tempfile
warnings.filterwarnings("ignore")
from spotforecast2_safe.data.fetch_data import fetch_data, get_package_data_home
from spotforecast2_safe.configurator.config_multi import ConfigMulti
from spotforecast2.multitask import MultiTask

data_home = get_package_data_home()
df = fetch_data(filename=str(data_home / "demo10.csv")).iloc[:500]

config = ConfigMulti(
    predict_size=12,
    targets=["A"],
    lags_consider=[1, 2, 3],
    window_size=4,
    number_folds=2,
    use_exogenous_features=False,
    use_outlier_detection=False,
    auto_save_models=False,
    verbose=False,
    n_trials_spotoptim=2,
    n_initial_spotoptim=1,
)
config.cache_home = tempfile.mkdtemp()

mt = MultiTask(config, task="spotoptim", dataframe=df, show_progress=False)
mt.prepare_data()
mt.impute()
result = mt.run_task_spotoptim(show=False)
print("Result keys:", list(result.keys())[:4])
assert "future_pred" in result

WeightFunction: all sample weights for the requested index are zero (the window falls entirely within gap-penalty zones). Returning None so ForecasterRecursive uses uniform weighting.

WeightFunction: all sample weights for the requested index are zero (the window falls entirely within gap-penalty zones). Returning None so ForecasterRecursive uses uniform weighting.

WeightFunction: all sample weights for the requested index are zero (the window falls entirely within gap-penalty zones). Returning None so ForecasterRecursive uses uniform weighting.
WeightFunction: all sample weights for the requested index are zero (the window falls entirely within gap-penalty zones). Returning None so ForecasterRecursive uses uniform weighting.

`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [  1   2   3  23  24  25  47  48 167 168 169 336] 
  Parameters: {'estimator__num_leaves': 31, 'estimator__max_depth': 3, 'estimator__learning_rate': 0.1, 'estimator__n_estimators': 100, 'estimator__bagging_fraction': 0.75, 'estimator__feature_fraction': 0.75, 'estimator__reg_alpha': 0.01, 'estimator__reg_lambda': 0.01}
  Backtesting metric: 23011.6263809134
Result keys: ['train_actual', 'train_pred', 'future_actual', 'future_pred']

save_models

multitask.MultiTask.save_models(task_name, forecasters=None)

Save fitted forecaster models to the cache directory.

Each model is serialised with joblib (compress=3) into <cache_home>/models/<data_frame_name>/ using a datetime-stamped filename so that multiple snapshots can coexist.

Filename format::

<data_frame_name>_<target>_<task_name>_<YYYYMMDD_HHMMSS>.joblib

If forecasters is None the method collects fitted models from self.results[task_name], where each prediction package is expected to contain a "forecaster" key.

Parameters

Name	Type	Description	Default
task_name	str	Task identifier (`"lazy"`, `"defaults"`). The names `"optuna"` and `"spotoptim"` are also accepted so that model caches produced by the `spotforecast2` sibling package can be saved and loaded; no tuning is performed in this package.	required
forecasters	Optional[Dict[str, Any]]	Optional mapping `{target: fitted_forecaster}`. When `None`, models are taken from the prediction packages stored in `self.results`.	`None`

Returns

Name	Type	Description
	Dict[str, Path]	Mapping `{target: Path}` of saved model file paths.

Raises

Name	Type	Description
	ValueError	If `task_name` is not one of `"lazy"`, `"defaults"`, `"optuna"`, `"spotoptim"`.
	RuntimeError	If no fitted models are available for the requested task.

Examples

import tempfile
from pathlib import Path
from spotforecast2_safe.multitask import LazyTask
from spotforecast2_safe.configurator.config_multi import ConfigMulti

with tempfile.TemporaryDirectory() as tmp:
    cfg = ConfigMulti(
        data_frame_name="demo",
        cache_home=Path(tmp),
        verbose=False,
    )
    task = LazyTask(cfg)
    # Supply a tiny in-memory object as a stand-in for a fitted forecaster.
    dummy_forecaster = object()
    saved = task.save_models(
        task_name="lazy",
        forecasters={"load": dummy_forecaster},
    )
    print(f"Saved targets: {list(saved.keys())}")
    assert saved["load"].suffix == ".joblib"

Saved targets: ['load']

save_tuning_results

multitask.MultiTask.save_tuning_results(
    target,
    task_name,
    best_params,
    best_lags,
)

Save tuning results (best parameters and lags) to a JSON file.

The file is stored under <cache_home>/tuning_results/ with a datetime-stamped filename so that loaders can determine freshness.

Filename format::

<data_frame_name>_<target>_<task_name>_<YYYYMMDD_HHMMSS>.json

Parameters

Name	Type	Description	Default
target	str	Name of the forecast target column.	required
task_name	str	Tuning algorithm identifier (e.g. `"optuna"`, `"spotoptim"`).	required
best_params	Dict[str, Any]	Best hyperparameters discovered during tuning.	required
best_lags	Any	Best lag configuration (int, list, or nested list).	required

Returns

Name	Type	Description
	Path	Path to the saved JSON file.

Examples

import tempfile
from pathlib import Path
from spotforecast2_safe.multitask import LazyTask
from spotforecast2_safe.configurator.config_multi import ConfigMulti

with tempfile.TemporaryDirectory() as tmp:
    cfg = ConfigMulti(data_frame_name="demo10", cache_home=Path(tmp))
    task = LazyTask(cfg)
    path = task.save_tuning_results(
        target="target_0",
        task_name="optuna",
        best_params={"n_estimators": 100, "learning_rate": 0.05},
        best_lags=[1, 2, 24],
    )
    print(path.name[:10])

demo10_tar