manager.multitask.MultiTask

manager.multitask.MultiTask(
    task='lazy',
    dataframe=None,
    data_test=None,
    data_frame_name='default',
    cache_home=None,
    agg_weights=None,
    index_name='DateTime',
    number_folds=10,
    predict_size=24,
    bounds=None,
    contamination=0.03,
    imputation_method='weighted',
    use_exogenous_features=True,
    n_trials_optuna=15,
    n_trials_spotoptim=10,
    n_initial_spotoptim=5,
    auto_save_models=True,
    train_days=365 * 2,
    val_days=7 * 2,
    log_level=logging.INFO,
    verbose=False,
    dry_run=False,
    show_progress=False,
    **config_overrides,
)

Orchestrates a multi-target time-series forecasting pipeline.

Data must be provided either as a pandas DataFrame via dataframe. A test dataset can optionally be provided via data_test.

The typical usage flow is:

  1. Instantiate with configuration arguments.
  2. Call method prepare_data to load, resample, and validate data.
  3. Call method detect_outliers to apply hard bounds and IsolationForest.
  4. Call method impute to fill gaps.
  5. Call method build_exogenous_features to construct weather / calendar / day-night / holiday covariates.
  6. Call method run (or individual run_task_* methods) to train, predict, and aggregate.

Parameters

Name Type Description Default
task str Pipeline task mode — "lazy", "optuna", "spotoptim", "predict", or "clean". Defaults to "lazy". 'lazy'
dataframe Optional[pd.DataFrame] Pre-loaded input DataFrame with Train data. The DataFrame must contain a datetime column matching index_name plus at least one numeric target column. Optional for the “clean” task, but required for all other tasks. None
data_test Optional[pd.DataFrame] Pre-loaded input DataFrame with Test data. The DataFrame must contain a datetime column matching index_name plus at least one numeric target column. Optional. None
cache_home Optional[Path] Cache directory path. None
agg_weights Optional[List[float]] Per-target aggregation weights. None
index_name str Datetime column name in the raw CSV / DataFrame. 'DateTime'
number_folds int Number of validation folds. 10
predict_size int Forecast horizon in hours. 24
bounds Optional[List[tuple]] Per-column hard outlier bounds (lower, upper). None
contamination float IsolationForest contamination fraction. 0.03
imputation_method str Gap-filling strategy. 'weighted'
use_exogenous_features bool Whether to build exogenous features. True
n_trials_optuna int Number of Optuna Bayesian-search trials. 15
n_trials_spotoptim int Number of SpotOptim surrogate-search trials. 10
n_initial_spotoptim int Initial random evaluations for SpotOptim. 5
auto_save_models bool Whether to automatically save fitted models to disk after each training run. Defaults to True so that saved models are immediately available for the predict task without any manual call to save_models. True
train_days int Length of the training window in days. Controls TRAIN_SIZE and config.train_size. Defaults to 365 * 2 (two years). 365 * 2
val_days int Length of each validation fold in days. The total validation span is val_days * number_folds. Controls DELTA_VAL and config.delta_val. Defaults to 7 * 10 (ten weeks). 7 * 2
log_level int Logging level for the pipeline logger. logging.INFO
dry_run bool If True, do not clean cache or save models. Useful for testing and debugging. False
config_overrides Any Extra keyword arguments forwarded to ConfigMulti. {}

Examples

import pandas as pd
from spotforecast2.manager.multitask import MultiTask
from spotforecast2_safe.data.fetch_data import fetch_data, get_package_data_home

data_home = get_package_data_home()
df = fetch_data(filename=str(data_home / "demo10.csv"))

mt = MultiTask(dataframe=df, predict_size=24)
print(f"DataFrame stored: {mt._dataframe is not None}")
print(f"Task: {mt.TASK}")
DataFrame stored: True
Task: lazy

Methods

Name Description
run Run the task specified by task (or self.TASK).
run_task_clean Remove all cached data from the pipeline cache directory.
run_task_lazy Lazy Fitting with default LightGBM parameters.
run_task_optuna Optuna Bayesian hyperparameter tuning.
run_task_predict Predict-only using previously saved models.
run_task_spotoptim SpotOptim surrogate-model Bayesian tuning.

run

manager.multitask.MultiTask.run(task=None, show=True, **kwargs)

Run the task specified by task (or self.TASK).

Parameters

Name Type Description Default
task Optional[str] Override the task mode. None uses self.TASK. None
show bool If True, display prediction figures. True

Returns

Name Type Description
Dict[str, Any] Aggregated prediction package. Per-target results are stored
Dict[str, Any] on self.results[<task_key>].

Raises

Name Type Description
ValueError If task is not one of "lazy", "optuna", "spotoptim", "predict", "clean".
RuntimeError If method prepare_data has not been called (for training and prediction tasks).

run_task_clean

manager.multitask.MultiTask.run_task_clean(
    show=True,
    dry_run=False,
    cache_home=None,
)

Remove all cached data from the pipeline cache directory.

Does not require prepare_data() to be called first.

Parameters

Name Type Description Default
show bool Accepted for API consistency. Not used by the clean task. True
dry_run bool If True, report what would be deleted without actually removing anything. False
cache_home Optional[Path] Override the directory to clean. None uses the cache directory configured on this instance. None

Returns

Name Type Description
Dict[str, Any] Dict with keys status, cache_dir, and deleted_items.

Raises

Name Type Description
RuntimeError If the cache directory cannot be removed.

run_task_lazy

manager.multitask.MultiTask.run_task_lazy(show=True)

Lazy Fitting with default LightGBM parameters.

Parameters

Name Type Description Default
show bool If True, display prediction figures. True

Returns

Name Type Description
Dict[str, Any] Aggregated prediction package. Per-target results in
Dict[str, Any] self.results["lazy"].

run_task_optuna

manager.multitask.MultiTask.run_task_optuna(
    search_space=None,
    show=True,
    show_progress=False,
)

Optuna Bayesian hyperparameter tuning.

Parameters

Name Type Description Default
search_space Optional[Callable] Callable (trial) -> dict. None
show bool If True, display prediction figures. True

Returns

Name Type Description
Dict[str, Any] Aggregated prediction package. Per-target results in
Dict[str, Any] self.results["optuna"].

run_task_predict

manager.multitask.MultiTask.run_task_predict(
    show=True,
    task_name=None,
    max_age_days=None,
)

Predict-only using previously saved models.

Loads fitted models from the cache directory and produces predictions without any training. Raises RuntimeError if no saved models are found.

Parameters

Name Type Description Default
show bool If True, display prediction figures. True
task_name Optional[str] Restrict model loading to a specific source task ("lazy", "optuna", or "spotoptim"). None loads the most recent model regardless of source. None
max_age_days Optional[float] Maximum age in days for saved models. None accepts any age. None

Returns

Name Type Description
Dict[str, Any] Aggregated prediction package. Per-target results in
Dict[str, Any] self.results["predict"].

Raises

Name Type Description
RuntimeError If no saved models are found.

run_task_spotoptim

manager.multitask.MultiTask.run_task_spotoptim(search_space=None, show=True)

SpotOptim surrogate-model Bayesian tuning.

Parameters

Name Type Description Default
search_space Optional[Dict[str, Any]] Dictionary defining the SpotOptim search space. None
show bool If True, display prediction figures. True

Returns

Name Type Description
Dict[str, Any] Aggregated prediction package. Per-target results in
Dict[str, Any] self.results["spotoptim"].