manager.multitask.MultiTask

manager.multitask.MultiTask(
    task='lazy',
    dataframe=None,
    data_test=None,
    data_frame_name='default',
    cache_home=None,
    agg_weights=None,
    index_name='DateTime',
    number_folds=10,
    predict_size=24,
    bounds=None,
    contamination=0.03,
    imputation_method='weighted',
    use_exogenous_features=True,
    n_trials_optuna=15,
    n_trials_spotoptim=10,
    n_initial_spotoptim=5,
    auto_save_models=True,
    train_days=365 * 2,
    val_days=7 * 2,
    log_level=logging.INFO,
    verbose=False,
    dry_run=False,
    show_progress=False,
    **config_overrides,
)

Orchestrates a multi-target time-series forecasting pipeline.

Data must be provided either as a pandas DataFrame via dataframe. A test dataset can optionally be provided via data_test.

The typical usage flow is:

Instantiate with configuration arguments.
Call method prepare_data to load, resample, and validate data.
Call method detect_outliers to apply hard bounds and IsolationForest.
Call method impute to fill gaps.
Call method build_exogenous_features to construct weather / calendar / day-night / holiday covariates.
Call method run (or individual run_task_* methods) to train, predict, and aggregate.

Parameters

Name	Type	Description	Default
task	str	Pipeline task mode — `"lazy"`, `"optuna"`, `"spotoptim"`, `"predict"`, or `"clean"`. Defaults to `"lazy"`.	`'lazy'`
dataframe	Optional[pd.DataFrame]	Pre-loaded input DataFrame with Train data. The DataFrame must contain a datetime column matching `index_name` plus at least one numeric target column. Optional for the “clean” task, but required for all other tasks.	`None`
data_test	Optional[pd.DataFrame]	Pre-loaded input DataFrame with Test data. The DataFrame must contain a datetime column matching `index_name` plus at least one numeric target column. Optional.	`None`
cache_home	Optional[Path]	Cache directory path.	`None`
agg_weights	Optional[List[float]]	Per-target aggregation weights.	`None`
index_name	str	Datetime column name in the raw CSV / DataFrame.	`'DateTime'`
number_folds	int	Number of validation folds.	`10`
predict_size	int	Forecast horizon in hours.	`24`
bounds	Optional[List[tuple]]	Per-column hard outlier bounds `(lower, upper)`.	`None`
contamination	float	IsolationForest contamination fraction.	`0.03`
imputation_method	str	Gap-filling strategy.	`'weighted'`
use_exogenous_features	bool	Whether to build exogenous features.	`True`
n_trials_optuna	int	Number of Optuna Bayesian-search trials.	`15`
n_trials_spotoptim	int	Number of SpotOptim surrogate-search trials.	`10`
n_initial_spotoptim	int	Initial random evaluations for SpotOptim.	`5`
auto_save_models	bool	Whether to automatically save fitted models to disk after each training run. Defaults to `True` so that saved models are immediately available for the predict task without any manual call to `save_models`.	`True`
train_days	int	Length of the training window in days. Controls `TRAIN_SIZE` and `config.train_size`. Defaults to `365 * 2` (two years).	`365 * 2`
val_days	int	Length of each validation fold in days. The total validation span is `val_days * number_folds`. Controls `DELTA_VAL` and `config.delta_val`. Defaults to `7 * 10` (ten weeks).	`7 * 2`
log_level	int	Logging level for the pipeline logger.	`logging.INFO`
dry_run	bool	If `True`, do not clean cache or save models. Useful for testing and debugging.	`False`
config_overrides	Any	Extra keyword arguments forwarded to ConfigMulti.	`{}`

Examples

import pandas as pd
from spotforecast2.manager.multitask import MultiTask
from spotforecast2_safe.data.fetch_data import fetch_data, get_package_data_home

data_home = get_package_data_home()
df = fetch_data(filename=str(data_home / "demo10.csv"))

mt = MultiTask(dataframe=df, predict_size=24)
print(f"DataFrame stored: {mt._dataframe is not None}")
print(f"Task: {mt.TASK}")

DataFrame stored: True
Task: lazy

Methods

Name	Description
run	Run the task specified by `task` (or `self.TASK`).
run_task_clean	Remove all cached data from the pipeline cache directory.
run_task_lazy	Lazy Fitting with default LightGBM parameters.
run_task_optuna	Optuna Bayesian hyperparameter tuning.
run_task_predict	Predict-only using previously saved models.
run_task_spotoptim	SpotOptim surrogate-model Bayesian tuning.

run

manager.multitask.MultiTask.run(task=None, show=True, **kwargs)

Run the task specified by task (or self.TASK).

Parameters

Name	Type	Description	Default
task	Optional[str]	Override the task mode. `None` uses `self.TASK`.	`None`
show	bool	If `True`, display prediction figures.	`True`

Returns

Name	Type	Description
	Dict[str, Any]	Aggregated prediction package. Per-target results are stored
	Dict[str, Any]	on `self.results[<task_key>]`.

Raises

Name	Type	Description
	ValueError	If `task` is not one of `"lazy"`, `"optuna"`, `"spotoptim"`, `"predict"`, `"clean"`.
	RuntimeError	If method `prepare_data` has not been called (for training and prediction tasks).

run_task_clean

manager.multitask.MultiTask.run_task_clean(
    show=True,
    dry_run=False,
    cache_home=None,
)

Remove all cached data from the pipeline cache directory.

Does not require prepare_data() to be called first.

Parameters

Name	Type	Description	Default
show	bool	Accepted for API consistency. Not used by the clean task.	`True`
dry_run	bool	If `True`, report what would be deleted without actually removing anything.	`False`
cache_home	Optional[Path]	Override the directory to clean. `None` uses the cache directory configured on this instance.	`None`

Returns

Name	Type	Description
	Dict[str, Any]	Dict with keys status, cache_dir, and deleted_items.

Raises

Name	Type	Description
	RuntimeError	If the cache directory cannot be removed.

run_task_lazy

manager.multitask.MultiTask.run_task_lazy(show=True)

Lazy Fitting with default LightGBM parameters.

Parameters

Name	Type	Description	Default
show	bool	If `True`, display prediction figures.	`True`

Returns

Name	Type	Description
	Dict[str, Any]	Aggregated prediction package. Per-target results in
	Dict[str, Any]	`self.results["lazy"]`.

run_task_optuna

manager.multitask.MultiTask.run_task_optuna(
    search_space=None,
    show=True,
    show_progress=False,
)

Optuna Bayesian hyperparameter tuning.

Parameters

Name	Type	Description	Default
search_space	Optional[Callable]	Callable `(trial) -> dict`.	`None`
show	bool	If `True`, display prediction figures.	`True`

Returns

Name	Type	Description
	Dict[str, Any]	Aggregated prediction package. Per-target results in
	Dict[str, Any]	`self.results["optuna"]`.

run_task_predict

manager.multitask.MultiTask.run_task_predict(
    show=True,
    task_name=None,
    max_age_days=None,
)

Predict-only using previously saved models.

Loads fitted models from the cache directory and produces predictions without any training. Raises RuntimeError if no saved models are found.

Parameters

Name	Type	Description	Default
show	bool	If `True`, display prediction figures.	`True`
task_name	Optional[str]	Restrict model loading to a specific source task (`"lazy"`, `"optuna"`, or `"spotoptim"`). `None` loads the most recent model regardless of source.	`None`
max_age_days	Optional[float]	Maximum age in days for saved models. `None` accepts any age.	`None`

Returns

Name	Type	Description
	Dict[str, Any]	Aggregated prediction package. Per-target results in
	Dict[str, Any]	`self.results["predict"]`.

Raises

Name	Type	Description
	RuntimeError	If no saved models are found.

run_task_spotoptim

manager.multitask.MultiTask.run_task_spotoptim(search_space=None, show=True)

SpotOptim surrogate-model Bayesian tuning.

Parameters

Name	Type	Description	Default
search_space	Optional[Dict[str, Any]]	Dictionary defining the SpotOptim search space.	`None`
show	bool	If `True`, display prediction figures.	`True`

Returns

Name	Type	Description
	Dict[str, Any]	Aggregated prediction package. Per-target results in
	Dict[str, Any]	`self.results["spotoptim"]`.