manager.trainer_full

manager.trainer_full

Module for managing full model training.

Functions

Name	Description
handle_training	Check if a new model needs to be trained and trigger training if necessary.
search_space_lgbm	Optuna search space for LightGBM hyperparameters.
search_space_xgb	Optuna search space for XGBoost hyperparameters.
train_new_model	Train a new forecaster model and optionally save it to disk.

handle_training

manager.trainer_full.handle_training(
    model_class,
    model_name=None,
    model_dir=None,
    force=False,
    train_size=None,
    end_dev=None,
    data_filename=None,
    hours_until_retrain=168,
    **kwargs,
)

Check if a new model needs to be trained and trigger training if necessary.

Inspects the most recently saved model (if any) and trains a new one when the model cache is empty, the existing model’s end_dev is older than hours_until_retrain hours, or force=True is passed. All training parameters are forwarded verbatim to :func:train_new_model.

Parameters

Name	Type	Description	Default
model_class	type	The class of the forecaster model to train, for example `spotforecast2_safe.forecaster.ForecasterLGBM`. The class must accept `iteration`, `end_dev`, and `train_size` in its constructor and expose a `tune()` method.	required
model_name	Optional[str]	Short identifier for the model (e.g. `'lgbm'`). Used to locate existing model files and to name the new one. If `None`, the lower-cased class name is used.	`None`
model_dir	Optional[Union[str, Path]]	Directory where model files are stored. Forwarded to :func:`~spotforecast2_safe.manager.trainer.get_last_model` and :func:`train_new_model`. Defaults to :func:`~spotforecast2_safe.data.fetch_data.get_cache_home`.	`None`
force	bool	If `True`, retrain unconditionally regardless of the existing model’s age. Default is `False`.	`False`
train_size	Optional[pd.Timedelta]	Length of the training window forwarded to the model constructor. `None` means all available data up to `end_dev`.	`None`
end_dev	Optional[Union[str, pd.Timestamp]]	Hard cutoff timestamp passed to the model constructor. When `None`, :func:`train_new_model` calculates it automatically as one day before the latest index in the dataset.	`None`
data_filename	Optional[str]	Path to the CSV training file forwarded to :func:`train_new_model`. When `None`, the library default is used.	`None`
hours_until_retrain	int	Number of hours after which the existing model is considered stale and retraining is triggered. Default is 168 hours (7 days).	`168`
**kwargs	Any	Extra keyword arguments forwarded to the model constructor.	`{}`

Returns

Name	Type	Description
	None	None

Examples

import tempfile
import pandas as pd
from unittest.mock import patch
from spotforecast2.manager.trainer_full import handle_training

# Minimal model stub — no real ML libraries required
class StubForecaster:
    name = "stub"
    def __init__(self, iteration, end_dev, train_size=None, **kw):
        self.iteration = iteration
        self.end_dev = end_dev
    def tune(self): pass
    def get_params(self): return {}

# Scenario 1: empty cache → trains at iteration 0
with tempfile.TemporaryDirectory() as tmpdir:
    with patch("spotforecast2.manager.trainer_full.get_last_model",
               return_value=(-1, None)):
        with patch("spotforecast2.manager.trainer_full.train_new_model") as m:
            handle_training(StubForecaster, model_name="stub", model_dir=tmpdir)
            print(f"Scenario 1 — first training at iteration {m.call_args[0][1]}")

# Scenario 2: recent model (24 h old) → skipped
recent = StubForecaster(0, pd.Timestamp.now("UTC") - pd.Timedelta(hours=24))
with tempfile.TemporaryDirectory() as tmpdir:
    with patch("spotforecast2.manager.trainer_full.get_last_model",
               return_value=(0, recent)):
        with patch("spotforecast2.manager.trainer_full.train_new_model") as m:
            handle_training(StubForecaster, model_name="stub", model_dir=tmpdir)
            print(f"Scenario 2 — recent model, retraining called: {m.called}")

# Scenario 3: stale model (10 days old) → retrains at iteration n+1
stale = StubForecaster(2, pd.Timestamp.now("UTC") - pd.Timedelta(days=10))
with tempfile.TemporaryDirectory() as tmpdir:
    with patch("spotforecast2.manager.trainer_full.get_last_model",
               return_value=(2, stale)):
        with patch("spotforecast2.manager.trainer_full.train_new_model") as m:
            handle_training(StubForecaster, model_name="stub", model_dir=tmpdir)
            print(f"Scenario 3 — stale model retrained at iteration {m.call_args[0][1]}")

# Scenario 4: force=True with recent model → retrains unconditionally
with tempfile.TemporaryDirectory() as tmpdir:
    with patch("spotforecast2.manager.trainer_full.get_last_model",
               return_value=(0, recent)):
        with patch("spotforecast2.manager.trainer_full.train_new_model") as m:
            handle_training(
                StubForecaster, model_name="stub", model_dir=tmpdir, force=True
            )
            print(f"Scenario 4 — forced retraining called: {m.called}")

Scenario 1 — first training at iteration 0
Scenario 2 — recent model, retraining called: False
Scenario 3 — stale model retrained at iteration 3
Scenario 4 — forced retraining called: True

search_space_lgbm

manager.trainer_full.search_space_lgbm(trial)

Optuna search space for LightGBM hyperparameters.

Parameters

Name	Type	Description	Default
trial	Any	An :class:`optuna.trial.Trial` instance.	required

Returns

Name	Type	Description
dict	dict	Suggested hyperparameters for the current trial.

Examples

>>> from spotforecast2.manager.trainer_full import search_space_lgbm
>>> # Without Optuna, verify the function signature exists
>>> callable(search_space_lgbm)
True

search_space_xgb

manager.trainer_full.search_space_xgb(trial)

Optuna search space for XGBoost hyperparameters.

Parameters

Name	Type	Description	Default
trial	Any	An :class:`optuna.trial.Trial` instance.	required

Returns

Name	Type	Description
dict	dict	Suggested hyperparameters for the current trial.

Examples

>>> from spotforecast2.manager.trainer_full import search_space_xgb
>>> callable(search_space_xgb)
True

train_new_model

manager.trainer_full.train_new_model(
    model_class,
    n_iteration,
    model_name=None,
    train_size=None,
    save_to_file=True,
    model_dir=None,
    end_dev=None,
    data_filename=None,
    **kwargs,
)

Train a new forecaster model and optionally save it to disk.

This function fetches the latest data, calculates the training cutoff, initializes a model of the given class, triggers the tuning process, and saves the model following the naming convention: {model_name}_forecaster_{n_iteration}.joblib.

Parameters

Name	Type	Description	Default
model_class	type	The class of the forecaster model to train. The class should accept `iteration`, `end_dev`, and `train_size` in its constructor and provide a `tune()` method.	required
n_iteration	int	The iteration number for this training run. This acts as an incrementing version number for the model. When using `handle_training`, the first model starts at iteration 0. Upon subsequent forced or scheduled retrainings, it is incremented by 1 (`get_last_model_iteration + 1`). It is primarily used to determine the filename when saving the model to disk (e.g., `lgbm_forecaster_0.joblib`, `lgbm_forecaster_1.joblib`).	required
model_name	Optional[str]	Optional name of the model to train. If None, the name is inferred from the model class. Defaults to None.	`None`
train_size	Optional[pd.Timedelta]	Optional size of the training set as a pandas Timedelta. Determines the lookback window length from `end_dev`. If provided, the training data will start at `end_dev - train_size`. If None, all available data up to `end_dev` is used. Defaults to None.	`None`
save_to_file	bool	If True, saves the model to disk after training. Defaults to True.	`True`
model_dir	Optional[Union[str, Path]]	Directory where the model should be saved. If None, defaults to the library’s cache home.	`None`
end_dev	Optional[Union[str, pd.Timestamp]]	Optional cutoff date for training. This represents the absolute point in time separating training/development data from unseen future data. If None, it is calculated automatically to be one day before the latest available index in the data.	`None`
data_filename	Optional[str]	Absolute path to the CSV file used for training (e.g., `str(get_data_home() / 'interim/energy_load.csv')`). Relative paths are resolved against :func:`~spotforecast2_safe.data.fetch_data.get_data_home`. If None, a `ValueError` is raised by :func:`~spotforecast2_safe.data.fetch_data.fetch_data`. Defaults to None.	`None`
**kwargs	Any	Additional keyword arguments to be passed to the model constructor.	`{}`

Notes

Relationship between train_size and end_dev: The actual training data spans from max(dataset_start, end_dev - train_size) to end_dev. - If train_size is larger than the available history before end_dev, the framework gracefully clips the start date to the beginning of the dataset without throwing an error. - If end_dev is set to a time before the start of the dataset, the training subset will be empty and the forecaster will fail to fit.

Examples

import pandas as pd
from spotforecast2.manager.trainer_full import train_new_model

# Define a mock model class for demonstration
class MyModel:
    def __init__(self, iteration, end_dev, train_size, **kwargs):
        self.iteration = iteration
        self.end_dev = end_dev
        self.train_size = train_size
    def tune(self): print(f"Tuning model {self.iteration} up to {self.end_dev}!")
    def get_params(self): return {}
    @property
    def name(self): return "mymodel"

# Train using exactly 3 years of data leading up to the end of 2025:
# Note: In a real scenario, this fetches data and saves a joblib file.
# We pass save_to_file=False to avoid writing disk artifacts in the doc example.
from spotforecast2_safe.data.fetch_data import get_package_data_home
demo_file = get_package_data_home() / "demo01.csv"

model = train_new_model(
    model_class=MyModel,
    n_iteration=0,
    train_size=pd.Timedelta(days=3*365),
    end_dev="2025-12-31 00:00+00:00",
    save_to_file=False,
    data_filename=str(demo_file)
)

Tuning model 0 up to 2025-12-31 00:00:00+00:00!

Returns

Name	Type	Description
	Any	The trained model instance.