manager.trainer_full

manager.trainer_full

Module for managing full model training.

Functions

Name Description
handle_training Check if a new model needs to be trained and trigger training if necessary.
search_space_lgbm Optuna search space for LightGBM hyperparameters.
search_space_xgb Optuna search space for XGBoost hyperparameters.
train_new_model Train a new forecaster model and optionally save it to disk.

handle_training

manager.trainer_full.handle_training(
    model_class,
    model_name=None,
    model_dir=None,
    force=False,
    train_size=None,
    end_dev=None,
    data_filename=None,
    hours_until_retrain=168,
    **kwargs,
)

Check if a new model needs to be trained and trigger training if necessary.

Inspects the most recently saved model (if any) and trains a new one when the model cache is empty, the existing model’s end_dev is older than hours_until_retrain hours, or force=True is passed. All training parameters are forwarded verbatim to :func:train_new_model.

Parameters

Name Type Description Default
model_class type The class of the forecaster model to train, for example spotforecast2_safe.forecaster.ForecasterLGBM. The class must accept iteration, end_dev, and train_size in its constructor and expose a tune() method. required
model_name Optional[str] Short identifier for the model (e.g. 'lgbm'). Used to locate existing model files and to name the new one. If None, the lower-cased class name is used. None
model_dir Optional[Union[str, Path]] Directory where model files are stored. Forwarded to :func:~spotforecast2_safe.manager.trainer.get_last_model and :func:train_new_model. Defaults to :func:~spotforecast2_safe.data.fetch_data.get_cache_home. None
force bool If True, retrain unconditionally regardless of the existing model’s age. Default is False. False
train_size Optional[pd.Timedelta] Length of the training window forwarded to the model constructor. None means all available data up to end_dev. None
end_dev Optional[Union[str, pd.Timestamp]] Hard cutoff timestamp passed to the model constructor. When None, :func:train_new_model calculates it automatically as one day before the latest index in the dataset. None
data_filename Optional[str] Path to the CSV training file forwarded to :func:train_new_model. When None, the library default is used. None
hours_until_retrain int Number of hours after which the existing model is considered stale and retraining is triggered. Default is 168 hours (7 days). 168
**kwargs Any Extra keyword arguments forwarded to the model constructor. {}

Returns

Name Type Description
None None

Examples

import tempfile
import pandas as pd
from unittest.mock import patch
from spotforecast2.manager.trainer_full import handle_training

# Minimal model stub — no real ML libraries required
class StubForecaster:
    name = "stub"
    def __init__(self, iteration, end_dev, train_size=None, **kw):
        self.iteration = iteration
        self.end_dev = end_dev
    def tune(self): pass
    def get_params(self): return {}

# Scenario 1: empty cache → trains at iteration 0
with tempfile.TemporaryDirectory() as tmpdir:
    with patch("spotforecast2.manager.trainer_full.get_last_model",
               return_value=(-1, None)):
        with patch("spotforecast2.manager.trainer_full.train_new_model") as m:
            handle_training(StubForecaster, model_name="stub", model_dir=tmpdir)
            print(f"Scenario 1 — first training at iteration {m.call_args[0][1]}")

# Scenario 2: recent model (24 h old) → skipped
recent = StubForecaster(0, pd.Timestamp.now("UTC") - pd.Timedelta(hours=24))
with tempfile.TemporaryDirectory() as tmpdir:
    with patch("spotforecast2.manager.trainer_full.get_last_model",
               return_value=(0, recent)):
        with patch("spotforecast2.manager.trainer_full.train_new_model") as m:
            handle_training(StubForecaster, model_name="stub", model_dir=tmpdir)
            print(f"Scenario 2 — recent model, retraining called: {m.called}")

# Scenario 3: stale model (10 days old) → retrains at iteration n+1
stale = StubForecaster(2, pd.Timestamp.now("UTC") - pd.Timedelta(days=10))
with tempfile.TemporaryDirectory() as tmpdir:
    with patch("spotforecast2.manager.trainer_full.get_last_model",
               return_value=(2, stale)):
        with patch("spotforecast2.manager.trainer_full.train_new_model") as m:
            handle_training(StubForecaster, model_name="stub", model_dir=tmpdir)
            print(f"Scenario 3 — stale model retrained at iteration {m.call_args[0][1]}")

# Scenario 4: force=True with recent model → retrains unconditionally
with tempfile.TemporaryDirectory() as tmpdir:
    with patch("spotforecast2.manager.trainer_full.get_last_model",
               return_value=(0, recent)):
        with patch("spotforecast2.manager.trainer_full.train_new_model") as m:
            handle_training(
                StubForecaster, model_name="stub", model_dir=tmpdir, force=True
            )
            print(f"Scenario 4 — forced retraining called: {m.called}")
Scenario 1 — first training at iteration 0
Scenario 2 — recent model, retraining called: False
Scenario 3 — stale model retrained at iteration 3
Scenario 4 — forced retraining called: True

search_space_lgbm

manager.trainer_full.search_space_lgbm(trial)

Optuna search space for LightGBM hyperparameters.

Parameters

Name Type Description Default
trial Any An :class:optuna.trial.Trial instance. required

Returns

Name Type Description
dict dict Suggested hyperparameters for the current trial.

Examples

>>> from spotforecast2.manager.trainer_full import search_space_lgbm
>>> # Without Optuna, verify the function signature exists
>>> callable(search_space_lgbm)
True

search_space_xgb

manager.trainer_full.search_space_xgb(trial)

Optuna search space for XGBoost hyperparameters.

Parameters

Name Type Description Default
trial Any An :class:optuna.trial.Trial instance. required

Returns

Name Type Description
dict dict Suggested hyperparameters for the current trial.

Examples

>>> from spotforecast2.manager.trainer_full import search_space_xgb
>>> callable(search_space_xgb)
True

train_new_model

manager.trainer_full.train_new_model(
    model_class,
    n_iteration,
    model_name=None,
    train_size=None,
    save_to_file=True,
    model_dir=None,
    end_dev=None,
    data_filename=None,
    **kwargs,
)

Train a new forecaster model and optionally save it to disk.

This function fetches the latest data, calculates the training cutoff, initializes a model of the given class, triggers the tuning process, and saves the model following the naming convention: {model_name}_forecaster_{n_iteration}.joblib.

Parameters

Name Type Description Default
model_class type The class of the forecaster model to train. The class should accept iteration, end_dev, and train_size in its constructor and provide a tune() method. required
n_iteration int The iteration number for this training run. This acts as an incrementing version number for the model. When using handle_training, the first model starts at iteration 0. Upon subsequent forced or scheduled retrainings, it is incremented by 1 (get_last_model_iteration + 1). It is primarily used to determine the filename when saving the model to disk (e.g., lgbm_forecaster_0.joblib, lgbm_forecaster_1.joblib). required
model_name Optional[str] Optional name of the model to train. If None, the name is inferred from the model class. Defaults to None. None
train_size Optional[pd.Timedelta] Optional size of the training set as a pandas Timedelta. Determines the lookback window length from end_dev. If provided, the training data will start at end_dev - train_size. If None, all available data up to end_dev is used. Defaults to None. None
save_to_file bool If True, saves the model to disk after training. Defaults to True. True
model_dir Optional[Union[str, Path]] Directory where the model should be saved. If None, defaults to the library’s cache home. None
end_dev Optional[Union[str, pd.Timestamp]] Optional cutoff date for training. This represents the absolute point in time separating training/development data from unseen future data. If None, it is calculated automatically to be one day before the latest available index in the data. None
data_filename Optional[str] Absolute path to the CSV file used for training (e.g., str(get_data_home() / 'interim/energy_load.csv')). Relative paths are resolved against :func:~spotforecast2_safe.data.fetch_data.get_data_home. If None, a ValueError is raised by :func:~spotforecast2_safe.data.fetch_data.fetch_data. Defaults to None. None
**kwargs Any Additional keyword arguments to be passed to the model constructor. {}

Notes

Relationship between train_size and end_dev: The actual training data spans from max(dataset_start, end_dev - train_size) to end_dev. - If train_size is larger than the available history before end_dev, the framework gracefully clips the start date to the beginning of the dataset without throwing an error. - If end_dev is set to a time before the start of the dataset, the training subset will be empty and the forecaster will fail to fit.

Examples

import pandas as pd
from spotforecast2.manager.trainer_full import train_new_model

# Define a mock model class for demonstration
class MyModel:
    def __init__(self, iteration, end_dev, train_size, **kwargs):
        self.iteration = iteration
        self.end_dev = end_dev
        self.train_size = train_size
    def tune(self): print(f"Tuning model {self.iteration} up to {self.end_dev}!")
    def get_params(self): return {}
    @property
    def name(self): return "mymodel"

# Train using exactly 3 years of data leading up to the end of 2025:
# Note: In a real scenario, this fetches data and saves a joblib file.
# We pass save_to_file=False to avoid writing disk artifacts in the doc example.
from spotforecast2_safe.data.fetch_data import get_package_data_home
demo_file = get_package_data_home() / "demo01.csv"

model = train_new_model(
    model_class=MyModel,
    n_iteration=0,
    train_size=pd.Timedelta(days=3*365),
    end_dev="2025-12-31 00:00+00:00",
    save_to_file=False,
    data_filename=str(demo_file)
)
Tuning model 0 up to 2025-12-31 00:00:00+00:00!

Returns

Name Type Description
Any The trained model instance.