Check if a new model needs to be trained and trigger training if necessary.
Inspects the most recently saved model (if any) and trains a new one when the model cache is empty, the existing model’s end_dev is older than hours_until_retrain hours, or force=True is passed. All training parameters are forwarded verbatim to :func:train_new_model.
The class of the forecaster model to train, for example spotforecast2_safe.forecaster.ForecasterLGBM. The class must accept iteration, end_dev, and train_size in its constructor and expose a tune() method.
Short identifier for the model (e.g. 'lgbm'). Used to locate existing model files and to name the new one. If None, the lower-cased class name is used.
Directory where model files are stored. Forwarded to :func:~spotforecast2_safe.manager.trainer.get_last_model and :func:train_new_model. Defaults to :func:~spotforecast2_safe.data.fetch_data.get_cache_home.
Hard cutoff timestamp passed to the model constructor. When None, :func:train_new_model calculates it automatically as one day before the latest index in the dataset.
>>>from spotforecast2.manager.trainer_full import search_space_lgbm>>># Without Optuna, verify the function signature exists>>>callable(search_space_lgbm)True
Train a new forecaster model and optionally save it to disk.
This function fetches the latest data, calculates the training cutoff, initializes a model of the given class, triggers the tuning process, and saves the model following the naming convention: {model_name}_forecaster_{n_iteration}.joblib.
The iteration number for this training run. This acts as an incrementing version number for the model. When using handle_training, the first model starts at iteration 0. Upon subsequent forced or scheduled retrainings, it is incremented by 1 (get_last_model_iteration + 1). It is primarily used to determine the filename when saving the model to disk (e.g., lgbm_forecaster_0.joblib, lgbm_forecaster_1.joblib).
Optional size of the training set as a pandas Timedelta. Determines the lookback window length from end_dev. If provided, the training data will start at end_dev - train_size. If None, all available data up to end_dev is used. Defaults to None.
Optional cutoff date for training. This represents the absolute point in time separating training/development data from unseen future data. If None, it is calculated automatically to be one day before the latest available index in the data.
Absolute path to the CSV file used for training (e.g., str(get_data_home() / 'interim/energy_load.csv')). Relative paths are resolved against :func:~spotforecast2_safe.data.fetch_data.get_data_home. If None, a ValueError is raised by :func:~spotforecast2_safe.data.fetch_data.fetch_data. Defaults to None.
Additional keyword arguments to be passed to the model constructor.
{}
Notes
Relationship between train_size and end_dev: The actual training data spans from max(dataset_start, end_dev - train_size) to end_dev. - If train_size is larger than the available history before end_dev, the framework gracefully clips the start date to the beginning of the dataset without throwing an error. - If end_dev is set to a time before the start of the dataset, the training subset will be empty and the forecaster will fail to fit.
Examples
import pandas as pdfrom spotforecast2.manager.trainer_full import train_new_model# Define a mock model class for demonstrationclass MyModel:def__init__(self, iteration, end_dev, train_size, **kwargs):self.iteration = iterationself.end_dev = end_devself.train_size = train_sizedef tune(self): print(f"Tuning model {self.iteration} up to {self.end_dev}!")def get_params(self): return {}@propertydef name(self): return"mymodel"# Train using exactly 3 years of data leading up to the end of 2025:# Note: In a real scenario, this fetches data and saves a joblib file.# We pass save_to_file=False to avoid writing disk artifacts in the doc example.from spotforecast2_safe.data.fetch_data import get_package_data_homedemo_file = get_package_data_home() /"demo01.csv"model = train_new_model( model_class=MyModel, n_iteration=0, train_size=pd.Timedelta(days=3*365), end_dev="2025-12-31 00:00+00:00", save_to_file=False, data_filename=str(demo_file))