24 HPT PyTorch Lightning: Diabetes Using a Recurrent Neural Network

In this tutorial, we will show how spotPython can be integrated into the PyTorch Lightning training workflow for a regression task.

This chapter describes the hyperparameter tuning of a PyTorch Lightning network on the Diabetes data set. This is a PyTorch Dataset for regression. A toy data set from scikit-learn. Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline.

24.1 Step 1: Setup

Before we consider the detailed experimental setup, we select the parameters that affect run time, initial design size, etc.
The parameter MAX_TIME specifies the maximum run time in seconds.
The parameter INIT_SIZE specifies the initial design size.
The parameter WORKERS specifies the number of workers.
The prefix PREFIX is used for the experiment name and the name of the log file.
The parameter DEVICE specifies the device to use for training.

from spotPython.utils.device import getDevice
from math import inf
MAX_TIME = 1
FUN_EVALS = inf
INIT_SIZE = 5
WORKERS = 0
PREFIX="032"
DEVICE = getDevice()
TORCH_METRIC = "mean_squared_error"

Caution: Run time and initial design size should be increased for real experiments

MAX_TIME is set to one minute for demonstration purposes. For real experiments, this should be increased to at least 1 hour.
FUN_EVALS is set to infinity.
INIT_SIZE is set to 5 for demonstration purposes. For real experiments, this should be increased to at least 10.
WORKERS is set to 0 for demonstration purposes. For real experiments, this should be increased. See the warnings that are printed when the number of workers is set to 0.
PREFIX is set to “032”. This is used for the experiment name and the name of the log file.
DEVICE is set to the device that is returned by getDevice(), e.g., gpu.

Note: Device selection

Although there are no .cuda() or .to(device) calls required, because Lightning does these for you, see LIGHTNINGMODULE, we would like to know which device is used. Threrefore, we imitate the LightningModule behaviour which selects the highest device.
The method spotPython.utils.device.getDevice() returns the device that is used by Lightning.

24.2 Step 2: Initialization of the `fun_control` Dictionary

spotPython uses a Python dictionary for storing the information required for the hyperparameter tuning process.

from spotPython.utils.init import fun_control_init
import numpy as np

fun_control = fun_control_init(
    _L_in=10,
    _L_out=1,
    _torchmetric=TORCH_METRIC,
    PREFIX=PREFIX,
    TENSORBOARD_CLEAN=True,
    device=DEVICE,
    enable_progress_bar=False,
    fun_evals=FUN_EVALS,
    log_level=10,
    max_time=MAX_TIME,
    num_workers=WORKERS,
    show_progress=True,
    test_size=0.1,
    tolerance_x=np.sqrt(np.spacing(1)),
    verbosity=1
    )

Moving TENSORBOARD_PATH: runs/ to TENSORBOARD_PATH_OLD: runs_OLD/runs_2024_04_22_01_45_35
Created spot_tensorboard_path: runs/spot_logs/032_maans14_2024-04-22_01-45-35 for SummaryWriter()

24.3 Step 3: Loading the Diabetes Data Set

from spotPython.hyperparameters.values import set_control_key_value
from spotPython.data.diabetes import Diabetes
dataset = Diabetes()
set_control_key_value(control_dict=fun_control,
                        key="data_set",
                        value=dataset,
                        replace=True)
print(len(dataset))

Note: Data Set and Data Loader

As shown below, a DataLoader from torch.utils.data can be used to check the data.

# Set batch size for DataLoader
batch_size = 5
# Create DataLoader
from torch.utils.data import DataLoader
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False)

# Iterate over the data in the DataLoader
for batch in dataloader:
    inputs, targets = batch
    print(f"Batch Size: {inputs.size(0)}")
    print(f"Inputs Shape: {inputs.shape}")
    print(f"Targets Shape: {targets.shape}")
    print("---------------")
    print(f"Inputs: {inputs}")
    print(f"Targets: {targets}")
    break

Batch Size: 5
Inputs Shape: torch.Size([5, 10])
Targets Shape: torch.Size([5])
---------------
Inputs: tensor([[ 0.0381,  0.0507,  0.0617,  0.0219, -0.0442, -0.0348, -0.0434, -0.0026,
          0.0199, -0.0176],
        [-0.0019, -0.0446, -0.0515, -0.0263, -0.0084, -0.0192,  0.0744, -0.0395,
         -0.0683, -0.0922],
        [ 0.0853,  0.0507,  0.0445, -0.0057, -0.0456, -0.0342, -0.0324, -0.0026,
          0.0029, -0.0259],
        [-0.0891, -0.0446, -0.0116, -0.0367,  0.0122,  0.0250, -0.0360,  0.0343,
          0.0227, -0.0094],
        [ 0.0054, -0.0446, -0.0364,  0.0219,  0.0039,  0.0156,  0.0081, -0.0026,
         -0.0320, -0.0466]])
Targets: tensor([151.,  75., 141., 206., 135.])

24.4 Step 4: Preprocessing

Preprocessing is handled by Lightning and PyTorch. It is described in the LIGHTNINGDATAMODULE documentation. Here you can find information about the transforms methods.

24.5 Step 5: Select the Core Model (`algorithm`) and `core_model_hyper_dict`

spotPython includes the NetLightRegression class [SOURCE] for configurable neural networks. The class is imported here. It inherits from the class Lightning.LightningModule, which is the base class for all models in Lightning. Lightning.LightningModule is a subclass of torch.nn.Module and provides additional functionality for the training and testing of neural networks. The class Lightning.LightningModule is described in the Lightning documentation.

Here we simply add the NN Model to the fun_control dictionary by calling the function add_core_model_to_fun_control:

from spotPython.light.regression.rnnlightregression import RNNLightRegression
from spotPython.hyperdict.light_hyper_dict import LightHyperDict
from spotPython.hyperparameters.values import add_core_model_to_fun_control
add_core_model_to_fun_control(fun_control=fun_control,
                              core_model=RNNLightRegression,
                              hyper_dict=LightHyperDict)

The hyperparameters of the model are specified in the core_model_hyper_dict dictionary [SOURCE].

Note: User specified models and hyperparameter dictionaries

The user can specify a model and a hyperparameter dictionary in a subfolder, e.g., userRNN in the current working directory.
The model and the hyperparameter dictionary are imported with the following code:

from spotPython.hyperparameters.values import add_core_model_to_fun_control
import sys
sys.path.insert(0, './userRNN')
import userrnn
import user_hyper_dict
add_core_model_to_fun_control(fun_control=fun_control,
                              core_model=userrnn.RNNLightRegression,
                              hyper_dict=user_hyper_dict.UserHyperDict)

Example files can be found in the userRNN folder.
These files can be modified by the user.
They can be used without re-compilation of the spotPython source code, if they are located in a subfolder of the current working directory.

24.6 Step 6: Modify `hyper_dict` Hyperparameters for the Selected Algorithm aka `core_model`

spotPython provides functions for modifying the hyperparameters, their bounds and factors as well as for activating and de-activating hyperparameters without re-compilation of the Python source code.

Caution: Small number of epochs for demonstration purposes

epochs and patience are set to small values for demonstration purposes. These values are too small for a real application.
More resonable values are, e.g.:
- set_control_hyperparameter_value(fun_control, "epochs", [7, 9]) and
- set_control_hyperparameter_value(fun_control, "patience", [2, 7])

from spotPython.hyperparameters.values import set_control_hyperparameter_value

set_control_hyperparameter_value(fun_control, "l1", [3, 8])
set_control_hyperparameter_value(fun_control, "epochs", [7, 9])
set_control_hyperparameter_value(fun_control, "batch_size", [2, 6])
set_control_hyperparameter_value(fun_control, "optimizer", [
                "Adadelta",
                "Adagrad",
                "Adam",
                "Adamax"])
set_control_hyperparameter_value(fun_control, "dropout_prob", [0.01, 0.25])
set_control_hyperparameter_value(fun_control, "lr_mult", [0.5, 5.0])
set_control_hyperparameter_value(fun_control, "patience", [3, 9])
set_control_hyperparameter_value(fun_control, "act_fn",["ReLU"] )
set_control_hyperparameter_value(fun_control, "initialization",["Default"] )

Setting hyperparameter l1 to value [3, 8].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter epochs to value [7, 9].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter batch_size to value [2, 6].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter optimizer to value ['Adadelta', 'Adagrad', 'Adam', 'Adamax'].
Variable type is factor.
Core type is str.
Calling modify_hyper_parameter_levels().
Setting hyperparameter dropout_prob to value [0.01, 0.25].
Variable type is float.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter lr_mult to value [0.5, 5.0].
Variable type is float.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter patience to value [3, 9].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter act_fn to value ['ReLU'].
Variable type is factor.
Core type is instance().
Calling modify_hyper_parameter_levels().
Setting hyperparameter initialization to value ['Default'].
Variable type is factor.
Core type is str.
Calling modify_hyper_parameter_levels().

Now, the dictionary fun_control contains all information needed for the hyperparameter tuning. Before the hyperparameter tuning is started, it is recommended to take a look at the experimental design. The method gen_design_table [SOURCE] generates a design table as follows:

from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control))

| name           | type   | default   |   lower |   upper | transform             |
|----------------|--------|-----------|---------|---------|-----------------------|
| l1             | int    | 3         |    3    |    8    | transform_power_2_int |
| epochs         | int    | 4         |    7    |    9    | transform_power_2_int |
| batch_size     | int    | 4         |    2    |    6    | transform_power_2_int |
| act_fn         | factor | ReLU      |    0    |    0    | None                  |
| optimizer      | factor | SGD       |    0    |    3    | None                  |
| dropout_prob   | float  | 0.01      |    0.01 |    0.25 | None                  |
| lr_mult        | float  | 1.0       |    0.5  |    5    | None                  |
| patience       | int    | 2         |    3    |    9    | transform_power_2_int |
| initialization | factor | Default   |    0    |    0    | None                  |

This allows to check if all information is available and if the information is correct.

Note: Hyperparameters of the Tuned Model and the fun_control Dictionary

The updated fun_control dictionary can be shown with the command fun_control["core_model_hyper_dict"].

24.7 Step 7: Data Splitting, the Objective (Loss) Function and the Metric

24.7.1 Evaluation

The evaluation procedure requires the specification of two elements:

the way how the data is split into a train and a test set
the loss function (and a metric).

Caution: Data Splitting in Lightning

The data splitting is handled by Lightning.

24.7.2 Loss Function

The loss function is specified in the configurable network class [SOURCE] We will use MSE.

24.7.3 Metric

Similar to the loss function, the metric is specified in the configurable network class [SOURCE].

Caution: Loss Function and Metric in Lightning

The loss function and the metric are not hyperparameters that can be tuned with spotPython.
They are handled by Lightning.

24.8 Step 8: Calling the SPOT Function

24.8.1 Preparing the SPOT Call

from spotPython.utils.init import design_control_init, surrogate_control_init
design_control = design_control_init()
set_control_key_value(control_dict=design_control,
                        key="init_size",
                        value=INIT_SIZE,
                        replace=True)

surrogate_control = surrogate_control_init()
set_control_key_value(control_dict=surrogate_control,
                        key="noise",
                        value=True,
                        replace=True)                       
set_control_key_value(control_dict=surrogate_control,
                        key="n_theta",
                        value=2,
                        replace=True)

24.8.2 The Objective Function `fun`

The objective function fun from the class HyperLight [SOURCE] is selected next. It implements an interface from PyTorch’s training, validation, and testing methods to spotPython.

from spotPython.fun.hyperlight import HyperLight
fun = HyperLight(log_level=10).fun

24.8.3 Showing the fun_control Dictionary

import pprint
pprint.pprint(fun_control)

{'CHECKPOINT_PATH': 'runs/saved_models/',
 'DATASET_PATH': 'data/',
 'PREFIX': '032',
 'RESULTS_PATH': 'results/',
 'TENSORBOARD_PATH': 'runs/',
 '_L_in': 10,
 '_L_out': 1,
 '_torchmetric': 'mean_squared_error',
 'accelerator': 'auto',
 'converters': None,
 'core_model': <class 'spotPython.light.regression.rnnlightregression.RNNLightRegression'>,
 'core_model_hyper_dict': {'act_fn': {'class_name': 'spotPython.torch.activation',
                                      'core_model_parameter_type': 'instance()',
                                      'default': 'ReLU',
                                      'levels': ['ReLU'],
                                      'lower': 0,
                                      'transform': 'None',
                                      'type': 'factor',
                                      'upper': 0},
                           'batch_size': {'default': 4,
                                          'lower': 2,
                                          'transform': 'transform_power_2_int',
                                          'type': 'int',
                                          'upper': 6},
                           'dropout_prob': {'default': 0.01,
                                            'lower': 0.01,
                                            'transform': 'None',
                                            'type': 'float',
                                            'upper': 0.25},
                           'epochs': {'default': 4,
                                      'lower': 7,
                                      'transform': 'transform_power_2_int',
                                      'type': 'int',
                                      'upper': 9},
                           'initialization': {'core_model_parameter_type': 'str',
                                              'default': 'Default',
                                              'levels': ['Default'],
                                              'lower': 0,
                                              'transform': 'None',
                                              'type': 'factor',
                                              'upper': 0},
                           'l1': {'default': 3,
                                  'lower': 3,
                                  'transform': 'transform_power_2_int',
                                  'type': 'int',
                                  'upper': 8},
                           'lr_mult': {'default': 1.0,
                                       'lower': 0.5,
                                       'transform': 'None',
                                       'type': 'float',
                                       'upper': 5.0},
                           'optimizer': {'class_name': 'torch.optim',
                                         'core_model_parameter_type': 'str',
                                         'default': 'SGD',
                                         'levels': ['Adadelta',
                                                    'Adagrad',
                                                    'Adam',
                                                    'Adamax'],
                                         'lower': 0,
                                         'transform': 'None',
                                         'type': 'factor',
                                         'upper': 3},
                           'patience': {'default': 2,
                                        'lower': 3,
                                        'transform': 'transform_power_2_int',
                                        'type': 'int',
                                        'upper': 9}},
 'core_model_hyper_dict_default': {'act_fn': {'class_name': 'spotPython.torch.activation',
                                              'core_model_parameter_type': 'instance()',
                                              'default': 'ReLU',
                                              'levels': ['Tanh', 'ReLU'],
                                              'lower': 0,
                                              'transform': 'None',
                                              'type': 'factor',
                                              'upper': 1},
                                   'batch_size': {'default': 4,
                                                  'lower': 1,
                                                  'transform': 'transform_power_2_int',
                                                  'type': 'int',
                                                  'upper': 4},
                                   'dropout_prob': {'default': 0.01,
                                                    'lower': 0.0,
                                                    'transform': 'None',
                                                    'type': 'float',
                                                    'upper': 0.25},
                                   'epochs': {'default': 4,
                                              'lower': 4,
                                              'transform': 'transform_power_2_int',
                                              'type': 'int',
                                              'upper': 9},
                                   'initialization': {'core_model_parameter_type': 'str',
                                                      'default': 'Default',
                                                      'levels': ['Default',
                                                                 'Kaiming',
                                                                 'Xavier'],
                                                      'lower': 0,
                                                      'transform': 'None',
                                                      'type': 'factor',
                                                      'upper': 2},
                                   'l1': {'default': 3,
                                          'lower': 3,
                                          'transform': 'transform_power_2_int',
                                          'type': 'int',
                                          'upper': 8},
                                   'lr_mult': {'default': 1.0,
                                               'lower': 0.1,
                                               'transform': 'None',
                                               'type': 'float',
                                               'upper': 10.0},
                                   'optimizer': {'class_name': 'torch.optim',
                                                 'core_model_parameter_type': 'str',
                                                 'default': 'SGD',
                                                 'levels': ['Adadelta',
                                                            'Adagrad',
                                                            'Adam',
                                                            'AdamW',
                                                            'SparseAdam',
                                                            'Adamax',
                                                            'ASGD',
                                                            'NAdam',
                                                            'RAdam',
                                                            'RMSprop',
                                                            'Rprop',
                                                            'SGD'],
                                                 'lower': 0,
                                                 'transform': 'None',
                                                 'type': 'factor',
                                                 'upper': 11},
                                   'patience': {'default': 2,
                                                'lower': 2,
                                                'transform': 'transform_power_2_int',
                                                'type': 'int',
                                                'upper': 6}},
 'core_model_name': None,
 'counter': 0,
 'data': None,
 'data_dir': './data',
 'data_module': None,
 'data_set': <spotPython.data.diabetes.Diabetes object at 0x398de53d0>,
 'data_set_name': None,
 'db_dict_name': None,
 'design': None,
 'device': 'mps',
 'devices': 1,
 'enable_progress_bar': False,
 'eval': None,
 'fun_evals': inf,
 'fun_repeats': 1,
 'horizon': None,
 'infill_criterion': 'y',
 'k_folds': 3,
 'log_graph': False,
 'log_level': 10,
 'loss_function': None,
 'lower': array([3. , 4. , 1. , 0. , 0. , 0. , 0.1, 2. , 0. ]),
 'max_surrogate_points': 30,
 'max_time': 1,
 'metric_params': {},
 'metric_river': None,
 'metric_sklearn': None,
 'metric_sklearn_name': None,
 'metric_torch': None,
 'model_dict': {},
 'n_points': 1,
 'n_samples': None,
 'n_total': None,
 'noise': False,
 'num_workers': 0,
 'ocba_delta': 0,
 'oml_grace_period': None,
 'optimizer': None,
 'path': None,
 'prep_model': None,
 'prep_model_name': None,
 'progress_file': None,
 'save_model': False,
 'scenario': None,
 'seed': 123,
 'show_batch_interval': 1000000,
 'show_models': False,
 'show_progress': True,
 'shuffle': None,
 'sigma': 0.0,
 'spot_tensorboard_path': 'runs/spot_logs/032_maans14_2024-04-22_01-45-35',
 'spot_writer': <torch.utils.tensorboard.writer.SummaryWriter object at 0x398b86410>,
 'target_column': None,
 'target_type': None,
 'task': None,
 'test': None,
 'test_seed': 1234,
 'test_size': 0.1,
 'tolerance_x': 1.4901161193847656e-08,
 'train': None,
 'upper': array([ 8.  ,  9.  ,  4.  ,  1.  , 11.  ,  0.25, 10.  ,  6.  ,  2.  ]),
 'var_name': ['l1',
              'epochs',
              'batch_size',
              'act_fn',
              'optimizer',
              'dropout_prob',
              'lr_mult',
              'patience',
              'initialization'],
 'var_type': ['int',
              'int',
              'int',
              'factor',
              'factor',
              'float',
              'float',
              'int',
              'factor'],
 'verbosity': 1,
 'weight_coeff': 0.0,
 'weights': 1.0,
 'weights_entry': None}

pprint.pprint(design_control)

{'init_size': 5, 'repeats': 1}

pprint.pprint(surrogate_control)

{'log_level': 50,
 'max_Lambda': 1,
 'max_theta': 2.0,
 'metric_factorial': 'canberra',
 'min_Lambda': 1e-09,
 'min_theta': -3.0,
 'model_fun_evals': 10000,
 'model_optimizer': <function differential_evolution at 0x17619cd60>,
 'n_p': 1,
 'n_theta': 2,
 'noise': True,
 'optim_p': False,
 'p_val': 2.0,
 'seed': 124,
 'theta_init_zero': True,
 'var_type': None}

24.8.4 Starting the Hyperparameter Tuning

The spotPython hyperparameter tuning is started by calling the Spot function [SOURCE].

from spotPython.spot import spot
spot_tuner = spot.Spot(fun=fun,
                       fun_control=fun_control,
                       design_control=design_control,
                       surrogate_control=surrogate_control)
spot_tuner.run()


In fun(): config:
{'act_fn': ReLU(),
 'batch_size': 64,
 'dropout_prob': 0.19355651674791854,
 'epochs': 256,
 'initialization': 'Default',
 'l1': 16,
 'lr_mult': 1.5691149440098038,
 'optimizer': 'Adam',
 'patience': 32}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 5583.392578125, 'hp_metric': 5583.392578125}

In fun(): config:
{'act_fn': ReLU(),
 'batch_size': 16,
 'dropout_prob': 0.09424169914869776,
 'epochs': 256,
 'initialization': 'Default',
 'l1': 128,
 'lr_mult': 3.35818256351233,
 'optimizer': 'Adadelta',
 'patience': 512}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 3522.114501953125, 'hp_metric': 3522.114501953125}

In fun(): config:
{'act_fn': ReLU(),
 'batch_size': 4,
 'dropout_prob': 0.21164199382623602,
 'epochs': 512,
 'initialization': 'Default',
 'l1': 128,
 'lr_mult': 0.9336514668325573,
 'optimizer': 'Adamax',
 'patience': 16}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2821.33837890625, 'hp_metric': 2821.33837890625}

In fun(): config:
{'act_fn': ReLU(),
 'batch_size': 8,
 'dropout_prob': 0.05728504399550885,
 'epochs': 128,
 'initialization': 'Default',
 'l1': 64,
 'lr_mult': 4.575980093998586,
 'optimizer': 'Adam',
 'patience': 32}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 3466.693603515625, 'hp_metric': 3466.693603515625}

In fun(): config:
{'act_fn': ReLU(),
 'batch_size': 16,
 'dropout_prob': 0.14352914208400058,
 'epochs': 256,
 'initialization': 'Default',
 'l1': 8,
 'lr_mult': 2.4204853123355816,
 'optimizer': 'Adagrad',
 'patience': 128}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 7482.10986328125, 'hp_metric': 7482.10986328125}

In fun(): config:
{'act_fn': ReLU(),
 'batch_size': 32,
 'dropout_prob': 0.25,
 'epochs': 512,
 'initialization': 'Default',
 'l1': 128,
 'lr_mult': 0.5,
 'optimizer': 'Adamax',
 'patience': 8}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2854.076416015625, 'hp_metric': 2854.076416015625}
spotPython tuning: 2821.33837890625 [#########-] 87.57% 

In fun(): config:
{'act_fn': ReLU(),
 'batch_size': 4,
 'dropout_prob': 0.1882325888608727,
 'epochs': 512,
 'initialization': 'Default',
 'l1': 128,
 'lr_mult': 1.1984089487825935,
 'optimizer': 'Adamax',
 'patience': 32}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2890.33056640625, 'hp_metric': 2890.33056640625}
spotPython tuning: 2821.33837890625 [##########] 100.00% Done...

{'CHECKPOINT_PATH': 'runs/saved_models/',
 'DATASET_PATH': 'data/',
 'PREFIX': '032',
 'RESULTS_PATH': 'results/',
 'TENSORBOARD_PATH': 'runs/',
 '_L_in': 10,
 '_L_out': 1,
 '_torchmetric': 'mean_squared_error',
 'accelerator': 'auto',
 'converters': None,
 'core_model': <class 'spotPython.light.regression.rnnlightregression.RNNLightRegression'>,
 'core_model_hyper_dict': {'act_fn': {'class_name': 'spotPython.torch.activation',
                                      'core_model_parameter_type': 'instance()',
                                      'default': 'ReLU',
                                      'levels': ['ReLU'],
                                      'lower': 0,
                                      'transform': 'None',
                                      'type': 'factor',
                                      'upper': 0},
                           'batch_size': {'default': 4,
                                          'lower': 2,
                                          'transform': 'transform_power_2_int',
                                          'type': 'int',
                                          'upper': 6},
                           'dropout_prob': {'default': 0.01,
                                            'lower': 0.01,
                                            'transform': 'None',
                                            'type': 'float',
                                            'upper': 0.25},
                           'epochs': {'default': 4,
                                      'lower': 7,
                                      'transform': 'transform_power_2_int',
                                      'type': 'int',
                                      'upper': 9},
                           'initialization': {'core_model_parameter_type': 'str',
                                              'default': 'Default',
                                              'levels': ['Default'],
                                              'lower': 0,
                                              'transform': 'None',
                                              'type': 'factor',
                                              'upper': 0},
                           'l1': {'default': 3,
                                  'lower': 3,
                                  'transform': 'transform_power_2_int',
                                  'type': 'int',
                                  'upper': 8},
                           'lr_mult': {'default': 1.0,
                                       'lower': 0.5,
                                       'transform': 'None',
                                       'type': 'float',
                                       'upper': 5.0},
                           'optimizer': {'class_name': 'torch.optim',
                                         'core_model_parameter_type': 'str',
                                         'default': 'SGD',
                                         'levels': ['Adadelta',
                                                    'Adagrad',
                                                    'Adam',
                                                    'Adamax'],
                                         'lower': 0,
                                         'transform': 'None',
                                         'type': 'factor',
                                         'upper': 3},
                           'patience': {'default': 2,
                                        'lower': 3,
                                        'transform': 'transform_power_2_int',
                                        'type': 'int',
                                        'upper': 9}},
 'core_model_hyper_dict_default': {'act_fn': {'class_name': 'spotPython.torch.activation',
                                              'core_model_parameter_type': 'instance()',
                                              'default': 'ReLU',
                                              'levels': ['Tanh', 'ReLU'],
                                              'lower': 0,
                                              'transform': 'None',
                                              'type': 'factor',
                                              'upper': 1},
                                   'batch_size': {'default': 4,
                                                  'lower': 1,
                                                  'transform': 'transform_power_2_int',
                                                  'type': 'int',
                                                  'upper': 4},
                                   'dropout_prob': {'default': 0.01,
                                                    'lower': 0.0,
                                                    'transform': 'None',
                                                    'type': 'float',
                                                    'upper': 0.25},
                                   'epochs': {'default': 4,
                                              'lower': 4,
                                              'transform': 'transform_power_2_int',
                                              'type': 'int',
                                              'upper': 9},
                                   'initialization': {'core_model_parameter_type': 'str',
                                                      'default': 'Default',
                                                      'levels': ['Default',
                                                                 'Kaiming',
                                                                 'Xavier'],
                                                      'lower': 0,
                                                      'transform': 'None',
                                                      'type': 'factor',
                                                      'upper': 2},
                                   'l1': {'default': 3,
                                          'lower': 3,
                                          'transform': 'transform_power_2_int',
                                          'type': 'int',
                                          'upper': 8},
                                   'lr_mult': {'default': 1.0,
                                               'lower': 0.1,
                                               'transform': 'None',
                                               'type': 'float',
                                               'upper': 10.0},
                                   'optimizer': {'class_name': 'torch.optim',
                                                 'core_model_parameter_type': 'str',
                                                 'default': 'SGD',
                                                 'levels': ['Adadelta',
                                                            'Adagrad',
                                                            'Adam',
                                                            'AdamW',
                                                            'SparseAdam',
                                                            'Adamax',
                                                            'ASGD',
                                                            'NAdam',
                                                            'RAdam',
                                                            'RMSprop',
                                                            'Rprop',
                                                            'SGD'],
                                                 'lower': 0,
                                                 'transform': 'None',
                                                 'type': 'factor',
                                                 'upper': 11},
                                   'patience': {'default': 2,
                                                'lower': 2,
                                                'transform': 'transform_power_2_int',
                                                'type': 'int',
                                                'upper': 6}},
 'core_model_name': None,
 'counter': 7,
 'data': None,
 'data_dir': './data',
 'data_module': None,
 'data_set': <spotPython.data.diabetes.Diabetes object at 0x398de53d0>,
 'data_set_name': None,
 'db_dict_name': None,
 'design': None,
 'device': 'mps',
 'devices': 1,
 'enable_progress_bar': False,
 'eval': None,
 'fun_evals': inf,
 'fun_repeats': 1,
 'horizon': None,
 'infill_criterion': 'y',
 'k_folds': 3,
 'log_graph': False,
 'log_level': 10,
 'loss_function': None,
 'lower': array([3. , 4. , 1. , 0. , 0. , 0. , 0.1, 2. , 0. ]),
 'max_surrogate_points': 30,
 'max_time': 1,
 'metric_params': {},
 'metric_river': None,
 'metric_sklearn': None,
 'metric_sklearn_name': None,
 'metric_torch': None,
 'model_dict': {},
 'n_points': 1,
 'n_samples': None,
 'n_total': None,
 'noise': False,
 'num_workers': 0,
 'ocba_delta': 0,
 'oml_grace_period': None,
 'optimizer': None,
 'path': None,
 'prep_model': None,
 'prep_model_name': None,
 'progress_file': None,
 'save_model': False,
 'scenario': None,
 'seed': 123,
 'show_batch_interval': 1000000,
 'show_models': False,
 'show_progress': True,
 'shuffle': None,
 'sigma': 0.0,
 'spot_tensorboard_path': 'runs/spot_logs/032_maans14_2024-04-22_01-45-35',
 'spot_writer': <torch.utils.tensorboard.writer.SummaryWriter object at 0x398b86410>,
 'target_column': None,
 'target_type': None,
 'task': None,
 'test': None,
 'test_seed': 1234,
 'test_size': 0.1,
 'tolerance_x': 1.4901161193847656e-08,
 'train': None,
 'upper': array([ 8.  ,  9.  ,  4.  ,  1.  , 11.  ,  0.25, 10.  ,  6.  ,  2.  ]),
 'var_name': ['l1',
              'epochs',
              'batch_size',
              'act_fn',
              'optimizer',
              'dropout_prob',
              'lr_mult',
              'patience',
              'initialization'],
 'var_type': ['int',
              'int',
              'int',
              'factor',
              'factor',
              'float',
              'float',
              'int',
              'factor'],
 'verbosity': 1,
 'weight_coeff': 0.0,
 'weights': 1.0,
 'weights_entry': None}

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric      ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric         │      5583.392578125       │
│         val_loss          │      5583.392578125       │
└───────────────────────────┴───────────────────────────┘

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric      ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric         │     3522.114501953125     │
│         val_loss          │     3522.114501953125     │
└───────────────────────────┴───────────────────────────┘

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric      ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric         │     2821.33837890625      │
│         val_loss          │     2821.33837890625      │
└───────────────────────────┴───────────────────────────┘

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric      ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric         │     3466.693603515625     │
│         val_loss          │     3466.693603515625     │
└───────────────────────────┴───────────────────────────┘

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric      ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric         │     7482.10986328125      │
│         val_loss          │     7482.10986328125      │
└───────────────────────────┴───────────────────────────┘

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric      ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric         │     2854.076416015625     │
│         val_loss          │     2854.076416015625     │
└───────────────────────────┴───────────────────────────┘

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric      ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric         │     2890.33056640625      │
│         val_loss          │     2890.33056640625      │
└───────────────────────────┴───────────────────────────┘

<spotPython.spot.spot.Spot at 0x3c8c13550>

24.9 Step 9: Tensorboard

The textual output shown in the console (or code cell) can be visualized with Tensorboard.

tensorboard --logdir="runs/"

Further information can be found in the PyTorch Lightning documentation for Tensorboard.

24.10 Step 10: Results

After the hyperparameter tuning run is finished, the results can be analyzed.

spot_tuner.plot_progress(log_y=False,
    filename="./figures/" + PREFIX + "_progress.png")

Progress plot. *Black* dots denote results from the initial design. *Red* dots illustrate the improvement found by the surrogate model based optimization.

from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control=fun_control, spot=spot_tuner))

| name           | type   | default   |   lower |   upper | tuned               | transform             |   importance | stars   |
|----------------|--------|-----------|---------|---------|---------------------|-----------------------|--------------|---------|
| l1             | int    | 3         |     3.0 |     8.0 | 7.0                 | transform_power_2_int |       100.00 | ***     |
| epochs         | int    | 4         |     7.0 |     9.0 | 9.0                 | transform_power_2_int |         1.01 | *       |
| batch_size     | int    | 4         |     2.0 |     6.0 | 2.0                 | transform_power_2_int |         1.01 | *       |
| act_fn         | factor | ReLU      |     0.0 |     0.0 | ReLU                | None                  |         0.00 |         |
| optimizer      | factor | SGD       |     0.0 |     3.0 | Adamax              | None                  |         1.01 | *       |
| dropout_prob   | float  | 0.01      |    0.01 |    0.25 | 0.21164199382623602 | None                  |         1.01 | *       |
| lr_mult        | float  | 1.0       |     0.5 |     5.0 | 0.9336514668325573  | None                  |         2.97 | *       |
| patience       | int    | 2         |     3.0 |     9.0 | 4.0                 | transform_power_2_int |         1.01 | *       |
| initialization | factor | Default   |     0.0 |     0.0 | Default             | None                  |         0.00 |         |

spot_tuner.plot_importance(threshold=0.025,
    filename="./figures/" + PREFIX + "_importance.png")

Variable importance plot, threshold 0.025.

24.10.1 Get the Tuned Architecture

from spotPython.hyperparameters.values import get_tuned_architecture
config = get_tuned_architecture(spot_tuner, fun_control)
print(config)

{'l1': 128, 'epochs': 512, 'batch_size': 4, 'act_fn': ReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.21164199382623602, 'lr_mult': 0.9336514668325573, 'patience': 16, 'initialization': 'Default'}

Test on the full data set

from spotPython.light.testmodel import test_model
test_model(config, fun_control)

LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.TESTING
test_size: 0.1 used for test dataset.
LightDataModule.test_dataloader(). Test set size: 45
test_model result: {'val_loss': 3073.3818359375, 'hp_metric': 3073.3818359375}

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric         │      3073.3818359375      │
│         val_loss          │      3073.3818359375      │
└───────────────────────────┴───────────────────────────┘

(3073.3818359375, 3073.3818359375)

from spotPython.light.loadmodel import load_light_from_checkpoint

model_loaded = load_light_from_checkpoint(config, fun_control)

config: {'l1': 128, 'epochs': 512, 'batch_size': 4, 'act_fn': ReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.21164199382623602, 'lr_mult': 0.9336514668325573, 'patience': 16, 'initialization': 'Default'}
Loading model with 128_512_4_ReLU_Adamax_0.2116_0.9337_16_Default_TEST from runs/saved_models/128_512_4_ReLU_Adamax_0.2116_0.9337_16_Default_TEST/last.ckpt
Model: RNNLightRegression(
  (rnn_layer): RNN(10, 128, batch_first=True)
  (fc): Linear(in_features=128, out_features=128, bias=True)
  (output_layer): Linear(in_features=128, out_features=1, bias=True)
  (dropout1): Dropout(p=0.21164199382623602, inplace=False)
  (dropout2): Dropout(p=0.0, inplace=False)
  (dropout3): Dropout(p=0.0, inplace=False)
  (activation_fct): ReLU()
)

filename = "./figures/" + PREFIX
spot_tuner.plot_important_hyperparameter_contour(filename=filename)

l1:  100.0
epochs:  1.0091142515853175
batch_size:  1.0091142515853175
optimizer:  1.0091142515853175
dropout_prob:  1.0091142515853175
lr_mult:  2.966544621270671
patience:  1.0091142515853175
impo: [['l1', 100.0], ['epochs', 1.0091142515853175], ['batch_size', 1.0091142515853175], ['optimizer', 1.0091142515853175], ['dropout_prob', 1.0091142515853175], ['lr_mult', 2.966544621270671], ['patience', 1.0091142515853175]]
indices: [0, 5, 1, 2, 3, 4, 6]
indices after max_imp selection: [0, 5, 1, 2, 3, 4, 6]

24.10.2 Parallel Coordinates Plot

spot_tuner.parallel_plot()

Parallel coordinates plots

24.10.3 Cross Validation With Lightning

The KFold class from sklearn.model_selection is used to generate the folds for cross-validation.
These mechanism is used to generate the folds for the final evaluation of the model.
The CrossValidationDataModule class [SOURCE] is used to generate the folds for the hyperparameter tuning process.
It is called from the cv_model function [SOURCE].

from spotPython.light.cvmodel import cv_model
set_control_key_value(control_dict=fun_control,
                        key="k_folds",
                        value=2,
                        replace=True)
set_control_key_value(control_dict=fun_control,
                        key="test_size",
                        value=0.1,
                        replace=True)
cv_model(config, fun_control)

k: 0
Train Dataset Size: 221
Val Dataset Size: 221
train_model result: {'val_loss': 3406.28759765625, 'hp_metric': 3406.28759765625}
k: 1
Train Dataset Size: 221
Val Dataset Size: 221
train_model result: {'val_loss': 3142.47998046875, 'hp_metric': 3142.47998046875}

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric      ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric         │     3406.28759765625      │
│         val_loss          │     3406.28759765625      │
└───────────────────────────┴───────────────────────────┘

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric      ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric         │     3142.47998046875      │
│         val_loss          │     3142.47998046875      │
└───────────────────────────┴───────────────────────────┘

3274.3837890625

24.10.4 Plot all Combinations of Hyperparameters

Warning: this may take a while.

PLOT_ALL = False
if PLOT_ALL:
    n = spot_tuner.k
    for i in range(n-1):
        for j in range(i+1, n):
            spot_tuner.plot_contour(i=i, j=j, min_z=min_z, max_z = max_z)

24.10.5 Visualizing the Activation Distribution (Under Development)

Reference:

The following code is based on [PyTorch Lightning TUTORIAL 2: ACTIVATION FUNCTIONS], Author: Phillip Lippe, License: [CC BY-SA], Generated: 2023-03-15T09:52:39.179933.

After we have trained the models, we can look at the actual activation values that find inside the model. For instance, how many neurons are set to zero in ReLU? Where do we find most values in Tanh? To answer these questions, we can write a simple function which takes a trained model, applies it to a batch of images, and plots the histogram of the activations inside the network:

from spotPython.torch.activation import Sigmoid, Tanh, ReLU, LeakyReLU, ELU, Swish
act_fn_by_name = {"sigmoid": Sigmoid, "tanh": Tanh, "relu": ReLU, "leakyrelu": LeakyReLU, "elu": ELU, "swish": Swish}

from spotPython.hyperparameters.values import get_one_config_from_X
X = spot_tuner.to_all_dim(spot_tuner.min_X.reshape(1,-1))
config = get_one_config_from_X(X, fun_control)
model = fun_control["core_model"](**config, _L_in=64, _L_out=11, _torchmetric=TORCH_METRIC)
model

RNNLightRegression(
  (rnn_layer): RNN(64, 128, batch_first=True)
  (fc): Linear(in_features=128, out_features=128, bias=True)
  (output_layer): Linear(in_features=128, out_features=11, bias=True)
  (dropout1): Dropout(p=0.21164199382623602, inplace=False)
  (dropout2): Dropout(p=0.0, inplace=False)
  (dropout3): Dropout(p=0.0, inplace=False)
  (activation_fct): ReLU()
)

# from spotPython.utils.eda import visualize_activations
# visualize_activations(model, color=f"C{0}")