23  HPT PyTorch Lightning: Diabetes

In this tutorial, we will show how spotPython can be integrated into the PyTorch Lightning training workflow for a regression task.

This chapter describes the hyperparameter tuning of a PyTorch Lightning network on the Diabetes data set. This is a PyTorch Dataset for regression. A toy data set from scikit-learn. Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline.

23.1 Step 1: Setup

  • Before we consider the detailed experimental setup, we select the parameters that affect run time, initial design size, etc.
  • The parameter MAX_TIME specifies the maximum run time in seconds.
  • The parameter INIT_SIZE specifies the initial design size.
  • The parameter WORKERS specifies the number of workers.
  • The prefix PREFIX is used for the experiment name and the name of the log file.
  • The parameter DEVICE specifies the device to use for training.
from spotPython.utils.device import getDevice
from math import inf

MAX_TIME = 1
FUN_EVALS = inf
INIT_SIZE = 5
WORKERS = 0
PREFIX="031"
DEVICE = getDevice()
DEVICES = 1
TEST_SIZE = 0.1
TORCH_METRIC = "mean_squared_error"
Caution: Run time and initial design size should be increased for real experiments
  • MAX_TIME is set to one minute for demonstration purposes. For real experiments, this should be increased to at least 1 hour.
  • INIT_SIZE is set to 5 for demonstration purposes. For real experiments, this should be increased to at least 10.
  • WORKERS is set to 0 for demonstration purposes. For real experiments, this should be increased. See the warnings that are printed when the number of workers is set to 0.
Note: Device selection
  • Although there are no .cuda() or .to(device) calls required, because Lightning does these for you, see LIGHTNINGMODULE, we would like to know which device is used. Threrefore, we imitate the LightningModule behaviour which selects the highest device.
  • The method spotPython.utils.device.getDevice() returns the device that is used by Lightning.

23.2 Step 2: Initialization of the fun_control Dictionary

spotPython uses a Python dictionary for storing the information required for the hyperparameter tuning process.

from spotPython.utils.init import fun_control_init
import numpy as np
fun_control = fun_control_init(
    _L_in=10,
    _L_out=1,
    _torchmetric=TORCH_METRIC,
    PREFIX=PREFIX,
    TENSORBOARD_CLEAN=True,
    device=DEVICE,
    enable_progress_bar=False,
    fun_evals=FUN_EVALS,
    log_level=10,
    max_time=MAX_TIME,
    num_workers=WORKERS,
    show_progress=True,
    test_size=0.1,
    tolerance_x=np.sqrt(np.spacing(1)),
    )
Moving TENSORBOARD_PATH: runs/ to TENSORBOARD_PATH_OLD: runs_OLD/runs_2024_04_22_01_34_37
Created spot_tensorboard_path: runs/spot_logs/031_maans14_2024-04-22_01-34-37 for SummaryWriter()

23.3 Step 3: Loading the Diabetes Data Set

23.3.1 Data Exploration of the sklearn Diabetes Data Set

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
X, y = load_diabetes(return_X_y=True, as_frame=False)
feature_names = ["age", "sex", "bmi", "bp", "s1_tc", "s2_ldl", "s3_hdl", "s4_tch", "s5_ltg", "s6_glu"]

Note: * Each of these 10 feature variables have been mean centered and scaled by the standard deviation times the square root of n_samples (i.e. the sum of squares of each column totals 1).

fig, axs = plt.subplots(nrows = 5, ncols=2, figsize=(30, 20))
for i, (ax, col) in enumerate(zip(axs.flat, feature_names)):
    x = X[:,i]
    pf = np.polyfit(x, y, 1)
    p = np.poly1d(pf)

    ax.plot(x, y, 'o')
    ax.plot(x, p(x),"r--")

    ax.set_title(col + ' vs disease progression')
    ax.set_xlabel(col)
    ax.set_ylabel('disease progression')

  • HDL (high-density lipoprotein) cholesterol, sometimes called “good” cholesterol, absorbs cholesterol in the blood and carries it back to the liver.
  • The liver then flushes it from the body.
  • High levels of HDL cholesterol can lower your risk for heart disease and stroke.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TEST_SIZE, random_state=0)
from sklearn import linear_model

lin_regr = linear_model.LinearRegression()
lin_regr.fit(X_train, y_train)

# determine the mse of the model
from sklearn.metrics import mean_squared_error
y_pred = lin_regr.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean squared error: {mse}")
Mean squared error: 3111.965104291441
print(lin_regr.coef_)
[ -38.78231462 -236.13752074  529.51199073  329.38300284 -653.70276656
  384.03663363   49.11443103  139.13184584  718.01936255   75.34401057]
# plot the coefficients of the model
fig, ax = plt.subplots()
ax.bar(feature_names, lin_regr.coef_)
ax.set_title("Coefficients of the linear regression model")
ax.set_ylabel("Coefficient")
ax.set_xlabel("Feature")
plt.show()

  • Coefficients are indeed well suited to tell us what happens when we change the value of an input feature, but they are not a good means in themselves to measure the general importance of a feature.
  • This is because the value of each coefficient depends on the scale of the input features.
  • For example, if we were to measure the age of a person in minutes instead of years, then the coefficients for the feature “age” would be -38.78231462 / 525600 = 0.000073.
  • It is clear that the number of years is no more important than the number of minutes.
  • This means that the size of a coefficient is not necessarily a good measure of the importance of a feature in a linear model.
-38.78231462 / (3652460)
-1.0618135344398023e-05

23.3.2 Mutual Information

# determine the mutual information of the model
from sklearn.feature_selection import mutual_info_regression
mi = mutual_info_regression(X_train, y_train)
print(f"Mutual information: {mi}")
# generate a bar plot of the mutual information
plt.bar(feature_names, mi)
plt.ylabel('Mutual information')
plt.xlabel('Feature')
plt.title('Mutual information of features')
plt.show()
Mutual information: [0.02773406 0.03559276 0.17420237 0.04975074 0.09714525 0.
 0.08745141 0.10939959 0.12955598 0.14555097]

23.3.3 SHAP

  • SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions
import shap
# rather than use the whole training set to estimate expected values, we summarize with
# a set of weighted kmeans, each weighted by the number of points they represent.
X_train_summary = shap.kmeans(X_train, 10)
ex = shap.KernelExplainer(lin_regr.predict, X_train_summary)
shap_values = ex.shap_values(X_test)
shap.summary_plot(shap_values, X_test)

23.4 The PyTorch data set

from spotPython.hyperparameters.values import set_control_key_value
from spotPython.data.diabetes import Diabetes
dataset = Diabetes()
set_control_key_value(control_dict=fun_control,
                        key="data_set",
                        value=dataset,
                        replace=True)
print(len(dataset))
print(dataset.names)
442
['age', 'sex', 'bmi', 'bp', 's1_tc', 's2_ldl', 's3_hdl', 's4_tch', 's5_ltg', 's6_glu']
Note: Data Set and Data Loader
  • As shown below, a DataLoader from torch.utils.data can be used to check the data.
# Set batch size for DataLoader
batch_size = 5
# Create DataLoader
from torch.utils.data import DataLoader
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False)

# Iterate over the data in the DataLoader
for batch in dataloader:
    inputs, targets = batch
    print(f"Batch Size: {inputs.size(0)}")
    print(f"Inputs Shape: {inputs.shape}")
    print(f"Targets Shape: {targets.shape}")
    print("---------------")
    print(f"Inputs: {inputs}")
    print(f"Targets: {targets}")
    break
Batch Size: 5
Inputs Shape: torch.Size([5, 10])
Targets Shape: torch.Size([5])
---------------
Inputs: tensor([[ 0.0381,  0.0507,  0.0617,  0.0219, -0.0442, -0.0348, -0.0434, -0.0026,
          0.0199, -0.0176],
        [-0.0019, -0.0446, -0.0515, -0.0263, -0.0084, -0.0192,  0.0744, -0.0395,
         -0.0683, -0.0922],
        [ 0.0853,  0.0507,  0.0445, -0.0057, -0.0456, -0.0342, -0.0324, -0.0026,
          0.0029, -0.0259],
        [-0.0891, -0.0446, -0.0116, -0.0367,  0.0122,  0.0250, -0.0360,  0.0343,
          0.0227, -0.0094],
        [ 0.0054, -0.0446, -0.0364,  0.0219,  0.0039,  0.0156,  0.0081, -0.0026,
         -0.0320, -0.0466]])
Targets: tensor([151.,  75., 141., 206., 135.])

23.5 Step 4: Preprocessing

Preprocessing is handled by Lightning and PyTorch. It is described in the LIGHTNINGDATAMODULE documentation. Here you can find information about the transforms methods.

23.6 Step 5: Select the Core Model (algorithm) and core_model_hyper_dict

spotPython includes the NetLightRegression class [SOURCE] for configurable neural networks. The class is imported here. It inherits from the class Lightning.LightningModule, which is the base class for all models in Lightning. Lightning.LightningModule is a subclass of torch.nn.Module and provides additional functionality for the training and testing of neural networks. The class Lightning.LightningModule is described in the Lightning documentation.

  • Here we simply add the NN Model to the fun_control dictionary by calling the function add_core_model_to_fun_control:
from spotPython.light.regression.netlightregression import NetLightRegression
from spotPython.hyperdict.light_hyper_dict import LightHyperDict
from spotPython.hyperparameters.values import add_core_model_to_fun_control
add_core_model_to_fun_control(fun_control=fun_control,
                              core_model=NetLightRegression,
                              hyper_dict=LightHyperDict)

The hyperparameters of the model are specified in the core_model_hyper_dict dictionary [SOURCE].

23.7 Step 6: Modify hyper_dict Hyperparameters for the Selected Algorithm aka core_model

spotPython provides functions for modifying the hyperparameters, their bounds and factors as well as for activating and de-activating hyperparameters without re-compilation of the Python source code.

Caution: Small number of epochs for demonstration purposes
  • epochs and patience are set to small values for demonstration purposes. These values are too small for a real application.
  • More resonable values are, e.g.:
    • set_control_hyperparameter_value(fun_control, "epochs", [7, 9]) and
    • set_control_hyperparameter_value(fun_control, "patience", [2, 7])
from spotPython.hyperparameters.values import set_control_hyperparameter_value

set_control_hyperparameter_value(fun_control, "l1", [4, 6])
set_control_hyperparameter_value(fun_control, "epochs", [9, 10])
set_control_hyperparameter_value(fun_control, "batch_size", [4, 5])
set_control_hyperparameter_value(fun_control, "optimizer", [
                "Adadelta",
                "Adagrad",
                "Adam",
                "AdamW",
                "Adamax",
                "NAdam",
                "RAdam",
                "RMSprop",
                "Rprop"
            ])
set_control_hyperparameter_value(fun_control, "dropout_prob", [0.01, 0.1])
set_control_hyperparameter_value(fun_control, "lr_mult", [0.5, 5.0])
set_control_hyperparameter_value(fun_control, "patience", [5, 7])
set_control_hyperparameter_value(fun_control, "act_fn",[
                "Sigmoid",
                "ReLU",
                "LeakyReLU",
                "Swish"
            ] )
Setting hyperparameter l1 to value [4, 6].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter epochs to value [9, 10].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter batch_size to value [4, 5].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter optimizer to value ['Adadelta', 'Adagrad', 'Adam', 'AdamW', 'Adamax', 'NAdam', 'RAdam', 'RMSprop', 'Rprop'].
Variable type is factor.
Core type is str.
Calling modify_hyper_parameter_levels().
Setting hyperparameter dropout_prob to value [0.01, 0.1].
Variable type is float.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter lr_mult to value [0.5, 5.0].
Variable type is float.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter patience to value [5, 7].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter act_fn to value ['Sigmoid', 'ReLU', 'LeakyReLU', 'Swish'].
Variable type is factor.
Core type is instance().
Calling modify_hyper_parameter_levels().

Now, the dictionary fun_control contains all information needed for the hyperparameter tuning. Before the hyperparameter tuning is started, it is recommended to take a look at the experimental design. The method gen_design_table [SOURCE] generates a design table as follows:

from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control))
| name           | type   | default   |   lower |   upper | transform             |
|----------------|--------|-----------|---------|---------|-----------------------|
| l1             | int    | 3         |    4    |     6   | transform_power_2_int |
| epochs         | int    | 4         |    9    |    10   | transform_power_2_int |
| batch_size     | int    | 4         |    4    |     5   | transform_power_2_int |
| act_fn         | factor | ReLU      |    0    |     3   | None                  |
| optimizer      | factor | SGD       |    0    |     8   | None                  |
| dropout_prob   | float  | 0.01      |    0.01 |     0.1 | None                  |
| lr_mult        | float  | 1.0       |    0.5  |     5   | None                  |
| patience       | int    | 2         |    5    |     7   | transform_power_2_int |
| initialization | factor | Default   |    0    |     2   | None                  |

This allows to check if all information is available and if the information is correct.

Note: Hyperparameters of the Tuned Model and the fun_control Dictionary

The updated fun_control dictionary can be shown with the command fun_control["core_model_hyper_dict"].

23.8 Step 7: Data Splitting, the Objective (Loss) Function and the Metric

23.8.1 Evaluation

The evaluation procedure requires the specification of two elements:

  1. the way how the data is split into a train and a test set
  2. the loss function (and a metric).
Caution: Data Splitting in Lightning

The data splitting is handled by Lightning.

23.8.2 Loss Function

The loss function is specified in the configurable network class [SOURCE] We will use MSE.

23.8.3 Metric

  • Similar to the loss function, the metric is specified in the configurable network class [SOURCE].
Caution: Loss Function and Metric in Lightning
  • The loss function and the metric are not hyperparameters that can be tuned with spotPython.
  • They are handled by Lightning.

23.9 Step 8: Calling the SPOT Function

23.9.1 Preparing the SPOT Call

from spotPython.utils.init import design_control_init, surrogate_control_init
design_control = design_control_init(init_size=INIT_SIZE)

surrogate_control = surrogate_control_init(noise=True,
                                            n_theta=2)
Note: Modifying Values in the Control Dictionaries
  • The values in the control dictionaries can be modified with the function set_control_key_value [SOURCE], for example:
set_control_key_value(control_dict=surrogate_control,
                        key="noise",
                        value=True,
                        replace=True)
set_control_key_value(control_dict=surrogate_control,
                        key="n_theta",
                        value=2,
                        replace=True)

23.9.2 The Objective Function fun

The objective function fun from the class HyperLight [SOURCE] is selected next. It implements an interface from PyTorch’s training, validation, and testing methods to spotPython.

from spotPython.fun.hyperlight import HyperLight
fun = HyperLight(log_level=50).fun

23.9.3 Starting the Hyperparameter Tuning

The spotPython hyperparameter tuning is started by calling the Spot function [SOURCE].

from spotPython.spot import spot
spot_tuner = spot.Spot(fun=fun,
                       fun_control=fun_control,
                       design_control=design_control,
                       surrogate_control=surrogate_control)
spot_tuner.run()
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2733.956298828125, 'hp_metric': 2733.956298828125}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2853.91357421875, 'hp_metric': 2853.91357421875}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 3266.3662109375, 'hp_metric': 3266.3662109375}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 5135.49755859375, 'hp_metric': 5135.49755859375}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': nan, 'hp_metric': nan}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2650.9560546875, 'hp_metric': 2650.9560546875}
spotPython tuning: 2650.9560546875 [###-------] 31.31% 
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2707.06982421875, 'hp_metric': 2707.06982421875}
spotPython tuning: 2650.9560546875 [##########] 100.00% Done...

{'CHECKPOINT_PATH': 'runs/saved_models/',
 'DATASET_PATH': 'data/',
 'PREFIX': '031',
 'RESULTS_PATH': 'results/',
 'TENSORBOARD_PATH': 'runs/',
 '_L_in': 10,
 '_L_out': 1,
 '_torchmetric': 'mean_squared_error',
 'accelerator': 'auto',
 'converters': None,
 'core_model': <class 'spotPython.light.regression.netlightregression.NetLightRegression'>,
 'core_model_hyper_dict': {'act_fn': {'class_name': 'spotPython.torch.activation',
                                      'core_model_parameter_type': 'instance()',
                                      'default': 'ReLU',
                                      'levels': ['Sigmoid',
                                                 'ReLU',
                                                 'LeakyReLU',
                                                 'Swish'],
                                      'lower': 0,
                                      'transform': 'None',
                                      'type': 'factor',
                                      'upper': 3},
                           'batch_size': {'default': 4,
                                          'lower': 4,
                                          'transform': 'transform_power_2_int',
                                          'type': 'int',
                                          'upper': 5},
                           'dropout_prob': {'default': 0.01,
                                            'lower': 0.01,
                                            'transform': 'None',
                                            'type': 'float',
                                            'upper': 0.1},
                           'epochs': {'default': 4,
                                      'lower': 9,
                                      'transform': 'transform_power_2_int',
                                      'type': 'int',
                                      'upper': 10},
                           'initialization': {'core_model_parameter_type': 'str',
                                              'default': 'Default',
                                              'levels': ['Default',
                                                         'Kaiming',
                                                         'Xavier'],
                                              'lower': 0,
                                              'transform': 'None',
                                              'type': 'factor',
                                              'upper': 2},
                           'l1': {'default': 3,
                                  'lower': 4,
                                  'transform': 'transform_power_2_int',
                                  'type': 'int',
                                  'upper': 6},
                           'lr_mult': {'default': 1.0,
                                       'lower': 0.5,
                                       'transform': 'None',
                                       'type': 'float',
                                       'upper': 5.0},
                           'optimizer': {'class_name': 'torch.optim',
                                         'core_model_parameter_type': 'str',
                                         'default': 'SGD',
                                         'levels': ['Adadelta',
                                                    'Adagrad',
                                                    'Adam',
                                                    'AdamW',
                                                    'Adamax',
                                                    'NAdam',
                                                    'RAdam',
                                                    'RMSprop',
                                                    'Rprop'],
                                         'lower': 0,
                                         'transform': 'None',
                                         'type': 'factor',
                                         'upper': 8},
                           'patience': {'default': 2,
                                        'lower': 5,
                                        'transform': 'transform_power_2_int',
                                        'type': 'int',
                                        'upper': 7}},
 'core_model_hyper_dict_default': {'act_fn': {'class_name': 'spotPython.torch.activation',
                                              'core_model_parameter_type': 'instance()',
                                              'default': 'ReLU',
                                              'levels': ['Sigmoid',
                                                         'Tanh',
                                                         'ReLU',
                                                         'LeakyReLU',
                                                         'ELU',
                                                         'Swish'],
                                              'lower': 0,
                                              'transform': 'None',
                                              'type': 'factor',
                                              'upper': 5},
                                   'batch_size': {'default': 4,
                                                  'lower': 1,
                                                  'transform': 'transform_power_2_int',
                                                  'type': 'int',
                                                  'upper': 4},
                                   'dropout_prob': {'default': 0.01,
                                                    'lower': 0.0,
                                                    'transform': 'None',
                                                    'type': 'float',
                                                    'upper': 0.25},
                                   'epochs': {'default': 4,
                                              'lower': 4,
                                              'transform': 'transform_power_2_int',
                                              'type': 'int',
                                              'upper': 9},
                                   'initialization': {'core_model_parameter_type': 'str',
                                                      'default': 'Default',
                                                      'levels': ['Default',
                                                                 'Kaiming',
                                                                 'Xavier'],
                                                      'lower': 0,
                                                      'transform': 'None',
                                                      'type': 'factor',
                                                      'upper': 2},
                                   'l1': {'default': 3,
                                          'lower': 3,
                                          'transform': 'transform_power_2_int',
                                          'type': 'int',
                                          'upper': 8},
                                   'lr_mult': {'default': 1.0,
                                               'lower': 0.1,
                                               'transform': 'None',
                                               'type': 'float',
                                               'upper': 10.0},
                                   'optimizer': {'class_name': 'torch.optim',
                                                 'core_model_parameter_type': 'str',
                                                 'default': 'SGD',
                                                 'levels': ['Adadelta',
                                                            'Adagrad',
                                                            'Adam',
                                                            'AdamW',
                                                            'SparseAdam',
                                                            'Adamax',
                                                            'ASGD',
                                                            'NAdam',
                                                            'RAdam',
                                                            'RMSprop',
                                                            'Rprop',
                                                            'SGD'],
                                                 'lower': 0,
                                                 'transform': 'None',
                                                 'type': 'factor',
                                                 'upper': 11},
                                   'patience': {'default': 2,
                                                'lower': 2,
                                                'transform': 'transform_power_2_int',
                                                'type': 'int',
                                                'upper': 6}},
 'core_model_name': None,
 'counter': 6,
 'data': None,
 'data_dir': './data',
 'data_module': None,
 'data_set': <spotPython.data.diabetes.Diabetes object at 0x39f00d350>,
 'data_set_name': None,
 'db_dict_name': None,
 'design': None,
 'device': 'mps',
 'devices': 1,
 'enable_progress_bar': False,
 'eval': None,
 'fun_evals': inf,
 'fun_repeats': 1,
 'horizon': None,
 'infill_criterion': 'y',
 'k_folds': 3,
 'log_graph': False,
 'log_level': 10,
 'loss_function': None,
 'lower': array([3. , 4. , 1. , 0. , 0. , 0. , 0.1, 2. , 0. ]),
 'max_surrogate_points': 30,
 'max_time': 1,
 'metric_params': {},
 'metric_river': None,
 'metric_sklearn': None,
 'metric_sklearn_name': None,
 'metric_torch': None,
 'model_dict': {},
 'n_points': 1,
 'n_samples': None,
 'n_total': None,
 'noise': False,
 'num_workers': 0,
 'ocba_delta': 0,
 'oml_grace_period': None,
 'optimizer': None,
 'path': None,
 'prep_model': None,
 'prep_model_name': None,
 'progress_file': None,
 'save_model': False,
 'scenario': None,
 'seed': 123,
 'show_batch_interval': 1000000,
 'show_models': False,
 'show_progress': True,
 'shuffle': None,
 'sigma': 0.0,
 'spot_tensorboard_path': 'runs/spot_logs/031_maans14_2024-04-22_01-34-37',
 'spot_writer': <torch.utils.tensorboard.writer.SummaryWriter object at 0x391e18f50>,
 'target_column': None,
 'target_type': None,
 'task': None,
 'test': None,
 'test_seed': 1234,
 'test_size': 0.1,
 'tolerance_x': 1.4901161193847656e-08,
 'train': None,
 'upper': array([ 8.  ,  9.  ,  4.  ,  5.  , 11.  ,  0.25, 10.  ,  6.  ,  2.  ]),
 'var_name': ['l1',
              'epochs',
              'batch_size',
              'act_fn',
              'optimizer',
              'dropout_prob',
              'lr_mult',
              'patience',
              'initialization'],
 'var_type': ['int',
              'int',
              'int',
              'factor',
              'factor',
              'float',
              'float',
              'int',
              'factor'],
 'verbosity': 0,
 'weight_coeff': 0.0,
 'weights': 1.0,
 'weights_entry': None}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric              2733.956298828125     │
│         val_loss               2733.956298828125     │
└───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric              2853.91357421875      │
│         val_loss               2853.91357421875      │
└───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric               3266.3662109375      │
│         val_loss                3266.3662109375      │
└───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric              5135.49755859375      │
│         val_loss               5135.49755859375      │
└───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric                     nan            │
│         val_loss                      nan            │
└───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric               2650.9560546875      │
│         val_loss                2650.9560546875      │
└───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric              2707.06982421875      │
│         val_loss               2707.06982421875      │
└───────────────────────────┴───────────────────────────┘
<spotPython.spot.spot.Spot at 0x3d450f6d0>

23.10 Step 9: Tensorboard

The textual output shown in the console (or code cell) can be visualized with Tensorboard.

tensorboard --logdir="runs/"

Further information can be found in the PyTorch Lightning documentation for Tensorboard.

23.11 Load the saved experiment and get the hyperparameters (tuned architecture)

from spotPython.utils.file import load_experiment
import pprint
PREFIX="031"
experiment_name = "spot_" + PREFIX + "_experiment.pickle"
spot_tuner, fun_control, design_control, surrogate_control, optimizer_control = load_experiment(experiment_name)
from spotPython.hyperparameters.values import get_tuned_architecture
config = get_tuned_architecture(spot_tuner, fun_control)
pprint.pprint(config)
{'act_fn': ReLU(),
 'batch_size': 32,
 'dropout_prob': 0.04480646755985472,
 'epochs': 1024,
 'initialization': 'Default',
 'l1': 64,
 'lr_mult': 2.166650746218857,
 'optimizer': 'AdamW',
 'patience': 64}

23.12 Step 10: Results

After the hyperparameter tuning run is finished, the results can be analyzed.

spot_tuner.plot_progress(log_y=False)

Progress plot. Black dots denote results from the initial design. Red dots illustrate the improvement found by the surrogate model based optimization.
from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control=fun_control, spot=spot_tuner))
| name           | type   | default   |   lower |   upper | tuned               | transform             |   importance | stars   |
|----------------|--------|-----------|---------|---------|---------------------|-----------------------|--------------|---------|
| l1             | int    | 3         |     4.0 |     6.0 | 6.0                 | transform_power_2_int |         0.05 |         |
| epochs         | int    | 4         |     9.0 |    10.0 | 10.0                | transform_power_2_int |         0.05 |         |
| batch_size     | int    | 4         |     4.0 |     5.0 | 5.0                 | transform_power_2_int |         0.05 |         |
| act_fn         | factor | ReLU      |     0.0 |     3.0 | ReLU                | None                  |         0.05 |         |
| optimizer      | factor | SGD       |     0.0 |     8.0 | AdamW               | None                  |       100.00 | ***     |
| dropout_prob   | float  | 0.01      |    0.01 |     0.1 | 0.04480646755985472 | None                  |         0.05 |         |
| lr_mult        | float  | 1.0       |     0.5 |     5.0 | 2.166650746218857   | None                  |         0.05 |         |
| patience       | int    | 2         |     5.0 |     7.0 | 6.0                 | transform_power_2_int |         0.05 |         |
| initialization | factor | Default   |     0.0 |     2.0 | Default             | None                  |         0.05 |         |
spot_tuner.plot_importance(threshold=0.025)

Variable importance plot, threshold 0.025.

23.12.1 Contour Plots of the Hyperparameters

filename = None
spot_tuner.plot_important_hyperparameter_contour(filename=filename, max_imp=3)
l1:  0.05456390401259079
epochs:  0.05456390401259079
batch_size:  0.05456390401259079
act_fn:  0.05456390401259079
optimizer:  100.0
dropout_prob:  0.05456390401259079
lr_mult:  0.05456390401259079
patience:  0.05456390401259079
initialization:  0.05456390401259079
impo: [['l1', 0.05456390401259079], ['epochs', 0.05456390401259079], ['batch_size', 0.05456390401259079], ['act_fn', 0.05456390401259079], ['optimizer', 100.0], ['dropout_prob', 0.05456390401259079], ['lr_mult', 0.05456390401259079], ['patience', 0.05456390401259079], ['initialization', 0.05456390401259079]]
indices: [4, 0, 1, 2, 3, 5, 6, 7, 8]
indices after max_imp selection: [4, 0, 1]

Contour plots.

23.12.2 Parallel Coordinates Plot

spot_tuner.parallel_plot()

Parallel coordinates plots

23.12.3 Cross Validation With Lightning

  • The KFold class from sklearn.model_selection is used to generate the folds for cross-validation.
  • These mechanism is used to generate the folds for the final evaluation of the model.
  • The CrossValidationDataModule class [SOURCE] is used to generate the folds for the hyperparameter tuning process.
  • It is called from the cv_model function [SOURCE].
from spotPython.light.cvmodel import cv_model
from spotPython.hyperparameters.values import set_control_key_value
set_control_key_value(control_dict=fun_control,
                        key="k_folds",
                        value=2,
                        replace=True)
set_control_key_value(control_dict=fun_control,
                        key="test_size",
                        value=0.6,
                        replace=True)
cv_model(config, fun_control)
k: 0
Train Dataset Size: 221
Val Dataset Size: 221
train_model result: {'val_loss': 2929.623046875, 'hp_metric': 2929.623046875}
k: 1
Train Dataset Size: 221
Val Dataset Size: 221
train_model result: {'val_loss': 2964.51953125, 'hp_metric': 2964.51953125}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric               2929.623046875       │
│         val_loss                2929.623046875       │
└───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric                2964.51953125       │
│         val_loss                 2964.51953125       │
└───────────────────────────┴───────────────────────────┘
2947.0712890625

23.13 Test on the full data set

from spotPython.light.testmodel import test_model
test_model(config, fun_control)
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.16, val_size: 0.24 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 106
LightDataModule.train_dataloader(). data_train size: 71
LightDataModule.setup(): stage: TrainerFn.TESTING
test_size: 0.6 used for test dataset.
LightDataModule.test_dataloader(). Test set size: 266
test_model result: {'val_loss': 3239.71240234375, 'hp_metric': 3239.71240234375}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric               DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric              3239.71240234375      │
│         val_loss               3239.71240234375      │
└───────────────────────────┴───────────────────────────┘
(3239.71240234375, 3239.71240234375)

23.14 Load the last model

from spotPython.light.loadmodel import load_light_from_checkpoint
model_loaded = load_light_from_checkpoint(config, fun_control)
config: {'l1': 64, 'epochs': 1024, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.04480646755985472, 'lr_mult': 2.166650746218857, 'patience': 64, 'initialization': 'Default'}
Loading model with 64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TEST from runs/saved_models/64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TEST/last.ckpt
Model: NetLightRegression(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=64, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.04480646755985472, inplace=False)
    (3): Linear(in_features=64, out_features=32, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.04480646755985472, inplace=False)
    (6): Linear(in_features=32, out_features=32, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.04480646755985472, inplace=False)
    (9): Linear(in_features=32, out_features=16, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.04480646755985472, inplace=False)
    (12): Linear(in_features=16, out_features=1, bias=True)
  )
)

23.15 Attributions

from spotPython.utils.file import load_experiment
from spotPython.hyperparameters.values import get_tuned_architecture
from spotPython.plot.xai import get_attributions, plot_attributions

spot_tuner, fun_control, design_control, surrogate_control, optimizer_control = load_experiment("spot_031_experiment.pickle")
config = get_tuned_architecture(spot_tuner, fun_control)
feature_names = fun_control["data_set"].names

23.15.1 Integrated Gradients

df = get_attributions(spot_tuner, fun_control, attr_method="IntegratedGradients")
print(df)
plot_attributions(df)
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2771.93896484375, 'hp_metric': 2771.93896484375}
config: {'l1': 64, 'epochs': 1024, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.04480646755985472, 'lr_mult': 2.166650746218857, 'patience': 64, 'initialization': 'Default'}
Loading model with 64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN from runs/saved_models/64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN/last.ckpt
Model: NetLightRegression(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=64, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.04480646755985472, inplace=False)
    (3): Linear(in_features=64, out_features=32, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.04480646755985472, inplace=False)
    (6): Linear(in_features=32, out_features=32, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.04480646755985472, inplace=False)
    (9): Linear(in_features=32, out_features=16, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.04480646755985472, inplace=False)
    (12): Linear(in_features=16, out_features=1, bias=True)
  )
)
   Feature Index Feature  IntegratedGradientsAttribution
0              2     bmi                       26.410578
1              3      bp                       21.924248
2              0     age                       21.223116
3              8  s5_ltg                       18.085563
4              9  s6_glu                       13.316550
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric              2771.93896484375      │
│         val_loss               2771.93896484375      │
└───────────────────────────┴───────────────────────────┘

23.15.2 Deep Lift

df = get_attributions(spot_tuner, fun_control, attr_method="DeepLift")
print(df)
plot_attributions(df,  attr_method="DeepLift")
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2506.58203125, 'hp_metric': 2506.58203125}
config: {'l1': 64, 'epochs': 1024, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.04480646755985472, 'lr_mult': 2.166650746218857, 'patience': 64, 'initialization': 'Default'}
Loading model with 64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN from runs/saved_models/64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN/last.ckpt
Model: NetLightRegression(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=64, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.04480646755985472, inplace=False)
    (3): Linear(in_features=64, out_features=32, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.04480646755985472, inplace=False)
    (6): Linear(in_features=32, out_features=32, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.04480646755985472, inplace=False)
    (9): Linear(in_features=32, out_features=16, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.04480646755985472, inplace=False)
    (12): Linear(in_features=16, out_features=1, bias=True)
  )
)
   Feature Index Feature  DeepLiftAttribution
0              2     bmi            39.669300
1              3      bp            30.352863
2              0     age            28.014923
3              8  s5_ltg            27.539591
4              9  s6_glu            16.869137
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric                2506.58203125       │
│         val_loss                 2506.58203125       │
└───────────────────────────┴───────────────────────────┘

23.15.3 Feature Ablation

df = get_attributions(spot_tuner, fun_control, attr_method="FeatureAblation")
print(df)
plot_attributions(df, attr_method="FeatureAblation")
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2556.08251953125, 'hp_metric': 2556.08251953125}
config: {'l1': 64, 'epochs': 1024, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.04480646755985472, 'lr_mult': 2.166650746218857, 'patience': 64, 'initialization': 'Default'}
Loading model with 64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN from runs/saved_models/64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN/last.ckpt
Model: NetLightRegression(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=64, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.04480646755985472, inplace=False)
    (3): Linear(in_features=64, out_features=32, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.04480646755985472, inplace=False)
    (6): Linear(in_features=32, out_features=32, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.04480646755985472, inplace=False)
    (9): Linear(in_features=32, out_features=16, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.04480646755985472, inplace=False)
    (12): Linear(in_features=16, out_features=1, bias=True)
  )
)
   Feature Index Feature  FeatureAblationAttribution
0              2     bmi                   30.711384
1              3      bp                   25.567238
2              8  s5_ltg                   22.287161
3              0     age                   21.791470
4              9  s6_glu                   15.028356
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric              2556.08251953125      │
│         val_loss               2556.08251953125      │
└───────────────────────────┴───────────────────────────┘

23.16 Visualizing the Activations, Weights, and Gradients

In neural networks, activations, weights, and gradients are fundamental concepts that play different.

  1. Activations:

    Activations refer to the outputs of neurons after applying an activation function. In neural networks, the input passes through each neuron of the network layers, where each unit calculates a weighted sum of its inputs and then applies a non-linear activation function (such as ReLU, Sigmoid, or Tanh). These activation functions help introduce non-linearity into the model, enabling the neural network to learn complex relationships between the input data and the predictions. In short, activations are the outputs that are forwarded by the neurons after applying the activation function.

  2. Weights:

    Weights are parameters within a neural network that control the strength of the connection between two neurons in successive layers. They are adjusted during the training process to enable the neural network to perform the desired task as well as possible. Each input is multiplied by a weight, and the neural network learns by adjusting these weights based on the error between the predictions and the actual values. Adjusting the weights allows the network to recognize patterns and relationships in the input data and use them for predictions or classifications.

  3. Gradients:

    In the context of machine learning and specifically in neural networks, gradients are a measure of the rate of change or the slope of the loss function (a function that measures how well the network performs in predicting the desired output) with respect to the weights. During the training process, the goal is to minimize the value of the loss function to improve the model’s performance. The gradients indicate the direction and size of the steps that need to be taken to adjust the weights in a way that minimizes the loss (known as gradient descent). By repeatedly adjusting the weights in the opposite direction of the gradient, the network can be effectively trained to improve its prediction accuracy.

Reference:

After we have trained the models, we can look at the actual activation values that find inside the model. For instance, how many neurons are set to zero in ReLU? Where do we find most values in Tanh? To answer these questions, we can write a simple function which takes a trained model, applies it to a batch of images, and plots the histogram of the activations inside the network:

from spotPython.plot.xai import (get_activations, get_gradients, get_weights, plot_nn_values_hist, plot_nn_values_scatter, visualize_weights, visualize_gradients, visualize_activations, visualize_activations_distributions, visualize_gradient_distributions, visualize_weights_distributions)
import pprint
from spotPython.utils.file import load_experiment
PREFIX = "031"
experiment_name = "spot_" + PREFIX + "_experiment.pickle"
spot_tuner, fun_control, design_control, surrogate_control, optimizer_control = load_experiment(experiment_name)
from spotPython.hyperparameters.values import get_tuned_architecture
config = get_tuned_architecture(spot_tuner, fun_control)
pprint.pprint(config)
batch_size = config["batch_size"]
print(batch_size)
{'act_fn': ReLU(),
 'batch_size': 32,
 'dropout_prob': 0.04480646755985472,
 'epochs': 1024,
 'initialization': 'Default',
 'l1': 64,
 'lr_mult': 2.166650746218857,
 'optimizer': 'AdamW',
 'patience': 64}
32
from spotPython.light.loadmodel import load_light_from_checkpoint
model_loaded = load_light_from_checkpoint(config, fun_control)
model = model_loaded.to("cpu")
config: {'l1': 64, 'epochs': 1024, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.04480646755985472, 'lr_mult': 2.166650746218857, 'patience': 64, 'initialization': 'Default'}
Loading model with 64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TEST from runs/saved_models/64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TEST/last.ckpt
Model: NetLightRegression(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=64, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.04480646755985472, inplace=False)
    (3): Linear(in_features=64, out_features=32, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.04480646755985472, inplace=False)
    (6): Linear(in_features=32, out_features=32, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.04480646755985472, inplace=False)
    (9): Linear(in_features=32, out_features=16, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.04480646755985472, inplace=False)
    (12): Linear(in_features=16, out_features=1, bias=True)
  )
)

23.16.1 Weights

weights, index = get_weights(model, return_index=True)
print(index)
[0, 3, 6, 9, 12]
visualize_weights(model, absolute=True, cmap="gray", figsize=(6, 6))
640 values in Layer Layer 0.
36 padding values added.
676 values now in Layer Layer 0.
2048 values in Layer Layer 3.
68 padding values added.
2116 values now in Layer Layer 3.
1024 values in Layer Layer 6.
1024 values now in Layer Layer 6.
512 values in Layer Layer 9.
17 padding values added.
529 values now in Layer Layer 9.
16 values in Layer Layer 12.
16 values now in Layer Layer 12.

visualize_weights_distributions(model, color=f"C{0}")
n:5

23.16.2 Activations

activations = get_activations(model, fun_control=fun_control, batch_size=batch_size, device = "cpu")
net: NetLightRegression(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=64, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.04480646755985472, inplace=False)
    (3): Linear(in_features=64, out_features=32, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.04480646755985472, inplace=False)
    (6): Linear(in_features=32, out_features=32, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.04480646755985472, inplace=False)
    (9): Linear(in_features=32, out_features=16, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.04480646755985472, inplace=False)
    (12): Linear(in_features=16, out_features=1, bias=True)
  )
)
visualize_activations(model, fun_control=fun_control, batch_size=batch_size, device = "cpu", cmap="BlueWhiteRed", absolute=False)
net: NetLightRegression(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=64, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.04480646755985472, inplace=False)
    (3): Linear(in_features=64, out_features=32, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.04480646755985472, inplace=False)
    (6): Linear(in_features=32, out_features=32, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.04480646755985472, inplace=False)
    (9): Linear(in_features=32, out_features=16, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.04480646755985472, inplace=False)
    (12): Linear(in_features=16, out_features=1, bias=True)
  )
)
2048 values in Layer 0.
68 padding values added.
2116 values now in Layer 0.
1024 values in Layer 3.
1024 values now in Layer 3.
1024 values in Layer 6.
1024 values now in Layer 6.
512 values in Layer 9.
17 padding values added.
529 values now in Layer 9.
32 values in Layer 12.
4 padding values added.
36 values now in Layer 12.

  • Absolute values of the activations are plotted:
visualize_activations(model, fun_control=fun_control, batch_size=batch_size, device = "cpu", absolute=True)
net: NetLightRegression(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=64, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.04480646755985472, inplace=False)
    (3): Linear(in_features=64, out_features=32, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.04480646755985472, inplace=False)
    (6): Linear(in_features=32, out_features=32, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.04480646755985472, inplace=False)
    (9): Linear(in_features=32, out_features=16, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.04480646755985472, inplace=False)
    (12): Linear(in_features=16, out_features=1, bias=True)
  )
)
2048 values in Layer 0.
68 padding values added.
2116 values now in Layer 0.
1024 values in Layer 3.
1024 values now in Layer 3.
1024 values in Layer 6.
1024 values now in Layer 6.
512 values in Layer 9.
17 padding values added.
529 values now in Layer 9.
32 values in Layer 12.
4 padding values added.
36 values now in Layer 12.

visualize_activations_distributions(net=model, fun_control=fun_control, batch_size=batch_size, device="cpu", color="C0", columns=2)
net: NetLightRegression(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=64, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.04480646755985472, inplace=False)
    (3): Linear(in_features=64, out_features=32, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.04480646755985472, inplace=False)
    (6): Linear(in_features=32, out_features=32, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.04480646755985472, inplace=False)
    (9): Linear(in_features=32, out_features=16, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.04480646755985472, inplace=False)
    (12): Linear(in_features=16, out_features=1, bias=True)
  )
)
n:5

23.16.3 Gradients

gradients = get_gradients(model, fun_control, batch_size, device="cpu")
visualize_gradients(model, fun_control, batch_size, absolute=True, cmap="BlueWhiteRed", figsize=(6, 6))
640 values in Layer layers.0.weight.
36 padding values added.
676 values now in Layer layers.0.weight.
2048 values in Layer layers.3.weight.
68 padding values added.
2116 values now in Layer layers.3.weight.
1024 values in Layer layers.6.weight.
1024 values now in Layer layers.6.weight.
512 values in Layer layers.9.weight.
17 padding values added.
529 values now in Layer layers.9.weight.
16 values in Layer layers.12.weight.
16 values now in Layer layers.12.weight.

visualize_gradient_distributions(model, fun_control, batch_size=batch_size, color=f"C{0}")
n:5

24 Layer Conductance

from spotPython.plot.xai import get_weights_conductance_last_layer, plot_conductance_last_layer
w, c = get_weights_conductance_last_layer(spot_tuner, fun_control)
plot_conductance_last_layer(w,c)
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2681.980712890625, 'hp_metric': 2681.980712890625}
config: {'l1': 64, 'epochs': 1024, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.04480646755985472, 'lr_mult': 2.166650746218857, 'patience': 64, 'initialization': 'Default'}
Loading model with 64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN from runs/saved_models/64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN/last.ckpt
Model: NetLightRegression(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=64, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.04480646755985472, inplace=False)
    (3): Linear(in_features=64, out_features=32, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.04480646755985472, inplace=False)
    (6): Linear(in_features=32, out_features=32, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.04480646755985472, inplace=False)
    (9): Linear(in_features=32, out_features=16, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.04480646755985472, inplace=False)
    (12): Linear(in_features=16, out_features=1, bias=True)
  )
)
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2596.02734375, 'hp_metric': 2596.02734375}
config: {'l1': 64, 'epochs': 1024, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.04480646755985472, 'lr_mult': 2.166650746218857, 'patience': 64, 'initialization': 'Default'}
Loading model with 64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN from runs/saved_models/64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN/last.ckpt
Model: NetLightRegression(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=64, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.04480646755985472, inplace=False)
    (3): Linear(in_features=64, out_features=32, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.04480646755985472, inplace=False)
    (6): Linear(in_features=32, out_features=32, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.04480646755985472, inplace=False)
    (9): Linear(in_features=32, out_features=16, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.04480646755985472, inplace=False)
    (12): Linear(in_features=16, out_features=1, bias=True)
  )
)
Conductance analysis for layer:  Linear(in_features=16, out_features=1, bias=True)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric              2681.980712890625     │
│         val_loss               2681.980712890625     │
└───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      Validate metric             DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric                2596.02734375       │
│         val_loss                 2596.02734375       │
└───────────────────────────┴───────────────────────────┘