from spotPython.utils.device import getDevice
from math import inf
= 1
MAX_TIME = inf
FUN_EVALS = 5
INIT_SIZE = 0
WORKERS ="031"
PREFIX= getDevice()
DEVICE = 1
DEVICES = 0.1
TEST_SIZE = "mean_squared_error" TORCH_METRIC
23 HPT PyTorch Lightning: Diabetes
In this tutorial, we will show how spotPython
can be integrated into the PyTorch
Lightning training workflow for a regression task.
This chapter describes the hyperparameter tuning of a PyTorch Lightning
network on the Diabetes data set. This is a PyTorch Dataset for regression. A toy data set from scikit-learn. Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline.
23.1 Step 1: Setup
- Before we consider the detailed experimental setup, we select the parameters that affect run time, initial design size, etc.
- The parameter
MAX_TIME
specifies the maximum run time in seconds. - The parameter
INIT_SIZE
specifies the initial design size. - The parameter
WORKERS
specifies the number of workers. - The prefix
PREFIX
is used for the experiment name and the name of the log file. - The parameter
DEVICE
specifies the device to use for training.
MAX_TIME
is set to one minute for demonstration purposes. For real experiments, this should be increased to at least 1 hour.INIT_SIZE
is set to 5 for demonstration purposes. For real experiments, this should be increased to at least 10.WORKERS
is set to 0 for demonstration purposes. For real experiments, this should be increased. See the warnings that are printed when the number of workers is set to 0.
- Although there are no .cuda() or .to(device) calls required, because Lightning does these for you, see LIGHTNINGMODULE, we would like to know which device is used. Threrefore, we imitate the LightningModule behaviour which selects the highest device.
- The method
spotPython.utils.device.getDevice()
returns the device that is used by Lightning.
23.2 Step 2: Initialization of the fun_control
Dictionary
spotPython
uses a Python dictionary for storing the information required for the hyperparameter tuning process.
from spotPython.utils.init import fun_control_init
import numpy as np
= fun_control_init(
fun_control =10,
_L_in=1,
_L_out=TORCH_METRIC,
_torchmetric=PREFIX,
PREFIX=True,
TENSORBOARD_CLEAN=DEVICE,
device=False,
enable_progress_bar=FUN_EVALS,
fun_evals=10,
log_level=MAX_TIME,
max_time=WORKERS,
num_workers=True,
show_progress=0.1,
test_size=np.sqrt(np.spacing(1)),
tolerance_x )
Moving TENSORBOARD_PATH: runs/ to TENSORBOARD_PATH_OLD: runs_OLD/runs_2024_04_22_01_34_37
Created spot_tensorboard_path: runs/spot_logs/031_maans14_2024-04-22_01-34-37 for SummaryWriter()
23.3 Step 3: Loading the Diabetes Data Set
23.3.1 Data Exploration of the sklearn Diabetes Data Set
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
= load_diabetes(return_X_y=True, as_frame=False)
X, y = ["age", "sex", "bmi", "bp", "s1_tc", "s2_ldl", "s3_hdl", "s4_tch", "s5_ltg", "s6_glu"] feature_names
Note: * Each of these 10 feature variables have been mean centered and scaled by the standard deviation times the square root of n_samples (i.e. the sum of squares of each column totals 1).
= plt.subplots(nrows = 5, ncols=2, figsize=(30, 20))
fig, axs for i, (ax, col) in enumerate(zip(axs.flat, feature_names)):
= X[:,i]
x = np.polyfit(x, y, 1)
pf = np.poly1d(pf)
p
'o')
ax.plot(x, y, "r--")
ax.plot(x, p(x),
+ ' vs disease progression')
ax.set_title(col
ax.set_xlabel(col)'disease progression') ax.set_ylabel(
- HDL (high-density lipoprotein) cholesterol, sometimes called “good” cholesterol, absorbs cholesterol in the blood and carries it back to the liver.
- The liver then flushes it from the body.
- High levels of HDL cholesterol can lower your risk for heart disease and stroke.
= train_test_split(X, y, test_size=TEST_SIZE, random_state=0) X_train, X_test, y_train, y_test
from sklearn import linear_model
= linear_model.LinearRegression()
lin_regr
lin_regr.fit(X_train, y_train)
# determine the mse of the model
from sklearn.metrics import mean_squared_error
= lin_regr.predict(X_test)
y_pred = mean_squared_error(y_test, y_pred)
mse print(f"Mean squared error: {mse}")
Mean squared error: 3111.965104291441
print(lin_regr.coef_)
[ -38.78231462 -236.13752074 529.51199073 329.38300284 -653.70276656
384.03663363 49.11443103 139.13184584 718.01936255 75.34401057]
# plot the coefficients of the model
= plt.subplots()
fig, ax
ax.bar(feature_names, lin_regr.coef_)"Coefficients of the linear regression model")
ax.set_title("Coefficient")
ax.set_ylabel("Feature")
ax.set_xlabel( plt.show()
- Coefficients are indeed well suited to tell us what happens when we change the value of an input feature, but they are not a good means in themselves to measure the general importance of a feature.
- This is because the value of each coefficient depends on the scale of the input features.
- For example, if we were to measure the age of a person in minutes instead of years, then the coefficients for the feature “age” would be -38.78231462 / 525600 = 0.000073.
- It is clear that the number of years is no more important than the number of minutes.
- This means that the size of a coefficient is not necessarily a good measure of the importance of a feature in a linear model.
-38.78231462 / (3652460)
-1.0618135344398023e-05
23.3.2 Mutual Information
# determine the mutual information of the model
from sklearn.feature_selection import mutual_info_regression
= mutual_info_regression(X_train, y_train)
mi print(f"Mutual information: {mi}")
# generate a bar plot of the mutual information
plt.bar(feature_names, mi)'Mutual information')
plt.ylabel('Feature')
plt.xlabel('Mutual information of features')
plt.title( plt.show()
Mutual information: [0.02773406 0.03559276 0.17420237 0.04975074 0.09714525 0.
0.08745141 0.10939959 0.12955598 0.14555097]
23.3.3 SHAP
- SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions
import shap
# rather than use the whole training set to estimate expected values, we summarize with
# a set of weighted kmeans, each weighted by the number of points they represent.
= shap.kmeans(X_train, 10)
X_train_summary = shap.KernelExplainer(lin_regr.predict, X_train_summary)
ex = ex.shap_values(X_test)
shap_values shap.summary_plot(shap_values, X_test)
23.4 The PyTorch data set
from spotPython.hyperparameters.values import set_control_key_value
from spotPython.data.diabetes import Diabetes
= Diabetes()
dataset =fun_control,
set_control_key_value(control_dict="data_set",
key=dataset,
value=True)
replaceprint(len(dataset))
print(dataset.names)
442
['age', 'sex', 'bmi', 'bp', 's1_tc', 's2_ldl', 's3_hdl', 's4_tch', 's5_ltg', 's6_glu']
- As shown below, a DataLoader from
torch.utils.data
can be used to check the data.
# Set batch size for DataLoader
= 5
batch_size # Create DataLoader
from torch.utils.data import DataLoader
= DataLoader(dataset, batch_size=batch_size, shuffle=False)
dataloader
# Iterate over the data in the DataLoader
for batch in dataloader:
= batch
inputs, targets print(f"Batch Size: {inputs.size(0)}")
print(f"Inputs Shape: {inputs.shape}")
print(f"Targets Shape: {targets.shape}")
print("---------------")
print(f"Inputs: {inputs}")
print(f"Targets: {targets}")
break
Batch Size: 5
Inputs Shape: torch.Size([5, 10])
Targets Shape: torch.Size([5])
---------------
Inputs: tensor([[ 0.0381, 0.0507, 0.0617, 0.0219, -0.0442, -0.0348, -0.0434, -0.0026,
0.0199, -0.0176],
[-0.0019, -0.0446, -0.0515, -0.0263, -0.0084, -0.0192, 0.0744, -0.0395,
-0.0683, -0.0922],
[ 0.0853, 0.0507, 0.0445, -0.0057, -0.0456, -0.0342, -0.0324, -0.0026,
0.0029, -0.0259],
[-0.0891, -0.0446, -0.0116, -0.0367, 0.0122, 0.0250, -0.0360, 0.0343,
0.0227, -0.0094],
[ 0.0054, -0.0446, -0.0364, 0.0219, 0.0039, 0.0156, 0.0081, -0.0026,
-0.0320, -0.0466]])
Targets: tensor([151., 75., 141., 206., 135.])
23.5 Step 4: Preprocessing
Preprocessing is handled by Lightning
and PyTorch
. It is described in the LIGHTNINGDATAMODULE documentation. Here you can find information about the transforms
methods.
23.6 Step 5: Select the Core Model (algorithm
) and core_model_hyper_dict
spotPython
includes the NetLightRegression
class [SOURCE] for configurable neural networks. The class is imported here. It inherits from the class Lightning.LightningModule
, which is the base class for all models in Lightning
. Lightning.LightningModule
is a subclass of torch.nn.Module
and provides additional functionality for the training and testing of neural networks. The class Lightning.LightningModule
is described in the Lightning documentation.
- Here we simply add the NN Model to the fun_control dictionary by calling the function
add_core_model_to_fun_control
:
from spotPython.light.regression.netlightregression import NetLightRegression
from spotPython.hyperdict.light_hyper_dict import LightHyperDict
from spotPython.hyperparameters.values import add_core_model_to_fun_control
=fun_control,
add_core_model_to_fun_control(fun_control=NetLightRegression,
core_model=LightHyperDict) hyper_dict
The hyperparameters of the model are specified in the core_model_hyper_dict
dictionary [SOURCE].
23.7 Step 6: Modify hyper_dict
Hyperparameters for the Selected Algorithm aka core_model
spotPython
provides functions for modifying the hyperparameters, their bounds and factors as well as for activating and de-activating hyperparameters without re-compilation of the Python source code.
epochs
andpatience
are set to small values for demonstration purposes. These values are too small for a real application.- More resonable values are, e.g.:
set_control_hyperparameter_value(fun_control, "epochs", [7, 9])
andset_control_hyperparameter_value(fun_control, "patience", [2, 7])
from spotPython.hyperparameters.values import set_control_hyperparameter_value
"l1", [4, 6])
set_control_hyperparameter_value(fun_control, "epochs", [9, 10])
set_control_hyperparameter_value(fun_control, "batch_size", [4, 5])
set_control_hyperparameter_value(fun_control, "optimizer", [
set_control_hyperparameter_value(fun_control, "Adadelta",
"Adagrad",
"Adam",
"AdamW",
"Adamax",
"NAdam",
"RAdam",
"RMSprop",
"Rprop"
])"dropout_prob", [0.01, 0.1])
set_control_hyperparameter_value(fun_control, "lr_mult", [0.5, 5.0])
set_control_hyperparameter_value(fun_control, "patience", [5, 7])
set_control_hyperparameter_value(fun_control, "act_fn",[
set_control_hyperparameter_value(fun_control, "Sigmoid",
"ReLU",
"LeakyReLU",
"Swish"
] )
Setting hyperparameter l1 to value [4, 6].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter epochs to value [9, 10].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter batch_size to value [4, 5].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter optimizer to value ['Adadelta', 'Adagrad', 'Adam', 'AdamW', 'Adamax', 'NAdam', 'RAdam', 'RMSprop', 'Rprop'].
Variable type is factor.
Core type is str.
Calling modify_hyper_parameter_levels().
Setting hyperparameter dropout_prob to value [0.01, 0.1].
Variable type is float.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter lr_mult to value [0.5, 5.0].
Variable type is float.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter patience to value [5, 7].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter act_fn to value ['Sigmoid', 'ReLU', 'LeakyReLU', 'Swish'].
Variable type is factor.
Core type is instance().
Calling modify_hyper_parameter_levels().
Now, the dictionary fun_control
contains all information needed for the hyperparameter tuning. Before the hyperparameter tuning is started, it is recommended to take a look at the experimental design. The method gen_design_table
[SOURCE] generates a design table as follows:
from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control))
| name | type | default | lower | upper | transform |
|----------------|--------|-----------|---------|---------|-----------------------|
| l1 | int | 3 | 4 | 6 | transform_power_2_int |
| epochs | int | 4 | 9 | 10 | transform_power_2_int |
| batch_size | int | 4 | 4 | 5 | transform_power_2_int |
| act_fn | factor | ReLU | 0 | 3 | None |
| optimizer | factor | SGD | 0 | 8 | None |
| dropout_prob | float | 0.01 | 0.01 | 0.1 | None |
| lr_mult | float | 1.0 | 0.5 | 5 | None |
| patience | int | 2 | 5 | 7 | transform_power_2_int |
| initialization | factor | Default | 0 | 2 | None |
This allows to check if all information is available and if the information is correct.
fun_control
Dictionary
The updated fun_control
dictionary can be shown with the command fun_control["core_model_hyper_dict"]
.
23.8 Step 7: Data Splitting, the Objective (Loss) Function and the Metric
23.8.1 Evaluation
The evaluation procedure requires the specification of two elements:
- the way how the data is split into a train and a test set
- the loss function (and a metric).
The data splitting is handled by Lightning
.
23.8.2 Loss Function
The loss function is specified in the configurable network class [SOURCE] We will use MSE.
23.8.3 Metric
- Similar to the loss function, the metric is specified in the configurable network class [SOURCE].
- The loss function and the metric are not hyperparameters that can be tuned with
spotPython
. - They are handled by
Lightning
.
23.9 Step 8: Calling the SPOT Function
23.9.1 Preparing the SPOT Call
from spotPython.utils.init import design_control_init, surrogate_control_init
= design_control_init(init_size=INIT_SIZE)
design_control
= surrogate_control_init(noise=True,
surrogate_control =2) n_theta
- The values in the control dictionaries can be modified with the function
set_control_key_value
[SOURCE], for example:
set_control_key_value(control_dict=surrogate_control,
key="noise",
value=True,
replace=True)
set_control_key_value(control_dict=surrogate_control,
key="n_theta",
value=2,
replace=True)
23.9.2 The Objective Function fun
The objective function fun
from the class HyperLight
[SOURCE] is selected next. It implements an interface from PyTorch
’s training, validation, and testing methods to spotPython
.
from spotPython.fun.hyperlight import HyperLight
= HyperLight(log_level=50).fun fun
23.9.3 Starting the Hyperparameter Tuning
The spotPython
hyperparameter tuning is started by calling the Spot
function [SOURCE].
from spotPython.spot import spot
= spot.Spot(fun=fun,
spot_tuner =fun_control,
fun_control=design_control,
design_control=surrogate_control)
surrogate_control spot_tuner.run()
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2733.956298828125, 'hp_metric': 2733.956298828125}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2853.91357421875, 'hp_metric': 2853.91357421875}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 3266.3662109375, 'hp_metric': 3266.3662109375}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 5135.49755859375, 'hp_metric': 5135.49755859375}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': nan, 'hp_metric': nan}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2650.9560546875, 'hp_metric': 2650.9560546875}
spotPython tuning: 2650.9560546875 [###-------] 31.31%
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2707.06982421875, 'hp_metric': 2707.06982421875}
spotPython tuning: 2650.9560546875 [##########] 100.00% Done...
{'CHECKPOINT_PATH': 'runs/saved_models/',
'DATASET_PATH': 'data/',
'PREFIX': '031',
'RESULTS_PATH': 'results/',
'TENSORBOARD_PATH': 'runs/',
'_L_in': 10,
'_L_out': 1,
'_torchmetric': 'mean_squared_error',
'accelerator': 'auto',
'converters': None,
'core_model': <class 'spotPython.light.regression.netlightregression.NetLightRegression'>,
'core_model_hyper_dict': {'act_fn': {'class_name': 'spotPython.torch.activation',
'core_model_parameter_type': 'instance()',
'default': 'ReLU',
'levels': ['Sigmoid',
'ReLU',
'LeakyReLU',
'Swish'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 3},
'batch_size': {'default': 4,
'lower': 4,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 5},
'dropout_prob': {'default': 0.01,
'lower': 0.01,
'transform': 'None',
'type': 'float',
'upper': 0.1},
'epochs': {'default': 4,
'lower': 9,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 10},
'initialization': {'core_model_parameter_type': 'str',
'default': 'Default',
'levels': ['Default',
'Kaiming',
'Xavier'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 2},
'l1': {'default': 3,
'lower': 4,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 6},
'lr_mult': {'default': 1.0,
'lower': 0.5,
'transform': 'None',
'type': 'float',
'upper': 5.0},
'optimizer': {'class_name': 'torch.optim',
'core_model_parameter_type': 'str',
'default': 'SGD',
'levels': ['Adadelta',
'Adagrad',
'Adam',
'AdamW',
'Adamax',
'NAdam',
'RAdam',
'RMSprop',
'Rprop'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 8},
'patience': {'default': 2,
'lower': 5,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 7}},
'core_model_hyper_dict_default': {'act_fn': {'class_name': 'spotPython.torch.activation',
'core_model_parameter_type': 'instance()',
'default': 'ReLU',
'levels': ['Sigmoid',
'Tanh',
'ReLU',
'LeakyReLU',
'ELU',
'Swish'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 5},
'batch_size': {'default': 4,
'lower': 1,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 4},
'dropout_prob': {'default': 0.01,
'lower': 0.0,
'transform': 'None',
'type': 'float',
'upper': 0.25},
'epochs': {'default': 4,
'lower': 4,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 9},
'initialization': {'core_model_parameter_type': 'str',
'default': 'Default',
'levels': ['Default',
'Kaiming',
'Xavier'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 2},
'l1': {'default': 3,
'lower': 3,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 8},
'lr_mult': {'default': 1.0,
'lower': 0.1,
'transform': 'None',
'type': 'float',
'upper': 10.0},
'optimizer': {'class_name': 'torch.optim',
'core_model_parameter_type': 'str',
'default': 'SGD',
'levels': ['Adadelta',
'Adagrad',
'Adam',
'AdamW',
'SparseAdam',
'Adamax',
'ASGD',
'NAdam',
'RAdam',
'RMSprop',
'Rprop',
'SGD'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 11},
'patience': {'default': 2,
'lower': 2,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 6}},
'core_model_name': None,
'counter': 6,
'data': None,
'data_dir': './data',
'data_module': None,
'data_set': <spotPython.data.diabetes.Diabetes object at 0x39f00d350>,
'data_set_name': None,
'db_dict_name': None,
'design': None,
'device': 'mps',
'devices': 1,
'enable_progress_bar': False,
'eval': None,
'fun_evals': inf,
'fun_repeats': 1,
'horizon': None,
'infill_criterion': 'y',
'k_folds': 3,
'log_graph': False,
'log_level': 10,
'loss_function': None,
'lower': array([3. , 4. , 1. , 0. , 0. , 0. , 0.1, 2. , 0. ]),
'max_surrogate_points': 30,
'max_time': 1,
'metric_params': {},
'metric_river': None,
'metric_sklearn': None,
'metric_sklearn_name': None,
'metric_torch': None,
'model_dict': {},
'n_points': 1,
'n_samples': None,
'n_total': None,
'noise': False,
'num_workers': 0,
'ocba_delta': 0,
'oml_grace_period': None,
'optimizer': None,
'path': None,
'prep_model': None,
'prep_model_name': None,
'progress_file': None,
'save_model': False,
'scenario': None,
'seed': 123,
'show_batch_interval': 1000000,
'show_models': False,
'show_progress': True,
'shuffle': None,
'sigma': 0.0,
'spot_tensorboard_path': 'runs/spot_logs/031_maans14_2024-04-22_01-34-37',
'spot_writer': <torch.utils.tensorboard.writer.SummaryWriter object at 0x391e18f50>,
'target_column': None,
'target_type': None,
'task': None,
'test': None,
'test_seed': 1234,
'test_size': 0.1,
'tolerance_x': 1.4901161193847656e-08,
'train': None,
'upper': array([ 8. , 9. , 4. , 5. , 11. , 0.25, 10. , 6. , 2. ]),
'var_name': ['l1',
'epochs',
'batch_size',
'act_fn',
'optimizer',
'dropout_prob',
'lr_mult',
'patience',
'initialization'],
'var_type': ['int',
'int',
'int',
'factor',
'factor',
'float',
'float',
'int',
'factor'],
'verbosity': 0,
'weight_coeff': 0.0,
'weights': 1.0,
'weights_entry': None}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2733.956298828125 │ │ val_loss │ 2733.956298828125 │ └───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2853.91357421875 │ │ val_loss │ 2853.91357421875 │ └───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 3266.3662109375 │ │ val_loss │ 3266.3662109375 │ └───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 5135.49755859375 │ │ val_loss │ 5135.49755859375 │ └───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ nan │ │ val_loss │ nan │ └───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2650.9560546875 │ │ val_loss │ 2650.9560546875 │ └───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2707.06982421875 │ │ val_loss │ 2707.06982421875 │ └───────────────────────────┴───────────────────────────┘
<spotPython.spot.spot.Spot at 0x3d450f6d0>
23.10 Step 9: Tensorboard
The textual output shown in the console (or code cell) can be visualized with Tensorboard.
tensorboard --logdir="runs/"
Further information can be found in the PyTorch Lightning documentation for Tensorboard.
23.11 Load the saved experiment and get the hyperparameters (tuned architecture)
from spotPython.utils.file import load_experiment
import pprint
="031"
PREFIX= "spot_" + PREFIX + "_experiment.pickle"
experiment_name = load_experiment(experiment_name)
spot_tuner, fun_control, design_control, surrogate_control, optimizer_control from spotPython.hyperparameters.values import get_tuned_architecture
= get_tuned_architecture(spot_tuner, fun_control)
config pprint.pprint(config)
{'act_fn': ReLU(),
'batch_size': 32,
'dropout_prob': 0.04480646755985472,
'epochs': 1024,
'initialization': 'Default',
'l1': 64,
'lr_mult': 2.166650746218857,
'optimizer': 'AdamW',
'patience': 64}
23.12 Step 10: Results
After the hyperparameter tuning run is finished, the results can be analyzed.
=False) spot_tuner.plot_progress(log_y
from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control=fun_control, spot=spot_tuner))
| name | type | default | lower | upper | tuned | transform | importance | stars |
|----------------|--------|-----------|---------|---------|---------------------|-----------------------|--------------|---------|
| l1 | int | 3 | 4.0 | 6.0 | 6.0 | transform_power_2_int | 0.05 | |
| epochs | int | 4 | 9.0 | 10.0 | 10.0 | transform_power_2_int | 0.05 | |
| batch_size | int | 4 | 4.0 | 5.0 | 5.0 | transform_power_2_int | 0.05 | |
| act_fn | factor | ReLU | 0.0 | 3.0 | ReLU | None | 0.05 | |
| optimizer | factor | SGD | 0.0 | 8.0 | AdamW | None | 100.00 | *** |
| dropout_prob | float | 0.01 | 0.01 | 0.1 | 0.04480646755985472 | None | 0.05 | |
| lr_mult | float | 1.0 | 0.5 | 5.0 | 2.166650746218857 | None | 0.05 | |
| patience | int | 2 | 5.0 | 7.0 | 6.0 | transform_power_2_int | 0.05 | |
| initialization | factor | Default | 0.0 | 2.0 | Default | None | 0.05 | |
=0.025) spot_tuner.plot_importance(threshold
23.12.1 Contour Plots of the Hyperparameters
= None
filename =filename, max_imp=3) spot_tuner.plot_important_hyperparameter_contour(filename
l1: 0.05456390401259079
epochs: 0.05456390401259079
batch_size: 0.05456390401259079
act_fn: 0.05456390401259079
optimizer: 100.0
dropout_prob: 0.05456390401259079
lr_mult: 0.05456390401259079
patience: 0.05456390401259079
initialization: 0.05456390401259079
impo: [['l1', 0.05456390401259079], ['epochs', 0.05456390401259079], ['batch_size', 0.05456390401259079], ['act_fn', 0.05456390401259079], ['optimizer', 100.0], ['dropout_prob', 0.05456390401259079], ['lr_mult', 0.05456390401259079], ['patience', 0.05456390401259079], ['initialization', 0.05456390401259079]]
indices: [4, 0, 1, 2, 3, 5, 6, 7, 8]
indices after max_imp selection: [4, 0, 1]
23.12.2 Parallel Coordinates Plot
spot_tuner.parallel_plot()
Parallel coordinates plots
23.12.3 Cross Validation With Lightning
- The
KFold
class fromsklearn.model_selection
is used to generate the folds for cross-validation. - These mechanism is used to generate the folds for the final evaluation of the model.
- The
CrossValidationDataModule
class [SOURCE] is used to generate the folds for the hyperparameter tuning process. - It is called from the
cv_model
function [SOURCE].
from spotPython.light.cvmodel import cv_model
from spotPython.hyperparameters.values import set_control_key_value
=fun_control,
set_control_key_value(control_dict="k_folds",
key=2,
value=True)
replace=fun_control,
set_control_key_value(control_dict="test_size",
key=0.6,
value=True)
replace cv_model(config, fun_control)
k: 0
Train Dataset Size: 221
Val Dataset Size: 221
train_model result: {'val_loss': 2929.623046875, 'hp_metric': 2929.623046875}
k: 1
Train Dataset Size: 221
Val Dataset Size: 221
train_model result: {'val_loss': 2964.51953125, 'hp_metric': 2964.51953125}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2929.623046875 │ │ val_loss │ 2929.623046875 │ └───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2964.51953125 │ │ val_loss │ 2964.51953125 │ └───────────────────────────┴───────────────────────────┘
2947.0712890625
23.13 Test on the full data set
from spotPython.light.testmodel import test_model
test_model(config, fun_control)
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.16, val_size: 0.24 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 106
LightDataModule.train_dataloader(). data_train size: 71
LightDataModule.setup(): stage: TrainerFn.TESTING
test_size: 0.6 used for test dataset.
LightDataModule.test_dataloader(). Test set size: 266
test_model result: {'val_loss': 3239.71240234375, 'hp_metric': 3239.71240234375}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 3239.71240234375 │ │ val_loss │ 3239.71240234375 │ └───────────────────────────┴───────────────────────────┘
(3239.71240234375, 3239.71240234375)
23.14 Load the last model
from spotPython.light.loadmodel import load_light_from_checkpoint
= load_light_from_checkpoint(config, fun_control) model_loaded
config: {'l1': 64, 'epochs': 1024, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.04480646755985472, 'lr_mult': 2.166650746218857, 'patience': 64, 'initialization': 'Default'}
Loading model with 64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TEST from runs/saved_models/64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TEST/last.ckpt
Model: NetLightRegression(
(layers): Sequential(
(0): Linear(in_features=10, out_features=64, bias=True)
(1): ReLU()
(2): Dropout(p=0.04480646755985472, inplace=False)
(3): Linear(in_features=64, out_features=32, bias=True)
(4): ReLU()
(5): Dropout(p=0.04480646755985472, inplace=False)
(6): Linear(in_features=32, out_features=32, bias=True)
(7): ReLU()
(8): Dropout(p=0.04480646755985472, inplace=False)
(9): Linear(in_features=32, out_features=16, bias=True)
(10): ReLU()
(11): Dropout(p=0.04480646755985472, inplace=False)
(12): Linear(in_features=16, out_features=1, bias=True)
)
)
23.15 Attributions
from spotPython.utils.file import load_experiment
from spotPython.hyperparameters.values import get_tuned_architecture
from spotPython.plot.xai import get_attributions, plot_attributions
= load_experiment("spot_031_experiment.pickle")
spot_tuner, fun_control, design_control, surrogate_control, optimizer_control = get_tuned_architecture(spot_tuner, fun_control)
config = fun_control["data_set"].names feature_names
23.15.1 Integrated Gradients
= get_attributions(spot_tuner, fun_control, attr_method="IntegratedGradients")
df print(df)
plot_attributions(df)
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2771.93896484375, 'hp_metric': 2771.93896484375}
config: {'l1': 64, 'epochs': 1024, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.04480646755985472, 'lr_mult': 2.166650746218857, 'patience': 64, 'initialization': 'Default'}
Loading model with 64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN from runs/saved_models/64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN/last.ckpt
Model: NetLightRegression(
(layers): Sequential(
(0): Linear(in_features=10, out_features=64, bias=True)
(1): ReLU()
(2): Dropout(p=0.04480646755985472, inplace=False)
(3): Linear(in_features=64, out_features=32, bias=True)
(4): ReLU()
(5): Dropout(p=0.04480646755985472, inplace=False)
(6): Linear(in_features=32, out_features=32, bias=True)
(7): ReLU()
(8): Dropout(p=0.04480646755985472, inplace=False)
(9): Linear(in_features=32, out_features=16, bias=True)
(10): ReLU()
(11): Dropout(p=0.04480646755985472, inplace=False)
(12): Linear(in_features=16, out_features=1, bias=True)
)
)
Feature Index Feature IntegratedGradientsAttribution
0 2 bmi 26.410578
1 3 bp 21.924248
2 0 age 21.223116
3 8 s5_ltg 18.085563
4 9 s6_glu 13.316550
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2771.93896484375 │ │ val_loss │ 2771.93896484375 │ └───────────────────────────┴───────────────────────────┘
23.15.2 Deep Lift
= get_attributions(spot_tuner, fun_control, attr_method="DeepLift")
df print(df)
="DeepLift") plot_attributions(df, attr_method
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2506.58203125, 'hp_metric': 2506.58203125}
config: {'l1': 64, 'epochs': 1024, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.04480646755985472, 'lr_mult': 2.166650746218857, 'patience': 64, 'initialization': 'Default'}
Loading model with 64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN from runs/saved_models/64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN/last.ckpt
Model: NetLightRegression(
(layers): Sequential(
(0): Linear(in_features=10, out_features=64, bias=True)
(1): ReLU()
(2): Dropout(p=0.04480646755985472, inplace=False)
(3): Linear(in_features=64, out_features=32, bias=True)
(4): ReLU()
(5): Dropout(p=0.04480646755985472, inplace=False)
(6): Linear(in_features=32, out_features=32, bias=True)
(7): ReLU()
(8): Dropout(p=0.04480646755985472, inplace=False)
(9): Linear(in_features=32, out_features=16, bias=True)
(10): ReLU()
(11): Dropout(p=0.04480646755985472, inplace=False)
(12): Linear(in_features=16, out_features=1, bias=True)
)
)
Feature Index Feature DeepLiftAttribution
0 2 bmi 39.669300
1 3 bp 30.352863
2 0 age 28.014923
3 8 s5_ltg 27.539591
4 9 s6_glu 16.869137
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2506.58203125 │ │ val_loss │ 2506.58203125 │ └───────────────────────────┴───────────────────────────┘
23.15.3 Feature Ablation
= get_attributions(spot_tuner, fun_control, attr_method="FeatureAblation")
df print(df)
="FeatureAblation") plot_attributions(df, attr_method
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2556.08251953125, 'hp_metric': 2556.08251953125}
config: {'l1': 64, 'epochs': 1024, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.04480646755985472, 'lr_mult': 2.166650746218857, 'patience': 64, 'initialization': 'Default'}
Loading model with 64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN from runs/saved_models/64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN/last.ckpt
Model: NetLightRegression(
(layers): Sequential(
(0): Linear(in_features=10, out_features=64, bias=True)
(1): ReLU()
(2): Dropout(p=0.04480646755985472, inplace=False)
(3): Linear(in_features=64, out_features=32, bias=True)
(4): ReLU()
(5): Dropout(p=0.04480646755985472, inplace=False)
(6): Linear(in_features=32, out_features=32, bias=True)
(7): ReLU()
(8): Dropout(p=0.04480646755985472, inplace=False)
(9): Linear(in_features=32, out_features=16, bias=True)
(10): ReLU()
(11): Dropout(p=0.04480646755985472, inplace=False)
(12): Linear(in_features=16, out_features=1, bias=True)
)
)
Feature Index Feature FeatureAblationAttribution
0 2 bmi 30.711384
1 3 bp 25.567238
2 8 s5_ltg 22.287161
3 0 age 21.791470
4 9 s6_glu 15.028356
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2556.08251953125 │ │ val_loss │ 2556.08251953125 │ └───────────────────────────┴───────────────────────────┘
23.16 Visualizing the Activations, Weights, and Gradients
In neural networks, activations, weights, and gradients are fundamental concepts that play different.
Activations:
Activations refer to the outputs of neurons after applying an activation function. In neural networks, the input passes through each neuron of the network layers, where each unit calculates a weighted sum of its inputs and then applies a non-linear activation function (such as ReLU, Sigmoid, or Tanh). These activation functions help introduce non-linearity into the model, enabling the neural network to learn complex relationships between the input data and the predictions. In short, activations are the outputs that are forwarded by the neurons after applying the activation function.
Weights:
Weights are parameters within a neural network that control the strength of the connection between two neurons in successive layers. They are adjusted during the training process to enable the neural network to perform the desired task as well as possible. Each input is multiplied by a weight, and the neural network learns by adjusting these weights based on the error between the predictions and the actual values. Adjusting the weights allows the network to recognize patterns and relationships in the input data and use them for predictions or classifications.
Gradients:
In the context of machine learning and specifically in neural networks, gradients are a measure of the rate of change or the slope of the loss function (a function that measures how well the network performs in predicting the desired output) with respect to the weights. During the training process, the goal is to minimize the value of the loss function to improve the model’s performance. The gradients indicate the direction and size of the steps that need to be taken to adjust the weights in a way that minimizes the loss (known as gradient descent). By repeatedly adjusting the weights in the opposite direction of the gradient, the network can be effectively trained to improve its prediction accuracy.
- The following code is based on [PyTorch Lightning TUTORIAL 2: ACTIVATION FUNCTIONS], Author: Phillip Lippe, License: [CC BY-SA], Generated: 2023-03-15T09:52:39.179933.
After we have trained the models, we can look at the actual activation values that find inside the model. For instance, how many neurons are set to zero in ReLU? Where do we find most values in Tanh? To answer these questions, we can write a simple function which takes a trained model, applies it to a batch of images, and plots the histogram of the activations inside the network:
from spotPython.plot.xai import (get_activations, get_gradients, get_weights, plot_nn_values_hist, plot_nn_values_scatter, visualize_weights, visualize_gradients, visualize_activations, visualize_activations_distributions, visualize_gradient_distributions, visualize_weights_distributions)
import pprint
from spotPython.utils.file import load_experiment
= "031"
PREFIX = "spot_" + PREFIX + "_experiment.pickle"
experiment_name = load_experiment(experiment_name) spot_tuner, fun_control, design_control, surrogate_control, optimizer_control
from spotPython.hyperparameters.values import get_tuned_architecture
= get_tuned_architecture(spot_tuner, fun_control)
config
pprint.pprint(config)= config["batch_size"]
batch_size print(batch_size)
{'act_fn': ReLU(),
'batch_size': 32,
'dropout_prob': 0.04480646755985472,
'epochs': 1024,
'initialization': 'Default',
'l1': 64,
'lr_mult': 2.166650746218857,
'optimizer': 'AdamW',
'patience': 64}
32
from spotPython.light.loadmodel import load_light_from_checkpoint
= load_light_from_checkpoint(config, fun_control)
model_loaded = model_loaded.to("cpu") model
config: {'l1': 64, 'epochs': 1024, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.04480646755985472, 'lr_mult': 2.166650746218857, 'patience': 64, 'initialization': 'Default'}
Loading model with 64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TEST from runs/saved_models/64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TEST/last.ckpt
Model: NetLightRegression(
(layers): Sequential(
(0): Linear(in_features=10, out_features=64, bias=True)
(1): ReLU()
(2): Dropout(p=0.04480646755985472, inplace=False)
(3): Linear(in_features=64, out_features=32, bias=True)
(4): ReLU()
(5): Dropout(p=0.04480646755985472, inplace=False)
(6): Linear(in_features=32, out_features=32, bias=True)
(7): ReLU()
(8): Dropout(p=0.04480646755985472, inplace=False)
(9): Linear(in_features=32, out_features=16, bias=True)
(10): ReLU()
(11): Dropout(p=0.04480646755985472, inplace=False)
(12): Linear(in_features=16, out_features=1, bias=True)
)
)
23.16.1 Weights
= get_weights(model, return_index=True)
weights, index print(index)
[0, 3, 6, 9, 12]
=True, cmap="gray", figsize=(6, 6)) visualize_weights(model, absolute
640 values in Layer Layer 0.
36 padding values added.
676 values now in Layer Layer 0.
2048 values in Layer Layer 3.
68 padding values added.
2116 values now in Layer Layer 3.
1024 values in Layer Layer 6.
1024 values now in Layer Layer 6.
512 values in Layer Layer 9.
17 padding values added.
529 values now in Layer Layer 9.
16 values in Layer Layer 12.
16 values now in Layer Layer 12.
=f"C{0}") visualize_weights_distributions(model, color
n:5
23.16.2 Activations
= get_activations(model, fun_control=fun_control, batch_size=batch_size, device = "cpu") activations
net: NetLightRegression(
(layers): Sequential(
(0): Linear(in_features=10, out_features=64, bias=True)
(1): ReLU()
(2): Dropout(p=0.04480646755985472, inplace=False)
(3): Linear(in_features=64, out_features=32, bias=True)
(4): ReLU()
(5): Dropout(p=0.04480646755985472, inplace=False)
(6): Linear(in_features=32, out_features=32, bias=True)
(7): ReLU()
(8): Dropout(p=0.04480646755985472, inplace=False)
(9): Linear(in_features=32, out_features=16, bias=True)
(10): ReLU()
(11): Dropout(p=0.04480646755985472, inplace=False)
(12): Linear(in_features=16, out_features=1, bias=True)
)
)
=fun_control, batch_size=batch_size, device = "cpu", cmap="BlueWhiteRed", absolute=False) visualize_activations(model, fun_control
net: NetLightRegression(
(layers): Sequential(
(0): Linear(in_features=10, out_features=64, bias=True)
(1): ReLU()
(2): Dropout(p=0.04480646755985472, inplace=False)
(3): Linear(in_features=64, out_features=32, bias=True)
(4): ReLU()
(5): Dropout(p=0.04480646755985472, inplace=False)
(6): Linear(in_features=32, out_features=32, bias=True)
(7): ReLU()
(8): Dropout(p=0.04480646755985472, inplace=False)
(9): Linear(in_features=32, out_features=16, bias=True)
(10): ReLU()
(11): Dropout(p=0.04480646755985472, inplace=False)
(12): Linear(in_features=16, out_features=1, bias=True)
)
)
2048 values in Layer 0.
68 padding values added.
2116 values now in Layer 0.
1024 values in Layer 3.
1024 values now in Layer 3.
1024 values in Layer 6.
1024 values now in Layer 6.
512 values in Layer 9.
17 padding values added.
529 values now in Layer 9.
32 values in Layer 12.
4 padding values added.
36 values now in Layer 12.
- Absolute values of the activations are plotted:
=fun_control, batch_size=batch_size, device = "cpu", absolute=True) visualize_activations(model, fun_control
net: NetLightRegression(
(layers): Sequential(
(0): Linear(in_features=10, out_features=64, bias=True)
(1): ReLU()
(2): Dropout(p=0.04480646755985472, inplace=False)
(3): Linear(in_features=64, out_features=32, bias=True)
(4): ReLU()
(5): Dropout(p=0.04480646755985472, inplace=False)
(6): Linear(in_features=32, out_features=32, bias=True)
(7): ReLU()
(8): Dropout(p=0.04480646755985472, inplace=False)
(9): Linear(in_features=32, out_features=16, bias=True)
(10): ReLU()
(11): Dropout(p=0.04480646755985472, inplace=False)
(12): Linear(in_features=16, out_features=1, bias=True)
)
)
2048 values in Layer 0.
68 padding values added.
2116 values now in Layer 0.
1024 values in Layer 3.
1024 values now in Layer 3.
1024 values in Layer 6.
1024 values now in Layer 6.
512 values in Layer 9.
17 padding values added.
529 values now in Layer 9.
32 values in Layer 12.
4 padding values added.
36 values now in Layer 12.
=model, fun_control=fun_control, batch_size=batch_size, device="cpu", color="C0", columns=2) visualize_activations_distributions(net
net: NetLightRegression(
(layers): Sequential(
(0): Linear(in_features=10, out_features=64, bias=True)
(1): ReLU()
(2): Dropout(p=0.04480646755985472, inplace=False)
(3): Linear(in_features=64, out_features=32, bias=True)
(4): ReLU()
(5): Dropout(p=0.04480646755985472, inplace=False)
(6): Linear(in_features=32, out_features=32, bias=True)
(7): ReLU()
(8): Dropout(p=0.04480646755985472, inplace=False)
(9): Linear(in_features=32, out_features=16, bias=True)
(10): ReLU()
(11): Dropout(p=0.04480646755985472, inplace=False)
(12): Linear(in_features=16, out_features=1, bias=True)
)
)
n:5
23.16.3 Gradients
= get_gradients(model, fun_control, batch_size, device="cpu") gradients
=True, cmap="BlueWhiteRed", figsize=(6, 6)) visualize_gradients(model, fun_control, batch_size, absolute
640 values in Layer layers.0.weight.
36 padding values added.
676 values now in Layer layers.0.weight.
2048 values in Layer layers.3.weight.
68 padding values added.
2116 values now in Layer layers.3.weight.
1024 values in Layer layers.6.weight.
1024 values now in Layer layers.6.weight.
512 values in Layer layers.9.weight.
17 padding values added.
529 values now in Layer layers.9.weight.
16 values in Layer layers.12.weight.
16 values now in Layer layers.12.weight.
=batch_size, color=f"C{0}") visualize_gradient_distributions(model, fun_control, batch_size
n:5
24 Layer Conductance
from spotPython.plot.xai import get_weights_conductance_last_layer, plot_conductance_last_layer
= get_weights_conductance_last_layer(spot_tuner, fun_control)
w, c plot_conductance_last_layer(w,c)
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2681.980712890625, 'hp_metric': 2681.980712890625}
config: {'l1': 64, 'epochs': 1024, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.04480646755985472, 'lr_mult': 2.166650746218857, 'patience': 64, 'initialization': 'Default'}
Loading model with 64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN from runs/saved_models/64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN/last.ckpt
Model: NetLightRegression(
(layers): Sequential(
(0): Linear(in_features=10, out_features=64, bias=True)
(1): ReLU()
(2): Dropout(p=0.04480646755985472, inplace=False)
(3): Linear(in_features=64, out_features=32, bias=True)
(4): ReLU()
(5): Dropout(p=0.04480646755985472, inplace=False)
(6): Linear(in_features=32, out_features=32, bias=True)
(7): ReLU()
(8): Dropout(p=0.04480646755985472, inplace=False)
(9): Linear(in_features=32, out_features=16, bias=True)
(10): ReLU()
(11): Dropout(p=0.04480646755985472, inplace=False)
(12): Linear(in_features=16, out_features=1, bias=True)
)
)
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.81, val_size: 0.09 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 39
LightDataModule.train_dataloader(). data_train size: 359
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 39
train_model result: {'val_loss': 2596.02734375, 'hp_metric': 2596.02734375}
config: {'l1': 64, 'epochs': 1024, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'AdamW', 'dropout_prob': 0.04480646755985472, 'lr_mult': 2.166650746218857, 'patience': 64, 'initialization': 'Default'}
Loading model with 64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN from runs/saved_models/64_1024_32_ReLU_AdamW_0.0448_2.1667_64_Default_TRAIN/last.ckpt
Model: NetLightRegression(
(layers): Sequential(
(0): Linear(in_features=10, out_features=64, bias=True)
(1): ReLU()
(2): Dropout(p=0.04480646755985472, inplace=False)
(3): Linear(in_features=64, out_features=32, bias=True)
(4): ReLU()
(5): Dropout(p=0.04480646755985472, inplace=False)
(6): Linear(in_features=32, out_features=32, bias=True)
(7): ReLU()
(8): Dropout(p=0.04480646755985472, inplace=False)
(9): Linear(in_features=32, out_features=16, bias=True)
(10): ReLU()
(11): Dropout(p=0.04480646755985472, inplace=False)
(12): Linear(in_features=16, out_features=1, bias=True)
)
)
Conductance analysis for layer: Linear(in_features=16, out_features=1, bias=True)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2681.980712890625 │ │ val_loss │ 2681.980712890625 │ └───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 2596.02734375 │ │ val_loss │ 2596.02734375 │ └───────────────────────────┴───────────────────────────┘