from spotPython.utils.device import getDevice
from math import inf
= 1
MAX_TIME = inf
FUN_EVALS = 1
FUN_REPEATS = 0
OCBA_DELTA = 1
REPEATS = 3
INIT_SIZE = 0
WORKERS ="033"
PREFIX= getDevice()
DEVICE = 1
DEVICES = 0.3
TEST_SIZE = "mean_squared_error" TORCH_METRIC
25 HPT PyTorch Lightning: User Specified Data Set and Regression Model
In this tutorial, we will show how spotPython
can be integrated into the PyTorch
Lightning training workflow for a regression task with a user specified data set and a user specified regression model.
This chapter describes the hyperparameter tuning of a PyTorch Lightning
network on a user data set, which can be found in the subfolder of this notebook, userData
. The network can be found in the subfolder userModel
.
25.1 Step 1: Setup
- Before we consider the detailed experimental setup, we select the parameters that affect run time, initial design size, etc.
- The parameter
MAX_TIME
specifies the maximum run time in seconds. - The parameter
INIT_SIZE
specifies the initial design size. - The parameter
WORKERS
specifies the number of workers. - The prefix
PREFIX
is used for the experiment name and the name of the log file. - The parameter
DEVICE
specifies the device to use for training.
MAX_TIME
is set to one minute for demonstration purposes. For real experiments, this should be increased to at least 1 hour.INIT_SIZE
is set to a small value for demonstration purposes. For real experiments, this should be increased to at least 10.WORKERS
is set to 0 for demonstration purposes. For real experiments, this should be increased. See the warnings that are printed when the number of workers is set to 0.
- Although there are no .cuda() or .to(device) calls required, because Lightning does these for you, see LIGHTNINGMODULE, we would like to know which device is used. Threrefore, we imitate the LightningModule behaviour which selects the highest device.
- The method
spotPython.utils.device.getDevice()
returns the device that is used by Lightning.
25.2 Step 2: Initialization of the fun_control
Dictionary
spotPython
uses a Python dictionary for storing the information required for the hyperparameter tuning process.
from spotPython.utils.init import fun_control_init
import numpy as np
= fun_control_init(
fun_control =6,
_L_in=1,
_L_out=TORCH_METRIC,
_torchmetric=PREFIX,
PREFIX=True,
TENSORBOARD_CLEAN=DEVICE,
device=False,
enable_progress_bar=FUN_EVALS,
fun_evals=FUN_REPEATS,
fun_repeats=50,
log_level=MAX_TIME,
max_time=WORKERS,
num_workers= OCBA_DELTA,
ocba_delta =True,
show_progress=TEST_SIZE,
test_size=np.sqrt(np.spacing(1)),
tolerance_x=1,
verbosity )
Moving TENSORBOARD_PATH: runs/ to TENSORBOARD_PATH_OLD: runs_OLD/runs_2024_04_22_02_01_25
Created spot_tensorboard_path: runs/spot_logs/033_maans14_2024-04-22_02-01-25 for SummaryWriter()
25.3 Step 3: Loading the User Specified Data Set
# from spotPython.hyperparameters.values import set_control_key_value
# from spotPython.data.pkldataset import PKLDataset
# import torch
# dataset = PKLDataset(directory="./userData/",
# filename="data_sensitive.pkl",
# target_column='N',
# feature_type=torch.float32,
# target_type=torch.float32,
# rmNA=True)
# set_control_key_value(control_dict=fun_control,
# key="data_set",
# value=dataset,
# replace=True)
# print(len(dataset))
- As shown below, a DataLoader from
torch.utils.data
can be used to check the data.
# if the package pyhcf is installed then print "pyhcf is installed" else print "pyhcf is not installed"
try:
import pyhcf
print("pyhcf is installed")
from pyhcf.data.loadHcfData import load_hcf_data
= load_hcf_data(A=True, H=True,
dataset =['H', 'D', 'L', 'K', 'E', 'I', 'N'],
param_list='N', rmNA=True, rmMF=True, scale_data=True, return_X_y=False)
targetexcept ImportError:
print("pyhcf is not installed")
from spotPython.data.pkldataset import PKLDataset
import torch
= PKLDataset(directory="./userData/",
dataset ="data_sensitive.pkl",
filename='N',
target_column=torch.float32,
feature_type=torch.float32,
target_type=True) rmNA
pyhcf is installed
Loading data for ['H', 'D', 'L', 'K', 'E', 'I', 'N']...
from spotPython.hyperparameters.values import set_control_key_value
=fun_control,
set_control_key_value(control_dict="data_set",
key=dataset,
value=True)
replaceprint(len(dataset))
41837
# Set batch size for DataLoader
= 5
batch_size # Create DataLoader
from torch.utils.data import DataLoader
= DataLoader(dataset, batch_size=batch_size, shuffle=False)
dataloader
# Iterate over the data in the DataLoader
for batch in dataloader:
= batch
inputs, targets print(f"Batch Size: {inputs.size(0)}")
print(f"Inputs Shape: {inputs.shape}")
print(f"Targets Shape: {targets.shape}")
print("---------------")
print(f"Inputs: {inputs}")
print(f"Targets: {targets}")
break
Batch Size: 5
Inputs Shape: torch.Size([5, 6])
Targets Shape: torch.Size([5])
---------------
Inputs: tensor([[0.0033, 0.4000, 0.0000, 0.7500, 1.0000, 0.1667],
[0.0246, 0.4000, 0.0435, 0.7500, 1.0000, 0.1667],
[0.0275, 0.4000, 0.0435, 0.7500, 1.0000, 0.1667],
[0.0285, 0.4000, 0.0435, 0.7500, 1.0000, 0.1667],
[0.0285, 0.4000, 0.0435, 0.7500, 1.0000, 0.1667]])
Targets: tensor([4.5764, 4.9073, 6.2846, 5.5094, 5.6079])
25.4 Step 4: Preprocessing
Preprocessing is handled by Lightning
and PyTorch
. It is described in the LIGHTNINGDATAMODULE documentation. Here you can find information about the transforms
methods.
25.5 Step 5: Select the Core Model (algorithm
) and core_model_hyper_dict
spotPython
includes the NetLightRegression
class [SOURCE] for configurable neural networks. The class is imported here. It inherits from the class Lightning.LightningModule
, which is the base class for all models in Lightning
. Lightning.LightningModule
is a subclass of torch.nn.Module
and provides additional functionality for the training and testing of neural networks. The class Lightning.LightningModule
is described in the Lightning documentation.
- Here we simply add the NN Model to the fun_control dictionary by calling the function
add_core_model_to_fun_control
:
We can use aconfiguration from the spotPython
package:
from spotPython.light.regression.netlightregression import NetLightRegression
from spotPython.hyperdict.light_hyper_dict import LightHyperDict
from spotPython.hyperparameters.values import add_core_model_to_fun_control
=fun_control,
add_core_model_to_fun_control(fun_control=NetLightRegression,
core_model=LightHyperDict) hyper_dict
- Alternatively, we can use a userr configuration from the subdirectory
userModel
:
from spotPython.hyperparameters.values import add_core_model_to_fun_control
import sys
0, './userModel')
sys.path.insert(import netlightregression
import light_hyper_dict
=fun_control,
add_core_model_to_fun_control(fun_control=netlightregression.NetLightRegression,
core_model=light_hyper_dict.LightHyperDict) hyper_dict
The hyperparameters of the model are specified in the core_model_hyper_dict
dictionary [SOURCE].
25.6 Step 6: Modify hyper_dict
Hyperparameters for the Selected Algorithm aka core_model
spotPython
provides functions for modifying the hyperparameters, their bounds and factors as well as for activating and de-activating hyperparameters without re-compilation of the Python source code.
epochs
andpatience
are set to small values for demonstration purposes. These values are too small for a real application.- More resonable values are, e.g.:
set_control_hyperparameter_value(fun_control, "epochs", [7, 9])
andset_control_hyperparameter_value(fun_control, "patience", [2, 7])
- The following hyperparameters {Table 25.1} have generated acceptable results (obtained in in pre-experimental runs):
Hyperparameter | Value |
---|---|
act_fn |
LeakyReLU |
batch_size |
16 |
dropout_prob |
0.01 |
epochs |
512 |
initialization |
Default |
l1 |
128 |
lr_mult |
0.5 |
optimizer |
Adagrad |
patience |
16 |
Therefore, we will use these values as the starting poing for the hyperparameter tuning.
from spotPython.hyperparameters.values import set_control_hyperparameter_value
"l1", [3, 4])
set_control_hyperparameter_value(fun_control, "epochs", [2, 4])
set_control_hyperparameter_value(fun_control, "batch_size", [3, 6])
set_control_hyperparameter_value(fun_control, "optimizer", [
set_control_hyperparameter_value(fun_control, "Adadelta",
"Adamax",
"Adagrad"
])"dropout_prob", [0.005, 0.25])
set_control_hyperparameter_value(fun_control, "lr_mult", [0.25, 5.0])
set_control_hyperparameter_value(fun_control, "patience", [2, 3])
set_control_hyperparameter_value(fun_control, "act_fn",[
set_control_hyperparameter_value(fun_control, "ReLU",
"LeakyReLU",
] )"initialization",["Default"] ) set_control_hyperparameter_value(fun_control,
Setting hyperparameter l1 to value [3, 4].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter epochs to value [2, 4].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter batch_size to value [3, 6].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter optimizer to value ['Adadelta', 'Adamax', 'Adagrad'].
Variable type is factor.
Core type is str.
Calling modify_hyper_parameter_levels().
Setting hyperparameter dropout_prob to value [0.005, 0.25].
Variable type is float.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter lr_mult to value [0.25, 5.0].
Variable type is float.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter patience to value [2, 3].
Variable type is int.
Core type is None.
Calling modify_hyper_parameter_bounds().
Setting hyperparameter act_fn to value ['ReLU', 'LeakyReLU'].
Variable type is factor.
Core type is instance().
Calling modify_hyper_parameter_levels().
Setting hyperparameter initialization to value ['Default'].
Variable type is factor.
Core type is str.
Calling modify_hyper_parameter_levels().
Now, the dictionary fun_control
contains all information needed for the hyperparameter tuning. Before the hyperparameter tuning is started, it is recommended to take a look at the experimental design. The method gen_design_table
[SOURCE] generates a design table as follows:
from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control))
| name | type | default | lower | upper | transform |
|----------------|--------|-----------|---------|---------|-----------------------|
| l1 | int | 3 | 3 | 4 | transform_power_2_int |
| epochs | int | 4 | 2 | 4 | transform_power_2_int |
| batch_size | int | 4 | 3 | 6 | transform_power_2_int |
| act_fn | factor | ReLU | 0 | 1 | None |
| optimizer | factor | SGD | 0 | 2 | None |
| dropout_prob | float | 0.01 | 0.005 | 0.25 | None |
| lr_mult | float | 1.0 | 0.25 | 5 | None |
| patience | int | 2 | 2 | 3 | transform_power_2_int |
| initialization | factor | Default | 0 | 0 | None |
This allows to check if all information is available and if the information is correct.
fun_control
Dictionary
The updated fun_control
dictionary can be shown with the command fun_control["core_model_hyper_dict"]
.
25.7 Step 7: Data Splitting, the Objective (Loss) Function and the Metric
25.7.1 Evaluation
The evaluation procedure requires the specification of two elements:
- the way how the data is split into a train and a test set
- the loss function (and a metric).
The data splitting is handled by Lightning
.
25.7.2 Loss Function
The loss function is specified in the configurable network class [SOURCE] We will use MSE.
25.7.3 Metric
- Similar to the loss function, the metric is specified in the configurable network class [SOURCE].
- The loss function and the metric are not hyperparameters that can be tuned with
spotPython
. - They are handled by
Lightning
.
25.8 Step 8: Calling the SPOT Function
25.8.1 Preparing the SPOT Call
from spotPython.utils.init import design_control_init, surrogate_control_init
= design_control_init(init_size=INIT_SIZE,
design_control =REPEATS,)
repeats
= surrogate_control_init(noise=True,
surrogate_control =2,
n_theta=1e-6,
min_Lambda=10,
max_Lambda=50,) log_level
- The values in the control dictionaries can be modified with the function
set_control_key_value
[SOURCE], for example:
set_control_key_value(control_dict=surrogate_control,
key="noise",
value=True,
replace=True)
set_control_key_value(control_dict=surrogate_control,
key="n_theta",
value=2,
replace=True)
25.8.2 The Objective Function fun
The objective function fun
from the class HyperLight
[SOURCE] is selected next. It implements an interface from PyTorch
’s training, validation, and testing methods to spotPython
.
from spotPython.fun.hyperlight import HyperLight
= HyperLight(log_level=50).fun fun
25.8.3 Showing the fun_control Dictionary
import pprint
pprint.pprint(fun_control)
{'CHECKPOINT_PATH': 'runs/saved_models/',
'DATASET_PATH': 'data/',
'PREFIX': '033',
'RESULTS_PATH': 'results/',
'TENSORBOARD_PATH': 'runs/',
'_L_in': 6,
'_L_out': 1,
'_torchmetric': 'mean_squared_error',
'accelerator': 'auto',
'converters': None,
'core_model': <class 'netlightregression.NetLightRegression'>,
'core_model_hyper_dict': {'act_fn': {'class_name': 'spotPython.torch.activation',
'core_model_parameter_type': 'instance()',
'default': 'ReLU',
'levels': ['ReLU', 'LeakyReLU'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 1},
'batch_size': {'default': 4,
'lower': 3,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 6},
'dropout_prob': {'default': 0.01,
'lower': 0.005,
'transform': 'None',
'type': 'float',
'upper': 0.25},
'epochs': {'default': 4,
'lower': 2,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 4},
'initialization': {'core_model_parameter_type': 'str',
'default': 'Default',
'levels': ['Default'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 0},
'l1': {'default': 3,
'lower': 3,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 4},
'lr_mult': {'default': 1.0,
'lower': 0.25,
'transform': 'None',
'type': 'float',
'upper': 5.0},
'optimizer': {'class_name': 'torch.optim',
'core_model_parameter_type': 'str',
'default': 'SGD',
'levels': ['Adadelta',
'Adamax',
'Adagrad'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 2},
'patience': {'default': 2,
'lower': 2,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 3}},
'core_model_hyper_dict_default': {'act_fn': {'class_name': 'spotPython.torch.activation',
'core_model_parameter_type': 'instance()',
'default': 'ReLU',
'levels': ['Sigmoid',
'Tanh',
'ReLU',
'LeakyReLU',
'ELU',
'Swish'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 5},
'batch_size': {'default': 4,
'lower': 1,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 4},
'dropout_prob': {'default': 0.01,
'lower': 0.0,
'transform': 'None',
'type': 'float',
'upper': 0.25},
'epochs': {'default': 4,
'lower': 4,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 9},
'initialization': {'core_model_parameter_type': 'str',
'default': 'Default',
'levels': ['Default',
'Kaiming',
'Xavier'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 2},
'l1': {'default': 3,
'lower': 3,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 8},
'lr_mult': {'default': 1.0,
'lower': 0.1,
'transform': 'None',
'type': 'float',
'upper': 10.0},
'optimizer': {'class_name': 'torch.optim',
'core_model_parameter_type': 'str',
'default': 'SGD',
'levels': ['Adadelta',
'Adagrad',
'Adam',
'AdamW',
'SparseAdam',
'Adamax',
'ASGD',
'NAdam',
'RAdam',
'RMSprop',
'Rprop',
'SGD'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 11},
'patience': {'default': 2,
'lower': 2,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 6}},
'core_model_name': None,
'counter': 0,
'data': None,
'data_dir': './data',
'data_module': None,
'data_set': <torch.utils.data.dataset.TensorDataset object at 0x38f6f9110>,
'data_set_name': None,
'db_dict_name': None,
'design': None,
'device': 'mps',
'devices': 1,
'enable_progress_bar': False,
'eval': None,
'fun_evals': inf,
'fun_repeats': 1,
'horizon': None,
'infill_criterion': 'y',
'k_folds': 3,
'log_graph': False,
'log_level': 50,
'loss_function': None,
'lower': array([3. , 4. , 1. , 0. , 0. , 0. , 0.1, 2. , 0. ]),
'max_surrogate_points': 30,
'max_time': 1,
'metric_params': {},
'metric_river': None,
'metric_sklearn': None,
'metric_sklearn_name': None,
'metric_torch': None,
'model_dict': {},
'n_points': 1,
'n_samples': None,
'n_total': None,
'noise': False,
'num_workers': 0,
'ocba_delta': 0,
'oml_grace_period': None,
'optimizer': None,
'path': None,
'prep_model': None,
'prep_model_name': None,
'progress_file': None,
'save_model': False,
'scenario': None,
'seed': 123,
'show_batch_interval': 1000000,
'show_models': False,
'show_progress': True,
'shuffle': None,
'sigma': 0.0,
'spot_tensorboard_path': 'runs/spot_logs/033_maans14_2024-04-22_02-01-25',
'spot_writer': <torch.utils.tensorboard.writer.SummaryWriter object at 0x1753aab50>,
'target_column': None,
'target_type': None,
'task': None,
'test': None,
'test_seed': 1234,
'test_size': 0.3,
'tolerance_x': 1.4901161193847656e-08,
'train': None,
'upper': array([ 8. , 9. , 4. , 5. , 11. , 0.25, 10. , 6. , 2. ]),
'var_name': ['l1',
'epochs',
'batch_size',
'act_fn',
'optimizer',
'dropout_prob',
'lr_mult',
'patience',
'initialization'],
'var_type': ['int',
'int',
'int',
'factor',
'factor',
'float',
'float',
'int',
'factor'],
'verbosity': 1,
'weight_coeff': 0.0,
'weights': 1.0,
'weights_entry': None}
25.8.4 Starting the Hyperparameter Tuning
The spotPython
hyperparameter tuning is started by calling the Spot
function [SOURCE].
from spotPython.spot import spot
= spot.Spot(fun=fun,
spot_tuner =fun_control,
fun_control=design_control,
design_control=surrogate_control)
surrogate_control spot_tuner.run()
In fun(): config:
{'act_fn': LeakyReLU(),
'batch_size': 16,
'dropout_prob': 0.020345615289778483,
'epochs': 8,
'initialization': 'Default',
'l1': 16,
'lr_mult': 3.5380370864571606,
'optimizer': 'Adamax',
'patience': 8}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.49, val_size: 0.21 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 8785
LightDataModule.train_dataloader(). data_train size: 20501
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 8785
train_model result: {'val_loss': 28.93893051147461, 'hp_metric': 28.93893051147461}
In fun(): config:
{'act_fn': ReLU(),
'batch_size': 16,
'dropout_prob': 0.23254269132436722,
'epochs': 4,
'initialization': 'Default',
'l1': 8,
'lr_mult': 0.6593438339617097,
'optimizer': 'Adadelta',
'patience': 4}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.49, val_size: 0.21 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 8785
LightDataModule.train_dataloader(). data_train size: 20501
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 8785
train_model result: {'val_loss': 40.207130432128906, 'hp_metric': 40.207130432128906}
In fun(): config:
{'act_fn': LeakyReLU(),
'batch_size': 32,
'dropout_prob': 0.15478450721867254,
'epochs': 16,
'initialization': 'Default',
'l1': 8,
'lr_mult': 2.628500799878493,
'optimizer': 'Adagrad',
'patience': 8}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.49, val_size: 0.21 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 8785
LightDataModule.train_dataloader(). data_train size: 20501
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 8785
train_model result: {'val_loss': 30.875810623168945, 'hp_metric': 30.875810623168945}
In fun(): config:
{'act_fn': LeakyReLU(),
'batch_size': 8,
'dropout_prob': 0.005,
'epochs': 4,
'initialization': 'Default',
'l1': 16,
'lr_mult': 5.0,
'optimizer': 'Adamax',
'patience': 8}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.49, val_size: 0.21 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 8785
LightDataModule.train_dataloader(). data_train size: 20501
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 8785
train_model result: {'val_loss': 30.511211395263672, 'hp_metric': 30.511211395263672}
spotPython tuning: 28.93893051147461 [########--] 75.44%
In fun(): config:
{'act_fn': LeakyReLU(),
'batch_size': 8,
'dropout_prob': 0.005,
'epochs': 4,
'initialization': 'Default',
'l1': 16,
'lr_mult': 3.7678241462885973,
'optimizer': 'Adamax',
'patience': 8}
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.49, val_size: 0.21 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 8785
LightDataModule.train_dataloader(). data_train size: 20501
LightDataModule.setup(): stage: TrainerFn.VALIDATING
LightDataModule.val_dataloader(). Val. set size: 8785
train_model result: {'val_loss': 30.22126579284668, 'hp_metric': 30.22126579284668}
spotPython tuning: 28.93893051147461 [##########] 100.00% Done...
{'CHECKPOINT_PATH': 'runs/saved_models/',
'DATASET_PATH': 'data/',
'PREFIX': '033',
'RESULTS_PATH': 'results/',
'TENSORBOARD_PATH': 'runs/',
'_L_in': 6,
'_L_out': 1,
'_torchmetric': 'mean_squared_error',
'accelerator': 'auto',
'converters': None,
'core_model': <class 'netlightregression.NetLightRegression'>,
'core_model_hyper_dict': {'act_fn': {'class_name': 'spotPython.torch.activation',
'core_model_parameter_type': 'instance()',
'default': 'ReLU',
'levels': ['ReLU', 'LeakyReLU'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 1},
'batch_size': {'default': 4,
'lower': 3,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 6},
'dropout_prob': {'default': 0.01,
'lower': 0.005,
'transform': 'None',
'type': 'float',
'upper': 0.25},
'epochs': {'default': 4,
'lower': 2,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 4},
'initialization': {'core_model_parameter_type': 'str',
'default': 'Default',
'levels': ['Default'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 0},
'l1': {'default': 3,
'lower': 3,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 4},
'lr_mult': {'default': 1.0,
'lower': 0.25,
'transform': 'None',
'type': 'float',
'upper': 5.0},
'optimizer': {'class_name': 'torch.optim',
'core_model_parameter_type': 'str',
'default': 'SGD',
'levels': ['Adadelta',
'Adamax',
'Adagrad'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 2},
'patience': {'default': 2,
'lower': 2,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 3}},
'core_model_hyper_dict_default': {'act_fn': {'class_name': 'spotPython.torch.activation',
'core_model_parameter_type': 'instance()',
'default': 'ReLU',
'levels': ['Sigmoid',
'Tanh',
'ReLU',
'LeakyReLU',
'ELU',
'Swish'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 5},
'batch_size': {'default': 4,
'lower': 1,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 4},
'dropout_prob': {'default': 0.01,
'lower': 0.0,
'transform': 'None',
'type': 'float',
'upper': 0.25},
'epochs': {'default': 4,
'lower': 4,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 9},
'initialization': {'core_model_parameter_type': 'str',
'default': 'Default',
'levels': ['Default',
'Kaiming',
'Xavier'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 2},
'l1': {'default': 3,
'lower': 3,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 8},
'lr_mult': {'default': 1.0,
'lower': 0.1,
'transform': 'None',
'type': 'float',
'upper': 10.0},
'optimizer': {'class_name': 'torch.optim',
'core_model_parameter_type': 'str',
'default': 'SGD',
'levels': ['Adadelta',
'Adagrad',
'Adam',
'AdamW',
'SparseAdam',
'Adamax',
'ASGD',
'NAdam',
'RAdam',
'RMSprop',
'Rprop',
'SGD'],
'lower': 0,
'transform': 'None',
'type': 'factor',
'upper': 11},
'patience': {'default': 2,
'lower': 2,
'transform': 'transform_power_2_int',
'type': 'int',
'upper': 6}},
'core_model_name': None,
'counter': 5,
'data': None,
'data_dir': './data',
'data_module': None,
'data_set': <torch.utils.data.dataset.TensorDataset object at 0x38f6f9110>,
'data_set_name': None,
'db_dict_name': None,
'design': None,
'device': 'mps',
'devices': 1,
'enable_progress_bar': False,
'eval': None,
'fun_evals': inf,
'fun_repeats': 1,
'horizon': None,
'infill_criterion': 'y',
'k_folds': 3,
'log_graph': False,
'log_level': 50,
'loss_function': None,
'lower': array([3. , 4. , 1. , 0. , 0. , 0. , 0.1, 2. , 0. ]),
'max_surrogate_points': 30,
'max_time': 1,
'metric_params': {},
'metric_river': None,
'metric_sklearn': None,
'metric_sklearn_name': None,
'metric_torch': None,
'model_dict': {},
'n_points': 1,
'n_samples': None,
'n_total': None,
'noise': False,
'num_workers': 0,
'ocba_delta': 0,
'oml_grace_period': None,
'optimizer': None,
'path': None,
'prep_model': None,
'prep_model_name': None,
'progress_file': None,
'save_model': False,
'scenario': None,
'seed': 123,
'show_batch_interval': 1000000,
'show_models': False,
'show_progress': True,
'shuffle': None,
'sigma': 0.0,
'spot_tensorboard_path': 'runs/spot_logs/033_maans14_2024-04-22_02-01-25',
'spot_writer': <torch.utils.tensorboard.writer.SummaryWriter object at 0x1753aab50>,
'target_column': None,
'target_type': None,
'task': None,
'test': None,
'test_seed': 1234,
'test_size': 0.3,
'tolerance_x': 1.4901161193847656e-08,
'train': None,
'upper': array([ 8. , 9. , 4. , 5. , 11. , 0.25, 10. , 6. , 2. ]),
'var_name': ['l1',
'epochs',
'batch_size',
'act_fn',
'optimizer',
'dropout_prob',
'lr_mult',
'patience',
'initialization'],
'var_type': ['int',
'int',
'int',
'factor',
'factor',
'float',
'float',
'int',
'factor'],
'verbosity': 1,
'weight_coeff': 0.0,
'weights': 1.0,
'weights_entry': None}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 28.93893051147461 │ │ val_loss │ 28.93893051147461 │ └───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 40.207130432128906 │ │ val_loss │ 40.207130432128906 │ └───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 30.875810623168945 │ │ val_loss │ 30.875810623168945 │ └───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 30.511211395263672 │ │ val_loss │ 30.511211395263672 │ └───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 30.22126579284668 │ │ val_loss │ 30.22126579284668 │ └───────────────────────────┴───────────────────────────┘
<spotPython.spot.spot.Spot at 0x39eee52d0>
25.9 Step 9: Tensorboard
The textual output shown in the console (or code cell) can be visualized with Tensorboard.
tensorboard --logdir="runs/"
Further information can be found in the PyTorch Lightning documentation for Tensorboard.
25.10 Step 10: Results
After the hyperparameter tuning run is finished, the results can be analyzed.
if spot_tuner.noise:
print(spot_tuner.min_mean_X)
print(spot_tuner.min_mean_y)
else:
print(spot_tuner.min_X)
print(spot_tuner.min_y)
[4. 3. 4. 1. 1. 0.02034562
3.53803709 3. ]
28.93893051147461
=False,
spot_tuner.plot_progress(log_y="./figures/" + PREFIX +"_progress.png") filename
from spotPython.utils.eda import gen_design_table
print(gen_design_table(fun_control=fun_control, spot=spot_tuner))
| name | type | default | lower | upper | tuned | transform | importance | stars |
|----------------|--------|-----------|---------|---------|----------------------|-----------------------|--------------|---------|
| l1 | int | 3 | 3.0 | 4.0 | 4.0 | transform_power_2_int | 0.15 | . |
| epochs | int | 4 | 2.0 | 4.0 | 3.0 | transform_power_2_int | 0.01 | |
| batch_size | int | 4 | 3.0 | 6.0 | 4.0 | transform_power_2_int | 0.04 | |
| act_fn | factor | ReLU | 0.0 | 1.0 | LeakyReLU | None | 100.00 | *** |
| optimizer | factor | SGD | 0.0 | 2.0 | Adamax | None | 0.06 | |
| dropout_prob | float | 0.01 | 0.005 | 0.25 | 0.020345615289778483 | None | 0.01 | |
| lr_mult | float | 1.0 | 0.25 | 5.0 | 3.5380370864571606 | None | 0.00 | |
| patience | int | 2 | 2.0 | 3.0 | 3.0 | transform_power_2_int | 4.40 | * |
| initialization | factor | Default | 0.0 | 0.0 | Default | None | 0.00 | |
=0.025,
spot_tuner.plot_importance(threshold="./figures/" + PREFIX + "_importance.png") filename
25.10.1 Get the Tuned Architecture
from spotPython.hyperparameters.values import get_tuned_architecture
= get_tuned_architecture(spot_tuner, fun_control)
config print(config)
{'l1': 16, 'epochs': 8, 'batch_size': 16, 'act_fn': LeakyReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.020345615289778483, 'lr_mult': 3.5380370864571606, 'patience': 8, 'initialization': 'Default'}
- Test on the full data set
from spotPython.light.testmodel import test_model
test_model(config, fun_control)
LightDataModule.setup(): stage: TrainerFn.FITTING
train_size: 0.49, val_size: 0.21 used for train & val data.
LightDataModule.val_dataloader(). Val. set size: 8785
LightDataModule.train_dataloader(). data_train size: 20501
LightDataModule.setup(): stage: TrainerFn.TESTING
test_size: 0.3 used for test dataset.
LightDataModule.test_dataloader(). Test set size: 12552
test_model result: {'val_loss': 28.924518585205078, 'hp_metric': 28.924518585205078}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 28.924518585205078 │ │ val_loss │ 28.924518585205078 │ └───────────────────────────┴───────────────────────────┘
(28.924518585205078, 28.924518585205078)
from spotPython.light.loadmodel import load_light_from_checkpoint
= load_light_from_checkpoint(config, fun_control) model_loaded
config: {'l1': 16, 'epochs': 8, 'batch_size': 16, 'act_fn': LeakyReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.020345615289778483, 'lr_mult': 3.5380370864571606, 'patience': 8, 'initialization': 'Default'}
Loading model with 16_8_16_LeakyReLU_Adamax_0.0203_3.538_8_Default_TEST from runs/saved_models/16_8_16_LeakyReLU_Adamax_0.0203_3.538_8_Default_TEST/last.ckpt
Model: NetLightRegression(
(layers): Sequential(
(0): Linear(in_features=6, out_features=16, bias=True)
(1): LeakyReLU()
(2): Dropout(p=0.020345615289778483, inplace=False)
(3): Linear(in_features=16, out_features=8, bias=True)
(4): LeakyReLU()
(5): Dropout(p=0.020345615289778483, inplace=False)
(6): Linear(in_features=8, out_features=8, bias=True)
(7): LeakyReLU()
(8): Dropout(p=0.020345615289778483, inplace=False)
(9): Linear(in_features=8, out_features=4, bias=True)
(10): LeakyReLU()
(11): Dropout(p=0.020345615289778483, inplace=False)
(12): Linear(in_features=4, out_features=1, bias=True)
)
)
= "./figures/" + PREFIX
filename =filename) spot_tuner.plot_important_hyperparameter_contour(filename
l1: 0.151373792283411
epochs: 0.009783739500863815
batch_size: 0.04228580273492884
act_fn: 100.0
optimizer: 0.05739501694981163
dropout_prob: 0.008912834170537995
lr_mult: 0.0014135274127400966
patience: 4.404167655040114
impo: [['l1', 0.151373792283411], ['epochs', 0.009783739500863815], ['batch_size', 0.04228580273492884], ['act_fn', 100.0], ['optimizer', 0.05739501694981163], ['dropout_prob', 0.008912834170537995], ['lr_mult', 0.0014135274127400966], ['patience', 4.404167655040114]]
indices: [3, 7, 0, 4, 2, 1, 5, 6]
indices after max_imp selection: [3, 7, 0, 4, 2, 1, 5, 6]
25.10.2 Parallel Coordinates Plot
spot_tuner.parallel_plot()
Parallel coordinates plots
25.10.3 Cross Validation With Lightning
- The
KFold
class fromsklearn.model_selection
is used to generate the folds for cross-validation. - These mechanism is used to generate the folds for the final evaluation of the model.
- The
CrossValidationDataModule
class [SOURCE] is used to generate the folds for the hyperparameter tuning process. - It is called from the
cv_model
function [SOURCE].
from spotPython.light.cvmodel import cv_model
=fun_control,
set_control_key_value(control_dict="k_folds",
key=2,
value=True)
replace=fun_control,
set_control_key_value(control_dict="test_size",
key=0.6,
value=True)
replace cv_model(config, fun_control)
k: 0
Train Dataset Size: 20918
Val Dataset Size: 20919
train_model result: {'val_loss': 28.17633819580078, 'hp_metric': 28.17633819580078}
k: 1
Train Dataset Size: 20919
Val Dataset Size: 20918
train_model result: {'val_loss': 27.567384719848633, 'hp_metric': 27.567384719848633}
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 28.17633819580078 │ │ val_loss │ 28.17633819580078 │ └───────────────────────────┴───────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ hp_metric │ 27.567384719848633 │ │ val_loss │ 27.567384719848633 │ └───────────────────────────┴───────────────────────────┘
27.871861457824707
25.10.4 Plot all Combinations of Hyperparameters
- Warning: this may take a while.
= False
PLOT_ALL if PLOT_ALL:
= spot_tuner.k
n for i in range(n-1):
for j in range(i+1, n):
=i, j=j, min_z=min_z, max_z = max_z) spot_tuner.plot_contour(i
25.10.5 Visualizing the Activation Distribution (Under Development)
- The following code is based on [PyTorch Lightning TUTORIAL 2: ACTIVATION FUNCTIONS], Author: Phillip Lippe, License: [CC BY-SA], Generated: 2023-03-15T09:52:39.179933.
After we have trained the models, we can look at the actual activation values that find inside the model. For instance, how many neurons are set to zero in ReLU? Where do we find most values in Tanh? To answer these questions, we can write a simple function which takes a trained model, applies it to a batch of images, and plots the histogram of the activations inside the network:
from spotPython.torch.activation import Sigmoid, Tanh, ReLU, LeakyReLU, ELU, Swish
= {"sigmoid": Sigmoid, "tanh": Tanh, "relu": ReLU, "leakyrelu": LeakyReLU, "elu": ELU, "swish": Swish} act_fn_by_name
from spotPython.hyperparameters.values import get_one_config_from_X
= spot_tuner.to_all_dim(spot_tuner.min_X.reshape(1,-1))
X = get_one_config_from_X(X, fun_control)
config = fun_control["core_model"](**config, _L_in=64, _L_out=11, _torchmetric=TORCH_METRIC)
model model
NetLightRegression(
(layers): Sequential(
(0): Linear(in_features=64, out_features=16, bias=True)
(1): LeakyReLU()
(2): Dropout(p=0.020345615289778483, inplace=False)
(3): Linear(in_features=16, out_features=8, bias=True)
(4): LeakyReLU()
(5): Dropout(p=0.020345615289778483, inplace=False)
(6): Linear(in_features=8, out_features=8, bias=True)
(7): LeakyReLU()
(8): Dropout(p=0.020345615289778483, inplace=False)
(9): Linear(in_features=8, out_features=4, bias=True)
(10): LeakyReLU()
(11): Dropout(p=0.020345615289778483, inplace=False)
(12): Linear(in_features=4, out_features=11, bias=True)
)
)
# from spotPython.utils.eda import visualize_activations
# visualize_activations(model, color=f"C{0}")