import lightning as L
import torch
from lightning.pytorch.demos import Transformer
class LightningTransformer(L.LightningModule):
def __init__(self, vocab_size):
super().__init__()
self.model = Transformer(vocab_size=vocab_size)
def forward(self, inputs, target):
return self.model(inputs, target)
def training_step(self, batch, batch_idx):
= batch
inputs, target = self(inputs, target)
output = torch.nn.functional.nll_loss(output, target.view(-1))
loss return loss
def configure_optimizers(self):
return torch.optim.SGD(self.model.parameters(), lr=0.1)
25 Basic Lightning Module
25.1 Introduction
This chapter implements a basic Pytorch Lightning module. It is based on the Lightning documentation LIGHTNINGMODULE.
A LightningModule
organizes your PyTorch
code into six sections:
- Initialization (
__init__
andsetup()
). - Train Loop (
training_step()
) - Validation Loop (
validation_step()
) - Test Loop (
test_step()
) - Prediction Loop (
predict_step()
) - Optimizers and LR Schedulers (
configure_optimizers()
)
The Trainer
automates every required step in a clear and reproducible way. It is the most important part of PyTorch Lightning. It is responsible for training, testing, and validating the model. The Lightning
core structure looks like this:
= MyLightningModuleNet()
net = Trainer()
trainer trainer.fit(net)
There are no .cuda()
or .to(device)
calls required. Lightning does these for you.
# don't do in Lightning
= torch.Tensor(2, 3)
x = x.cuda()
x = x.to(device)
x
# do this instead
= x # leave it alone!
x
# or to init a new tensor
= torch.Tensor(2, 3)
new_x = new_x.to(x) new_x
A LightningModule is a torch.nn.Module
but with added functionality. For example:
= Net.load_from_checkpoint(PATH)
net
net.freeze()= net(x) out
25.2 Starter Example: Transfomer
Here are the only required methods for setting up a transfomer model:
The LightningTransformer
class is a subclass of LightningModule
. It can be trainted as follows:
from lightning.pytorch.demos import WikiText2
from torch.utils.data import DataLoader
= WikiText2()
dataset = DataLoader(dataset)
dataloader = LightningTransformer(vocab_size=dataset.vocab_size)
model
= L.Trainer(fast_dev_run=100)
trainer =model, train_dataloaders=dataloader) trainer.fit(model
25.3 Lightning Core Methods
The LightningModule has many convenient methods, but the core ones you need to know about are shown in Table 25.1.
Method | Description |
---|---|
__init__ and setup |
Initializes the model. |
forward |
Performs a forward pass through the model. To run data through your model only (separate from training_step ). |
training_step |
Performs a complete training step. |
validation_step |
Performs a complete validation step. |
test_step |
Performs a complete test step. |
predict_step |
Performs a complete prediction step. |
configure_optimizers |
Configures the optimizers and learning-rate schedulers. |
We will take a closer look at thes methods.
25.3.1 Training Step
25.3.1.1 Basics
To activate the training loop, override the training_step()
method. If you want to calculate epoch-level metrics and log them, use log()
.
class LightningTransformer(L.LightningModule):
def __init__(self, vocab_size):
super().__init__()
self.model = Transformer(vocab_size=vocab_size)
def training_step(self, batch, batch_idx):
= batch
inputs, target = self.model(inputs, target)
output = torch.nn.functional.nll_loss(output, target.view(-1))
loss
# logs metrics for each training_step,
# and the average across the epoch, to the progress bar and logger
self.log("train_loss", loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
return loss
The log()
method automatically reduces the requested metrics across a complete epoch and devices.
25.3.1.2 Background
- Here is the pseudocode of what the
log()
method does under the hood:
= []
outs for batch_idx, batch in enumerate(train_dataloader):
# forward
= training_step(batch, batch_idx)
loss
outs.append(loss.detach())
# clear gradients
optimizer.zero_grad()# backward
loss.backward()# update parameters
optimizer.step()
# note: in reality, we do this incrementally, instead of keeping all outputs in memory
= torch.mean(torch.stack(outs)) epoch_metric
- In the case that you need to make use of all the outputs from each
training_step()
, override theon_train_epoch_end()
method.
class LightningTransformer(L.LightningModule):
def __init__(self, vocab_size):
super().__init__()
self.model = Transformer(vocab_size=vocab_size)
self.training_step_outputs = []
def training_step(self, batch, batch_idx):
= batch
inputs, target = self.model(inputs, target)
output = torch.nn.functional.nll_loss(output, target.view(-1))
loss = ...
preds self.training_step_outputs.append(preds)
return loss
def on_train_epoch_end(self):
= torch.stack(self.training_step_outputs)
all_preds # do something with all preds
...self.training_step_outputs.clear() # free memory
25.3.2 Validation Step
25.3.2.1 Basics
To activate the validation loop while training, override the validation_step()
method.
class LightningTransformer(L.LightningModule):
def validation_step(self, batch, batch_idx):
= batch
inputs, target = self.model(inputs, target)
output = F.cross_entropy(y_hat, y)
loss self.log("val_loss", loss)
return loss
25.3.2.2 Background
- You can also run just the validation loop on your validation dataloaders by overriding
validation_step()
and callingvalidate()
.
= LightningTransformer(vocab_size=dataset.vocab_size)
model = L.Trainer()
trainer trainer.validate(model)
- In the case that you need to make use of all the outputs from each
validation_step()
, override theon_validation_epoch_end()
method. Note that this method is called beforeon_train_epoch_end()
.
class LightningTransformer(L.LightningModule):
def __init__(self, vocab_size):
super().__init__()
self.model = Transformer(vocab_size=vocab_size)
self.validation_step_outputs = []
def validation_step(self, batch, batch_idx):
= batch
x, y = batch
inputs, target = self.model(inputs, target)
output = torch.nn.functional.nll_loss(output, target.view(-1))
loss = ...
pred self.validation_step_outputs.append(pred)
return pred
def on_validation_epoch_end(self):
= torch.stack(self.validation_step_outputs)
all_preds # do something with all preds
...self.validation_step_outputs.clear() # free memory
25.3.3 Test Step
The process for enabling a test loop is the same as the process for enabling a validation loop. For this you need to override the test_step()
method. The only difference is that the test loop is only called when test()
is used.
def test_step(self, batch, batch_idx):
= batch
inputs, target = self.model(inputs, target)
output = F.cross_entropy(y_hat, y)
loss self.log("test_loss", loss)
return loss
25.3.4 Predict Step
25.3.4.1 Basics
By default, the predict_step()
method runs the forward()
method. In order to customize this behaviour, simply override the predict_step()
method.
class LightningTransformer(L.LightningModule):
def __init__(self, vocab_size):
super().__init__()
self.model = Transformer(vocab_size=vocab_size)
def predict_step(self, batch):
= batch
inputs, target return self.model(inputs, target)
25.3.4.2 Background
- If you want to perform inference with the system, you can add a
forward
method to the LightningModule. - When using forward, you are responsible to call
eval()
and use theno_grad()
context manager.
class LightningTransformer(L.LightningModule):
def __init__(self, vocab_size):
super().__init__()
self.model = Transformer(vocab_size=vocab_size)
def forward(self, batch):
= batch
inputs, target return self.model(inputs, target)
def training_step(self, batch, batch_idx):
= batch
inputs, target = self.model(inputs, target)
output = torch.nn.functional.nll_loss(output, target.view(-1))
loss return loss
def configure_optimizers(self):
return torch.optim.SGD(self.model.parameters(), lr=0.1)
= LightningTransformer(vocab_size=dataset.vocab_size)
model
eval()
model.with torch.no_grad():
= dataloader.dataset[0]
batch = model(batch) pred
25.4 Lightning Extras
This section covers some additional features of Lightning.
25.4.1 Lightning: Save Hyperparameters
Often times we train many versions of a model. You might share that model or come back to it a few months later at which point it is very useful to know how that model was trained (i.e.: what learning rate, neural network, etc.).
Lightning has a standardized way of saving the information for you in checkpoints and YAML files. The goal here is to improve readability and reproducibility.
Use save_hyperparameters()
within your LightningModule
’s __init__
method. It will enable Lightning to store all the provided arguments under the self.hparams
attribute. These hyperparameters will also be stored within the model checkpoint, which simplifies model re-instantiation after training.
class LitMNIST(L.LightningModule):
def __init__(self, layer_1_dim=128, learning_rate=1e-2):
super().__init__()
# call this to save (layer_1_dim=128, learning_rate=1e-4) to the checkpoint
self.save_hyperparameters()
# equivalent
self.save_hyperparameters("layer_1_dim", "learning_rate")
# Now possible to access layer_1_dim from hparams
self.hparams.layer_1_dim
25.4.2 Lightning: Model Loading
LightningModules that have hyperparameters automatically saved with save_hyperparameters()
can conveniently be loaded and instantiated directly from a checkpoint with load_from_checkpoint()
:
# to load specify the other args
= LitMNIST.load_from_checkpoint(PATH, loss_fx=torch.nn.SomeOtherLoss, generator_network=MyGenerator()) model
25.5 Starter Example: Linear Neural Network
We will use the LightningModule
to create a simple neural network for regression. It will be implemented as the LightningBasic
class.
25.5.2 Hyperparameters
The argument l1
will be treated as a hyperparameter, so it will be tuned in the following steps. Besides l1
, additonal hyperparameters are act_fn
and dropout_prob
.
The arguments _L_in
, _L_out
, and _torchmetric
are not hyperparameters, but are needed to create the network. The first two are specified by the data and the latter by user preferences (the desired evaluation metric).
25.5.3 The LightningBasic Class
import lightning as L
import torch
import torch.nn.functional as F
import torchmetrics.functional.regression
from torch import nn
from spotpython.hyperparameters.architecture import get_hidden_sizes
class LightningBasic(L.LightningModule):
def __init__(
self,
int,
l1:
act_fn: nn.Module,float,
dropout_prob: int,
_L_in: int,
_L_out: str,
_torchmetric: *args,
**kwargs):
super().__init__()
self._L_in = _L_in
self._L_out = _L_out
self._torchmetric = _torchmetric
self.metric = getattr(torchmetrics.functional.regression, _torchmetric)
# _L_in and _L_out are not hyperparameters, but are needed to create the network
# _torchmetric is not a hyperparameter, but is needed to calculate the loss
self.save_hyperparameters(ignore=["_L_in", "_L_out", "_torchmetric"])
# set dummy input array for Tensorboard Graphs
# set log_graph=True in Trainer to see the graph (in traintest.py)
= get_hidden_sizes(_L_in=self._L_in, l1=l1, n=4)
hidden_sizes # Create the network based on the specified hidden sizes
= []
layers = [self._L_in] + hidden_sizes
layer_sizes = layer_sizes[0]
layer_size_last for layer_size in layer_sizes[1:]:
+= [
layers
nn.Linear(layer_size_last, layer_size),self.hparams.act_fn,
self.hparams.dropout_prob),
nn.Dropout(
]= layer_size
layer_size_last += [nn.Linear(layer_sizes[-1], self._L_out)]
layers # nn.Sequential summarizes a list of modules into a single module,
# applying them in sequence
self.layers = nn.Sequential(*layers)
def _calculate_loss(self, batch):
= batch
x, y = y.view(len(y), 1)
y = self.layers(x)
y_hat = self.metric(y_hat, y)
loss return loss
def forward(self, x: torch.Tensor) -> torch.Tensor:
return self.layers(x)
def training_step(self, batch: tuple) -> torch.Tensor:
= self._calculate_loss(batch)
loss self.log("train_loss", loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
return loss
def validation_step(self, batch: tuple) -> torch.Tensor:
= self._calculate_loss(batch)
loss # logs metrics for each training_step,
# and the average across the epoch, to the progress bar and logger
self.log("val_loss", loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
return loss
def test_step(self, batch, batch_idx):
= self._calculate_loss(batch)
loss # logs metrics for each training_step,
# and the average across the epoch, to the progress bar and logger
self.log("test_loss", loss, on_step=True, on_epoch=True, prog_bar=True, logger=True)
return loss
def predict_step(self, batch, batch_idx, dataloader_idx=0):
= batch
x, _ = self.layers(x)
y_hat return y_hat
def configure_optimizers(self):
return torch.optim.Adam(self.layers.parameters(), lr=0.02)
We can instantiate the LightningBasic
class as follows:
= LightningBasic(
model_base =20,
l1=nn.ReLU(),
act_fn=0.01,
dropout_prob=10,
_L_in=1,
_L_out="mean_squared_error") _torchmetric
It has the following structure:
print(model_base)
LightningBasic(
(layers): Sequential(
(0): Linear(in_features=10, out_features=20, bias=True)
(1): ReLU()
(2): Dropout(p=0.01, inplace=False)
(3): Linear(in_features=20, out_features=10, bias=True)
(4): ReLU()
(5): Dropout(p=0.01, inplace=False)
(6): Linear(in_features=10, out_features=10, bias=True)
(7): ReLU()
(8): Dropout(p=0.01, inplace=False)
(9): Linear(in_features=10, out_features=5, bias=True)
(10): ReLU()
(11): Dropout(p=0.01, inplace=False)
(12): Linear(in_features=5, out_features=1, bias=True)
)
)
from spotpython.plot.xai import viz_net
=model_base,
viz_net(net="cpu",
device="model_architecture700", format="png") filename
25.5.4 The Data Set: Diabetes
We will use the Diabetes
[DOC] data set from the spotpython
package, which is a PyTorch Dataset for regression based on a data set from scikit-learn
. It consists of DataFrame entries, which were converted to PyTorch tensors.
Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline.
The Diabetes
data set has the following properties:
- Number of Instances: 442
- Number of Attributes: First 10 columns are numeric predictive values.
- Target: Column 11 is a quantitative measure of disease progression one year after baseline.
- Attribute Information:
- age age in years
- sex
- bmi body mass index
- bp average blood pressure
- s1 tc, total serum cholesterol
- s2 ldl, low-density lipoproteins
- s3 hdl, high-density lipoproteins
- s4 tch, total cholesterol / HDL
- s5 ltg, possibly log of serum triglycerides level
- s6 glu, blood sugar level
from torch.utils.data import DataLoader
from spotpython.data.diabetes import Diabetes
import torch
= Diabetes(feature_type=torch.float32, target_type=torch.float32)
dataset # Set batch size for DataLoader to 2 for demonstration purposes
= 2
batch_size = DataLoader(dataset, batch_size=batch_size, shuffle=False)
dataloader for batch in dataloader:
= batch
inputs, targets print(f"Batch Size: {inputs.size(0)}")
print("---------------")
print(f"Inputs: {inputs}")
print(f"Targets: {targets}")
break
Batch Size: 2
---------------
Inputs: tensor([[ 0.0381, 0.0507, 0.0617, 0.0219, -0.0442, -0.0348, -0.0434, -0.0026,
0.0199, -0.0176],
[-0.0019, -0.0446, -0.0515, -0.0263, -0.0084, -0.0192, 0.0744, -0.0395,
-0.0683, -0.0922]])
Targets: tensor([151., 75.])
25.5.5 The DataLoaders
Before we can call the Trainer
to fit, validate, and test the model, we need to create the DataLoaders
for each of these steps. The DataLoaders
are used to load the data into the model in batches and need the batch_size
.
import torch
from spotpython.data.diabetes import Diabetes
from torch.utils.data import DataLoader
= 8
batch_size
= Diabetes(target_type=torch.float)
dataset = torch.utils.data.random_split(dataset, [0.6, 0.4])
train1_set, test_set = torch.utils.data.random_split(train1_set, [0.6, 0.4])
train_set, val_set print(f"Full Data Set: {len(dataset)}")
print(f"Train Set: {len(train_set)}")
print(f"Validation Set: {len(val_set)}")
print(f"Test Set: {len(test_set)}")
= DataLoader(train_set, batch_size=batch_size, shuffle=True, drop_last=True, pin_memory=True)
train_loader = DataLoader(test_set, batch_size=batch_size)
test_loader = DataLoader(val_set, batch_size=batch_size) val_loader
Full Data Set: 442
Train Set: 160
Validation Set: 106
Test Set: 176
25.5.6 The Trainer
Now we are ready to train the model. We will use the Trainer
class from the lightning
package. For demonstration purposes, we will train the model for 100 epochs only.
= 100
epochs
= L.Trainer(max_epochs=epochs, enable_progress_bar=True)
trainer =model_base, train_dataloaders=train_loader) trainer.fit(model
trainer.validate(model_base, val_loader)
# automatically loads the best weights for you
= trainer.test(model_base, test_loader, verbose=True) out
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ test_loss_epoch │ 3239.458740234375 │ └───────────────────────────┴───────────────────────────┘
= trainer.predict(model_base, test_loader)
yhat # convert the list of tensors to a numpy array
= torch.cat(yhat).numpy()
yhat yhat.shape
25.5.7 Using a DataModule
Instead of creating the three DataLoaders
manually, we can use the LightDataModule
class from the spotpython
package.
from spotpython.data.lightdatamodule import LightDataModule
= Diabetes(target_type=torch.float)
dataset = LightDataModule(dataset=dataset, batch_size=5, test_size=0.4)
data_module data_module.setup()
There is a minor difference in the sizes of the data sets due to the random split as can be seen in the following code:
print(f"Full Data Set: {len(dataset)}")
print(f"Training set size: {len(data_module.data_train)}")
print(f"Validation set size: {len(data_module.data_val)}")
print(f"Test set size: {len(data_module.data_test)}")
Full Data Set: 442
Training set size: 160
Validation set size: 106
Test set size: 177
The DataModule
can be used to train the model as follows:
= L.Trainer(max_epochs=epochs, enable_progress_bar=False)
trainer =model_base, datamodule=data_module) trainer.fit(model
=model_base, datamodule=data_module, verbose=True, ckpt_path=None) trainer.validate(model
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Validate metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ val_loss_epoch │ 3125.3056640625 │ └───────────────────────────┴───────────────────────────┘
[{'val_loss_epoch': 3125.3056640625}]
=model_base, datamodule=data_module, verbose=True, ckpt_path=None) trainer.test(model
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ test_loss_epoch │ 2250.994384765625 │ └───────────────────────────┴───────────────────────────┘
[{'test_loss_epoch': 2250.994384765625}]
25.6 Using spotpython with Pytorch Lightning
import os
from math import inf
import warnings
"ignore")
warnings.filterwarnings(from spotpython.data.diabetes import Diabetes
from spotpython.hyperdict.light_hyper_dict import LightHyperDict
from spotpython.fun.hyperlight import HyperLight
from spotpython.utils.init import (fun_control_init, surrogate_control_init, design_control_init)
from spotpython.utils.eda import gen_design_table
from spotpython.spot import spot
from spotpython.utils.file import get_experiment_filename
="700"
PREFIX= Diabetes()
data_set = fun_control_init(
fun_control =PREFIX,
PREFIX=True,
save_experiment=inf,
fun_evals=2,
fun_repeats=1,
max_time=data_set,
data_set="light.regression.NNLinearRegressor",
core_model_name=LightHyperDict,
hyperdict=10,
_L_in=1,
_L_out=True,
TENSORBOARD_CLEAN=True,
tensorboard_log=True,
noise= 1, )
ocba_delta = HyperLight().fun
fun from spotpython.hyperparameters.values import set_hyperparameter
"optimizer", [ "Adadelta", "Adam", "Adamax"])
set_hyperparameter(fun_control, "l1", [3,4])
set_hyperparameter(fun_control, "epochs", [3,7])
set_hyperparameter(fun_control, "batch_size", [4,11])
set_hyperparameter(fun_control, "dropout_prob", [0.0, 0.025])
set_hyperparameter(fun_control, "patience", [2,3])
set_hyperparameter(fun_control,
= design_control_init(init_size=10, repeats=2)
design_control
print(gen_design_table(fun_control))
= spot.Spot(fun=fun,fun_control=fun_control, design_control=design_control)
spot_tuner = spot_tuner.run()
res
spot_tuner.plot_progress()print(gen_design_table(fun_control=fun_control, spot=spot_tuner))
Moving TENSORBOARD_PATH: runs/ to TENSORBOARD_PATH_OLD: runs_OLD/runs_2024_11_24_22_30_21
Created spot_tensorboard_path: runs/spot_logs/700_maans08_2024-11-24_22-30-21 for SummaryWriter()
module_name: light
submodule_name: regression
model_name: NNLinearRegressor
| name | type | default | lower | upper | transform |
|----------------|--------|-----------|---------|---------|-----------------------|
| l1 | int | 3 | 3 | 4 | transform_power_2_int |
| epochs | int | 4 | 3 | 7 | transform_power_2_int |
| batch_size | int | 4 | 4 | 11 | transform_power_2_int |
| act_fn | factor | ReLU | 0 | 5 | None |
| optimizer | factor | SGD | 0 | 2 | None |
| dropout_prob | float | 0.01 | 0 | 0.025 | None |
| lr_mult | float | 1.0 | 0.1 | 10 | None |
| patience | int | 2 | 2 | 3 | transform_power_2_int |
| batch_norm | factor | 0 | 0 | 1 | None |
| initialization | factor | Default | 0 | 4 | None |
In fun(): config:
{'act_fn': Sigmoid(),
'batch_norm': False,
'batch_size': 2048,
'dropout_prob': 0.010469763733360567,
'epochs': 16,
'initialization': 'xavier_uniform',
'l1': 16,
'lr_mult': 4.135888451953213,
'optimizer': 'Adam',
'patience': 4}
train_model result: {'val_loss': 23995.974609375, 'hp_metric': 23995.974609375}
In fun(): config:
{'act_fn': Sigmoid(),
'batch_norm': False,
'batch_size': 2048,
'dropout_prob': 0.010469763733360567,
'epochs': 16,
'initialization': 'xavier_uniform',
'l1': 16,
'lr_mult': 4.135888451953213,
'optimizer': 'Adam',
'patience': 4}
train_model result: {'val_loss': 23910.283203125, 'hp_metric': 23910.283203125}
In fun(): config:
{'act_fn': ReLU(),
'batch_norm': False,
'batch_size': 64,
'dropout_prob': 0.0184251494885258,
'epochs': 32,
'initialization': 'kaiming_normal',
'l1': 8,
'lr_mult': 3.1418668140600845,
'optimizer': 'Adadelta',
'patience': 8}
train_model result: {'val_loss': 23983.330078125, 'hp_metric': 23983.330078125}
In fun(): config:
{'act_fn': ReLU(),
'batch_norm': False,
'batch_size': 64,
'dropout_prob': 0.0184251494885258,
'epochs': 32,
'initialization': 'kaiming_normal',
'l1': 8,
'lr_mult': 3.1418668140600845,
'optimizer': 'Adadelta',
'patience': 8}
train_model result: {'val_loss': 3391.68994140625, 'hp_metric': 3391.68994140625}
In fun(): config:
{'act_fn': ELU(),
'batch_norm': True,
'batch_size': 256,
'dropout_prob': 0.00996276270809942,
'epochs': 64,
'initialization': 'Default',
'l1': 16,
'lr_mult': 8.543578103398445,
'optimizer': 'Adadelta',
'patience': 4}
train_model result: {'val_loss': 23819.19140625, 'hp_metric': 23819.19140625}
In fun(): config:
{'act_fn': ELU(),
'batch_norm': True,
'batch_size': 256,
'dropout_prob': 0.00996276270809942,
'epochs': 64,
'initialization': 'Default',
'l1': 16,
'lr_mult': 8.543578103398445,
'optimizer': 'Adadelta',
'patience': 4}
train_model result: {'val_loss': 23305.740234375, 'hp_metric': 23305.740234375}
In fun(): config:
{'act_fn': LeakyReLU(),
'batch_norm': True,
'batch_size': 512,
'dropout_prob': 0.004305336774252681,
'epochs': 8,
'initialization': 'kaiming_normal',
'l1': 8,
'lr_mult': 0.3009268823483702,
'optimizer': 'Adamax',
'patience': 4}
train_model result: {'val_loss': 24075.27734375, 'hp_metric': 24075.27734375}
In fun(): config:
{'act_fn': LeakyReLU(),
'batch_norm': True,
'batch_size': 512,
'dropout_prob': 0.004305336774252681,
'epochs': 8,
'initialization': 'kaiming_normal',
'l1': 8,
'lr_mult': 0.3009268823483702,
'optimizer': 'Adamax',
'patience': 4}
train_model result: {'val_loss': 23953.0, 'hp_metric': 23953.0}
In fun(): config:
{'act_fn': Tanh(),
'batch_norm': True,
'batch_size': 128,
'dropout_prob': 0.021718144359373085,
'epochs': 32,
'initialization': 'kaiming_uniform',
'l1': 16,
'lr_mult': 8.005670267977834,
'optimizer': 'Adam',
'patience': 8}
train_model result: {'val_loss': 23886.41015625, 'hp_metric': 23886.41015625}
In fun(): config:
{'act_fn': Tanh(),
'batch_norm': True,
'batch_size': 128,
'dropout_prob': 0.021718144359373085,
'epochs': 32,
'initialization': 'kaiming_uniform',
'l1': 16,
'lr_mult': 8.005670267977834,
'optimizer': 'Adam',
'patience': 8}
train_model result: {'val_loss': 23888.15234375, 'hp_metric': 23888.15234375}
In fun(): config:
{'act_fn': LeakyReLU(),
'batch_norm': False,
'batch_size': 32,
'dropout_prob': 0.023931753071792624,
'epochs': 16,
'initialization': 'xavier_normal',
'l1': 16,
'lr_mult': 1.2532486761645163,
'optimizer': 'Adamax',
'patience': 8}
train_model result: {'val_loss': 24041.34765625, 'hp_metric': 24041.34765625}
In fun(): config:
{'act_fn': LeakyReLU(),
'batch_norm': False,
'batch_size': 32,
'dropout_prob': 0.023931753071792624,
'epochs': 16,
'initialization': 'xavier_normal',
'l1': 16,
'lr_mult': 1.2532486761645163,
'optimizer': 'Adamax',
'patience': 8}
train_model result: {'val_loss': 24045.0078125, 'hp_metric': 24045.0078125}
In fun(): config:
{'act_fn': ELU(),
'batch_norm': False,
'batch_size': 512,
'dropout_prob': 0.0074444117802003025,
'epochs': 8,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 9.535342719713716,
'optimizer': 'Adam',
'patience': 8}
train_model result: {'val_loss': 24010.197265625, 'hp_metric': 24010.197265625}
In fun(): config:
{'act_fn': ELU(),
'batch_norm': False,
'batch_size': 512,
'dropout_prob': 0.0074444117802003025,
'epochs': 8,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 9.535342719713716,
'optimizer': 'Adam',
'patience': 8}
train_model result: {'val_loss': 24001.833984375, 'hp_metric': 24001.833984375}
In fun(): config:
{'act_fn': Swish(),
'batch_norm': True,
'batch_size': 32,
'dropout_prob': 0.0012790404219919403,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 2.4659566199812857,
'optimizer': 'Adadelta',
'patience': 4}
train_model result: {'val_loss': 5544.39990234375, 'hp_metric': 5544.39990234375}
In fun(): config:
{'act_fn': Swish(),
'batch_norm': True,
'batch_size': 32,
'dropout_prob': 0.0012790404219919403,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 2.4659566199812857,
'optimizer': 'Adadelta',
'patience': 4}
train_model result: {'val_loss': 10747.9541015625, 'hp_metric': 10747.9541015625}
In fun(): config:
{'act_fn': Tanh(),
'batch_norm': False,
'batch_size': 128,
'dropout_prob': 0.0153979445945591,
'epochs': 32,
'initialization': 'xavier_uniform',
'l1': 8,
'lr_mult': 6.089028896372417,
'optimizer': 'Adamax',
'patience': 4}
train_model result: {'val_loss': 23718.490234375, 'hp_metric': 23718.490234375}
In fun(): config:
{'act_fn': Tanh(),
'batch_norm': False,
'batch_size': 128,
'dropout_prob': 0.0153979445945591,
'epochs': 32,
'initialization': 'xavier_uniform',
'l1': 8,
'lr_mult': 6.089028896372417,
'optimizer': 'Adamax',
'patience': 4}
train_model result: {'val_loss': 23682.900390625, 'hp_metric': 23682.900390625}
In fun(): config:
{'act_fn': ReLU(),
'batch_norm': True,
'batch_size': 1024,
'dropout_prob': 0.013939072152682473,
'epochs': 64,
'initialization': 'xavier_uniform',
'l1': 16,
'lr_mult': 5.8899766345108855,
'optimizer': 'Adam',
'patience': 8}
train_model result: {'val_loss': 24041.25, 'hp_metric': 24041.25}
In fun(): config:
{'act_fn': ReLU(),
'batch_norm': True,
'batch_size': 1024,
'dropout_prob': 0.013939072152682473,
'epochs': 64,
'initialization': 'xavier_uniform',
'l1': 16,
'lr_mult': 5.8899766345108855,
'optimizer': 'Adam',
'patience': 8}
train_model result: {'val_loss': 24040.76171875, 'hp_metric': 24040.76171875}
In fun(): config:
{'act_fn': ReLU(),
'batch_norm': False,
'batch_size': 64,
'dropout_prob': 0.0184251494885258,
'epochs': 32,
'initialization': 'kaiming_normal',
'l1': 8,
'lr_mult': 3.1418668140600845,
'optimizer': 'Adadelta',
'patience': 8}
train_model result: {'val_loss': 23983.330078125, 'hp_metric': 23983.330078125}
In fun(): config:
{'act_fn': Swish(),
'batch_norm': False,
'batch_size': 16,
'dropout_prob': 0.025,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 0.1,
'optimizer': 'Adadelta',
'patience': 8}
train_model result: {'val_loss': 23142.373046875, 'hp_metric': 23142.373046875}
In fun(): config:
{'act_fn': Swish(),
'batch_norm': False,
'batch_size': 16,
'dropout_prob': 0.025,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 0.1,
'optimizer': 'Adadelta',
'patience': 8}
train_model result: {'val_loss': 2999.470703125, 'hp_metric': 2999.470703125}
spotpython tuning: 2999.470703125 [###-------] 27.50%
In fun(): config:
{'act_fn': Swish(),
'batch_norm': False,
'batch_size': 16,
'dropout_prob': 0.025,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 0.1,
'optimizer': 'Adadelta',
'patience': 8}
train_model result: {'val_loss': 3127.93896484375, 'hp_metric': 3127.93896484375}
In fun(): config:
{'act_fn': Swish(),
'batch_norm': True,
'batch_size': 16,
'dropout_prob': 1.3735480035911157e-05,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 6.852508798588737,
'optimizer': 'Adadelta',
'patience': 8}
train_model result: {'val_loss': 6169.18408203125, 'hp_metric': 6169.18408203125}
In fun(): config:
{'act_fn': Swish(),
'batch_norm': True,
'batch_size': 16,
'dropout_prob': 1.3735480035911157e-05,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 6.852508798588737,
'optimizer': 'Adadelta',
'patience': 8}
train_model result: {'val_loss': 4684.66552734375, 'hp_metric': 4684.66552734375}
spotpython tuning: 2999.470703125 [#####-----] 48.92%
In fun(): config:
{'act_fn': Swish(),
'batch_norm': False,
'batch_size': 16,
'dropout_prob': 0.025,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 0.1,
'optimizer': 'Adadelta',
'patience': 8}
train_model result: {'val_loss': 23172.3203125, 'hp_metric': 23172.3203125}
In fun(): config:
{'act_fn': Swish(),
'batch_norm': True,
'batch_size': 16,
'dropout_prob': 0.014398448777489579,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 0.10008800547563546,
'optimizer': 'Adadelta',
'patience': 4}
train_model result: {'val_loss': 24067.509765625, 'hp_metric': 24067.509765625}
In fun(): config:
{'act_fn': Swish(),
'batch_norm': True,
'batch_size': 16,
'dropout_prob': 0.014398448777489579,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 0.10008800547563546,
'optimizer': 'Adadelta',
'patience': 4}
train_model result: {'val_loss': 24102.650390625, 'hp_metric': 24102.650390625}
spotpython tuning: 2999.470703125 [########--] 77.62%
In fun(): config:
{'act_fn': Swish(),
'batch_norm': True,
'batch_size': 16,
'dropout_prob': 1.3735480035911157e-05,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 6.852508798588737,
'optimizer': 'Adadelta',
'patience': 8}
train_model result: {'val_loss': 6485.90625, 'hp_metric': 6485.90625}
In fun(): config:
{'act_fn': Swish(),
'batch_norm': False,
'batch_size': 2048,
'dropout_prob': 0.0015803470928499983,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 4.802051661116844,
'optimizer': 'Adadelta',
'patience': 8}
train_model result: {'val_loss': 13812.83984375, 'hp_metric': 13812.83984375}
In fun(): config:
{'act_fn': Swish(),
'batch_norm': False,
'batch_size': 2048,
'dropout_prob': 0.0015803470928499983,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 4.802051661116844,
'optimizer': 'Adadelta',
'patience': 8}
train_model result: {'val_loss': 23325.818359375, 'hp_metric': 23325.818359375}
spotpython tuning: 2999.470703125 [##########] 95.19%
In fun(): config:
{'act_fn': Swish(),
'batch_norm': True,
'batch_size': 16,
'dropout_prob': 1.3735480035911157e-05,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 6.852508798588737,
'optimizer': 'Adadelta',
'patience': 8}
train_model result: {'val_loss': 9430.7490234375, 'hp_metric': 9430.7490234375}
In fun(): config:
{'act_fn': Swish(),
'batch_norm': True,
'batch_size': 32,
'dropout_prob': 0.0010128687836724918,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 2.455487025188125,
'optimizer': 'Adadelta',
'patience': 4}
train_model result: {'val_loss': 7487.3359375, 'hp_metric': 7487.3359375}
In fun(): config:
{'act_fn': Swish(),
'batch_norm': True,
'batch_size': 32,
'dropout_prob': 0.0010128687836724918,
'epochs': 128,
'initialization': 'kaiming_uniform',
'l1': 8,
'lr_mult': 2.455487025188125,
'optimizer': 'Adadelta',
'patience': 4}
train_model result: {'val_loss': 7131.48486328125, 'hp_metric': 7131.48486328125}
spotpython tuning: 2999.470703125 [##########] 100.00% Done...
Experiment saved to spot_700_experiment.pickle
| name | type | default | lower | upper | tuned | transform | importance | stars |
|----------------|--------|-----------|---------|---------|------------------------|-----------------------|--------------|---------|
| l1 | int | 3 | 3.0 | 4.0 | 3.0 | transform_power_2_int | 0.46 | . |
| epochs | int | 4 | 3.0 | 7.0 | 7.0 | transform_power_2_int | 0.06 | |
| batch_size | int | 4 | 4.0 | 11.0 | 4.0 | transform_power_2_int | 19.97 | * |
| act_fn | factor | ReLU | 0.0 | 5.0 | Swish | None | 2.23 | * |
| optimizer | factor | SGD | 0.0 | 2.0 | Adadelta | None | 98.19 | *** |
| dropout_prob | float | 0.01 | 0.0 | 0.025 | 1.3735480035911157e-05 | None | 2.50 | * |
| lr_mult | float | 1.0 | 0.1 | 10.0 | 6.852508798588737 | None | 0.24 | . |
| patience | int | 2 | 2.0 | 3.0 | 3.0 | transform_power_2_int | 39.57 | * |
| batch_norm | factor | 0 | 0.0 | 1.0 | 1 | None | 3.31 | * |
| initialization | factor | Default | 0.0 | 4.0 | kaiming_uniform | None | 100.00 | *** |