39  Explainable AI with SpotPython and Pytorch

from spotpython.data.diabetes import Diabetes
from spotpython.hyperdict.light_hyper_dict import LightHyperDict
from spotpython.fun.hyperlight import HyperLight
from spotpython.utils.init import (fun_control_init, surrogate_control_init, design_control_init)
from spotpython.spot import Spot
from spotpython.utils.file import get_experiment_filename
from spotpython.hyperparameters.values import set_hyperparameter
from math import inf

PREFIX="602_12_1"

data_set = Diabetes()

fun_control = fun_control_init(
    save_experiment=True,
    PREFIX=PREFIX,
    fun_evals=inf,
    max_time=1,
    data_set = data_set,
    core_model_name="light.regression.NNLinearRegressor",
    hyperdict=LightHyperDict,
    _L_in=10,
    _L_out=1)

fun = HyperLight().fun


set_hyperparameter(fun_control, "optimizer", [ "Adadelta", "Adam", "Adamax"])
set_hyperparameter(fun_control, "l1", [3,7])
set_hyperparameter(fun_control, "epochs", [10,12])
set_hyperparameter(fun_control, "batch_size", [4,11])
set_hyperparameter(fun_control, "dropout_prob", [0.0, 0.025])
set_hyperparameter(fun_control, "patience", [2,9])

design_control = design_control_init(init_size=7)

S = Spot(fun=fun,fun_control=fun_control, design_control=design_control)
module_name: light
submodule_name: regression
model_name: NNLinearRegressor
Experiment saved to 602_12_1_exp.pkl

39.1 Running the Hyperparameter Tuning or Loading the Existing Model

S.run()
train_model result: {'val_loss': nan, 'hp_metric': nan}
train_model result: {'val_loss': 3589.231201171875, 'hp_metric': 3589.231201171875}
train_model result: {'val_loss': 3321.471923828125, 'hp_metric': 3321.471923828125}
train_model result: {'val_loss': 4960.193359375, 'hp_metric': 4960.193359375}
train_model result: {'val_loss': 4307.60595703125, 'hp_metric': 4307.60595703125}
train_model result: {'val_loss': 5645.673828125, 'hp_metric': 5645.673828125}
train_model result: {'val_loss': 4062.306884765625, 'hp_metric': 4062.306884765625}
train_model result: {'val_loss': 4190.7880859375, 'hp_metric': 4190.7880859375}
spotpython tuning: 3321.471923828125 [#---------] 5.43% 
train_model result: {'val_loss': 4271.2177734375, 'hp_metric': 4271.2177734375}
spotpython tuning: 3321.471923828125 [##--------] 16.38% 
train_model result: {'val_loss': 4208.11865234375, 'hp_metric': 4208.11865234375}
spotpython tuning: 3321.471923828125 [##--------] 19.71% 
train_model result: {'val_loss': 3138.77587890625, 'hp_metric': 3138.77587890625}
spotpython tuning: 3138.77587890625 [##--------] 21.51% 
train_model result: {'val_loss': 14858.453125, 'hp_metric': 14858.453125}
spotpython tuning: 3138.77587890625 [###-------] 25.04% 
train_model result: {'val_loss': 4250.97265625, 'hp_metric': 4250.97265625}
spotpython tuning: 3138.77587890625 [######----] 62.62% 
train_model result: {'val_loss': 378039665950720.0, 'hp_metric': 378039665950720.0}
spotpython tuning: 3138.77587890625 [##########] 100.00% Done...

Experiment saved to 602_12_1_res.pkl
<spotpython.spot.spot.Spot at 0x15622cb00>

39.2 Results from the Hyperparameter Tuning Experiment

  • After the hyperparameter tuning is finished, the following information is available:
    • the S object and the associated
    • fun_control dictionary
S.print_results(print_screen=True)
min y: 3138.77587890625
l1: 4.0
epochs: 12.0
batch_size: 5.0
act_fn: 2.0
optimizer: 1.0
dropout_prob: 0.003528741652944332
lr_mult: 5.090832865590933
patience: 2.0
batch_norm: 0.0
initialization: 1.0
S.plot_progress()

39.2.1 Getting the Best Model, i.e, the Tuned Architecture

  • The method get_tuned_architecture [DOC] returns the best model architecture found during the hyperparameter tuning.
  • It returns the transformed values, i.e., batch_size = 2^x if the hyperparameter batch_size was transformed with the transform_power_2_int function.
from spotpython.hyperparameters.values import get_tuned_architecture
import pprint
config = get_tuned_architecture(S)
pprint.pprint(config)
{'act_fn': ReLU(),
 'batch_norm': False,
 'batch_size': 32,
 'dropout_prob': 0.003528741652944332,
 'epochs': 4096,
 'initialization': 'kaiming_uniform',
 'l1': 16,
 'lr_mult': 5.090832865590933,
 'optimizer': 'Adam',
 'patience': 4}
  • Note: get_tuned_architecture has the option force_minX which does not have any effect in this case.
from spotpython.hyperparameters.values import get_tuned_architecture
config = get_tuned_architecture(S, force_minX=True)
pprint.pprint(config)
{'act_fn': ReLU(),
 'batch_norm': False,
 'batch_size': 32,
 'dropout_prob': 0.003528741652944332,
 'epochs': 4096,
 'initialization': 'kaiming_uniform',
 'l1': 16,
 'lr_mult': 5.090832865590933,
 'optimizer': 'Adam',
 'patience': 4}

39.3 Training the Tuned Architecture on the Test Data

  • Since we are interested in the explainability of the model, we will train the tuned architecture on the test data.
  • spotpythons’s test_model function [DOC] is used to train the model on the test data.
  • Note: Until now, we do not use any information about the NN’s weights and biases. Only the architecture, which is available as the config, is used.
  • spotpython used the TensorBoard logger to save the training process in the ./runs directory. Therefore, we have to enable the TensorBoard logger in the fun_control dictionary. To get a clean start, we remove an existing runs folder.
from spotpython.light.testmodel import test_model
from spotpython.light.loadmodel import load_light_from_checkpoint
fun_control.update({"tensorboard_log": True})
test_model(config, fun_control)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric               DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric              2800.421630859375     │
│         val_loss               2800.421630859375     │
└───────────────────────────┴───────────────────────────┘
test_model result: {'val_loss': 2800.421630859375, 'hp_metric': 2800.421630859375}
(2800.421630859375, 2800.421630859375)
model = load_light_from_checkpoint(config, fun_control)
config: {'l1': 16, 'epochs': 4096, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'Adam', 'dropout_prob': 0.003528741652944332, 'lr_mult': 5.090832865590933, 'patience': 4, 'batch_norm': False, 'initialization': 'kaiming_uniform'}
Loading model with 16_4096_32_ReLU_Adam_0.0035_5.0908_4_False_kaiming_uniform_TEST from runs/saved_models/16_4096_32_ReLU_Adam_0.0035_5.0908_4_False_kaiming_uniform_TEST/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=320, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.003528741652944332, inplace=False)
    (3): Linear(in_features=320, out_features=160, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.003528741652944332, inplace=False)
    (6): Linear(in_features=160, out_features=320, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.003528741652944332, inplace=False)
    (9): Linear(in_features=320, out_features=160, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.003528741652944332, inplace=False)
    (12): Linear(in_features=160, out_features=160, bias=True)
    (13): ReLU()
    (14): Dropout(p=0.003528741652944332, inplace=False)
    (15): Linear(in_features=160, out_features=80, bias=True)
    (16): ReLU()
    (17): Dropout(p=0.003528741652944332, inplace=False)
    (18): Linear(in_features=80, out_features=80, bias=True)
    (19): ReLU()
    (20): Dropout(p=0.003528741652944332, inplace=False)
    (21): Linear(in_features=80, out_features=1, bias=True)
  )
)

39.3.0.1 Details of the Training Process on the Test Data

  • The test_model method initializes the model with the tuned architecture as follows:
model = fun_control["core_model"](**config, _L_in=_L_in, _L_out=_L_out, _torchmetric=_torchmetric)
  • Then, the Lightning Trainer is initialized with the fun_control dictionary and the model as follows:

        trainer = L.Trainer(
        default_root_dir=os.path.join(fun_control["CHECKPOINT_PATH"], config_id),
        max_epochs=model.hparams.epochs,
        accelerator=fun_control["accelerator"],
        devices=fun_control["devices"],
        logger=TensorBoardLogger(
            save_dir=fun_control["TENSORBOARD_PATH"],
            version=config_id,
            default_hp_metric=True,
            log_graph=fun_control["log_graph"],
        ),
        callbacks=[
            EarlyStopping(monitor="val_loss", patience=config["patience"], mode="min", strict=False, verbose=False),
            ModelCheckpoint(
                dirpath=os.path.join(fun_control["CHECKPOINT_PATH"], config_id), save_last=True
            ), 
        ],
        enable_progress_bar=enable_progress_bar,
    )
    trainer.fit(model=model, datamodule=dm)    
    test_result = trainer.test(datamodule=dm, ckpt_path="last")
  • As shown in the code above, the last checkpoint ist saved.

  • spotpython’s method load_light_from_checkpoint is used to load the last checkpoint and to get the model’s weights and biases. It requires the fun_control dictionary and the config_id as input to find the correct checkpoint.

  • Now, the model is trained and the weights and biases are available.

39.4 Visualizing the Neural Network Architecture

# get the device
from spotpython.utils.device import getDevice
device = getDevice()
from spotpython.plot.xai import viz_net
viz_net(model, device=device)

architecture

39.5 XAI Methods

  • spotpython provides methods to explain the model’s predictions. The following neural network elements can be analyzed:

39.5.1 Weights

  • Weights are the parameters of the neural network that are learned from the data during training. They connect neurons between layers and determine the strength and direction of the signal sent from one neuron to another. The network adjusts the weights during training to minimize the error between the predicted output and the actual output.
  • Interpretation of the weights: A high weight value indicates a strong influence of the input neuron on the output. Positive weights suggest a positive correlation, whereas negative weights suggest an inverse relationship between neurons.

39.5.2 Activations

  • Activations are the outputs produced by neurons after applying an activation function to the weighted sum of inputs. The activation function (e.g., ReLU, sigmoid, tanh) adds non-linearity to the model, allowing it to learn more complex relationships.
  • Interpretation of the activations: The value of activations indicates the intensity of the signal passed to the next layer. Certain activation patterns can highlight which features or parts of the data the network is focusing on.

39.5.3 Gradients

  • Gradients are the partial derivatives of the loss function with respect to different parameters (weights) of the network. During backpropagation, gradients are used to update the weights in the direction that reduces the loss by methods like gradient descent.
  • Interpretation of the gradients: The magnitude of the gradient indicates how much a parameter should change to reduce the error. A large gradient implies a steeper slope and a bigger update, while a small gradient suggests that the parameter is near an optimal point. If gradients are too small (vanishing gradient problem), the network may learn slowly or stop learning. If they are too large (exploding gradient problem), the updates may be unstable.
  • sptpython provides the method get_gradients to get the gradients of the model.
from spotpython.plot.xai import (get_activations, get_gradients, get_weights, visualize_weights, visualize_gradients, visualize_mean_activations, visualize_gradient_distributions, visualize_weights_distributions, visualize_activations_distributions)
batch_size = config["batch_size"]

39.5.4 Getting the Weights

from spotpython.plot.xai import sort_layers
weights, _ = get_weights(model)
# sort_layers(weights)
visualize_weights(model, absolute=True, cmap="GreenYellowRed", figsize=(6, 6))
3200 values in Layer Layer 0. Geometry: (320, 10)

51200 values in Layer Layer 3. Geometry: (160, 320)

51200 values in Layer Layer 6. Geometry: (320, 160)

51200 values in Layer Layer 9. Geometry: (160, 320)

25600 values in Layer Layer 12. Geometry: (160, 160)

12800 values in Layer Layer 15. Geometry: (80, 160)

6400 values in Layer Layer 18. Geometry: (80, 80)

80 values in Layer Layer 21. Geometry: (1, 80)

visualize_weights_distributions(model, color=f"C{0}", columns=4)
n:8

39.5.5 Getting the Activations

from spotpython.plot.xai import get_activations
activations, mean_activations, layer_sizes = get_activations(net=model, fun_control=fun_control, batch_size=batch_size, device=device)
train_size: 0.36, val_size: 0.24, test_sie: 0.4 for splitting train & val data.
train samples: 160, val samples: 106 generated for train & val data.
LightDataModule.train_dataloader(). data_train size: 160
visualize_mean_activations(mean_activations, layer_sizes=layer_sizes, absolute=True, cmap="GreenYellowRed", figsize=(6, 6))
320 values in Layer 0. Geometry: (1, 320)

160 values in Layer 3. Geometry: (1, 160)

320 values in Layer 6. Geometry: (1, 320)

160 values in Layer 9. Geometry: (1, 160)

160 values in Layer 12. Geometry: (1, 160)

80 values in Layer 15. Geometry: (1, 80)

80 values in Layer 18. Geometry: (1, 80)

visualize_activations_distributions(activations=activations,
                                    net=model, color="C0", columns=4)

39.5.6 Getting the Gradients

gradients, _ = get_gradients(net=model, fun_control=fun_control, batch_size=batch_size, device=device)
train_size: 0.36, val_size: 0.24, test_sie: 0.4 for splitting train & val data.
train samples: 160, val samples: 106 generated for train & val data.
LightDataModule.train_dataloader(). data_train size: 160
visualize_gradients(model, fun_control, batch_size, absolute=True, cmap="GreenYellowRed", figsize=(6, 6), device=device)
train_size: 0.36, val_size: 0.24, test_sie: 0.4 for splitting train & val data.
train samples: 160, val samples: 106 generated for train & val data.
LightDataModule.train_dataloader(). data_train size: 160
3200 values in Layer layers.0.weight. Geometry: (320, 10)

51200 values in Layer layers.3.weight. Geometry: (160, 320)

51200 values in Layer layers.6.weight. Geometry: (320, 160)

51200 values in Layer layers.9.weight. Geometry: (160, 320)

25600 values in Layer layers.12.weight. Geometry: (160, 160)

12800 values in Layer layers.15.weight. Geometry: (80, 160)

6400 values in Layer layers.18.weight. Geometry: (80, 80)

80 values in Layer layers.21.weight. Geometry: (1, 80)

visualize_gradient_distributions(model, fun_control, batch_size=batch_size, color=f"C{0}", device=device, columns=3)
train_size: 0.36, val_size: 0.24, test_sie: 0.4 for splitting train & val data.
train samples: 160, val samples: 106 generated for train & val data.
LightDataModule.train_dataloader(). data_train size: 160
n:8

39.6 Feature Attributions

39.6.1 Integrated Gradients

from spotpython.plot.xai import get_attributions, plot_attributions
df_att = get_attributions(S, fun_control, attr_method="IntegratedGradients", n_rel=10)
plot_attributions(df_att, attr_method="IntegratedGradients")
train_model result: {'val_loss': 3042.60986328125, 'hp_metric': 3042.60986328125}
config: {'l1': 16, 'epochs': 4096, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'Adam', 'dropout_prob': 0.003528741652944332, 'lr_mult': 5.090832865590933, 'patience': 4, 'batch_norm': False, 'initialization': 'kaiming_uniform'}
Loading model with 16_4096_32_ReLU_Adam_0.0035_5.0908_4_False_kaiming_uniform_TRAIN from runs/saved_models/16_4096_32_ReLU_Adam_0.0035_5.0908_4_False_kaiming_uniform_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=320, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.003528741652944332, inplace=False)
    (3): Linear(in_features=320, out_features=160, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.003528741652944332, inplace=False)
    (6): Linear(in_features=160, out_features=320, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.003528741652944332, inplace=False)
    (9): Linear(in_features=320, out_features=160, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.003528741652944332, inplace=False)
    (12): Linear(in_features=160, out_features=160, bias=True)
    (13): ReLU()
    (14): Dropout(p=0.003528741652944332, inplace=False)
    (15): Linear(in_features=160, out_features=80, bias=True)
    (16): ReLU()
    (17): Dropout(p=0.003528741652944332, inplace=False)
    (18): Linear(in_features=80, out_features=80, bias=True)
    (19): ReLU()
    (20): Dropout(p=0.003528741652944332, inplace=False)
    (21): Linear(in_features=80, out_features=1, bias=True)
  )
)
train_size: 0.36, val_size: 0.24, test_sie: 0.4 for splitting test data.
test samples: 177 generated for test data.
LightDataModule.test_dataloader(). Test set size: 177

39.6.2 Deep Lift

df_lift = get_attributions(S, fun_control, attr_method="DeepLift",n_rel=10)
print(df_lift)
plot_attributions(df_lift,  attr_method="DeepLift")
train_model result: {'val_loss': 2877.055419921875, 'hp_metric': 2877.055419921875}
config: {'l1': 16, 'epochs': 4096, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'Adam', 'dropout_prob': 0.003528741652944332, 'lr_mult': 5.090832865590933, 'patience': 4, 'batch_norm': False, 'initialization': 'kaiming_uniform'}
Loading model with 16_4096_32_ReLU_Adam_0.0035_5.0908_4_False_kaiming_uniform_TRAIN from runs/saved_models/16_4096_32_ReLU_Adam_0.0035_5.0908_4_False_kaiming_uniform_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=320, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.003528741652944332, inplace=False)
    (3): Linear(in_features=320, out_features=160, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.003528741652944332, inplace=False)
    (6): Linear(in_features=160, out_features=320, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.003528741652944332, inplace=False)
    (9): Linear(in_features=320, out_features=160, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.003528741652944332, inplace=False)
    (12): Linear(in_features=160, out_features=160, bias=True)
    (13): ReLU()
    (14): Dropout(p=0.003528741652944332, inplace=False)
    (15): Linear(in_features=160, out_features=80, bias=True)
    (16): ReLU()
    (17): Dropout(p=0.003528741652944332, inplace=False)
    (18): Linear(in_features=80, out_features=80, bias=True)
    (19): ReLU()
    (20): Dropout(p=0.003528741652944332, inplace=False)
    (21): Linear(in_features=80, out_features=1, bias=True)
  )
)
train_size: 0.36, val_size: 0.24, test_sie: 0.4 for splitting test data.
test samples: 177 generated for test data.
LightDataModule.test_dataloader(). Test set size: 177
   Feature Index Feature  DeepLiftAttribution
0              3      bp           963.972290
1              0     age           850.080261
2              1     sex           839.660645
3              2     bmi           774.651489
4              8  s5_ltg           727.053894
5              9  s6_glu           700.455994
6              6  s3_hdl           677.958496
7              5  s2_ldl           490.871857
8              4   s1_tc           414.463257
9              7  s4_tch           363.745087

39.6.3 Feature Ablation

df_fl = get_attributions(S, fun_control, attr_method="FeatureAblation",n_rel=10)
train_model result: {'val_loss': 3676.307861328125, 'hp_metric': 3676.307861328125}
config: {'l1': 16, 'epochs': 4096, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'Adam', 'dropout_prob': 0.003528741652944332, 'lr_mult': 5.090832865590933, 'patience': 4, 'batch_norm': False, 'initialization': 'kaiming_uniform'}
Loading model with 16_4096_32_ReLU_Adam_0.0035_5.0908_4_False_kaiming_uniform_TRAIN from runs/saved_models/16_4096_32_ReLU_Adam_0.0035_5.0908_4_False_kaiming_uniform_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=320, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.003528741652944332, inplace=False)
    (3): Linear(in_features=320, out_features=160, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.003528741652944332, inplace=False)
    (6): Linear(in_features=160, out_features=320, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.003528741652944332, inplace=False)
    (9): Linear(in_features=320, out_features=160, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.003528741652944332, inplace=False)
    (12): Linear(in_features=160, out_features=160, bias=True)
    (13): ReLU()
    (14): Dropout(p=0.003528741652944332, inplace=False)
    (15): Linear(in_features=160, out_features=80, bias=True)
    (16): ReLU()
    (17): Dropout(p=0.003528741652944332, inplace=False)
    (18): Linear(in_features=80, out_features=80, bias=True)
    (19): ReLU()
    (20): Dropout(p=0.003528741652944332, inplace=False)
    (21): Linear(in_features=80, out_features=1, bias=True)
  )
)
train_size: 0.36, val_size: 0.24, test_sie: 0.4 for splitting test data.
test samples: 177 generated for test data.
LightDataModule.test_dataloader(). Test set size: 177
print(df_fl)
plot_attributions(df_fl, attr_method="FeatureAblation")
   Feature Index Feature  FeatureAblationAttribution
0              3      bp                  579.265625
1              0     age                  498.394257
2              1     sex                  401.831909
3              2     bmi                  401.706512
4              8  s5_ltg                  397.067322
5              6  s3_hdl                  359.057617
6              9  s6_glu                  323.271332
7              5  s2_ldl                  231.789368
8              4   s1_tc                  104.779633
9              7  s4_tch                   75.582512

39.7 Conductance

from spotpython.plot.xai import plot_conductance_last_layer, get_weights_conductance_last_layer
weights_last, layer_conductance_last = get_weights_conductance_last_layer(S, fun_control)
plot_conductance_last_layer(weights_last, layer_conductance_last, figsize=(6, 6))
train_model result: {'val_loss': 2850.458251953125, 'hp_metric': 2850.458251953125}
config: {'l1': 16, 'epochs': 4096, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'Adam', 'dropout_prob': 0.003528741652944332, 'lr_mult': 5.090832865590933, 'patience': 4, 'batch_norm': False, 'initialization': 'kaiming_uniform'}
Loading model with 16_4096_32_ReLU_Adam_0.0035_5.0908_4_False_kaiming_uniform_TRAIN from runs/saved_models/16_4096_32_ReLU_Adam_0.0035_5.0908_4_False_kaiming_uniform_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=320, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.003528741652944332, inplace=False)
    (3): Linear(in_features=320, out_features=160, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.003528741652944332, inplace=False)
    (6): Linear(in_features=160, out_features=320, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.003528741652944332, inplace=False)
    (9): Linear(in_features=320, out_features=160, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.003528741652944332, inplace=False)
    (12): Linear(in_features=160, out_features=160, bias=True)
    (13): ReLU()
    (14): Dropout(p=0.003528741652944332, inplace=False)
    (15): Linear(in_features=160, out_features=80, bias=True)
    (16): ReLU()
    (17): Dropout(p=0.003528741652944332, inplace=False)
    (18): Linear(in_features=80, out_features=80, bias=True)
    (19): ReLU()
    (20): Dropout(p=0.003528741652944332, inplace=False)
    (21): Linear(in_features=80, out_features=1, bias=True)
  )
)
train_model result: {'val_loss': 3232.12451171875, 'hp_metric': 3232.12451171875}
config: {'l1': 16, 'epochs': 4096, 'batch_size': 32, 'act_fn': ReLU(), 'optimizer': 'Adam', 'dropout_prob': 0.003528741652944332, 'lr_mult': 5.090832865590933, 'patience': 4, 'batch_norm': False, 'initialization': 'kaiming_uniform'}
Loading model with 16_4096_32_ReLU_Adam_0.0035_5.0908_4_False_kaiming_uniform_TRAIN from runs/saved_models/16_4096_32_ReLU_Adam_0.0035_5.0908_4_False_kaiming_uniform_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=320, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.003528741652944332, inplace=False)
    (3): Linear(in_features=320, out_features=160, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.003528741652944332, inplace=False)
    (6): Linear(in_features=160, out_features=320, bias=True)
    (7): ReLU()
    (8): Dropout(p=0.003528741652944332, inplace=False)
    (9): Linear(in_features=320, out_features=160, bias=True)
    (10): ReLU()
    (11): Dropout(p=0.003528741652944332, inplace=False)
    (12): Linear(in_features=160, out_features=160, bias=True)
    (13): ReLU()
    (14): Dropout(p=0.003528741652944332, inplace=False)
    (15): Linear(in_features=160, out_features=80, bias=True)
    (16): ReLU()
    (17): Dropout(p=0.003528741652944332, inplace=False)
    (18): Linear(in_features=80, out_features=80, bias=True)
    (19): ReLU()
    (20): Dropout(p=0.003528741652944332, inplace=False)
    (21): Linear(in_features=80, out_features=1, bias=True)
  )
)
Conductance analysis for layer:  Linear(in_features=80, out_features=1, bias=True)