[51]{.chapter-number}  [Explainable AI with SpotPython and Pytorch]{.chapter-title}

doi:10.48550/arXiv.2307.10262

51 Explainable AI with SpotPython and Pytorch

Note, the divergence_threshold is set to 3,000, which is based on some pre-experiments with the user data set.

from spotpython.data.diabetes import Diabetes
from spotpython.hyperdict.light_hyper_dict import LightHyperDict
from spotpython.fun.hyperlight import HyperLight
from spotpython.utils.init import (fun_control_init, surrogate_control_init, design_control_init)
from spotpython.spot import Spot
from spotpython.utils.file import get_experiment_filename
from spotpython.hyperparameters.values import set_hyperparameter
from math import inf

PREFIX="602_12_1"

data_set = Diabetes()

fun_control = fun_control_init(
    save_experiment=True,
    PREFIX=PREFIX,
    fun_evals=inf,
    max_time=1,
    data_set = data_set,
    core_model_name="light.regression.NNLinearRegressor",
    hyperdict=LightHyperDict,
    divergence_threshold=3_000,
    _L_in=10,
    _L_out=1)

fun = HyperLight().fun


set_hyperparameter(fun_control, "optimizer", [ "Adadelta", "Adam", "Adamax"])
set_hyperparameter(fun_control, "l1", [3,7])
set_hyperparameter(fun_control, "epochs", [10,12])
set_hyperparameter(fun_control, "batch_size", [4,11])
set_hyperparameter(fun_control, "dropout_prob", [0.0, 0.025])
set_hyperparameter(fun_control, "patience", [2,9])

design_control = design_control_init(init_size=7)

S = Spot(fun=fun,fun_control=fun_control, design_control=design_control)

module_name: light
submodule_name: regression
model_name: NNLinearRegressor
Experiment saved to 602_12_1_exp.pkl

51.1 Running the Hyperparameter Tuning or Loading the Existing Model

S.run()

train_model result: {'val_loss': 24008.5703125, 'hp_metric': 24008.5703125}

train_model result: {'val_loss': 25638.1328125, 'hp_metric': 25638.1328125}
train_model result: {'val_loss': 4298.646484375, 'hp_metric': 4298.646484375}

train_model result: {'val_loss': 23521.087890625, 'hp_metric': 23521.087890625}

train_model result: {'val_loss': 234333.578125, 'hp_metric': 234333.578125}

train_model result: {'val_loss': 24054.23046875, 'hp_metric': 24054.23046875}
train_model result: {'val_loss': 24206.201171875, 'hp_metric': 24206.201171875}
Anisotropic model: n_theta set to 10

train_model result: {'val_loss': 22380.19140625, 'hp_metric': 22380.19140625}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [----------] 2.04%

train_model result: {'val_loss': 24153.66015625, 'hp_metric': 24153.66015625}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [----------] 2.83%

train_model result: {'val_loss': 24075.626953125, 'hp_metric': 24075.626953125}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [----------] 4.28%

train_model result: {'val_loss': 24020.830078125, 'hp_metric': 24020.830078125}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [#---------] 6.36%

train_model result: {'val_loss': 23978.169921875, 'hp_metric': 23978.169921875}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [#---------] 7.56%

train_model result: {'val_loss': 22584.30859375, 'hp_metric': 22584.30859375}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [#---------] 9.20%

train_model result: {'val_loss': 8845.439453125, 'hp_metric': 8845.439453125}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [#---------] 10.70%

train_model result: {'val_loss': 16774.826171875, 'hp_metric': 16774.826171875}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [#---------] 11.94%

train_model result: {'val_loss': 24116.01953125, 'hp_metric': 24116.01953125}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [#---------] 13.13%

train_model result: {'val_loss': 10244.6591796875, 'hp_metric': 10244.6591796875}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [##--------] 15.31%

train_model result: {'val_loss': 14276.7373046875, 'hp_metric': 14276.7373046875}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [##--------] 16.94%

train_model result: {'val_loss': 21580.390625, 'hp_metric': 21580.390625}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [##--------] 19.18%

train_model result: {'val_loss': 23898.48828125, 'hp_metric': 23898.48828125}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [##--------] 21.09%

train_model result: {'val_loss': nan, 'hp_metric': nan}
train_model result: {'val_loss': 23886.98046875, 'hp_metric': 23886.98046875}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [##--------] 23.20%

train_model result: {'val_loss': 23654.2734375, 'hp_metric': 23654.2734375}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [##--------] 24.68%

train_model result: {'val_loss': 4790.43701171875, 'hp_metric': 4790.43701171875}
Anisotropic model: n_theta set to 10
spotpython tuning: 4298.646484375 [###-------] 26.47%

train_model result: {'val_loss': 3729.91064453125, 'hp_metric': 3729.91064453125}
Anisotropic model: n_theta set to 10
spotpython tuning: 3729.91064453125 [###-------] 29.51%

train_model result: {'val_loss': 17016.587890625, 'hp_metric': 17016.587890625}
Anisotropic model: n_theta set to 10
spotpython tuning: 3729.91064453125 [###-------] 32.69%

train_model result: {'val_loss': 24055.74609375, 'hp_metric': 24055.74609375}
Anisotropic model: n_theta set to 10
spotpython tuning: 3729.91064453125 [###-------] 34.89%

train_model result: {'val_loss': 24045.384765625, 'hp_metric': 24045.384765625}
Anisotropic model: n_theta set to 10
spotpython tuning: 3729.91064453125 [####------] 36.97%

train_model result: {'val_loss': 3415.248291015625, 'hp_metric': 3415.248291015625}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [####------] 39.49%

train_model result: {'val_loss': 23599.37109375, 'hp_metric': 23599.37109375}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [####------] 42.01%

train_model result: {'val_loss': 21852.49609375, 'hp_metric': 21852.49609375}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [#####-----] 45.02%

train_model result: {'val_loss': 15969.2724609375, 'hp_metric': 15969.2724609375}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [#####-----] 50.22%

train_model result: {'val_loss': 14395.564453125, 'hp_metric': 14395.564453125}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [#####-----] 53.10%

train_model result: {'val_loss': 8442.767578125, 'hp_metric': 8442.767578125}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [######----] 56.26%

train_model result: {'val_loss': 22751.6171875, 'hp_metric': 22751.6171875}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [######----] 58.15%

train_model result: {'val_loss': 24064.79296875, 'hp_metric': 24064.79296875}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [######----] 60.19%

train_model result: {'val_loss': 8674.9853515625, 'hp_metric': 8674.9853515625}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [######----] 63.43%

train_model result: {'val_loss': 22978.201171875, 'hp_metric': 22978.201171875}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [#######---] 66.11%

train_model result: {'val_loss': nan, 'hp_metric': nan}
train_model result: {'val_loss': 17358.623046875, 'hp_metric': 17358.623046875}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [#######---] 69.80%

train_model result: {'val_loss': 20936.974609375, 'hp_metric': 20936.974609375}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [#######---] 72.92%

train_model result: {'val_loss': 6638.09765625, 'hp_metric': 6638.09765625}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [########--] 77.73%

train_model result: {'val_loss': nan, 'hp_metric': nan}
train_model result: {'val_loss': 20395.26953125, 'hp_metric': 20395.26953125}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [########--] 80.77%

train_model result: {'val_loss': 24013.6015625, 'hp_metric': 24013.6015625}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [########--] 83.82%

train_model result: {'val_loss': 16228.6416015625, 'hp_metric': 16228.6416015625}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [#########-] 86.75%

train_model result: {'val_loss': 525553.5, 'hp_metric': 525553.5}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [#########-] 89.82%

train_model result: {'val_loss': 23161.849609375, 'hp_metric': 23161.849609375}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [#########-] 91.70%

train_model result: {'val_loss': 22778.419921875, 'hp_metric': 22778.419921875}
Anisotropic model: n_theta set to 10
spotpython tuning: 3415.248291015625 [#########-] 94.91%

train_model result: {'val_loss': 3392.896728515625, 'hp_metric': 3392.896728515625}
Anisotropic model: n_theta set to 10
spotpython tuning: 3392.896728515625 [##########] 98.07%

train_model result: {'val_loss': 24202.5078125, 'hp_metric': 24202.5078125}
Anisotropic model: n_theta set to 10
spotpython tuning: 3392.896728515625 [##########] 100.00% Done...

Experiment saved to 602_12_1_res.pkl

<spotpython.spot.spot.Spot at 0x12294e750>

51.2 Results from the Hyperparameter Tuning Experiment

After the hyperparameter tuning is finished, the following information is available:
- the S object and the associated
- fun_control dictionary

S.print_results(print_screen=True)

min y: 3392.896728515625
l1: 4.0
epochs: 12.0
batch_size: 6.0
act_fn: 3.0
optimizer: 2.0
dropout_prob: 0.013374200165559312
lr_mult: 10.0
patience: 2.0
batch_norm: 0.0
initialization: 4.0

[['l1', np.float64(4.0)],
 ['epochs', np.float64(12.0)],
 ['batch_size', np.float64(6.0)],
 ['act_fn', np.float64(3.0)],
 ['optimizer', np.float64(2.0)],
 ['dropout_prob', np.float64(0.013374200165559312)],
 ['lr_mult', np.float64(10.0)],
 ['patience', np.float64(2.0)],
 ['batch_norm', np.float64(0.0)],
 ['initialization', np.float64(4.0)]]

S.plot_progress()

51.2.1 Getting the Best Model, i.e, the Tuned Architecture

The method get_tuned_architecture [DOC] returns the best model architecture found during the hyperparameter tuning.
It returns the transformed values, i.e., batch_size = 2^x if the hyperparameter batch_size was transformed with the transform_power_2_int function.

from spotpython.hyperparameters.values import get_tuned_architecture
import pprint
config = get_tuned_architecture(S)
pprint.pprint(config)

{'act_fn': LeakyReLU(),
 'batch_norm': False,
 'batch_size': 64,
 'dropout_prob': 0.013374200165559312,
 'epochs': 4096,
 'initialization': 'xavier_normal',
 'l1': 16,
 'lr_mult': 10.0,
 'optimizer': 'Adamax',
 'patience': 4}

Note: get_tuned_architecture has the option force_minX which does not have any effect in this case.

from spotpython.hyperparameters.values import get_tuned_architecture
config = get_tuned_architecture(S, force_minX=True)
pprint.pprint(config)

{'act_fn': LeakyReLU(),
 'batch_norm': False,
 'batch_size': 64,
 'dropout_prob': 0.013374200165559312,
 'epochs': 4096,
 'initialization': 'xavier_normal',
 'l1': 16,
 'lr_mult': 10.0,
 'optimizer': 'Adamax',
 'patience': 4}

51.3 Training the Tuned Architecture on the Test Data

Since we are interested in the explainability of the model, we will train the tuned architecture on the test data.
spotpythons’s test_model function [DOC] is used to train the model on the test data.
Note: Until now, we do not use any information about the NN’s weights and biases. Only the architecture, which is available as the config, is used.
spotpython used the TensorBoard logger to save the training process in the ./runs directory. Therefore, we have to enable the TensorBoard logger in the fun_control dictionary. To get a clean start, we remove an existing runs folder.

from spotpython.light.testmodel import test_model
from spotpython.light.loadmodel import load_light_from_checkpoint
fun_control.update({"tensorboard_log": True})
test_model(config, fun_control)

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│         hp_metric         │     3204.912353515625     │
│         val_loss          │     3204.912353515625     │
└───────────────────────────┴───────────────────────────┘

test_model result: {'val_loss': 3204.912353515625, 'hp_metric': 3204.912353515625}

(3204.912353515625, 3204.912353515625)

model = load_light_from_checkpoint(config, fun_control)

config: {'l1': 16, 'epochs': 4096, 'batch_size': 64, 'act_fn': LeakyReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.013374200165559312, 'lr_mult': 10.0, 'patience': 4, 'batch_norm': False, 'initialization': 'xavier_normal'}
Loading model with 16_4096_64_LeakyReLU_Adamax_0.0134_10.0_4_False_xavier_normal_TEST from runs/saved_models/16_4096_64_LeakyReLU_Adamax_0.0134_10.0_4_False_xavier_normal_TEST/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=320, bias=True)
    (1): LeakyReLU()
    (2): Dropout(p=0.013374200165559312, inplace=False)
    (3): Linear(in_features=320, out_features=160, bias=True)
    (4): LeakyReLU()
    (5): Dropout(p=0.013374200165559312, inplace=False)
    (6): Linear(in_features=160, out_features=320, bias=True)
    (7): LeakyReLU()
    (8): Dropout(p=0.013374200165559312, inplace=False)
    (9): Linear(in_features=320, out_features=160, bias=True)
    (10): LeakyReLU()
    (11): Dropout(p=0.013374200165559312, inplace=False)
    (12): Linear(in_features=160, out_features=160, bias=True)
    (13): LeakyReLU()
    (14): Dropout(p=0.013374200165559312, inplace=False)
    (15): Linear(in_features=160, out_features=80, bias=True)
    (16): LeakyReLU()
    (17): Dropout(p=0.013374200165559312, inplace=False)
    (18): Linear(in_features=80, out_features=80, bias=True)
    (19): LeakyReLU()
    (20): Dropout(p=0.013374200165559312, inplace=False)
    (21): Linear(in_features=80, out_features=1, bias=True)
  )
)

51.3.0.1 Details of the Training Process on the Test Data

The test_model method initializes the model with the tuned architecture as follows:

model = fun_control["core_model"](**config, _L_in=_L_in, _L_out=_L_out, _torchmetric=_torchmetric)

Then, the Lightning Trainer is initialized with the fun_control dictionary and the model as follows:

    trainer = L.Trainer(
    default_root_dir=os.path.join(fun_control["CHECKPOINT_PATH"], config_id),
    max_epochs=model.hparams.epochs,
    accelerator=fun_control["accelerator"],
    devices=fun_control["devices"],
    logger=TensorBoardLogger(
        save_dir=fun_control["TENSORBOARD_PATH"],
        version=config_id,
        default_hp_metric=True,
        log_graph=fun_control["log_graph"],
    ),
    callbacks=[
        EarlyStopping(monitor="val_loss", patience=config["patience"], mode="min", strict=False, verbose=False),
        ModelCheckpoint(
            dirpath=os.path.join(fun_control["CHECKPOINT_PATH"], config_id), save_last=True
        ), 
    ],
    enable_progress_bar=enable_progress_bar,
)
trainer.fit(model=model, datamodule=dm)    
test_result = trainer.test(datamodule=dm, ckpt_path="last")

As shown in the code above, the last checkpoint ist saved.
spotpython’s method load_light_from_checkpoint is used to load the last checkpoint and to get the model’s weights and biases. It requires the fun_control dictionary and the config_id as input to find the correct checkpoint.
Now, the model is trained and the weights and biases are available.

51.4 Visualizing the Neural Network Architecture

# get the device
from spotpython.utils.device import getDevice
device = getDevice()

from spotpython.plot.xai import viz_net
viz_net(model, device=device)

51.5 XAI Methods

spotpython provides methods to explain the model’s predictions. The following neural network elements can be analyzed:

51.5.1 Weights

Weights are the parameters of the neural network that are learned from the data during training. They connect neurons between layers and determine the strength and direction of the signal sent from one neuron to another. The network adjusts the weights during training to minimize the error between the predicted output and the actual output.
Interpretation of the weights: A high weight value indicates a strong influence of the input neuron on the output. Positive weights suggest a positive correlation, whereas negative weights suggest an inverse relationship between neurons.

51.5.2 Activations

Activations are the outputs produced by neurons after applying an activation function to the weighted sum of inputs. The activation function (e.g., ReLU, sigmoid, tanh) adds non-linearity to the model, allowing it to learn more complex relationships.
Interpretation of the activations: The value of activations indicates the intensity of the signal passed to the next layer. Certain activation patterns can highlight which features or parts of the data the network is focusing on.

51.5.3 Gradients

Gradients are the partial derivatives of the loss function with respect to different parameters (weights) of the network. During backpropagation, gradients are used to update the weights in the direction that reduces the loss by methods like gradient descent.
Interpretation of the gradients: The magnitude of the gradient indicates how much a parameter should change to reduce the error. A large gradient implies a steeper slope and a bigger update, while a small gradient suggests that the parameter is near an optimal point. If gradients are too small (vanishing gradient problem), the network may learn slowly or stop learning. If they are too large (exploding gradient problem), the updates may be unstable.
sptpython provides the method get_gradients to get the gradients of the model.

from spotpython.plot.xai import (get_activations, get_gradients, get_weights, visualize_weights, visualize_gradients, visualize_mean_activations, visualize_gradient_distributions, visualize_weights_distributions, visualize_activations_distributions)
batch_size = config["batch_size"]

51.5.4 Getting the Weights

from spotpython.plot.xai import sort_layers
weights, _ = get_weights(model)
# sort_layers(weights)

visualize_weights(model, absolute=True, cmap="GreenYellowRed", figsize=(6, 6))

3200 values in Layer Layer 0. Geometry: (320, 10)

51200 values in Layer Layer 3. Geometry: (160, 320)

51200 values in Layer Layer 6. Geometry: (320, 160)

51200 values in Layer Layer 9. Geometry: (160, 320)

25600 values in Layer Layer 12. Geometry: (160, 160)

12800 values in Layer Layer 15. Geometry: (80, 160)

6400 values in Layer Layer 18. Geometry: (80, 80)

80 values in Layer Layer 21. Geometry: (1, 80)

visualize_weights_distributions(model, color=f"C{0}", columns=4)

n:8

51.5.5 Getting the Activations

from spotpython.plot.xai import get_activations
activations, mean_activations, layer_sizes = get_activations(net=model, fun_control=fun_control, batch_size=batch_size, device=device)

train_size: 0.36, val_size: 0.24, test_sie: 0.4 for splitting train & val data.
train samples: 160, val samples: 106 generated for train & val data.
LightDataModule.train_dataloader(). data_train size: 160

visualize_mean_activations(mean_activations, layer_sizes=layer_sizes, absolute=True, cmap="GreenYellowRed", figsize=(6, 6))

320 values in Layer 0. Geometry: (1, 320)

160 values in Layer 3. Geometry: (1, 160)

320 values in Layer 6. Geometry: (1, 320)

160 values in Layer 9. Geometry: (1, 160)

160 values in Layer 12. Geometry: (1, 160)

80 values in Layer 15. Geometry: (1, 80)

80 values in Layer 18. Geometry: (1, 80)

visualize_activations_distributions(activations=activations,
                                    net=model, color="C0", columns=4)

51.5.6 Getting the Gradients

gradients, _ = get_gradients(net=model, fun_control=fun_control, batch_size=batch_size, device=device)

train_size: 0.36, val_size: 0.24, test_sie: 0.4 for splitting train & val data.
train samples: 160, val samples: 106 generated for train & val data.
LightDataModule.train_dataloader(). data_train size: 160

visualize_gradients(model, fun_control, batch_size, absolute=True, cmap="GreenYellowRed", figsize=(6, 6), device=device)

train_size: 0.36, val_size: 0.24, test_sie: 0.4 for splitting train & val data.
train samples: 160, val samples: 106 generated for train & val data.
LightDataModule.train_dataloader(). data_train size: 160
3200 values in Layer layers.0.weight. Geometry: (320, 10)

51200 values in Layer layers.3.weight. Geometry: (160, 320)

51200 values in Layer layers.6.weight. Geometry: (320, 160)

51200 values in Layer layers.9.weight. Geometry: (160, 320)

25600 values in Layer layers.12.weight. Geometry: (160, 160)

12800 values in Layer layers.15.weight. Geometry: (80, 160)

6400 values in Layer layers.18.weight. Geometry: (80, 80)

80 values in Layer layers.21.weight. Geometry: (1, 80)

visualize_gradient_distributions(model, fun_control, batch_size=batch_size, color=f"C{0}", device=device, columns=3)

train_size: 0.36, val_size: 0.24, test_sie: 0.4 for splitting train & val data.
train samples: 160, val samples: 106 generated for train & val data.
LightDataModule.train_dataloader(). data_train size: 160
n:8

51.6 Feature Attributions

51.6.1 Integrated Gradients

from spotpython.plot.xai import get_attributions, plot_attributions
df_att = get_attributions(S, fun_control, attr_method="IntegratedGradients", n_rel=10)
plot_attributions(df_att, attr_method="IntegratedGradients")

train_model result: {'val_loss': 276376.03125, 'hp_metric': 276376.03125}
config: {'l1': 16, 'epochs': 4096, 'batch_size': 64, 'act_fn': LeakyReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.013374200165559312, 'lr_mult': 10.0, 'patience': 4, 'batch_norm': False, 'initialization': 'xavier_normal'}
Loading model with 16_4096_64_LeakyReLU_Adamax_0.0134_10.0_4_False_xavier_normal_TRAIN from runs/saved_models/16_4096_64_LeakyReLU_Adamax_0.0134_10.0_4_False_xavier_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=320, bias=True)
    (1): LeakyReLU()
    (2): Dropout(p=0.013374200165559312, inplace=False)
    (3): Linear(in_features=320, out_features=160, bias=True)
    (4): LeakyReLU()
    (5): Dropout(p=0.013374200165559312, inplace=False)
    (6): Linear(in_features=160, out_features=320, bias=True)
    (7): LeakyReLU()
    (8): Dropout(p=0.013374200165559312, inplace=False)
    (9): Linear(in_features=320, out_features=160, bias=True)
    (10): LeakyReLU()
    (11): Dropout(p=0.013374200165559312, inplace=False)
    (12): Linear(in_features=160, out_features=160, bias=True)
    (13): LeakyReLU()
    (14): Dropout(p=0.013374200165559312, inplace=False)
    (15): Linear(in_features=160, out_features=80, bias=True)
    (16): LeakyReLU()
    (17): Dropout(p=0.013374200165559312, inplace=False)
    (18): Linear(in_features=80, out_features=80, bias=True)
    (19): LeakyReLU()
    (20): Dropout(p=0.013374200165559312, inplace=False)
    (21): Linear(in_features=80, out_features=1, bias=True)
  )
)
train_size: 0.36, val_size: 0.24, test_sie: 0.4 for splitting test data.
test samples: 177 generated for test data.
LightDataModule.test_dataloader(). Test set size: 177

51.6.2 Deep Lift

df_lift = get_attributions(S, fun_control, attr_method="DeepLift",n_rel=10)
print(df_lift)
plot_attributions(df_lift,  attr_method="DeepLift")

train_model result: {'val_loss': 545710.4375, 'hp_metric': 545710.4375}
config: {'l1': 16, 'epochs': 4096, 'batch_size': 64, 'act_fn': LeakyReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.013374200165559312, 'lr_mult': 10.0, 'patience': 4, 'batch_norm': False, 'initialization': 'xavier_normal'}
Loading model with 16_4096_64_LeakyReLU_Adamax_0.0134_10.0_4_False_xavier_normal_TRAIN from runs/saved_models/16_4096_64_LeakyReLU_Adamax_0.0134_10.0_4_False_xavier_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=320, bias=True)
    (1): LeakyReLU()
    (2): Dropout(p=0.013374200165559312, inplace=False)
    (3): Linear(in_features=320, out_features=160, bias=True)
    (4): LeakyReLU()
    (5): Dropout(p=0.013374200165559312, inplace=False)
    (6): Linear(in_features=160, out_features=320, bias=True)
    (7): LeakyReLU()
    (8): Dropout(p=0.013374200165559312, inplace=False)
    (9): Linear(in_features=320, out_features=160, bias=True)
    (10): LeakyReLU()
    (11): Dropout(p=0.013374200165559312, inplace=False)
    (12): Linear(in_features=160, out_features=160, bias=True)
    (13): LeakyReLU()
    (14): Dropout(p=0.013374200165559312, inplace=False)
    (15): Linear(in_features=160, out_features=80, bias=True)
    (16): LeakyReLU()
    (17): Dropout(p=0.013374200165559312, inplace=False)
    (18): Linear(in_features=80, out_features=80, bias=True)
    (19): LeakyReLU()
    (20): Dropout(p=0.013374200165559312, inplace=False)
    (21): Linear(in_features=80, out_features=1, bias=True)
  )
)
train_size: 0.36, val_size: 0.24, test_sie: 0.4 for splitting test data.
test samples: 177 generated for test data.
LightDataModule.test_dataloader(). Test set size: 177
   Feature Index Feature  DeepLiftAttribution
0              6  s3_hdl           387.663086
1              7  s4_tch           335.390594
2              2     bmi           314.796387
3              5  s2_ldl           313.770782
4              9  s6_glu           294.742889
5              3      bp           282.895996
6              0     age           270.284454
7              1     sex           240.448441
8              8  s5_ltg           235.279114
9              4   s1_tc           223.170135

51.6.3 Feature Ablation

df_fl = get_attributions(S, fun_control, attr_method="FeatureAblation",n_rel=10)

train_model result: {'val_loss': 45109.37890625, 'hp_metric': 45109.37890625}
config: {'l1': 16, 'epochs': 4096, 'batch_size': 64, 'act_fn': LeakyReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.013374200165559312, 'lr_mult': 10.0, 'patience': 4, 'batch_norm': False, 'initialization': 'xavier_normal'}
Loading model with 16_4096_64_LeakyReLU_Adamax_0.0134_10.0_4_False_xavier_normal_TRAIN from runs/saved_models/16_4096_64_LeakyReLU_Adamax_0.0134_10.0_4_False_xavier_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=320, bias=True)
    (1): LeakyReLU()
    (2): Dropout(p=0.013374200165559312, inplace=False)
    (3): Linear(in_features=320, out_features=160, bias=True)
    (4): LeakyReLU()
    (5): Dropout(p=0.013374200165559312, inplace=False)
    (6): Linear(in_features=160, out_features=320, bias=True)
    (7): LeakyReLU()
    (8): Dropout(p=0.013374200165559312, inplace=False)
    (9): Linear(in_features=320, out_features=160, bias=True)
    (10): LeakyReLU()
    (11): Dropout(p=0.013374200165559312, inplace=False)
    (12): Linear(in_features=160, out_features=160, bias=True)
    (13): LeakyReLU()
    (14): Dropout(p=0.013374200165559312, inplace=False)
    (15): Linear(in_features=160, out_features=80, bias=True)
    (16): LeakyReLU()
    (17): Dropout(p=0.013374200165559312, inplace=False)
    (18): Linear(in_features=80, out_features=80, bias=True)
    (19): LeakyReLU()
    (20): Dropout(p=0.013374200165559312, inplace=False)
    (21): Linear(in_features=80, out_features=1, bias=True)
  )
)
train_size: 0.36, val_size: 0.24, test_sie: 0.4 for splitting test data.
test samples: 177 generated for test data.
LightDataModule.test_dataloader(). Test set size: 177

print(df_fl)
plot_attributions(df_fl, attr_method="FeatureAblation")

   Feature Index Feature  FeatureAblationAttribution
0              6  s3_hdl                  284.679016
1              7  s4_tch                  272.092926
2              5  s2_ldl                  229.824738
3              9  s6_glu                  221.573792
4              2     bmi                  213.897018
5              3      bp                  174.111221
6              8  s5_ltg                  164.738464
7              4   s1_tc                  157.881592
8              0     age                  155.606445
9              1     sex                  131.627594

51.7 Conductance

from spotpython.plot.xai import plot_conductance_last_layer, get_weights_conductance_last_layer
weights_last, layer_conductance_last = get_weights_conductance_last_layer(S, fun_control)
plot_conductance_last_layer(weights_last, layer_conductance_last, figsize=(6, 6))

train_model result: {'val_loss': 14965.318359375, 'hp_metric': 14965.318359375}
config: {'l1': 16, 'epochs': 4096, 'batch_size': 64, 'act_fn': LeakyReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.013374200165559312, 'lr_mult': 10.0, 'patience': 4, 'batch_norm': False, 'initialization': 'xavier_normal'}
Loading model with 16_4096_64_LeakyReLU_Adamax_0.0134_10.0_4_False_xavier_normal_TRAIN from runs/saved_models/16_4096_64_LeakyReLU_Adamax_0.0134_10.0_4_False_xavier_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=320, bias=True)
    (1): LeakyReLU()
    (2): Dropout(p=0.013374200165559312, inplace=False)
    (3): Linear(in_features=320, out_features=160, bias=True)
    (4): LeakyReLU()
    (5): Dropout(p=0.013374200165559312, inplace=False)
    (6): Linear(in_features=160, out_features=320, bias=True)
    (7): LeakyReLU()
    (8): Dropout(p=0.013374200165559312, inplace=False)
    (9): Linear(in_features=320, out_features=160, bias=True)
    (10): LeakyReLU()
    (11): Dropout(p=0.013374200165559312, inplace=False)
    (12): Linear(in_features=160, out_features=160, bias=True)
    (13): LeakyReLU()
    (14): Dropout(p=0.013374200165559312, inplace=False)
    (15): Linear(in_features=160, out_features=80, bias=True)
    (16): LeakyReLU()
    (17): Dropout(p=0.013374200165559312, inplace=False)
    (18): Linear(in_features=80, out_features=80, bias=True)
    (19): LeakyReLU()
    (20): Dropout(p=0.013374200165559312, inplace=False)
    (21): Linear(in_features=80, out_features=1, bias=True)
  )
)
train_model result: {'val_loss': 48323.77734375, 'hp_metric': 48323.77734375}
config: {'l1': 16, 'epochs': 4096, 'batch_size': 64, 'act_fn': LeakyReLU(), 'optimizer': 'Adamax', 'dropout_prob': 0.013374200165559312, 'lr_mult': 10.0, 'patience': 4, 'batch_norm': False, 'initialization': 'xavier_normal'}
Loading model with 16_4096_64_LeakyReLU_Adamax_0.0134_10.0_4_False_xavier_normal_TRAIN from runs/saved_models/16_4096_64_LeakyReLU_Adamax_0.0134_10.0_4_False_xavier_normal_TRAIN/last.ckpt
Model: NNLinearRegressor(
  (layers): Sequential(
    (0): Linear(in_features=10, out_features=320, bias=True)
    (1): LeakyReLU()
    (2): Dropout(p=0.013374200165559312, inplace=False)
    (3): Linear(in_features=320, out_features=160, bias=True)
    (4): LeakyReLU()
    (5): Dropout(p=0.013374200165559312, inplace=False)
    (6): Linear(in_features=160, out_features=320, bias=True)
    (7): LeakyReLU()
    (8): Dropout(p=0.013374200165559312, inplace=False)
    (9): Linear(in_features=320, out_features=160, bias=True)
    (10): LeakyReLU()
    (11): Dropout(p=0.013374200165559312, inplace=False)
    (12): Linear(in_features=160, out_features=160, bias=True)
    (13): LeakyReLU()
    (14): Dropout(p=0.013374200165559312, inplace=False)
    (15): Linear(in_features=160, out_features=80, bias=True)
    (16): LeakyReLU()
    (17): Dropout(p=0.013374200165559312, inplace=False)
    (18): Linear(in_features=80, out_features=80, bias=True)
    (19): LeakyReLU()
    (20): Dropout(p=0.013374200165559312, inplace=False)
    (21): Linear(in_features=80, out_features=1, bias=True)
  )
)
Conductance analysis for layer:  Linear(in_features=80, out_features=1, bias=True)