nn.mlp.MLP

nn.mlp.MLP(
    in_channels,
    hidden_channels=None,
    norm_layer=None,
    activation_layer=torch.nn.ReLU,
    inplace=None,
    bias=True,
    dropout=0.0,
    lr=1.0,
    l1=64,
    num_hidden_layers=2,
    output_dim=1,
)

This block implements the multi-layer perceptron (MLP) module.

Parameters

Name Type Description Default
in_channels int Number of channels of the input required
hidden_channels List[int] List of the hidden channel dimensions. Note that the last element of this list is the output dimension of the network. None
norm_layer Callable[…, torch.nn.Module] Norm layer that will be stacked on top of the linear layer. If None this layer won’t be used. Default: None None
activation_layer Callable[…, torch.nn.Module] Activation function which will be stacked on top of the normalization layer (if not None), otherwise on top of the linear layer. If None this layer won’t be used. Default: torch.nn.ReLU torch.nn.ReLU
inplace bool Parameter for the activation layer, which can optionally do the operation in-place. Default is None, which uses the respective default values of the activation_layer and Dropout layer. None
bias bool Whether to use bias in the linear layer. Default True True
dropout float The probability for the dropout layer. Default: 0.0 0.0
lr float Unified learning rate multiplier. This value is automatically scaled to optimizer-specific learning rates using the map_lr() function. A value of 1.0 corresponds to the optimizer’s default learning rate. Default: 1.0. 1.0
l1 int Number of neurons in each hidden layer. Will only be used if hidden_channels is None. Default: 64 64
num_hidden_layers int Number of hidden layers. Will only be used if hidden_channels is None. Default: 2 2

Note

Parameter Definitions:

  • hidden_channels: This defines the explicit architecture of the MLP. It is a list where each element is the size of a layer. The last element is the output dimension. Example: [32, 32, 1] means two hidden layers of size 32 and an output layer of size 1.

  • l1 and num_hidden_layers: These are helper parameters often used in hyperparameter optimization (see get_default_parameters()). They will only be used if hidden_channels is None.

    • l1: The number of neurons in each hidden layer.
    • num_hidden_layers: The number of hidden layers before the output layer.

    They describe the architecture in a more compact way but are less flexible than hidden_channels. Relationship: To convert l1 and num_hidden_layers to hidden_channels for a given output_dim: hidden_channels = [l1] * num_hidden_layers + [output_dim]

Examples

Basic usage:

>>> import torch
>>> from spotoptim.nn.mlp import MLP
>>> # Input: 10 features. Output (is considered a hidden layer): 30 features. Hidden layer: 20 neurons.
>>> mlp = MLP(in_channels=10, hidden_channels=[20, 30])
>>> x = torch.randn(5, 10)
>>> output = mlp(x)
>>> print(output.shape)
torch.Size([5, 30])

Using get_optimizer:

>>> model = MLP(in_channels=10, hidden_channels=[32, 1], lr=0.5)
>>> optimizer = model.get_optimizer("Adam")  # Uses 0.5 * 0.001
>>> print(optimizer)
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.0005
    maximize: False
    weight_decay: 0
)

Using l1 and num_hidden_layers parameters: This example shows how to use the hyperparameters suggested by get_default_parameters() to construct the hidden_channels list.

>>> input_dim = 10
>>> output_dim = 1
>>>
>>> # Hyperparameters (e.g., from spotoptim tuning)
>>> l1 = 64
>>> num_hidden_layers = 2
>>>
>>> # Construct hidden_channels list
>>> # [64, 64, 1] -> 2 hidden layers of 64, output layer of 1
>>> # Relationship: To convert l1 and num_hidden_layers to hidden_channels for a given output_dim:
>>> # hidden_channels = [l1] * num_hidden_layers + [output_dim]
>>> # but we can pass l1 and num_hidden_layers directly to the constructor
>>> model = MLP(in_channels=input_dim, l1=l1, num_hidden_layers=num_hidden_layers, output_dim=output_dim)
>>> print(model)
MLP(
  (0): Linear(in_features=10, out_features=64, bias=True)
  (1): ReLU()
  (2): Dropout(p=0.0, inplace=False)
  (3): Linear(in_features=64, out_features=64, bias=True)
  (4): ReLU()
  (5): Dropout(p=0.0, inplace=False)
  (6): Linear(in_features=64, out_features=1, bias=True)
  (7): Dropout(p=0.0, inplace=False)
)

Getting default parameters for tuning:

>>> params = MLP.get_default_parameters()
>>> print(params.names())
['l1', 'num_hidden_layers', 'activation', 'lr', 'optimizer']

Methods

Name Description
get_default_parameters Returns a ParameterSet populated with default hyperparameters for this model.
get_optimizer Get a PyTorch optimizer configured for this model.

get_default_parameters

nn.mlp.MLP.get_default_parameters()

Returns a ParameterSet populated with default hyperparameters for this model.

Note

Since MLP structure is generic (list of hidden channels), the default parameters provided here are a starting point assuming a simple structure similar to LinearRegressor (l1 units per layer, num_hidden_layers). This might need adjustment for specific architectures.

Returns

Name Type Description
ParameterSet ParameterSet Default hyperparameters.

Examples

>>> params = MLP.get_default_parameters()
>>> print(params.names())
['l1', 'num_hidden_layers', 'activation', 'lr', 'optimizer']

get_optimizer

nn.mlp.MLP.get_optimizer(optimizer_name='Adam', lr=None, **kwargs)

Get a PyTorch optimizer configured for this model.

Parameters

Name Type Description Default
optimizer_name str Name of the optimizer from torch.optim. Defaults to “Adam”. 'Adam'
lr float Unified learning rate multiplier. If None, uses self.lr. This value is automatically scaled to optimizer-specific learning rates. A value of 1.0 corresponds to the optimizer’s default learning rate. Defaults to None (uses self.lr). None
**kwargs Any Additional optimizer-specific parameters. {}

Returns

Name Type Description
optim.Optimizer optim.Optimizer: Configured optimizer instance ready for training.