nn.mlp.MLP
nn.mlp.MLP(
in_channels,
hidden_channels=None,
norm_layer=None,
activation_layer=torch.nn.ReLU,
inplace=None,
bias=True,
dropout=0.0,
lr=1.0,
l1=64,
num_hidden_layers=2,
output_dim=1,
)This block implements the multi-layer perceptron (MLP) module.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| in_channels | int | Number of channels of the input | required |
| hidden_channels | List[int] | List of the hidden channel dimensions. Note that the last element of this list is the output dimension of the network. | None |
| norm_layer | Callable[…, torch.nn.Module] |
Norm layer that will be stacked on top of the linear layer. If None this layer won’t be used. Default: None |
None |
| activation_layer | Callable[…, torch.nn.Module] |
Activation function which will be stacked on top of the normalization layer (if not None), otherwise on top of the linear layer. If None this layer won’t be used. Default: torch.nn.ReLU |
torch.nn.ReLU |
| inplace | bool | Parameter for the activation layer, which can optionally do the operation in-place. Default is None, which uses the respective default values of the activation_layer and Dropout layer. |
None |
| bias | bool | Whether to use bias in the linear layer. Default True |
True |
| dropout | float | The probability for the dropout layer. Default: 0.0 | 0.0 |
| lr | float | Unified learning rate multiplier. This value is automatically scaled to optimizer-specific learning rates using the map_lr() function. A value of 1.0 corresponds to the optimizer’s default learning rate. Default: 1.0. | 1.0 |
| l1 | int | Number of neurons in each hidden layer. Will only be used if hidden_channels is None. Default: 64 | 64 |
| num_hidden_layers | int | Number of hidden layers. Will only be used if hidden_channels is None. Default: 2 | 2 |
Note
Parameter Definitions:
hidden_channels: This defines the explicit architecture of the MLP. It is a list where each element is the size of a layer. The last element is the output dimension. Example:
[32, 32, 1]means two hidden layers of size 32 and an output layer of size 1.l1 and num_hidden_layers: These are helper parameters often used in hyperparameter optimization (see
get_default_parameters()). They will only be used if hidden_channels is None.l1: The number of neurons in each hidden layer.num_hidden_layers: The number of hidden layers before the output layer.
They describe the architecture in a more compact way but are less flexible than
hidden_channels. Relationship: To convertl1andnum_hidden_layerstohidden_channelsfor a givenoutput_dim:hidden_channels = [l1] * num_hidden_layers + [output_dim]
Examples
Basic usage:
>>> import torch
>>> from spotoptim.nn.mlp import MLP
>>> # Input: 10 features. Output (is considered a hidden layer): 30 features. Hidden layer: 20 neurons.
>>> mlp = MLP(in_channels=10, hidden_channels=[20, 30])
>>> x = torch.randn(5, 10)
>>> output = mlp(x)
>>> print(output.shape)
torch.Size([5, 30])Using get_optimizer:
>>> model = MLP(in_channels=10, hidden_channels=[32, 1], lr=0.5)
>>> optimizer = model.get_optimizer("Adam") # Uses 0.5 * 0.001
>>> print(optimizer)
Adam (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.999)
capturable: False
differentiable: False
eps: 1e-08
foreach: None
fused: None
lr: 0.0005
maximize: False
weight_decay: 0
)Using l1 and num_hidden_layers parameters: This example shows how to use the hyperparameters suggested by get_default_parameters() to construct the hidden_channels list.
>>> input_dim = 10
>>> output_dim = 1
>>>
>>> # Hyperparameters (e.g., from spotoptim tuning)
>>> l1 = 64
>>> num_hidden_layers = 2
>>>
>>> # Construct hidden_channels list
>>> # [64, 64, 1] -> 2 hidden layers of 64, output layer of 1
>>> # Relationship: To convert l1 and num_hidden_layers to hidden_channels for a given output_dim:
>>> # hidden_channels = [l1] * num_hidden_layers + [output_dim]
>>> # but we can pass l1 and num_hidden_layers directly to the constructor
>>> model = MLP(in_channels=input_dim, l1=l1, num_hidden_layers=num_hidden_layers, output_dim=output_dim)
>>> print(model)
MLP(
(0): Linear(in_features=10, out_features=64, bias=True)
(1): ReLU()
(2): Dropout(p=0.0, inplace=False)
(3): Linear(in_features=64, out_features=64, bias=True)
(4): ReLU()
(5): Dropout(p=0.0, inplace=False)
(6): Linear(in_features=64, out_features=1, bias=True)
(7): Dropout(p=0.0, inplace=False)
)Getting default parameters for tuning:
>>> params = MLP.get_default_parameters()
>>> print(params.names())
['l1', 'num_hidden_layers', 'activation', 'lr', 'optimizer']Methods
| Name | Description |
|---|---|
| get_default_parameters | Returns a ParameterSet populated with default hyperparameters for this model. |
| get_optimizer | Get a PyTorch optimizer configured for this model. |
get_default_parameters
nn.mlp.MLP.get_default_parameters()Returns a ParameterSet populated with default hyperparameters for this model.
Note
Since MLP structure is generic (list of hidden channels), the default parameters provided here are a starting point assuming a simple structure similar to LinearRegressor (l1 units per layer, num_hidden_layers). This might need adjustment for specific architectures.
Returns
| Name | Type | Description |
|---|---|---|
| ParameterSet | ParameterSet | Default hyperparameters. |
Examples
>>> params = MLP.get_default_parameters()
>>> print(params.names())
['l1', 'num_hidden_layers', 'activation', 'lr', 'optimizer']get_optimizer
nn.mlp.MLP.get_optimizer(optimizer_name='Adam', lr=None, **kwargs)Get a PyTorch optimizer configured for this model.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| optimizer_name | str | Name of the optimizer from torch.optim. Defaults to “Adam”. | 'Adam' |
| lr | float | Unified learning rate multiplier. If None, uses self.lr. This value is automatically scaled to optimizer-specific learning rates. A value of 1.0 corresponds to the optimizer’s default learning rate. Defaults to None (uses self.lr). | None |
| **kwargs | Any | Additional optimizer-specific parameters. | {} |
Returns
| Name | Type | Description |
|---|---|---|
optim.Optimizer |
optim.Optimizer: Configured optimizer instance ready for training. |