8  Factor Variables for Categorical Hyperparameters

SpotOptim supports factor variables for optimizing categorical hyperparameters, such as activation functions, optimizers, or any discrete string-based choices. Factor variables are automatically converted between string values (external interface) and integers (internal optimization), making categorical optimization seamless.

8.1 Overview

What are Factor Variables?

Factor variables allow you to specify categorical choices as tuples of strings in the bounds. SpotOptim handles the conversion:

  1. String tuples in bounds → Internal integer mapping (0, 1, 2, …)
  2. Optimization uses integers internally for surrogate modeling
  3. Objective function receives strings after automatic conversion
  4. Results return strings (not integers)

Module: spotoptim.SpotOptim

Key Features:

  • Define categorical choices as string tuples: ("ReLU", "Sigmoid", "Tanh")
  • Automatic integer↔︎string conversion
  • Seamless integration with neural network hyperparameters
  • Mix factor variables with numeric/integer variables

8.2 Quick Start

8.2.1 Basic Factor Variable Usage

from spotoptim import SpotOptim
import numpy as np

def objective_function(X):
    """Objective function receives string values."""
    results = []
    for params in X:
        activation = params[0]  # This is a string!
        print(f"Testing activation: {activation}")
        
        # Simple scoring based on activation choice (for demonstration)
        # In real use, you would train a model and return actual performance
        scores = {
            "ReLU": 3500.0,
            "Sigmoid": 4200.0,
            "Tanh": 3800.0,
            "LeakyReLU": 3600.0
        }
        score = scores.get(activation, 5000.0) + np.random.normal(0, 100)
        results.append(score)
    return np.array(results)  # Return numpy array

# Define bounds with factor variable
optimizer = SpotOptim(
    fun=objective_function,
    bounds=[("ReLU", "Sigmoid", "Tanh", "LeakyReLU")],
    var_type=["factor"],
    max_iter=20,
    seed=42
)

result = optimizer.optimize()
print(f"\nBest activation: {result.x[0]}")  # Returns string, e.g., "ReLU"
print(f"Best score: {result.fun:.4f}")
Testing activation: ReLU
Testing activation: Sigmoid
Testing activation: Tanh
Testing activation: LeakyReLU
Testing activation: Tanh
Testing activation: LeakyReLU
Testing activation: Sigmoid
Testing activation: Sigmoid
Testing activation: LeakyReLU
Testing activation: LeakyReLU
Testing activation: LeakyReLU
Testing activation: ReLU
Testing activation: Sigmoid
Testing activation: Tanh
Testing activation: ReLU
Testing activation: Sigmoid
Testing activation: LeakyReLU
Testing activation: Sigmoid
Testing activation: ReLU
Testing activation: Sigmoid

Best activation: ReLU
Best score: 3327.5082

8.2.2 Neural Network Activation Function Optimization

import torch
import torch.nn as nn
from spotoptim import SpotOptim
from spotoptim.data import get_diabetes_dataloaders
from spotoptim.nn.linear_regressor import LinearRegressor
import numpy as np

def train_and_evaluate(X):
    """Train models with different activation functions."""
    results = []
    
    for params in X:
        activation = params[0]  # String: "ReLU", "Sigmoid", etc.
        
        # Load data
        train_loader, test_loader, _ = get_diabetes_dataloaders()
        
        # Create model with the activation function
        model = LinearRegressor(
            input_dim=10,
            output_dim=1,
            l1=64,
            num_hidden_layers=2,
            activation=activation  # Pass string directly!
        )
        
        # Train model
        optimizer = model.get_optimizer("Adam", lr=0.01)
        criterion = nn.MSELoss()
        
        for epoch in range(50):
            model.train()
            for batch_X, batch_y in train_loader:
                predictions = model(batch_X)
                loss = criterion(predictions, batch_y)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
        
        # Evaluate
        model.eval()
        test_loss = 0.0
        with torch.no_grad():
            for batch_X, batch_y in test_loader:
                predictions = model(batch_X)
                test_loss += criterion(predictions, batch_y).item()
        
        avg_loss = test_loss / len(test_loader)
        results.append(avg_loss)
    
    return np.array(results)  # Return numpy array

# Optimize activation function choice
optimizer = SpotOptim(
    fun=train_and_evaluate,
    bounds=[("ReLU", "Sigmoid", "Tanh", "LeakyReLU", "ELU")],
    var_type=["factor"],
    max_iter=30
)

result = optimizer.optimize()
print(f"Best activation function: {result.x[0]}")
print(f"Best test MSE: {result.fun:.4f}")
Best activation function: Sigmoid
Best test MSE: 26416.9928

8.3 Mixed Variable Types

8.3.1 Combining Factor, Integer, and Continuous Variables

import numpy as np
import torch
import torch.nn as nn
from spotoptim import SpotOptim
from spotoptim.data import get_diabetes_dataloaders
from spotoptim.nn.linear_regressor import LinearRegressor

def comprehensive_optimization(X):
    """Optimize learning rate, layer size, depth, and activation."""
    results = []
    
    for params in X:
        log_lr = params[0]      # Continuous (log scale)
        l1 = int(params[1])     # Integer
        n_layers = int(params[2])  # Integer
        activation = params[3]   # Factor (string)
        
        lr = 10 ** log_lr  # Convert from log scale
        
        print(f"lr={lr:.6f}, l1={l1}, layers={n_layers}, activation={activation}")
        
        # Load data
        train_loader, test_loader, _ = get_diabetes_dataloaders(
            batch_size=32,
            random_state=42
        )
        
        # Create model
        model = LinearRegressor(
            input_dim=10,
            output_dim=1,
            l1=l1,
            num_hidden_layers=n_layers,
            activation=activation
        )
        
        # Train
        optimizer = model.get_optimizer("Adam", lr=lr)
        criterion = nn.MSELoss()
        
        for epoch in range(30):
            model.train()
            for batch_X, batch_y in train_loader:
                predictions = model(batch_X)
                loss = criterion(predictions, batch_y)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
        
        # Evaluate
        model.eval()
        test_loss = 0.0
        with torch.no_grad():
            for batch_X, batch_y in test_loader:
                predictions = model(batch_X)
                test_loss += criterion(predictions, batch_y).item()
        
        results.append(test_loss / len(test_loader))
    
    return np.array(results)

# Optimize all four hyperparameters simultaneously
optimizer = SpotOptim(
    fun=comprehensive_optimization,
    bounds=[
        (-4, -2),                                    # log10(learning_rate)
        (16, 128),                                   # l1 (neurons per layer)
        (0, 4),                                      # num_hidden_layers
        ("ReLU", "Sigmoid", "Tanh", "LeakyReLU")   # activation function
    ],
    var_type=["float", "int", "int", "factor"],
    max_iter=50
)

result = optimizer.optimize()

# Results contain original string values
print("\nOptimization Results:")
print(f"Best learning rate: {10**result.x[0]:.6f}")
print(f"Best layer size: {int(result.x[1])}")
print(f"Best num layers: {int(result.x[2])}")
print(f"Best activation: {result.x[3]}")  # String value!
print(f"Best test MSE: {result.fun:.4f}")
lr=0.000184, l1=82, layers=3, activation=Sigmoid
lr=0.001436, l1=71, layers=0, activation=Tanh
lr=0.002149, l1=16, layers=1, activation=ReLU
lr=0.000135, l1=105, layers=3, activation=Tanh
lr=0.005725, l1=116, layers=3, activation=Sigmoid
lr=0.002769, l1=37, layers=2, activation=LeakyReLU
lr=0.006991, l1=93, layers=4, activation=Sigmoid
lr=0.000502, l1=50, layers=1, activation=Tanh
lr=0.000394, l1=49, layers=2, activation=Sigmoid
lr=0.000931, l1=119, layers=2, activation=Tanh
lr=0.005968, l1=116, layers=3, activation=Sigmoid
lr=0.002241, l1=62, layers=3, activation=ReLU
lr=0.000534, l1=24, layers=1, activation=Tanh
lr=0.005972, l1=116, layers=3, activation=Sigmoid
lr=0.000936, l1=68, layers=2, activation=ReLU
lr=0.001352, l1=65, layers=2, activation=LeakyReLU
lr=0.000660, l1=23, layers=2, activation=Sigmoid
lr=0.000411, l1=47, layers=1, activation=Tanh
lr=0.001725, l1=90, layers=3, activation=Sigmoid
lr=0.000418, l1=105, layers=2, activation=Sigmoid
lr=0.005264, l1=79, layers=2, activation=Sigmoid
lr=0.002308, l1=74, layers=2, activation=Sigmoid
lr=0.003242, l1=32, layers=0, activation=Sigmoid
lr=0.000638, l1=72, layers=1, activation=Sigmoid
lr=0.000995, l1=56, layers=2, activation=Tanh
lr=0.002578, l1=46, layers=3, activation=Sigmoid
lr=0.003320, l1=74, layers=1, activation=LeakyReLU
lr=0.000688, l1=74, layers=1, activation=Tanh
lr=0.001720, l1=65, layers=2, activation=Tanh
lr=0.003828, l1=90, layers=2, activation=ReLU
lr=0.000990, l1=85, layers=3, activation=LeakyReLU
lr=0.001191, l1=120, layers=2, activation=Sigmoid
lr=0.001512, l1=94, layers=1, activation=LeakyReLU
lr=0.000225, l1=71, layers=1, activation=Sigmoid
lr=0.001115, l1=97, layers=1, activation=Sigmoid
lr=0.003666, l1=79, layers=1, activation=Sigmoid
lr=0.000471, l1=60, layers=0, activation=Sigmoid
lr=0.000628, l1=48, layers=1, activation=Sigmoid
lr=0.005669, l1=90, layers=2, activation=Tanh
lr=0.000363, l1=71, layers=2, activation=ReLU
lr=0.000231, l1=112, layers=3, activation=LeakyReLU
lr=0.000731, l1=103, layers=1, activation=LeakyReLU
lr=0.000111, l1=46, layers=3, activation=Sigmoid
lr=0.000101, l1=31, layers=1, activation=Tanh
lr=0.001515, l1=80, layers=3, activation=Tanh
lr=0.006210, l1=67, layers=3, activation=Tanh
lr=0.000449, l1=112, layers=4, activation=Tanh
lr=0.000234, l1=28, layers=0, activation=LeakyReLU
lr=0.002183, l1=120, layers=3, activation=Sigmoid
lr=0.002917, l1=16, layers=3, activation=ReLU

Optimization Results:
Best learning rate: 0.005968
Best layer size: 116
Best num layers: 3
Best activation: Sigmoid
Best test MSE: 26381.4785

8.4 Multiple Factor Variables

8.4.1 Optimizing Both Activation and Optimizer

from spotoptim import SpotOptim
from spotoptim.data import get_diabetes_dataloaders
from spotoptim.nn.linear_regressor import LinearRegressor
import torch.nn as nn
import numpy as np

def optimize_activation_and_optimizer(X):
    """Optimize both activation function and optimizer choice."""
    results = []
    
    for params in X:
        activation = params[0]      # Factor variable 1
        optimizer_name = params[1]  # Factor variable 2
        lr = 10 ** params[2]        # Continuous variable
        
        train_loader, test_loader, _ = get_diabetes_dataloaders()
        
        model = LinearRegressor(
            input_dim=10,
            output_dim=1,
            l1=64,
            num_hidden_layers=2,
            activation=activation
        )
        
        # Use the optimizer string
        optimizer = model.get_optimizer(optimizer_name, lr=lr)
        criterion = nn.MSELoss()
        
        # Train
        for epoch in range(30):
            model.train()
            for batch_X, batch_y in train_loader:
                predictions = model(batch_X)
                loss = criterion(predictions, batch_y)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
        
        # Evaluate
        model.eval()
        test_loss = 0.0
        with torch.no_grad():
            for batch_X, batch_y in test_loader:
                predictions = model(batch_X)
                test_loss += criterion(predictions, batch_y).item()
        
        results.append(test_loss / len(test_loader))
    
    return np.array(results)  # Return numpy array

# Two factor variables + one continuous
opt = SpotOptim(
    fun=optimize_activation_and_optimizer,
    bounds=[
        ("ReLU", "Tanh", "Sigmoid", "LeakyReLU"),    # Activation
        ("Adam", "SGD", "RMSprop", "AdamW"),         # Optimizer
        (-4, -2)                                      # log10(lr)
    ],
    var_type=["factor", "factor", "float"],
    max_iter=40
)

result = opt.optimize()
print(f"Best activation: {result.x[0]}")
print(f"Best optimizer: {result.x[1]}")
print(f"Best learning rate: {10**result.x[2]:.6f}")
Best activation: Tanh
Best optimizer: SGD
Best learning rate: 0.009786

8.5 Advanced Usage

8.5.1 Custom Categorical Choices

Factor variables work with any string values, not just activation functions:

from spotoptim import SpotOptim
import numpy as np

def train_model_with_config(dropout_policy, batch_norm, weight_init):
    """Simulate model training with different configurations."""
    # In real use, this would train an actual model
    # Here we return synthetic scores for demonstration
    base_score = 3000.0
    
    # Dropout impact
    dropout_scores = {"none": 200, "light": 0, "heavy": 100}
    # Batch norm impact
    bn_scores = {"before": -50, "after": 0, "none": 150}
    # Weight init impact
    init_scores = {"xavier": 0, "kaiming": -30, "normal": 100}
    
    score = (base_score + 
             dropout_scores.get(dropout_policy, 0) + 
             bn_scores.get(batch_norm, 0) + 
             init_scores.get(weight_init, 0) +
             np.random.normal(0, 50))
    
    return score

def train_with_config(X):
    """Objective function with various categorical choices."""
    results = []
    
    for params in X:
        dropout_policy = params[0]  # "none", "light", "heavy"
        batch_norm = params[1]       # "before", "after", "none"
        weight_init = params[2]      # "xavier", "kaiming", "normal"
        
        # Use these strings to configure your model
        score = train_model_with_config(
            dropout_policy=dropout_policy,
            batch_norm=batch_norm,
            weight_init=weight_init
        )
        results.append(score)
    
    return np.array(results)  # Return numpy array

optimizer = SpotOptim(
    fun=train_with_config,
    bounds=[
        ("none", "light", "heavy"),           # Dropout policy
        ("before", "after", "none"),          # Batch norm position
        ("xavier", "kaiming", "normal")       # Weight initialization
    ],
    var_type=["factor", "factor", "factor"],
    max_iter=25,
    seed=42
)

result = optimizer.optimize()
print("Best configuration:")
print(f"  Dropout: {result.x[0]}")
print(f"  Batch norm: {result.x[1]}")
print(f"  Weight init: {result.x[2]}")
print(f"  Score: {result.fun:.4f}")
Best configuration:
  Dropout: light
  Batch norm: before
  Weight init: xavier
  Score: 2863.7541

8.5.2 Viewing All Evaluated Configurations

import torch
import torch.nn as nn
from spotoptim import SpotOptim
from spotoptim.data import get_diabetes_dataloaders
from spotoptim.nn.linear_regressor import LinearRegressor
import numpy as np

def train_and_evaluate(X):
    """Train models with different activation functions."""
    results = []
    
    for params in X:
        l1 = int(params[0])         # Integer: layer size
        activation = params[1]       # String: activation function
        
        # Load data
        train_loader, test_loader, _ = get_diabetes_dataloaders()
        
        # Create model with the activation function
        model = LinearRegressor(
            input_dim=10,
            output_dim=1,
            l1=l1,
            num_hidden_layers=2,
            activation=activation  # Pass string directly!
        )
        
        # Train model
        optimizer = model.get_optimizer("Adam", lr=0.01)
        criterion = nn.MSELoss()
        
        for epoch in range(50):
            model.train()
            for batch_X, batch_y in train_loader:
                predictions = model(batch_X)
                loss = criterion(predictions, batch_y)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
        
        # Evaluate
        model.eval()
        test_loss = 0.0
        with torch.no_grad():
            for batch_X, batch_y in test_loader:
                predictions = model(batch_X)
                test_loss += criterion(predictions, batch_y).item()
        
        avg_loss = test_loss / len(test_loader)
        results.append(avg_loss)
    
    return np.array(results)

optimizer = SpotOptim(
    fun=train_and_evaluate,
    bounds=[
        (16, 128),                                   # Layer size
        ("ReLU", "Sigmoid", "Tanh", "LeakyReLU")   # Activation
    ],
    var_type=["int", "factor"],  # IMPORTANT: Specify variable types!
    max_iter=30,
    seed=42
)

result = optimizer.optimize()

# Access all evaluated configurations
print("\nAll evaluated configurations:")
print("Layer Size | Activation | Test MSE")
print("-" * 42)
for i in range(min(10, len(result.X))):  # Show first 10
    l1 = int(result.X[i, 0])
    activation = result.X[i, 1]  # String value!
    loss = result.y[i]
    print(f"{l1:10d} | {activation:10s} | {loss:.4f}")

# Find top 5 configurations
sorted_indices = result.y.argsort()[:5]
print("\nTop 5 configurations:")
for idx in sorted_indices:
    print(f"l1={int(result.X[idx, 0]):3d}, "
          f"activation={result.X[idx, 1]:10s}, "
          f"MSE={result.y[idx]:.4f}")

All evaluated configurations:
Layer Size | Activation | Test MSE
------------------------------------------
        41 | Tanh       | 26641.4141
       118 | Sigmoid    | 26307.1569
        26 | Tanh       | 26664.7982
       108 | Sigmoid    | 26354.6230
        71 | LeakyReLU  | 26520.9492
        34 | Tanh       | 26541.5020
        87 | ReLU       | 26506.7585
       101 | Tanh       | 26480.0638
        55 | Sigmoid    | 26636.6940
        74 | ReLU       | 26542.9134

Top 5 configurations:
l1=115, activation=Sigmoid   , MSE=26227.4974
l1=118, activation=Sigmoid   , MSE=26307.1569
l1=103, activation=Sigmoid   , MSE=26339.0430
l1=108, activation=Sigmoid   , MSE=26354.6230
l1= 99, activation=Sigmoid   , MSE=26367.6986

8.6 How It Works

8.6.1 Internal Mechanism

SpotOptim handles factor variables through automatic conversion:

  1. Initialization: String tuples in bounds are detected

    bounds = [("ReLU", "Sigmoid", "Tanh")]
    # Internally mapped to: {0: "ReLU", 1: "Sigmoid", 2: "Tanh"}
    # Bounds become: [(0, 2)]
  2. Sampling: Initial design samples from [0, n_levels-1] and rounds to integers

    # Samples might be: [0.3, 1.8, 2.1]
    # After rounding: [0, 2, 2]
  3. Evaluation: Before calling objective function, integers → strings

    # [0, 2, 2] → ["ReLU", "Tanh", "Tanh"]
    # Objective function receives strings
  4. Optimization: Surrogate model works with integers [0, n_levels-1]

  5. Results: Final results mapped back to strings

    result.x[0]  # Returns "ReLU", not 0
    result.X     # All rows contain strings for factor variables
    array([[41.0, 'Tanh'],
           [118.0, 'Sigmoid'],
           [26.0, 'Tanh'],
           [108.0, 'Sigmoid'],
           [71.0, 'LeakyReLU'],
           [34.0, 'Tanh'],
           [87.0, 'ReLU'],
           [101.0, 'Tanh'],
           [55.0, 'Sigmoid'],
           [74.0, 'ReLU'],
           [115.0, 'Sigmoid'],
           [114.0, 'Sigmoid'],
           [52.0, 'Sigmoid'],
           [45.0, 'Sigmoid'],
           [35.0, 'Sigmoid'],
           [26.0, 'Sigmoid'],
           [127.0, 'Tanh'],
           [128.0, 'ReLU'],
           [31.0, 'Sigmoid'],
           [55.0, 'LeakyReLU'],
           [99.0, 'Sigmoid'],
           [123.0, 'Sigmoid'],
           [24.0, 'ReLU'],
           [103.0, 'Sigmoid'],
           [95.0, 'Sigmoid'],
           [53.0, 'ReLU'],
           [110.0, 'Sigmoid'],
           [36.0, 'Sigmoid'],
           [28.0, 'ReLU'],
           [17.0, 'ReLU']], dtype=object)

8.6.2 Variable Type Auto-Detection

If you don’t specify var_type, SpotOptim automatically detects factor variables:

# Example 1: Explicit var_type (recommended)
# This shows the syntax - replace my_function with your actual function

# optimizer = SpotOptim(
#     fun=my_function,
#     bounds=[(-4, -2), ("ReLU", "Tanh")],
#     var_type=["float", "factor"]  # Explicit
# )

# Example 2: Auto-detection (works but less explicit)
# optimizer = SpotOptim(
#     fun=my_function,
#     bounds=[(-4, -2), ("ReLU", "Tanh")]
#     # var_type automatically set to ["float", "factor"]
# )

# Here's a working example:
from spotoptim import SpotOptim
import numpy as np

def demo_function(X):
    results = []
    for params in X:
        lr = 10 ** params[0]  # Continuous parameter
        activation = params[1]  # Factor parameter
        score = 3000 + lr * 100 + {"ReLU": 0, "Tanh": 50}.get(activation, 100)
        results.append(score + np.random.normal(0, 10))
    return np.array(results)

# With explicit var_type (recommended)
optimizer = SpotOptim(
    fun=demo_function,
    bounds=[(-4, -2), ("ReLU", "Tanh")],
    var_type=["float", "factor"],  # Explicit is clearer
    max_iter=10,
    seed=42
)

result = optimizer.optimize()
print(f"Best lr: {10**result.x[0]:.6f}, Best activation: {result.x[1]}")
Best lr: 0.000489, Best activation: ReLU

8.7 Complete Example: Full Workflow

"""
Complete example: Neural network hyperparameter optimization with factor variables.
"""
import numpy as np
import torch
import torch.nn as nn
from spotoptim import SpotOptim
from spotoptim.data import get_diabetes_dataloaders
from spotoptim.nn.linear_regressor import LinearRegressor


def objective_function(X):
    """Train and evaluate models with given hyperparameters."""
    results = []
    
    for params in X:
        # Extract hyperparameters
        log_lr = params[0]
        l1 = int(params[1])
        num_layers = int(params[2])
        activation = params[3]  # String!
        
        lr = 10 ** log_lr
        
        print(f"Testing: lr={lr:.6f}, l1={l1}, layers={num_layers}, "
              f"activation={activation}")
        
        # Load data
        train_loader, test_loader, _ = get_diabetes_dataloaders(
            test_size=0.2,
            batch_size=32,
            random_state=42
        )
        
        # Create and train model
        model = LinearRegressor(
            input_dim=10,
            output_dim=1,
            l1=l1,
            num_hidden_layers=num_layers,
            activation=activation
        )
        
        optimizer = model.get_optimizer("Adam", lr=lr)
        criterion = nn.MSELoss()
        
        # Training loop
        num_epochs = 30
        for epoch in range(num_epochs):
            model.train()
            for batch_X, batch_y in train_loader:
                predictions = model(batch_X)
                loss = criterion(predictions, batch_y)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
        
        # Evaluation
        model.eval()
        test_loss = 0.0
        with torch.no_grad():
            for batch_X, batch_y in test_loader:
                predictions = model(batch_X)
                loss = criterion(predictions, batch_y)
                test_loss += loss.item()
        
        avg_test_loss = test_loss / len(test_loader)
        results.append(avg_test_loss)
        print(f"  → Test MSE: {avg_test_loss:.4f}")
    
    return np.array(results)


def main():
    print("=" * 80)
    print("Neural Network Hyperparameter Optimization with Factor Variables")
    print("=" * 80)
    
    # Define optimization problem
    optimizer = SpotOptim(
        fun=objective_function,
        bounds=[
            (-4, -2),                                    # log10(learning_rate)
            (16, 128),                                   # l1 (neurons)
            (0, 4),                                      # num_hidden_layers
            ("ReLU", "Sigmoid", "Tanh", "LeakyReLU")   # activation (factor!)
        ],
        var_type=["float", "int", "int", "factor"],
        max_iter=50,
        seed=42
    )
    
    # Run optimization
    print("\nStarting optimization...")
    result = optimizer.optimize()
    
    # Display results
    print("\n" + "=" * 80)
    print("OPTIMIZATION RESULTS")
    print("=" * 80)
    print(f"Best learning rate: {10**result.x[0]:.6f}")
    print(f"Best layer size (l1): {int(result.x[1])}")
    print(f"Best num hidden layers: {int(result.x[2])}")
    print(f"Best activation function: {result.x[3]}")  # String value!
    print(f"Best test MSE: {result.fun:.4f}")
    
    # Show top 5 configurations
    print("\n" + "=" * 80)
    print("TOP 5 CONFIGURATIONS")
    print("=" * 80)
    sorted_indices = result.y.argsort()[:5]
    print(f"{'Rank':<6} {'LR':<12} {'L1':<6} {'Layers':<8} "
          f"{'Activation':<12} {'MSE':<10}")
    print("-" * 80)
    for rank, idx in enumerate(sorted_indices, 1):
        lr = 10 ** result.X[idx, 0]
        l1 = int(result.X[idx, 1])
        layers = int(result.X[idx, 2])
        activation = result.X[idx, 3]
        mse = result.y[idx]
        print(f"{rank:<6} {lr:<12.6f} {l1:<6} {layers:<8} "
              f"{activation:<12} {mse:<10.4f}")
    
    # Train final model with best configuration
    print("\n" + "=" * 80)
    print("TRAINING FINAL MODEL")
    print("=" * 80)
    
    best_lr = 10 ** result.x[0]
    best_l1 = int(result.x[1])
    best_layers = int(result.x[2])
    best_activation = result.x[3]
    
    print(f"Configuration: lr={best_lr:.6f}, l1={best_l1}, "
          f"layers={best_layers}, activation={best_activation}")
    
    train_loader, test_loader, _ = get_diabetes_dataloaders(
        test_size=0.2,
        batch_size=32,
        random_state=42
    )
    
    final_model = LinearRegressor(
        input_dim=10,
        output_dim=1,
        l1=best_l1,
        num_hidden_layers=best_layers,
        activation=best_activation
    )
    
    optimizer_final = final_model.get_optimizer("Adam", lr=best_lr)
    criterion = nn.MSELoss()
    
    # Extended training
    num_epochs = 100
    print(f"\nTraining for {num_epochs} epochs...")
    for epoch in range(num_epochs):
        final_model.train()
        train_loss = 0.0
        for batch_X, batch_y in train_loader:
            predictions = final_model(batch_X)
            loss = criterion(predictions, batch_y)
            optimizer_final.zero_grad()
            loss.backward()
            optimizer_final.step()
            train_loss += loss.item()
        
        if (epoch + 1) % 20 == 0:
            avg_train_loss = train_loss / len(train_loader)
            print(f"Epoch {epoch+1}/{num_epochs}: Train MSE = {avg_train_loss:.4f}")
    
    # Final evaluation
    final_model.eval()
    final_test_loss = 0.0
    with torch.no_grad():
        for batch_X, batch_y in test_loader:
            predictions = final_model(batch_X)
            final_test_loss += criterion(predictions, batch_y).item()
    
    final_avg_loss = final_test_loss / len(test_loader)
    print(f"\nFinal Test MSE: {final_avg_loss:.4f}")
    print("=" * 80)


if __name__ == "__main__":
    main()
================================================================================
Neural Network Hyperparameter Optimization with Factor Variables
================================================================================

Starting optimization...
Testing: lr=0.007002, l1=101, layers=2, activation=ReLU
  → Test MSE: 26595.1302
Testing: lr=0.000604, l1=50, layers=2, activation=ReLU
  → Test MSE: 26641.1608
Testing: lr=0.000149, l1=67, layers=1, activation=Tanh
  → Test MSE: 26616.5931
Testing: lr=0.000296, l1=40, layers=0, activation=Tanh
  → Test MSE: 26556.3835
Testing: lr=0.004887, l1=116, layers=2, activation=Sigmoid
  → Test MSE: 26603.4837
Testing: lr=0.001772, l1=124, layers=3, activation=Sigmoid
  → Test MSE: 26641.1367
Testing: lr=0.001107, l1=36, layers=4, activation=Sigmoid
  → Test MSE: 26626.1178
Testing: lr=0.003708, l1=20, layers=1, activation=LeakyReLU
  → Test MSE: 26663.0352
Testing: lr=0.000861, l1=90, layers=1, activation=Tanh
  → Test MSE: 26570.0729
Testing: lr=0.000237, l1=78, layers=3, activation=Tanh
  → Test MSE: 26644.3626
Testing: lr=0.005015, l1=26, layers=0, activation=LeakyReLU
  → Test MSE: 26595.7357
Testing: lr=0.000298, l1=88, layers=1, activation=ReLU
  → Test MSE: 26525.2005
Testing: lr=0.000301, l1=88, layers=1, activation=ReLU
  → Test MSE: 26616.7741
Testing: lr=0.000860, l1=58, layers=1, activation=LeakyReLU
  → Test MSE: 26680.8242
Testing: lr=0.003960, l1=66, layers=1, activation=Tanh
  → Test MSE: 26608.5612
Testing: lr=0.000884, l1=58, layers=2, activation=Tanh
  → Test MSE: 26600.1680
Testing: lr=0.003688, l1=76, layers=1, activation=Tanh
  → Test MSE: 26585.8685
Testing: lr=0.003183, l1=108, layers=3, activation=Tanh
  → Test MSE: 26602.7181
Testing: lr=0.000471, l1=69, layers=2, activation=ReLU
  → Test MSE: 26625.1087
Testing: lr=0.008072, l1=82, layers=2, activation=Sigmoid
  → Test MSE: 26504.7884
Testing: lr=0.004529, l1=86, layers=0, activation=LeakyReLU
  → Test MSE: 26550.3483
Testing: lr=0.000117, l1=70, layers=0, activation=Sigmoid
  → Test MSE: 26648.5879
Testing: lr=0.002727, l1=86, layers=2, activation=Sigmoid
  → Test MSE: 26687.7266
Testing: lr=0.000347, l1=42, layers=2, activation=ReLU
  → Test MSE: 26630.2812
Testing: lr=0.000872, l1=109, layers=3, activation=Sigmoid
  → Test MSE: 26698.2982
Testing: lr=0.000243, l1=94, layers=1, activation=Tanh
  → Test MSE: 26600.1732
Testing: lr=0.000339, l1=125, layers=1, activation=Tanh
  → Test MSE: 26642.8379
Testing: lr=0.000565, l1=34, layers=3, activation=Sigmoid
  → Test MSE: 26614.5781
Testing: lr=0.000663, l1=58, layers=3, activation=Tanh
  → Test MSE: 26635.9447
Testing: lr=0.000304, l1=57, layers=2, activation=ReLU
  → Test MSE: 26609.9961
Testing: lr=0.000165, l1=54, layers=1, activation=Sigmoid
  → Test MSE: 26580.0833
Testing: lr=0.003692, l1=110, layers=3, activation=Tanh
  → Test MSE: 26607.2572
Testing: lr=0.001682, l1=64, layers=2, activation=Sigmoid
  → Test MSE: 26665.3021
Testing: lr=0.001871, l1=115, layers=3, activation=Sigmoid
  → Test MSE: 26550.1654
Testing: lr=0.000208, l1=40, layers=2, activation=Tanh
  → Test MSE: 26659.4779
Testing: lr=0.000191, l1=100, layers=2, activation=ReLU
  → Test MSE: 26655.9225
Testing: lr=0.002624, l1=45, layers=3, activation=Tanh
  → Test MSE: 26578.6816
Testing: lr=0.008604, l1=84, layers=2, activation=Tanh
  → Test MSE: 26566.8008
Testing: lr=0.000506, l1=74, layers=4, activation=LeakyReLU
  → Test MSE: 26621.5280
Testing: lr=0.006887, l1=90, layers=1, activation=ReLU
  → Test MSE: 26648.8704
Testing: lr=0.004136, l1=58, layers=2, activation=Tanh
  → Test MSE: 26644.8346
Testing: lr=0.000282, l1=70, layers=1, activation=LeakyReLU
  → Test MSE: 26654.3620
Testing: lr=0.002332, l1=105, layers=1, activation=Tanh
  → Test MSE: 26644.0013
Testing: lr=0.000769, l1=117, layers=4, activation=Tanh
  → Test MSE: 26605.3704
Testing: lr=0.000399, l1=116, layers=2, activation=Sigmoid
  → Test MSE: 26696.8646
Testing: lr=0.007122, l1=110, layers=2, activation=ReLU
  → Test MSE: 26568.1309
Testing: lr=0.002259, l1=46, layers=1, activation=Sigmoid
  → Test MSE: 26663.3522
Testing: lr=0.002115, l1=43, layers=3, activation=Sigmoid
  → Test MSE: 26526.7422
Testing: lr=0.001023, l1=42, layers=2, activation=Tanh
  → Test MSE: 26585.4219
Testing: lr=0.007406, l1=81, layers=0, activation=LeakyReLU
  → Test MSE: 26570.9642

================================================================================
OPTIMIZATION RESULTS
================================================================================
Best learning rate: 0.008072
Best layer size (l1): 82
Best num hidden layers: 2
Best activation function: Sigmoid
Best test MSE: 26504.7884

================================================================================
TOP 5 CONFIGURATIONS
================================================================================
Rank   LR           L1     Layers   Activation   MSE       
--------------------------------------------------------------------------------
1      0.008072     82     2        Sigmoid      26504.7884
2      0.000298     88     1        ReLU         26525.2005
3      0.002115     43     3        Sigmoid      26526.7422
4      0.001871     115    3        Sigmoid      26550.1654
5      0.004529     86     0        LeakyReLU    26550.3483

================================================================================
TRAINING FINAL MODEL
================================================================================
Configuration: lr=0.008072, l1=82, layers=2, activation=Sigmoid

Training for 100 epochs...
Epoch 20/100: Train MSE = 32429.5312
Epoch 40/100: Train MSE = 32134.7218
Epoch 60/100: Train MSE = 27624.7871
Epoch 80/100: Train MSE = 28991.9951
Epoch 100/100: Train MSE = 31000.9487

Final Test MSE: 26428.3424
================================================================================

8.8 Best Practices

8.8.1 Do’s

Use descriptive string values

bounds=[("xavier_uniform", "kaiming_normal", "orthogonal")]

Explicitly specify var_type for clarity

var_type=["float", "int", "factor"]

Access results as strings

# Example: Accessing factor variable results as strings
# (This assumes you've run an optimization with activation as a factor variable)

# If you have a result from the previous examples:
# best_activation = result.x[3]  # For 4-parameter optimization
# Or for simpler cases:
# best_activation = result.x[0]  # For single-parameter optimization

# Example with inline optimization:
from spotoptim import SpotOptim
import numpy as np

def quick_test(X):
    results = []
    for params in X:
        activation = params[0]
        score = {"ReLU": 3500, "Tanh": 3600}.get(activation, 4000)
        results.append(score + np.random.normal(0, 50))
    return np.array(results)

opt = SpotOptim(
    fun=quick_test,
    bounds=[("ReLU", "Tanh")],
    var_type=["factor"],
    max_iter=10,
    seed=42
)
result = opt.optimize()

# Access as string - this is the correct way
best_activation = result.x[0]  # String value like "ReLU"
print(f"Best activation: {best_activation} (type: {type(best_activation).__name__})")

# You can use it directly in your model
# model = LinearRegressor(activation=best_activation)
Best activation: ReLU (type: str)

Mix factor variables with numeric/integer variables

bounds=[(-4, -2), (16, 128), ("ReLU", "Tanh")]
var_type=["float", "int", "factor"]

8.8.2 Don’ts

Don’t use integers in factor bounds

# Wrong: Use strings, not integers
bounds=[(0, 1, 2)]  # Wrong!
bounds=[("ReLU", "Sigmoid", "Tanh")]  # Correct!

Don’t expect integers in objective function

def objective(X):
    activation = X[0][2]
    # activation is a string, not an integer!
    # Don't do: if activation == 0:  # Wrong!
    # Do: if activation == "ReLU":   # Correct!

Don’t manually convert factor variables

# SpotOptim handles conversion automatically
# Don't do manual mapping in your objective function

Don’t use empty tuples

# Wrong: Empty tuple
bounds=[()]

# Correct: At least one string
bounds=[("ReLU",)]  # Single choice (will be treated as fixed)

8.9 Troubleshooting

8.9.1 Common Issues

Issue: Objective function receives integers instead of strings

Solution: Ensure you’re using the latest version of SpotOptim with factor variable support. Factor variables are automatically converted before calling the objective function.


Issue: ValueError: could not convert string to float

Solution: This occurs if there’s a version mismatch. Update SpotOptim to ensure the object array conversion is implemented correctly.


Issue: Results show integers instead of strings

Solution: Check that you’re accessing result.x (mapped values) instead of internal arrays. The result object automatically maps factor variables to their original strings.


Issue: Single-level factor variables cause dimension reduction

Behavior: If a factor variable has only one choice, e.g., ("ReLU",), SpotOptim treats it as a fixed dimension and may reduce the dimensionality. This is expected behavior.

Solution: Use at least two choices for optimization, or remove single-choice dimensions from bounds.

8.10 Summary

Factor variables in SpotOptim enable:

  • Categorical optimization: Optimize over discrete string choices
  • Automatic conversion: Seamless integer↔︎string mapping
  • Neural network hyperparameters: Optimize activation functions, optimizers, etc.
  • Mixed variable types: Combine with continuous and integer variables
  • Clean interface: Objective functions work with strings directly
  • String results: Final results contain original string values

Factor variables make categorical hyperparameter optimization as easy as continuous optimization!

8.11 Jupyter Notebook

Note