6  Factor Variables for Categorical Hyperparameters

SpotOptim supports factor variables for optimizing categorical hyperparameters, such as activation functions, optimizers, or any discrete string-based choices. Factor variables are automatically converted between string values (external interface) and integers (internal optimization), making categorical optimization seamless.

6.1 Overview

What are Factor Variables?

Factor variables allow you to specify categorical choices as tuples of strings in the bounds. SpotOptim handles the conversion:

  1. String tuples in bounds → Internal integer mapping (0, 1, 2, …)
  2. Optimization uses integers internally for surrogate modeling
  3. Objective function receives strings after automatic conversion
  4. Results return strings (not integers)

Module: spotoptim.SpotOptim

Key Features:

  • Define categorical choices as string tuples: ("ReLU", "Sigmoid", "Tanh")
  • Automatic integer↔︎string conversion
  • Seamless integration with neural network hyperparameters
  • Mix factor variables with numeric/integer variables

6.2 Quick Start

6.2.1 Basic Factor Variable Usage

from spotoptim import SpotOptim
import numpy as np

def objective_function(X):
    """Objective function receives string values."""
    results = []
    for params in X:
        activation = params[0]  # This is a string!
        print(f"Testing activation: {activation}")
        
        # Simple scoring based on activation choice (for demonstration)
        # In real use, you would train a model and return actual performance
        scores = {
            "ReLU": 3500.0,
            "Sigmoid": 4200.0,
            "Tanh": 3800.0,
            "LeakyReLU": 3600.0
        }
        score = scores.get(activation, 5000.0) + np.random.normal(0, 100)
        results.append(score)
    return np.array(results)  # Return numpy array

# Define bounds with factor variable
optimizer = SpotOptim(
    fun=objective_function,
    bounds=[("ReLU", "Sigmoid", "Tanh", "LeakyReLU")],
    var_type=["factor"],
    max_iter=20,
    seed=42
)

result = optimizer.optimize()
print(f"\nBest activation: {result.x[0]}")  # Returns string, e.g., "ReLU"
print(f"Best score: {result.fun:.4f}")
Testing activation: ReLU
Testing activation: Sigmoid
Testing activation: Tanh
Testing activation: LeakyReLU
Testing activation: Tanh
Testing activation: Sigmoid
Testing activation: Sigmoid
Testing activation: Sigmoid
Testing activation: Sigmoid
Testing activation: ReLU
Testing activation: Sigmoid
Testing activation: Tanh
Testing activation: LeakyReLU
Testing activation: Sigmoid
Testing activation: LeakyReLU
Testing activation: Tanh
Testing activation: Sigmoid
Testing activation: LeakyReLU
Testing activation: Sigmoid
Testing activation: Tanh

Best activation: ReLU
Best score: 3445.5154

6.2.2 Neural Network Activation Function Optimization

import torch
import torch.nn as nn
from spotoptim import SpotOptim
from spotoptim.data import get_diabetes_dataloaders
from spotoptim.nn.linear_regressor import LinearRegressor
import numpy as np

def train_and_evaluate(X):
    """Train models with different activation functions."""
    results = []
    
    for params in X:
        activation = params[0]  # String: "ReLU", "Sigmoid", etc.
        
        # Load data
        train_loader, test_loader, _ = get_diabetes_dataloaders()
        
        # Create model with the activation function
        model = LinearRegressor(
            input_dim=10,
            output_dim=1,
            l1=64,
            num_hidden_layers=2,
            activation=activation  # Pass string directly!
        )
        
        # Train model
        optimizer = model.get_optimizer("Adam", lr=0.01)
        criterion = nn.MSELoss()
        
        for epoch in range(50):
            model.train()
            for batch_X, batch_y in train_loader:
                predictions = model(batch_X)
                loss = criterion(predictions, batch_y)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
        
        # Evaluate
        model.eval()
        test_loss = 0.0
        with torch.no_grad():
            for batch_X, batch_y in test_loader:
                predictions = model(batch_X)
                test_loss += criterion(predictions, batch_y).item()
        
        avg_loss = test_loss / len(test_loader)
        results.append(avg_loss)
    
    return np.array(results)  # Return numpy array

# Optimize activation function choice
optimizer = SpotOptim(
    fun=train_and_evaluate,
    bounds=[("ReLU", "Sigmoid", "Tanh", "LeakyReLU", "ELU")],
    var_type=["factor"],
    max_iter=30
)

result = optimizer.optimize()
print(f"Best activation function: {result.x[0]}")
print(f"Best test MSE: {result.fun:.4f}")
Best activation function: Sigmoid
Best test MSE: 26456.8711

6.3 Mixed Variable Types

6.3.1 Combining Factor, Integer, and Continuous Variables

import numpy as np
import torch
import torch.nn as nn
from spotoptim import SpotOptim
from spotoptim.data import get_diabetes_dataloaders
from spotoptim.nn.linear_regressor import LinearRegressor

def comprehensive_optimization(X):
    """Optimize learning rate, layer size, depth, and activation."""
    results = []
    
    for params in X:
        log_lr = params[0]      # Continuous (log scale)
        l1 = int(params[1])     # Integer
        n_layers = int(params[2])  # Integer
        activation = params[3]   # Factor (string)
        
        lr = 10 ** log_lr  # Convert from log scale
        
        print(f"lr={lr:.6f}, l1={l1}, layers={n_layers}, activation={activation}")
        
        # Load data
        train_loader, test_loader, _ = get_diabetes_dataloaders(
            batch_size=32,
            random_state=42
        )
        
        # Create model
        model = LinearRegressor(
            input_dim=10,
            output_dim=1,
            l1=l1,
            num_hidden_layers=n_layers,
            activation=activation
        )
        
        # Train
        optimizer = model.get_optimizer("Adam", lr=lr)
        criterion = nn.MSELoss()
        
        for epoch in range(30):
            model.train()
            for batch_X, batch_y in train_loader:
                predictions = model(batch_X)
                loss = criterion(predictions, batch_y)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
        
        # Evaluate
        model.eval()
        test_loss = 0.0
        with torch.no_grad():
            for batch_X, batch_y in test_loader:
                predictions = model(batch_X)
                test_loss += criterion(predictions, batch_y).item()
        
        results.append(test_loss / len(test_loader))
    
    return np.array(results)

# Optimize all four hyperparameters simultaneously
optimizer = SpotOptim(
    fun=comprehensive_optimization,
    bounds=[
        (-4, -2),                                    # log10(learning_rate)
        (16, 128),                                   # l1 (neurons per layer)
        (0, 4),                                      # num_hidden_layers
        ("ReLU", "Sigmoid", "Tanh", "LeakyReLU")   # activation function
    ],
    var_type=["float", "int", "int", "factor"],
    max_iter=50
)

result = optimizer.optimize()

# Results contain original string values
print("\nOptimization Results:")
print(f"Best learning rate: {10**result.x[0]:.6f}")
print(f"Best layer size: {int(result.x[1])}")
print(f"Best num layers: {int(result.x[2])}")
print(f"Best activation: {result.x[3]}")  # String value!
print(f"Best test MSE: {result.fun:.4f}")
lr=0.000775, l1=42, layers=1, activation=Sigmoid
lr=0.004186, l1=123, layers=3, activation=Tanh
lr=0.000108, l1=59, layers=3, activation=Tanh
lr=0.000477, l1=28, layers=0, activation=Sigmoid
lr=0.001504, l1=111, layers=4, activation=ReLU
lr=0.002682, l1=72, layers=0, activation=ReLU
lr=0.000176, l1=64, layers=2, activation=Tanh
lr=0.002165, l1=91, layers=3, activation=LeakyReLU
lr=0.007202, l1=26, layers=2, activation=Sigmoid
lr=0.000287, l1=95, layers=1, activation=LeakyReLU
lr=0.001520, l1=68, layers=2, activation=Tanh
lr=0.001202, l1=90, layers=3, activation=Tanh
lr=0.000775, l1=42, layers=1, activation=Sigmoid
lr=0.000634, l1=57, layers=3, activation=ReLU
lr=0.000294, l1=126, layers=0, activation=Tanh
lr=0.002022, l1=104, layers=2, activation=ReLU
lr=0.001235, l1=75, layers=1, activation=LeakyReLU
lr=0.003422, l1=38, layers=1, activation=Sigmoid
lr=0.000847, l1=73, layers=0, activation=Sigmoid
lr=0.006355, l1=37, layers=2, activation=ReLU
lr=0.000202, l1=48, layers=0, activation=Tanh
lr=0.000670, l1=105, layers=2, activation=Tanh
lr=0.001992, l1=86, layers=4, activation=ReLU
lr=0.001361, l1=125, layers=2, activation=ReLU
lr=0.000359, l1=66, layers=4, activation=Sigmoid
lr=0.003103, l1=20, layers=0, activation=Sigmoid
lr=0.001268, l1=49, layers=2, activation=Sigmoid
lr=0.000659, l1=25, layers=3, activation=LeakyReLU
lr=0.002193, l1=98, layers=1, activation=Sigmoid
lr=0.008913, l1=40, layers=2, activation=Tanh
lr=0.003028, l1=28, layers=0, activation=Sigmoid
lr=0.005547, l1=82, layers=3, activation=Sigmoid
lr=0.002702, l1=99, layers=2, activation=Tanh
lr=0.000111, l1=91, layers=3, activation=ReLU
lr=0.000151, l1=33, layers=3, activation=Tanh
lr=0.002101, l1=91, layers=0, activation=Sigmoid
lr=0.000146, l1=33, layers=4, activation=Sigmoid
lr=0.008771, l1=33, layers=3, activation=LeakyReLU
lr=0.005618, l1=94, layers=2, activation=Tanh
lr=0.000111, l1=17, layers=4, activation=Sigmoid
lr=0.001076, l1=52, layers=2, activation=Sigmoid
lr=0.002047, l1=110, layers=2, activation=Tanh
lr=0.005745, l1=29, layers=3, activation=Sigmoid
lr=0.000372, l1=96, layers=4, activation=Sigmoid
lr=0.008027, l1=66, layers=2, activation=ReLU
lr=0.000162, l1=106, layers=3, activation=ReLU
lr=0.002745, l1=102, layers=4, activation=Tanh
lr=0.001636, l1=64, layers=2, activation=Tanh
lr=0.000167, l1=124, layers=1, activation=LeakyReLU
lr=0.001095, l1=83, layers=2, activation=LeakyReLU

Optimization Results:
Best learning rate: 0.000775
Best layer size: 42
Best num layers: 1
Best activation: Sigmoid
Best test MSE: 26432.8789

6.4 Multiple Factor Variables

6.4.1 Optimizing Both Activation and Optimizer

from spotoptim import SpotOptim
from spotoptim.data import get_diabetes_dataloaders
from spotoptim.nn.linear_regressor import LinearRegressor
import torch.nn as nn
import numpy as np

def optimize_activation_and_optimizer(X):
    """Optimize both activation function and optimizer choice."""
    results = []
    
    for params in X:
        activation = params[0]      # Factor variable 1
        optimizer_name = params[1]  # Factor variable 2
        lr = 10 ** params[2]        # Continuous variable
        
        train_loader, test_loader, _ = get_diabetes_dataloaders()
        
        model = LinearRegressor(
            input_dim=10,
            output_dim=1,
            l1=64,
            num_hidden_layers=2,
            activation=activation
        )
        
        # Use the optimizer string
        optimizer = model.get_optimizer(optimizer_name, lr=lr)
        criterion = nn.MSELoss()
        
        # Train
        for epoch in range(30):
            model.train()
            for batch_X, batch_y in train_loader:
                predictions = model(batch_X)
                loss = criterion(predictions, batch_y)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
        
        # Evaluate
        model.eval()
        test_loss = 0.0
        with torch.no_grad():
            for batch_X, batch_y in test_loader:
                predictions = model(batch_X)
                test_loss += criterion(predictions, batch_y).item()
        
        results.append(test_loss / len(test_loader))
    
    return np.array(results)  # Return numpy array

# Two factor variables + one continuous
opt = SpotOptim(
    fun=optimize_activation_and_optimizer,
    bounds=[
        ("ReLU", "Tanh", "Sigmoid", "LeakyReLU"),    # Activation
        ("Adam", "SGD", "RMSprop", "AdamW"),         # Optimizer
        (-4, -2)                                      # log10(lr)
    ],
    var_type=["factor", "factor", "float"],
    max_iter=40
)

result = opt.optimize()
print(f"Best activation: {result.x[0]}")
print(f"Best optimizer: {result.x[1]}")
print(f"Best learning rate: {10**result.x[2]:.6f}")
Best activation: LeakyReLU
Best optimizer: SGD
Best learning rate: 0.009024

6.5 Advanced Usage

6.5.1 Custom Categorical Choices

Factor variables work with any string values, not just activation functions:

from spotoptim import SpotOptim
import numpy as np

def train_model_with_config(dropout_policy, batch_norm, weight_init):
    """Simulate model training with different configurations."""
    # In real use, this would train an actual model
    # Here we return synthetic scores for demonstration
    base_score = 3000.0
    
    # Dropout impact
    dropout_scores = {"none": 200, "light": 0, "heavy": 100}
    # Batch norm impact
    bn_scores = {"before": -50, "after": 0, "none": 150}
    # Weight init impact
    init_scores = {"xavier": 0, "kaiming": -30, "normal": 100}
    
    score = (base_score + 
             dropout_scores.get(dropout_policy, 0) + 
             bn_scores.get(batch_norm, 0) + 
             init_scores.get(weight_init, 0) +
             np.random.normal(0, 50))
    
    return score

def train_with_config(X):
    """Objective function with various categorical choices."""
    results = []
    
    for params in X:
        dropout_policy = params[0]  # "none", "light", "heavy"
        batch_norm = params[1]       # "before", "after", "none"
        weight_init = params[2]      # "xavier", "kaiming", "normal"
        
        # Use these strings to configure your model
        score = train_model_with_config(
            dropout_policy=dropout_policy,
            batch_norm=batch_norm,
            weight_init=weight_init
        )
        results.append(score)
    
    return np.array(results)  # Return numpy array

optimizer = SpotOptim(
    fun=train_with_config,
    bounds=[
        ("none", "light", "heavy"),           # Dropout policy
        ("before", "after", "none"),          # Batch norm position
        ("xavier", "kaiming", "normal")       # Weight initialization
    ],
    var_type=["factor", "factor", "factor"],
    max_iter=25,
    seed=42
)

result = optimizer.optimize()
print("Best configuration:")
print(f"  Dropout: {result.x[0]}")
print(f"  Batch norm: {result.x[1]}")
print(f"  Weight init: {result.x[2]}")
print(f"  Score: {result.fun:.4f}")
Best configuration:
  Dropout: light
  Batch norm: before
  Weight init: kaiming
  Score: 2881.9803

6.5.2 Viewing All Evaluated Configurations

import torch
import torch.nn as nn
from spotoptim import SpotOptim
from spotoptim.data import get_diabetes_dataloaders
from spotoptim.nn.linear_regressor import LinearRegressor
import numpy as np

def train_and_evaluate(X):
    """Train models with different activation functions."""
    results = []
    
    for params in X:
        l1 = int(params[0])         # Integer: layer size
        activation = params[1]       # String: activation function
        
        # Load data
        train_loader, test_loader, _ = get_diabetes_dataloaders()
        
        # Create model with the activation function
        model = LinearRegressor(
            input_dim=10,
            output_dim=1,
            l1=l1,
            num_hidden_layers=2,
            activation=activation  # Pass string directly!
        )
        
        # Train model
        optimizer = model.get_optimizer("Adam", lr=0.01)
        criterion = nn.MSELoss()
        
        for epoch in range(50):
            model.train()
            for batch_X, batch_y in train_loader:
                predictions = model(batch_X)
                loss = criterion(predictions, batch_y)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
        
        # Evaluate
        model.eval()
        test_loss = 0.0
        with torch.no_grad():
            for batch_X, batch_y in test_loader:
                predictions = model(batch_X)
                test_loss += criterion(predictions, batch_y).item()
        
        avg_loss = test_loss / len(test_loader)
        results.append(avg_loss)
    
    return np.array(results)

optimizer = SpotOptim(
    fun=train_and_evaluate,
    bounds=[
        (16, 128),                                   # Layer size
        ("ReLU", "Sigmoid", "Tanh", "LeakyReLU")   # Activation
    ],
    var_type=["int", "factor"],  # IMPORTANT: Specify variable types!
    max_iter=30,
    seed=42
)

result = optimizer.optimize()

# Access all evaluated configurations
print("\nAll evaluated configurations:")
print("Layer Size | Activation | Test MSE")
print("-" * 42)
for i in range(min(10, len(result.X))):  # Show first 10
    l1 = int(result.X[i, 0])
    activation = result.X[i, 1]  # String value!
    loss = result.y[i]
    print(f"{l1:10d} | {activation:10s} | {loss:.4f}")

# Find top 5 configurations
sorted_indices = result.y.argsort()[:5]
print("\nTop 5 configurations:")
for idx in sorted_indices:
    print(f"l1={int(result.X[idx, 0]):3d}, "
          f"activation={result.X[idx, 1]:10s}, "
          f"MSE={result.y[idx]:.4f}")

All evaluated configurations:
Layer Size | Activation | Test MSE
------------------------------------------
        41 | Tanh       | 26585.9069
       118 | Sigmoid    | 26440.2975
        26 | Tanh       | 26613.8763
       108 | Sigmoid    | 26272.2806
        71 | LeakyReLU  | 26514.7331
        34 | Tanh       | 26540.7526
        87 | ReLU       | 26580.0137
       101 | Tanh       | 26504.7318
        55 | Sigmoid    | 26537.6081
        74 | ReLU       | 26540.1217

Top 5 configurations:
l1=108, activation=Sigmoid   , MSE=26272.2806
l1= 93, activation=Sigmoid   , MSE=26290.7604
l1=115, activation=Sigmoid   , MSE=26370.8145
l1=127, activation=Sigmoid   , MSE=26414.1354
l1= 42, activation=Sigmoid   , MSE=26424.8112

6.6 How It Works

6.6.1 Internal Mechanism

SpotOptim handles factor variables through automatic conversion:

  1. Initialization: String tuples in bounds are detected

    bounds = [("ReLU", "Sigmoid", "Tanh")]
    # Internally mapped to: {0: "ReLU", 1: "Sigmoid", 2: "Tanh"}
    # Bounds become: [(0, 2)]
  2. Sampling: Initial design samples from [0, n_levels-1] and rounds to integers

    # Samples might be: [0.3, 1.8, 2.1]
    # After rounding: [0, 2, 2]
  3. Evaluation: Before calling objective function, integers → strings

    # [0, 2, 2] → ["ReLU", "Tanh", "Tanh"]
    # Objective function receives strings
  4. Optimization: Surrogate model works with integers [0, n_levels-1]

  5. Results: Final results mapped back to strings

    result.x[0]  # Returns "ReLU", not 0
    result.X     # All rows contain strings for factor variables
    array([[41.0, 'Tanh'],
           [118.0, 'Sigmoid'],
           [26.0, 'Tanh'],
           [108.0, 'Sigmoid'],
           [71.0, 'LeakyReLU'],
           [34.0, 'Tanh'],
           [87.0, 'ReLU'],
           [101.0, 'Tanh'],
           [55.0, 'Sigmoid'],
           [74.0, 'ReLU'],
           [109.0, 'Sigmoid'],
           [20.0, 'Tanh'],
           [87.0, 'Tanh'],
           [107.0, 'LeakyReLU'],
           [75.0, 'Tanh'],
           [53.0, 'Tanh'],
           [35.0, 'Sigmoid'],
           [93.0, 'Sigmoid'],
           [38.0, 'Tanh'],
           [96.0, 'Sigmoid'],
           [112.0, 'Tanh'],
           [127.0, 'Sigmoid'],
           [54.0, 'Sigmoid'],
           [64.0, 'LeakyReLU'],
           [115.0, 'Sigmoid'],
           [75.0, 'Sigmoid'],
           [42.0, 'Sigmoid'],
           [66.0, 'Sigmoid'],
           [94.0, 'LeakyReLU'],
           [79.0, 'Tanh']], dtype=object)

6.6.2 Variable Type Auto-Detection

If you don’t specify var_type, SpotOptim automatically detects factor variables:

# Example 1: Explicit var_type (recommended)
# This shows the syntax - replace my_function with your actual function

# optimizer = SpotOptim(
#     fun=my_function,
#     bounds=[(-4, -2), ("ReLU", "Tanh")],
#     var_type=["float", "factor"]  # Explicit
# )

# Example 2: Auto-detection (works but less explicit)
# optimizer = SpotOptim(
#     fun=my_function,
#     bounds=[(-4, -2), ("ReLU", "Tanh")]
#     # var_type automatically set to ["float", "factor"]
# )

# Here's a working example:
from spotoptim import SpotOptim
import numpy as np

def demo_function(X):
    results = []
    for params in X:
        lr = 10 ** params[0]  # Continuous parameter
        activation = params[1]  # Factor parameter
        score = 3000 + lr * 100 + {"ReLU": 0, "Tanh": 50}.get(activation, 100)
        results.append(score + np.random.normal(0, 10))
    return np.array(results)

# With explicit var_type (recommended)
optimizer = SpotOptim(
    fun=demo_function,
    bounds=[(-4, -2), ("ReLU", "Tanh")],
    var_type=["float", "factor"],  # Explicit is clearer
    max_iter=10,
    seed=42
)

result = optimizer.optimize()
print(f"Best lr: {10**result.x[0]:.6f}, Best activation: {result.x[1]}")
Best lr: 0.001083, Best activation: ReLU

6.7 Complete Example: Full Workflow

"""
Complete example: Neural network hyperparameter optimization with factor variables.
"""
import numpy as np
import torch
import torch.nn as nn
from spotoptim import SpotOptim
from spotoptim.data import get_diabetes_dataloaders
from spotoptim.nn.linear_regressor import LinearRegressor


def objective_function(X):
    """Train and evaluate models with given hyperparameters."""
    results = []
    
    for params in X:
        # Extract hyperparameters
        log_lr = params[0]
        l1 = int(params[1])
        num_layers = int(params[2])
        activation = params[3]  # String!
        
        lr = 10 ** log_lr
        
        print(f"Testing: lr={lr:.6f}, l1={l1}, layers={num_layers}, "
              f"activation={activation}")
        
        # Load data
        train_loader, test_loader, _ = get_diabetes_dataloaders(
            test_size=0.2,
            batch_size=32,
            random_state=42
        )
        
        # Create and train model
        model = LinearRegressor(
            input_dim=10,
            output_dim=1,
            l1=l1,
            num_hidden_layers=num_layers,
            activation=activation
        )
        
        optimizer = model.get_optimizer("Adam", lr=lr)
        criterion = nn.MSELoss()
        
        # Training loop
        num_epochs = 30
        for epoch in range(num_epochs):
            model.train()
            for batch_X, batch_y in train_loader:
                predictions = model(batch_X)
                loss = criterion(predictions, batch_y)
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()
        
        # Evaluation
        model.eval()
        test_loss = 0.0
        with torch.no_grad():
            for batch_X, batch_y in test_loader:
                predictions = model(batch_X)
                loss = criterion(predictions, batch_y)
                test_loss += loss.item()
        
        avg_test_loss = test_loss / len(test_loader)
        results.append(avg_test_loss)
        print(f"  → Test MSE: {avg_test_loss:.4f}")
    
    return np.array(results)


def main():
    print("=" * 80)
    print("Neural Network Hyperparameter Optimization with Factor Variables")
    print("=" * 80)
    
    # Define optimization problem
    optimizer = SpotOptim(
        fun=objective_function,
        bounds=[
            (-4, -2),                                    # log10(learning_rate)
            (16, 128),                                   # l1 (neurons)
            (0, 4),                                      # num_hidden_layers
            ("ReLU", "Sigmoid", "Tanh", "LeakyReLU")   # activation (factor!)
        ],
        var_type=["float", "int", "int", "factor"],
        max_iter=50,
        seed=42
    )
    
    # Run optimization
    print("\nStarting optimization...")
    result = optimizer.optimize()
    
    # Display results
    print("\n" + "=" * 80)
    print("OPTIMIZATION RESULTS")
    print("=" * 80)
    print(f"Best learning rate: {10**result.x[0]:.6f}")
    print(f"Best layer size (l1): {int(result.x[1])}")
    print(f"Best num hidden layers: {int(result.x[2])}")
    print(f"Best activation function: {result.x[3]}")  # String value!
    print(f"Best test MSE: {result.fun:.4f}")
    
    # Show top 5 configurations
    print("\n" + "=" * 80)
    print("TOP 5 CONFIGURATIONS")
    print("=" * 80)
    sorted_indices = result.y.argsort()[:5]
    print(f"{'Rank':<6} {'LR':<12} {'L1':<6} {'Layers':<8} "
          f"{'Activation':<12} {'MSE':<10}")
    print("-" * 80)
    for rank, idx in enumerate(sorted_indices, 1):
        lr = 10 ** result.X[idx, 0]
        l1 = int(result.X[idx, 1])
        layers = int(result.X[idx, 2])
        activation = result.X[idx, 3]
        mse = result.y[idx]
        print(f"{rank:<6} {lr:<12.6f} {l1:<6} {layers:<8} "
              f"{activation:<12} {mse:<10.4f}")
    
    # Train final model with best configuration
    print("\n" + "=" * 80)
    print("TRAINING FINAL MODEL")
    print("=" * 80)
    
    best_lr = 10 ** result.x[0]
    best_l1 = int(result.x[1])
    best_layers = int(result.x[2])
    best_activation = result.x[3]
    
    print(f"Configuration: lr={best_lr:.6f}, l1={best_l1}, "
          f"layers={best_layers}, activation={best_activation}")
    
    train_loader, test_loader, _ = get_diabetes_dataloaders(
        test_size=0.2,
        batch_size=32,
        random_state=42
    )
    
    final_model = LinearRegressor(
        input_dim=10,
        output_dim=1,
        l1=best_l1,
        num_hidden_layers=best_layers,
        activation=best_activation
    )
    
    optimizer_final = final_model.get_optimizer("Adam", lr=best_lr)
    criterion = nn.MSELoss()
    
    # Extended training
    num_epochs = 100
    print(f"\nTraining for {num_epochs} epochs...")
    for epoch in range(num_epochs):
        final_model.train()
        train_loss = 0.0
        for batch_X, batch_y in train_loader:
            predictions = final_model(batch_X)
            loss = criterion(predictions, batch_y)
            optimizer_final.zero_grad()
            loss.backward()
            optimizer_final.step()
            train_loss += loss.item()
        
        if (epoch + 1) % 20 == 0:
            avg_train_loss = train_loss / len(train_loader)
            print(f"Epoch {epoch+1}/{num_epochs}: Train MSE = {avg_train_loss:.4f}")
    
    # Final evaluation
    final_model.eval()
    final_test_loss = 0.0
    with torch.no_grad():
        for batch_X, batch_y in test_loader:
            predictions = final_model(batch_X)
            final_test_loss += criterion(predictions, batch_y).item()
    
    final_avg_loss = final_test_loss / len(test_loader)
    print(f"\nFinal Test MSE: {final_avg_loss:.4f}")
    print("=" * 80)


if __name__ == "__main__":
    main()
================================================================================
Neural Network Hyperparameter Optimization with Factor Variables
================================================================================

Starting optimization...
Testing: lr=0.007002, l1=101, layers=2, activation=ReLU
  → Test MSE: 26604.4577
Testing: lr=0.000604, l1=50, layers=2, activation=ReLU
  → Test MSE: 26608.5286
Testing: lr=0.000149, l1=67, layers=1, activation=Tanh
  → Test MSE: 26589.4004
Testing: lr=0.000296, l1=40, layers=0, activation=Tanh
  → Test MSE: 26712.7474
Testing: lr=0.004887, l1=116, layers=2, activation=Sigmoid
  → Test MSE: 26595.3854
Testing: lr=0.001772, l1=124, layers=3, activation=Sigmoid
  → Test MSE: 26668.8835
Testing: lr=0.001107, l1=36, layers=4, activation=Sigmoid
  → Test MSE: 26695.1491
Testing: lr=0.003708, l1=20, layers=1, activation=LeakyReLU
  → Test MSE: 26677.1152
Testing: lr=0.000861, l1=90, layers=1, activation=Tanh
  → Test MSE: 26615.1491
Testing: lr=0.000237, l1=78, layers=3, activation=Tanh
  → Test MSE: 26604.9362
Testing: lr=0.006341, l1=113, layers=2, activation=Sigmoid
  → Test MSE: 26770.0299
Testing: lr=0.009463, l1=24, layers=0, activation=LeakyReLU
  → Test MSE: 26653.6100
Testing: lr=0.000761, l1=94, layers=4, activation=Tanh
  → Test MSE: 26614.2786
Testing: lr=0.001869, l1=90, layers=2, activation=Tanh
  → Test MSE: 26642.5664
Testing: lr=0.000604, l1=50, layers=2, activation=ReLU
  → Test MSE: 26662.8516
Testing: lr=0.003722, l1=82, layers=1, activation=Tanh
  → Test MSE: 26593.2461
Testing: lr=0.007645, l1=96, layers=3, activation=Sigmoid
  → Test MSE: 26647.5534
Testing: lr=0.000769, l1=40, layers=1, activation=Tanh
  → Test MSE: 26654.1263
Testing: lr=0.000235, l1=109, layers=4, activation=LeakyReLU
  → Test MSE: 26627.3477
Testing: lr=0.000359, l1=76, layers=3, activation=Sigmoid
  → Test MSE: 26798.4629
Testing: lr=0.004959, l1=50, layers=2, activation=Tanh
  → Test MSE: 26586.9772
Testing: lr=0.002494, l1=57, layers=3, activation=LeakyReLU
  → Test MSE: 26605.1699
Testing: lr=0.005807, l1=20, layers=0, activation=Sigmoid
  → Test MSE: 26687.4818
Testing: lr=0.002939, l1=19, layers=1, activation=Sigmoid
  → Test MSE: 26646.8073
Testing: lr=0.001263, l1=98, layers=4, activation=ReLU
  → Test MSE: 26600.8574
Testing: lr=0.001226, l1=105, layers=3, activation=Sigmoid
  → Test MSE: 26522.8008
Testing: lr=0.004431, l1=32, layers=1, activation=Sigmoid
  → Test MSE: 26511.9095
Testing: lr=0.001367, l1=58, layers=2, activation=Sigmoid
  → Test MSE: 26633.4505
Testing: lr=0.006778, l1=81, layers=4, activation=Tanh
  → Test MSE: 26587.1406
Testing: lr=0.002189, l1=112, layers=4, activation=Sigmoid
  → Test MSE: 26520.4076
Testing: lr=0.004558, l1=24, layers=2, activation=Tanh
  → Test MSE: 26623.3346
Testing: lr=0.000658, l1=125, layers=0, activation=Tanh
  → Test MSE: 26698.7467
Testing: lr=0.000272, l1=119, layers=2, activation=Tanh
  → Test MSE: 26638.9967
Testing: lr=0.000133, l1=64, layers=2, activation=Tanh
  → Test MSE: 26644.1510
Testing: lr=0.002172, l1=70, layers=2, activation=LeakyReLU
  → Test MSE: 26562.2077
Testing: lr=0.000223, l1=28, layers=3, activation=Sigmoid
  → Test MSE: 26629.1589
Testing: lr=0.006065, l1=53, layers=3, activation=Sigmoid
  → Test MSE: 26679.7598
Testing: lr=0.000352, l1=42, layers=4, activation=ReLU
  → Test MSE: 26609.8027
Testing: lr=0.003464, l1=124, layers=2, activation=Tanh
  → Test MSE: 26642.4284
Testing: lr=0.000219, l1=37, layers=3, activation=ReLU
  → Test MSE: 26614.7116
Testing: lr=0.002619, l1=70, layers=3, activation=ReLU
  → Test MSE: 26654.7917
Testing: lr=0.004686, l1=123, layers=2, activation=ReLU
  → Test MSE: 26578.9798
Testing: lr=0.000165, l1=44, layers=0, activation=ReLU
  → Test MSE: 26695.6628
Testing: lr=0.000917, l1=93, layers=1, activation=Sigmoid
  → Test MSE: 26729.6816
Testing: lr=0.001789, l1=117, layers=1, activation=Tanh
  → Test MSE: 26631.9160
Testing: lr=0.000134, l1=101, layers=4, activation=Sigmoid
  → Test MSE: 26622.4082
Testing: lr=0.004937, l1=108, layers=2, activation=ReLU
  → Test MSE: 26609.0150
Testing: lr=0.004047, l1=93, layers=1, activation=ReLU
  → Test MSE: 26549.9668
Testing: lr=0.000997, l1=112, layers=4, activation=Tanh
  → Test MSE: 26635.5697
Testing: lr=0.005449, l1=52, layers=4, activation=Sigmoid
  → Test MSE: 26562.9772

================================================================================
OPTIMIZATION RESULTS
================================================================================
Best learning rate: 0.004431
Best layer size (l1): 32
Best num hidden layers: 1
Best activation function: Sigmoid
Best test MSE: 26511.9095

================================================================================
TOP 5 CONFIGURATIONS
================================================================================
Rank   LR           L1     Layers   Activation   MSE       
--------------------------------------------------------------------------------
1      0.004431     32     1        Sigmoid      26511.9095
2      0.002189     112    4        Sigmoid      26520.4076
3      0.001226     105    3        Sigmoid      26522.8008
4      0.004047     93     1        ReLU         26549.9668
5      0.002172     70     2        LeakyReLU    26562.2077

================================================================================
TRAINING FINAL MODEL
================================================================================
Configuration: lr=0.004431, l1=32, layers=1, activation=Sigmoid

Training for 100 epochs...
Epoch 20/100: Train MSE = 29525.4520
Epoch 40/100: Train MSE = 28613.4528
Epoch 60/100: Train MSE = 30026.4541
Epoch 80/100: Train MSE = 31194.9251
Epoch 100/100: Train MSE = 28811.9515

Final Test MSE: 26571.4329
================================================================================

6.8 Best Practices

6.8.1 Do’s

Use descriptive string values

bounds=[("xavier_uniform", "kaiming_normal", "orthogonal")]

Explicitly specify var_type for clarity

var_type=["float", "int", "factor"]

Access results as strings

# Example: Accessing factor variable results as strings
# (This assumes you've run an optimization with activation as a factor variable)

# If you have a result from the previous examples:
# best_activation = result.x[3]  # For 4-parameter optimization
# Or for simpler cases:
# best_activation = result.x[0]  # For single-parameter optimization

# Example with inline optimization:
from spotoptim import SpotOptim
import numpy as np

def quick_test(X):
    results = []
    for params in X:
        activation = params[0]
        score = {"ReLU": 3500, "Tanh": 3600}.get(activation, 4000)
        results.append(score + np.random.normal(0, 50))
    return np.array(results)

opt = SpotOptim(
    fun=quick_test,
    bounds=[("ReLU", "Tanh")],
    var_type=["factor"],
    max_iter=10,
    seed=42
)
result = opt.optimize()

# Access as string - this is the correct way
best_activation = result.x[0]  # String value like "ReLU"
print(f"Best activation: {best_activation} (type: {type(best_activation).__name__})")

# You can use it directly in your model
# model = LinearRegressor(activation=best_activation)
Best activation: ReLU (type: str)

Mix factor variables with numeric/integer variables

bounds=[(-4, -2), (16, 128), ("ReLU", "Tanh")]
var_type=["float", "int", "factor"]

6.8.2 Don’ts

Don’t use integers in factor bounds

# Wrong: Use strings, not integers
bounds=[(0, 1, 2)]  # Wrong!
bounds=[("ReLU", "Sigmoid", "Tanh")]  # Correct!

Don’t expect integers in objective function

def objective(X):
    activation = X[0][2]
    # activation is a string, not an integer!
    # Don't do: if activation == 0:  # Wrong!
    # Do: if activation == "ReLU":   # Correct!

Don’t manually convert factor variables

# SpotOptim handles conversion automatically
# Don't do manual mapping in your objective function

Don’t use empty tuples

# Wrong: Empty tuple
bounds=[()]

# Correct: At least one string
bounds=[("ReLU",)]  # Single choice (will be treated as fixed)

6.9 Troubleshooting

6.9.1 Common Issues

Issue: Objective function receives integers instead of strings

Solution: Ensure you’re using the latest version of SpotOptim with factor variable support. Factor variables are automatically converted before calling the objective function.


Issue: ValueError: could not convert string to float

Solution: This occurs if there’s a version mismatch. Update SpotOptim to ensure the object array conversion is implemented correctly.


Issue: Results show integers instead of strings

Solution: Check that you’re accessing result.x (mapped values) instead of internal arrays. The result object automatically maps factor variables to their original strings.


Issue: Single-level factor variables cause dimension reduction

Behavior: If a factor variable has only one choice, e.g., ("ReLU",), SpotOptim treats it as a fixed dimension and may reduce the dimensionality. This is expected behavior.

Solution: Use at least two choices for optimization, or remove single-choice dimensions from bounds.

6.10 Summary

Factor variables in SpotOptim enable:

  • Categorical optimization: Optimize over discrete string choices
  • Automatic conversion: Seamless integer↔︎string mapping
  • Neural network hyperparameters: Optimize activation functions, optimizers, etc.
  • Mixed variable types: Combine with continuous and integer variables
  • Clean interface: Objective functions work with strings directly
  • String results: Final results contain original string values

Factor variables make categorical hyperparameter optimization as easy as continuous optimization!

6.11 Jupyter Notebook

Note