21 Hyperparameter Tuning for Physics-Informed Neural Networks

Using SpotOptim to Optimize PINN Architecture and Training

22 Overview

This tutorial demonstrates how to use SpotOptim for hyperparameter optimization of Physics-Informed Neural Networks (PINNs). We’ll optimize the network architecture and training parameters to find the best configuration for solving an ordinary differential equation.

Building on the basic PINN demo, we’ll now systematically search for optimal:

Number of neurons per hidden layer
Number of hidden layers
Activation function (categorical)
Optimizer algorithm (categorical)
Learning rate (log-scale)
Physics loss weight (log-scale)

22.1 Key Features

22.1.1 1. PyTorch Dataset and DataLoader

Following PyTorch best practices from the official tutorial, this tutorial implements:

Custom Dataset Classes: Separate classes for supervised data (PINNDataset) and collocation points (CollocationDataset)
DataLoader Integration: Efficient batch processing with configurable batch size, shuffling, and parallel loading
Proper Data Separation: Clean separation of training, validation, and collocation data
Gradient Tracking: Automatic gradient handling for collocation points needed in physics loss

Benefits:

Modularity: Clean separation between data and model code
Efficiency: Batch processing and optional parallel data loading
Scalability: Easy to extend to larger datasets
Best Practices: Follows PyTorch conventions used across the ecosystem

22.1.2 2. Automatic Transformation Handling

This tutorial also showcases SpotOptim’s var_trans feature for automatic variable transformations. Learning rates and regularization parameters are often best explored on a log scale, but manually transforming values is tedious and error-prone. With var_trans, you simply specify:

var_trans = [None, None, "log10", "log10"]

SpotOptim then:

Optimizes internally in log-transformed space (efficient exploration)
Passes original-scale values to your objective function (no manual conversion needed)
Displays all results in original scale (easy interpretation)

This eliminates the need for manual 10**x conversions throughout your code!

23 The Problem

We’re solving the same ODE as in the basic PINN demo:

\[ \frac{dy}{dt} + 0.1 y - \sin\left(\frac{\pi t}{2}\right) = 0 \]

with initial condition \(y(0) = 0\).

24 Setup

import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
from typing import Tuple
from spotoptim import SpotOptim
from spotoptim.nn.linear_regressor import LinearRegressor

# Set random seed for reproducibility
torch.manual_seed(42)
np.random.seed(42)

Set number of epochs for training, maximum number of iterations for hyperparameter optimization, and initial design size. For this example we use a small number of epochs and iterations to keep the runtime short.

N_EPOCHS = 500
MAX_ITER = 15
N_INITIAL = 10

25 Data Generation

Following PyTorch best practices, we’ll create custom Dataset classes for our PINN data.

25.1 Custom Dataset Classes

We’ll create two dataset types:

PINNDataset for supervised data (training and validation)
CollocationDataset for physics-informed collocation points

from torch.utils.data import Dataset, DataLoader

def oscillator(
    n_steps: int = 3000,
    t_min: float = 0.0,
    t_max: float = 30.0,
    y0: float = 0.0,
    alpha: float = 0.1,
    omega: float = np.pi / 2
) -> Tuple[torch.Tensor, torch.Tensor]:
    """
    Solve ODE: dy/dt + alpha*y - sin(omega*t) = 0
    using RK2 (midpoint method).
    
    Returns:
        t_tensor: Time points, shape (n_steps, 1)
        y_tensor: Solution values, shape (n_steps, 1)
    """
    t_step = (t_max - t_min) / n_steps
    t_points = np.arange(t_min, t_min + n_steps * t_step, t_step)[:n_steps]
    
    y = [y0]
    
    for t_current_step_end in t_points[1:]:
        t_midpoint = t_current_step_end - t_step / 2.0
        y_prev = y[-1]
        
        slope_at_t_mid = -alpha * y_prev + np.sin(omega * t_midpoint)
        y_intermediate = y_prev + (t_step / 2.0) * slope_at_t_mid
        
        slope_at_t_end = -alpha * y_intermediate + np.sin(omega * t_current_step_end)
        y_next = y_prev + t_step * slope_at_t_end
        y.append(y_next)
    
    t_tensor = torch.tensor(t_points, dtype=torch.float32).view(-1, 1)
    y_tensor = torch.tensor(y, dtype=torch.float32).view(-1, 1)
    
    return t_tensor, y_tensor

class PINNDataset(Dataset):
    """PyTorch Dataset for PINN supervised data (training/validation).
    
    This dataset stores time-solution pairs (t, y) for supervised learning.
    
    Args:
        t (torch.Tensor): Time points, shape (n_samples, 1)
        y (torch.Tensor): Solution values, shape (n_samples, 1)
    """
    
    def __init__(self, t: torch.Tensor, y: torch.Tensor):
        self.t = t
        self.y = y
        
    def __len__(self) -> int:
        return len(self.t)
    
    def __getitem__(self, idx: int) -> Tuple[torch.Tensor, torch.Tensor]:
        return self.t[idx], self.y[idx]

class CollocationDataset(Dataset):
    """PyTorch Dataset for PINN collocation points.
    
    This dataset stores time points where physics loss is evaluated.
    Gradients are required for computing derivatives in the PDE.
    
    Args:
        t (torch.Tensor): Collocation time points, shape (n_points, 1)
    """
    
    def __init__(self, t: torch.Tensor):
        # Store collocation points with gradient tracking
        self.t = t.requires_grad_(True)
        
    def __len__(self) -> int:
        return len(self.t)
    
    def __getitem__(self, idx: int) -> torch.Tensor:
        # Return single collocation point (still requires_grad)
        return self.t[idx].unsqueeze(0)

Generate exact solution using Runge-Kutta 2 (RK2) method:

x_exact, y_exact = oscillator()

# Create training data (sparse sampling)
t_train = x_exact[0:3000:119]
y_train = y_exact[0:3000:119]

# Create validation data (different sampling for unbiased evaluation)
t_val = x_exact[50:3000:120]
y_val = y_exact[50:3000:120]

# Create collocation points for physics loss
t_physics = torch.linspace(0, 30, 50).view(-1, 1)

# Create Dataset objects
train_dataset = PINNDataset(t_train, y_train)
val_dataset = PINNDataset(t_val, y_val)
collocation_dataset = CollocationDataset(t_physics)

print(f"Training dataset size: {len(train_dataset)}")
print(f"Validation dataset size: {len(val_dataset)}")
print(f"Collocation dataset size: {len(collocation_dataset)}")
print(f"\nSample from training dataset:")
t_sample, y_sample = train_dataset[0]
print(f"  t: {t_sample.item():.4f}, y: {y_sample.item():.4f}")

Training dataset size: 26
Validation dataset size: 25
Collocation dataset size: 50

Sample from training dataset:
  t: 0.0000, y: 0.0000

26 Define the PINN Training Function

This function creates DataLoaders and trains a PINN with given hyperparameters. Following PyTorch best practices, we use DataLoader for efficient batch processing.

def train_pinn(
    l1: int,
    num_layers: int,
    activation: str,
    optimizer_name: str,
    lr_unified: float,
    alpha: float,
    n_epochs: int = N_EPOCHS,
    batch_size: int = 16,
    verbose: bool = False
) -> float:
    """
    Train a PINN with specified hyperparameters using DataLoaders.
    
    Args:
        l1: Number of neurons per hidden layer
        num_layers: Number of hidden layers
        activation: Activation function ("Tanh", "ReLU", "Sigmoid", "GELU")
        optimizer_name: Optimizer algorithm ("Adam", "SGD", "RMSprop", "AdamW")
        lr_unified: Unified learning rate multiplier
        alpha: Weight for physics loss
        n_epochs: Number of training epochs
        batch_size: Batch size for DataLoader
        verbose: Whether to print progress
    
    Returns:
        Validation mean squared error
    """
    # Set seed for reproducibility
    torch.manual_seed(42)
    
    # Create DataLoaders
    train_loader = DataLoader(
        train_dataset,
        batch_size=batch_size,
        shuffle=True,
        num_workers=0,
        pin_memory=False
    )
    
    val_loader = DataLoader(
        val_dataset,
        batch_size=batch_size,
        shuffle=False,
        num_workers=0,
        pin_memory=False
    )
    
    # For collocation points, we can use full batch since it's small
    collocation_loader = DataLoader(
        collocation_dataset,
        batch_size=len(collocation_dataset),
        shuffle=False,
        num_workers=0
    )
    
    # Create model
    model = LinearRegressor(
        input_dim=1,
        output_dim=1,
        l1=l1,
        num_hidden_layers=num_layers,
        activation=activation,
        lr=lr_unified
    )
    
    # Get optimizer
    optimizer = model.get_optimizer(optimizer_name)
    
    # Training loop
    for epoch in range(n_epochs):
        model.train()
        epoch_loss = 0.0
        
        # Get collocation points (full batch)
        t_physics_batch = next(iter(collocation_loader))
        # Ensure gradients are enabled
        t_physics_batch = t_physics_batch.requires_grad_(True)
        
        # Iterate over training batches
        for batch_t, batch_y in train_loader:
            optimizer.zero_grad()
            
            # Data Loss
            y_pred = model(batch_t)
            loss_data = torch.mean((y_pred - batch_y)**2)
            
            # Physics Loss (computed on full collocation set)
            y_physics = model(t_physics_batch)
            dy_dt = torch.autograd.grad(
                y_physics,
                t_physics_batch,
                torch.ones_like(y_physics),
                create_graph=True,
                retain_graph=True
            )[0]
            
            # PDE residual: dy/dt + 0.1*y - sin(pi*t/2) = 0
            physics_residual = dy_dt + 0.1 * y_physics - torch.sin(np.pi * t_physics_batch / 2)
            loss_physics = torch.mean(physics_residual**2)
            
            # Total Loss
            loss = loss_data + alpha * loss_physics
            loss.backward()
            optimizer.step()
            
            epoch_loss += loss.item()
        
        if verbose and (epoch + 1) % 2000 == 0:
            avg_loss = epoch_loss / len(train_loader)
            print(f"  Epoch {epoch+1}/{n_epochs}: Avg Loss = {avg_loss:.6f}")
    
    # Evaluate on validation set
    model.eval()
    val_mse = 0.0
    with torch.no_grad():
        for batch_t, batch_y in val_loader:
            y_pred = model(batch_t)
            val_mse += torch.mean((batch_y - y_pred)**2).item()
    
    val_mse /= len(val_loader)
    return val_mse

# Test the function with default parameters
print("Testing PINN training function with DataLoaders...")
test_error = train_pinn(
    l1=32, 
    num_layers=2,
    activation="Tanh",
    optimizer_name="Adam",
    lr_unified=3.0, 
    alpha=0.06,
    batch_size=16,
    n_epochs=N_EPOCHS, 
    verbose=True
)
print(f"\nTest validation MSE: {test_error:.6f}")

Testing PINN training function with DataLoaders...

Test validation MSE: 0.188752

27 Hyperparameter Optimization with SpotOptim

Now we’ll use SpotOptim to find the best hyperparameters:

27.1 Define the Objective Function

def objective_pinn(X):
    """
    Objective function for SpotOptim.
    
    Args:
        X: Array of hyperparameter configurations, shape (n_configs, 6)
           Each row: [l1, num_layers, activation, optimizer, lr_unified, alpha]
           Note: SpotOptim handles log transformations and factor mapping automatically
    
    Returns:
        Array of validation errors
    """
    results = []
    
    for i, params in enumerate(X):
        # Extract parameters (already in original scale thanks to var_trans)
        # Factor variables (activation, optimizer) are returned as strings
        l1 = int(params[0])                    # Number of neurons
        num_layers = int(params[1])            # Number of hidden layers
        activation = params[2]                 # Activation function
        optimizer_name = params[3]             # Optimizer algorithm
        lr_unified = params[4]                 # Learning rate
        alpha = params[5]                      # Physics weight
        
        print(f"\nConfiguration {i+1}/{len(X)}:")
        print(f"  l1={l1}, num_layers={num_layers}, activation={activation}, ")
        print(f"  optimizer={optimizer_name}, lr_unified={lr_unified:.4f}, alpha={alpha:.4f}")
        
        # Train PINN with these hyperparameters
        val_error = train_pinn(
            l1=l1,
            num_layers=num_layers,
            activation=activation,
            optimizer_name=optimizer_name,
            lr_unified=lr_unified,
            alpha=alpha,
            n_epochs=N_EPOCHS,
            verbose=False
        )
        
        print(f"  Validation MSE: {val_error:.6f}")
        results.append(val_error)
    
    return np.array(results)

We test the objective function with two configurations:

X_test = np.array([
    [32, 2, "Tanh", "Adam", 3.0, 0.06],    # Baseline config
    [64, 3, "ReLU", "AdamW", 2.0, 0.04]    # Alternative config
], dtype=object)
test_results = objective_pinn(X_test)
print(f"\nTest results: {test_results}")


Configuration 1/2:
  l1=32, num_layers=2, activation=Tanh, 
  optimizer=Adam, lr_unified=3.0000, alpha=0.0600
  Validation MSE: 0.188752

Configuration 2/2:
  l1=64, num_layers=3, activation=ReLU, 
  optimizer=AdamW, lr_unified=2.0000, alpha=0.0400
  Validation MSE: 0.169017

Test results: [0.18875158 0.16901682]

27.2 Run the Optimization

Use tensorboard --logdir=runs from a shell in the current directory (where this notebook is located) to visualize the optimization process.

Setting the bounds for the search space:

# Define search space with var_trans for automatic log-scale handling
bounds = [
    (16, 128),                                      # l1: neurons per layer (16 to 128)
    (1, 4),                                         # num_layers: 1 to 4 hidden layers
    ("Tanh", "ReLU", "Sigmoid", "GELU"),         # activation: activation function
    ("Adam", "SGD", "RMSprop", "AdamW"),          # optimizer: optimizer algorithm
    (0.1, 10.0),                                    # lr_unified: learning rate (0.1 to 10)
    (0.01, 1.0)                                     # alpha: physics weight (0.01 to 1.0)
]

Specify the variable types and transformations. Use var_trans to handle log-scale transformations automatically, factor variables don’t need transformations (None):

var_type = ["int", "int", "factor", "factor", "float", "float"]
var_name = ["l1", "num_layers", "activation", "optimizer", "lr_unified", "alpha"]
var_trans = [None, None, None, None, "log10", "log10"]

# Create optimizer
optimizer = SpotOptim(
    fun=objective_pinn,
    bounds=bounds,
    var_type=var_type,
    var_name=var_name,
    var_trans=var_trans,  # Automatic log-scale handling!
    max_iter=MAX_ITER,
    n_initial=N_INITIAL,
    seed=42,
    verbose=True,
    tensorboard_clean=True,
    tensorboard_log=True
)

Factor variable at dimension 2:
  Levels: ['Tanh', 'ReLU', 'Sigmoid', 'GELU']
  Mapped to integers: 0 to 3
Factor variable at dimension 3:
  Levels: ['Adam', 'SGD', 'RMSprop', 'AdamW']
  Mapped to integers: 0 to 3
Removed old TensorBoard logs: runs/spotoptim_20251220_100904
Cleaned 1 old TensorBoard log directory
TensorBoard logging enabled: runs/spotoptim_20251220_114016

Display search space configuration. The transcolumn shows applied transformations. lr_unified and alpha use log10 transformation internally. This enables efficient exploration of log-scale parameters. All values shown are in original scale (not transformed).

design_table = optimizer.print_design_table(tablefmt="github")
print(design_table)

| name       | type   | lower   | upper   | default   | transform   |
|------------|--------|---------|---------|-----------|-------------|
| l1         | int    | 16.0    | 128.0   | 72        | -           |
| num_layers | int    | 1.0     | 4.0     | 2         | -           |
| activation | factor | Tanh    | GELU    | Sigmoid   | -           |
| optimizer  | factor | Adam    | AdamW   | RMSprop   | -           |
| lr_unified | float  | 0.1     | 10.0    | 5.05      | log10       |
| alpha      | float  | 0.01    | 1.0     | 0.505     | log10       |
| name       | type   | lower   | upper   | default   | transform   |
|------------|--------|---------|---------|-----------|-------------|
| l1         | int    | 16.0    | 128.0   | 72        | -           |
| num_layers | int    | 1.0     | 4.0     | 2         | -           |
| activation | factor | Tanh    | GELU    | Sigmoid   | -           |
| optimizer  | factor | Adam    | AdamW   | RMSprop   | -           |
| lr_unified | float  | 0.1     | 10.0    | 5.05      | log10       |
| alpha      | float  | 0.01    | 1.0     | 0.505     | log10       |

Run optimization

result = optimizer.optimize()


Configuration 1/10:
  l1=19, num_layers=3, activation=ReLU, 
  optimizer=SGD, lr_unified=9.5756, alpha=0.1603
  Validation MSE: 0.196427

Configuration 2/10:
  l1=30, num_layers=3, activation=ReLU, 
  optimizer=Adam, lr_unified=0.8430, alpha=0.0412
  Validation MSE: 0.190747

Configuration 3/10:
  l1=110, num_layers=4, activation=ReLU, 
  optimizer=AdamW, lr_unified=1.9457, alpha=0.3866
  Validation MSE: 0.172875

Configuration 4/10:
  l1=74, num_layers=1, activation=Sigmoid, 
  optimizer=RMSprop, lr_unified=0.1014, alpha=0.1050
  Validation MSE: 0.197574

Configuration 5/10:
  l1=41, num_layers=3, activation=Sigmoid, 
  optimizer=RMSprop, lr_unified=0.5877, alpha=0.7301
  Validation MSE: 0.196817

Configuration 6/10:
  l1=52, num_layers=1, activation=Sigmoid, 
  optimizer=RMSprop, lr_unified=0.2023, alpha=0.0145
  Validation MSE: 0.198142

Configuration 7/10:
  l1=71, num_layers=2, activation=Sigmoid, 
  optimizer=SGD, lr_unified=3.2551, alpha=0.4300
  Validation MSE: 0.204070

Configuration 8/10:
  l1=120, num_layers=3, activation=ReLU, 
  optimizer=AdamW, lr_unified=0.3330, alpha=0.0220
  Validation MSE: 0.185089

Configuration 9/10:
  l1=98, num_layers=2, activation=Tanh, 
  optimizer=SGD, lr_unified=4.3915, alpha=0.0293
  Validation MSE: 0.193439

Configuration 10/10:
  l1=87, num_layers=2, activation=GELU, 
  optimizer=SGD, lr_unified=1.4861, alpha=0.0949
  Validation MSE: nan
Warning: 1 initial design point(s) returned NaN/inf and will be ignored (reduced from 10 to 9 points)
Note: Initial design size (9) is smaller than requested (10) due to NaN/inf values
Initial best: f(x) = 0.172875

Configuration 1/1:
  l1=110, num_layers=4, activation=ReLU, 
  optimizer=AdamW, lr_unified=1.8630, alpha=0.3873
  Validation MSE: 0.167599
Iteration 1: New best f(x) = 0.167599

Configuration 1/1:
  l1=52, num_layers=3, activation=ReLU, 
  optimizer=RMSprop, lr_unified=0.1454, alpha=0.0135
  Validation MSE: 0.184578
Iteration 2: f(x) = 0.184578
  Attempt 2/10: Previous point was duplicate after rounding, trying fallback
Acquisition failure: Using random space-filling design as fallback.

Configuration 1/1:
  l1=115, num_layers=1, activation=Tanh, 
  optimizer=SGD, lr_unified=2.9394, alpha=0.0115
  Validation MSE: nan
Warning: Found 1 NaN/inf value(s), replacing with adaptive penalty (max + 3*std = 0.2379)
Iteration 3: f(x) = 0.287547

Configuration 1/1:
  l1=111, num_layers=3, activation=ReLU, 
  optimizer=RMSprop, lr_unified=0.6980, alpha=0.1358
  Validation MSE: 0.202440
Iteration 4: f(x) = 0.202440

Configuration 1/1:
  l1=53, num_layers=1, activation=GELU, 
  optimizer=RMSprop, lr_unified=0.4545, alpha=0.0140
  Validation MSE: 0.809881
Iteration 5: f(x) = 0.809881

Configuration 1/1:
  l1=110, num_layers=4, activation=ReLU, 
  optimizer=AdamW, lr_unified=0.9493, alpha=0.3979
  Validation MSE: 0.168582
Iteration 6: f(x) = 0.168582
TensorBoard writer closed. View logs with: tensorboard --logdir=runs/spotoptim_20251220_114016

28 Results Analysis

28.1 Best Configuration

Display best hyperparameters using print_best() method. With var_trans, results are already in original scale!

optimizer.print_best(result)


Best Solution Found:
--------------------------------------------------
  l1: 110
  num_layers: 4
  activation: ReLU
  optimizer: AdamW
  lr_unified: 1.8630
  alpha: 0.3873
  Objective Value: 0.1676
  Total Evaluations: 15

Store values for later use in visualizations. Values are already in original scale thanks to var_trans. Factor variables are returned as strings.

best_l1 = int(result.x[0])
best_num_layers = int(result.x[1])
best_activation = result.x[2]
best_optimizer = result.x[3]
best_lr_unified = result.x[4]
best_alpha = result.x[5]
best_val_error = result.fun

print(f"Best activation: {best_activation}")
print(f"Best optimizer: {best_optimizer}")

Best activation: ReLU
Best optimizer: AdamW

28.1.1 Results Table with Importance Scores

Display comprehensive results table with importance scores

table = optimizer.print_results_table(show_importance=True, tablefmt="github")
print(table)

| name       | type   | default   | lower   | upper   | tuned               | transform   |   importance | stars   |
|------------|--------|-----------|---------|---------|---------------------|-------------|--------------|---------|
| l1         | int    | 72        | 16.0    | 128.0   | 110                 | -           |        10.99 | .       |
| num_layers | int    | 2         | 1.0     | 4.0     | 4                   | -           |        28.95 | .       |
| activation | factor | Sigmoid   | Tanh    | GELU    | ReLU                | -           |        33.13 | .       |
| optimizer  | factor | RMSprop   | Adam    | AdamW   | AdamW               | -           |         2.77 |         |
| lr_unified | float  | 5.05      | 0.1     | 10.0    | 1.8630186679138154  | log10       |         7.77 |         |
| alpha      | float  | 0.505     | 0.01    | 1.0     | 0.38731466430172934 | log10       |        16.38 | .       |

Interpretation: ***: >99%, **: >75%, *: >50%, .: >10%
| name       | type   | default   | lower   | upper   | tuned               | transform   |   importance | stars   |
|------------|--------|-----------|---------|---------|---------------------|-------------|--------------|---------|
| l1         | int    | 72        | 16.0    | 128.0   | 110                 | -           |        10.99 | .       |
| num_layers | int    | 2         | 1.0     | 4.0     | 4                   | -           |        28.95 | .       |
| activation | factor | Sigmoid   | Tanh    | GELU    | ReLU                | -           |        33.13 | .       |
| optimizer  | factor | RMSprop   | Adam    | AdamW   | AdamW               | -           |         2.77 |         |
| lr_unified | float  | 5.05      | 0.1     | 10.0    | 1.8630186679138154  | log10       |         7.77 |         |
| alpha      | float  | 0.505     | 0.01    | 1.0     | 0.38731466430172934 | log10       |        16.38 | .       |

Interpretation: ***: >99%, **: >75%, *: >50%, .: >10%

28.2 Optimization History

optimizer.plot_progress(log_y=True, ylabel="Validation MSE")

28.3 Surrogate Visualization

Visualize the surrogate model’s learned response surface for the most important hyperparameter combinations:

# Plot top 3 most important hyperparameter combinations
optimizer.plot_important_hyperparameter_contour(max_imp=3)

Plotting surrogate contours for top 3 most important parameters:
  activation: importance = 33.13% (type: factor)
  num_layers: importance = 28.95% (type: int)
  alpha: importance = 16.38% (type: float)

Generating 3 surrogate plots...
  Plotting activation vs num_layers

  Plotting activation vs alpha

  Plotting num_layers vs alpha

28.4 Parameter Distribution Analysis

optimizer.plot_parameter_scatter(
    result,
    ylabel="Validation MSE",
    cmap="plasma",
    figsize=(14, 12)
)

29 Train Final Model with Best Hyperparameters

Now let’s train a final model with the optimized hyperparameters using DataLoaders:

print("Training final model with best hyperparameters using DataLoaders...")
print(f"Training for 30,000 epochs...")

# Set seed for reproducibility
torch.manual_seed(42)

# Create DataLoaders for final training
final_batch_size = 16
train_loader_final = DataLoader(
    train_dataset,
    batch_size=final_batch_size,
    shuffle=True,
    num_workers=0
)

collocation_loader_final = DataLoader(
    collocation_dataset,
    batch_size=len(collocation_dataset),
    shuffle=False,
    num_workers=0
)

# Create model with best hyperparameters
final_model = LinearRegressor(
    input_dim=1,
    output_dim=1,
    l1=best_l1,
    num_hidden_layers=best_num_layers,
    activation=best_activation,
    lr=best_lr_unified
)

optimizer_final = final_model.get_optimizer(best_optimizer)

# Training with history tracking
loss_history = []
n_epochs_final = 30000

for epoch in range(n_epochs_final):
    final_model.train()
    epoch_loss = 0.0
    
    # Get collocation points
    t_physics_batch = next(iter(collocation_loader_final))
    t_physics_batch = t_physics_batch.requires_grad_(True)
    
    # Iterate over training batches
    for batch_t, batch_y in train_loader_final:
        optimizer_final.zero_grad()
        
        # Data Loss
        y_pred = final_model(batch_t)
        loss_data = torch.mean((y_pred - batch_y)**2)
        
        # Physics Loss
        y_physics = final_model(t_physics_batch)
        dy_dt = torch.autograd.grad(
            y_physics,
            t_physics_batch,
            torch.ones_like(y_physics),
            create_graph=True,
            retain_graph=True
        )[0]
        
        physics_residual = dy_dt + 0.1 * y_physics - torch.sin(np.pi * t_physics_batch / 2)
        loss_physics = torch.mean(physics_residual**2)
        
        # Total Loss
        loss = loss_data + best_alpha * loss_physics
        loss.backward()
        optimizer_final.step()
        
        epoch_loss += loss.item()
    
    # Record average loss every 100 epochs
    if (epoch + 1) % 100 == 0:
        avg_loss = epoch_loss / len(train_loader_final)
        loss_history.append(avg_loss)
    
    if (epoch + 1) % 5000 == 0:
        avg_loss = epoch_loss / len(train_loader_final)
        print(f"  Epoch {epoch+1}/{n_epochs_final}: Avg Loss = {avg_loss:.6f}")

print("Training completed!")

# Plot training history
plt.figure(figsize=(10, 5))
plt.plot(loss_history, linewidth=1.5)
plt.xlabel('Iteration (×100)', fontsize=11)
plt.ylabel('Average Total Loss per Batch', fontsize=11)
plt.title('Final Model Training History', fontsize=12)
plt.grid(True, alpha=0.3)
plt.yscale('log')
plt.tight_layout()
plt.show()

Training final model with best hyperparameters using DataLoaders...
Training for 30,000 epochs...
  Epoch 5000/30000: Avg Loss = 0.159071
  Epoch 10000/30000: Avg Loss = 0.053157
  Epoch 15000/30000: Avg Loss = 0.031900
  Epoch 20000/30000: Avg Loss = 0.082692
  Epoch 25000/30000: Avg Loss = 0.030881
  Epoch 30000/30000: Avg Loss = 0.042042
Training completed!

30 Evaluate Final Model

# Create validation DataLoader for evaluation
val_loader_final = DataLoader(
    val_dataset,
    batch_size=len(val_dataset),
    shuffle=False,
    num_workers=0
)

# Evaluate on validation set
final_model.eval()
with torch.no_grad():
    # Validation MSE using DataLoader
    val_mse_total = 0.0
    for batch_t, batch_y in val_loader_final:
        y_pred = final_model(batch_t)
        val_mse_total += torch.mean((y_pred - batch_y)**2).item()
    final_val_mse = val_mse_total / len(val_loader_final)
    
    # Predict on full domain for visualization
    y_pred_full = final_model(x_exact)
    full_mse = torch.mean((y_pred_full - y_exact)**2).item()
    
    # Compute maximum absolute error
    max_error = torch.max(torch.abs(y_pred_full - y_exact)).item()

print("\nFinal Model Performance:")
print("-" * 50)
print(f"  Validation MSE: {final_val_mse:.6f}")
print(f"  Full domain MSE: {full_mse:.6f}")
print(f"  Maximum absolute error: {max_error:.6f}")


Final Model Performance:
--------------------------------------------------
  Validation MSE: 0.014159
  Full domain MSE: 0.009609
  Maximum absolute error: 0.213946

31 Visualize Final Solution

# Generate predictions
final_model.eval()
with torch.no_grad():
    y_pred = final_model(x_exact)

fig, axes = plt.subplots(2, 1, figsize=(12, 10))

# Plot 1: Solution comparison
ax1 = axes[0]
ax1.plot(x_exact.numpy(), y_exact.numpy(), 'b-', linewidth=2.5, 
         label='Exact solution', alpha=0.8)
ax1.plot(x_exact.numpy(), y_pred.numpy(), 'r--', linewidth=2, 
         label='PINN prediction (optimized)', alpha=0.8)

# Plot training data from dataset
ax1.scatter(train_dataset.t.numpy(), train_dataset.y.numpy(), 
            color='tab:orange', s=80, label='Training data', 
            zorder=5, edgecolors='black', linewidth=0.5)

# Plot collocation points
t_collocation = collocation_dataset.t.detach()
ax1.scatter(t_collocation.numpy(), 
            final_model(t_collocation).detach().numpy(),
            color='green', marker='x', s=50, 
            label='Collocation points', alpha=0.7, zorder=4)

ax1.set_xlabel('Time t', fontsize=12)
ax1.set_ylabel('Solution y(t)', fontsize=12)
ax1.set_title('Optimized PINN Solution vs Exact Solution', fontsize=13, fontweight='bold')
ax1.legend(fontsize=11, loc='best')
ax1.grid(True, alpha=0.3)

# Plot 2: Error
ax2 = axes[1]
error = torch.abs(y_pred - y_exact)
ax2.plot(x_exact.numpy(), error.numpy(), 'r-', linewidth=2)
ax2.axhline(y=max_error, color='gray', linestyle='--', linewidth=1, 
            label=f'Max error = {max_error:.6f}')
ax2.set_xlabel('Time t', fontsize=12)
ax2.set_ylabel('Absolute Error |y_exact - y_PINN|', fontsize=12)
ax2.set_title('Approximation Error (Optimized Model)', fontsize=13, fontweight='bold')
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

32 Comparison with Baseline

Let’s compare the optimized configuration with a baseline:

print("Training baseline model for comparison...")

# Baseline configuration (from basic PINN demo)
baseline_config = {
    'l1': 32,
    'num_layers': 3,
    'activation': 'Tanh',
    'optimizer': 'Adam',
    'lr_unified': 3.0,
    'alpha': 0.06
}

print(f"\nBaseline Configuration:")
for key, val in baseline_config.items():
    print(f"  {key}: {val}")

# Train baseline
baseline_error = train_pinn(
    l1=baseline_config['l1'],
    num_layers=baseline_config['num_layers'],
    activation=baseline_config['activation'],
    optimizer_name=baseline_config['optimizer'],
    lr_unified=baseline_config['lr_unified'],
    alpha=baseline_config['alpha'],
    n_epochs=N_EPOCHS,
    verbose=False
)

print(f"\nValidation MSE Comparison:")
print("-" * 50)
print(f"  Baseline: {baseline_error:.6f}")
print(f"  Optimized: {best_val_error:.6f}")
print(f"  Improvement: {(1 - best_val_error/baseline_error)*100:.1f}%")

# Bar plot comparison
fig, ax = plt.subplots(figsize=(8, 6))
configs = ['Baseline', 'Optimized']
errors = [baseline_error, best_val_error]
colors = ['tab:blue', 'tab:green']

bars = ax.bar(configs, errors, color=colors, alpha=0.7, edgecolor='black', linewidth=1.5)

# Add value labels on bars
for bar, error in zip(bars, errors):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{error:.6f}',
            ha='center', va='bottom', fontsize=11, fontweight='bold')

ax.set_ylabel('Validation MSE', fontsize=12)
ax.set_title('Model Comparison: Baseline vs Optimized', fontsize=13, fontweight='bold')
ax.grid(True, alpha=0.3, axis='y')
plt.tight_layout()
plt.show()

Training baseline model for comparison...

Baseline Configuration:
  l1: 32
  num_layers: 3
  activation: Tanh
  optimizer: Adam
  lr_unified: 3.0
  alpha: 0.06

Validation MSE Comparison:
--------------------------------------------------
  Baseline: 0.185787
  Optimized: 0.167599
  Improvement: 9.8%

33 Hyperparameter Sensitivity Analysis

Let’s analyze how sensitive the model is to each hyperparameter using the enhanced plot_parameter_scatter() method with Spearman correlation:

# Use the enhanced plot_parameter_scatter method with correlation display
optimizer.plot_parameter_scatter(
    result,
    ylabel="Validation MSE",
    cmap="plasma",
    figsize=(15, 10),
    show_correlation=True,
    log_y=True
)

33.1 Sensitivity Analysis (Spearman Correlation)

# Use the new sensitivity_spearman() method for tabular output
optimizer.sensitivity_spearman()


Sensitivity Analysis (Spearman Correlation):
--------------------------------------------------
  l1                  : -0.102 (p=0.717)
  num_layers          : -0.835 (p=0.000) ***
  activation          : (categorical variable, use visual inspection)
  optimizer           : (categorical variable, use visual inspection)
  lr_unified          : -0.043 (p=0.879)
  alpha               : -0.300 (p=0.277)

34 Summary

34.1 Key Findings

# Get optimization history for statistics
history = optimizer.y_

print("\n" + "="*70)
print("HYPERPARAMETER OPTIMIZATION SUMMARY")
print("="*70)

print("\n1. BEST CONFIGURATION FOUND:")
print(f"   - Neurons per layer (l1): {best_l1}")
print(f"   - Number of hidden layers: {best_num_layers}")
print(f"   - Activation function: {best_activation}")
print(f"   - Optimizer: {best_optimizer}")
print(f"   - Learning rate: {best_lr_unified:.4f}")
print(f"   - Physics weight (alpha): {best_alpha:.4f}")

print("\n2. PERFORMANCE:")
print(f"   - Validation MSE: {best_val_error:.6f}")
print(f"   - Full domain MSE: {full_mse:.6f}")
print(f"   - Maximum absolute error: {max_error:.6f}")

print("\n3. OPTIMIZATION STATISTICS:")
print(f"   - Total evaluations: {result.nfev}")
print(f"   - Initial best: {history[0]:.6f}")
print(f"   - Final best: {best_val_error:.6f}")
print(f"   - Improvement: {(1 - best_val_error/history[0])*100:.1f}%")

print("\n4. COMPARISON TO BASELINE:")
print(f"   - Baseline MSE: {baseline_error:.6f}")
print(f"   - Optimized MSE: {best_val_error:.6f}")
print(f"   - Improvement: {(1 - best_val_error/baseline_error)*100:.1f}%")

print("\n" + "="*70)


======================================================================
HYPERPARAMETER OPTIMIZATION SUMMARY
======================================================================

1. BEST CONFIGURATION FOUND:
   - Neurons per layer (l1): 110
   - Number of hidden layers: 4
   - Activation function: ReLU
   - Optimizer: AdamW
   - Learning rate: 1.8630
   - Physics weight (alpha): 0.3873

2. PERFORMANCE:
   - Validation MSE: 0.167599
   - Full domain MSE: 0.009609
   - Maximum absolute error: 0.213946

3. OPTIMIZATION STATISTICS:
   - Total evaluations: 15
   - Initial best: 0.196427
   - Final best: 0.167599
   - Improvement: 14.7%

4. COMPARISON TO BASELINE:
   - Baseline MSE: 0.185787
   - Optimized MSE: 0.167599
   - Improvement: 9.8%

======================================================================

34.2 Recommendations

Based on the hyperparameter optimization results:

Network Architecture:
- The optimal architecture was found with {best_l1} neurons and {best_num_layers} hidden layers
- Best activation function: {best_activation}
- This balances model capacity with training efficiency
Optimizer Selection:
- Best optimizer: {best_optimizer}
- Different optimizers have different convergence characteristics for PINNs
Learning Rate:
- Optimal unified learning rate: {best_lr_unified:.4f}
- This translates to an actual Adam learning rate of {best_lr_unified * 0.001:.6f}
Physics Loss Weight:
- Optimal alpha: {best_alpha:.4f}
- This balances data fitting with physics constraint satisfaction
Training Strategy:
- Start with a broad search space to explore different architectures
- Use var_trans with “log10” for learning rate and physics weight parameters
- This enables efficient exploration of log-scale parameters without manual transformations
- Validate on held-out data to prevent overfitting to training points
Benefits of var_trans and Factor Variables:
- Factor variables: Categorical choices (activation, optimizer) handled automatically
- SpotOptim maps strings to integers internally and back to strings in results
- Cleaner code: No manual 10**x conversions in objective function
- Fewer errors: Eliminates confusion about which scale values are in
- Better optimization: Searches efficiently in transformed space
- Easier interpretation: All results displayed in original scale

34.3 Using These Results

To use the optimized configuration in your own PINN problems:

# Create optimized PINN
model = LinearRegressor(
    input_dim=1,
    output_dim=1,
    l1={best_l1},
    num_hidden_layers={best_num_layers},
    activation="{best_activation}",
    lr={best_lr_unified:.4f}
)

optimizer = model.get_optimizer("{best_optimizer}")

# Use alpha={best_alpha:.4f} for physics loss weight
loss = data_loss + {best_alpha:.4f} * physics_loss

34.4 Using var_trans for Your Hyperparameter Optimization

When setting up optimization for your own PINN problems:

from spotoptim import SpotOptim

# Define search space with factor variables and log-scale parameters
bounds = [
    (16, 128),                                    # neurons (integer)
    (1, 4),                                       # layers (integer)
    ("Tanh", "ReLU", "Sigmoid", "GELU"),       # activation (factor)
    ("Adam", "SGD", "RMSprop", "AdamW"),        # optimizer (factor)
    (0.1, 10.0),                                  # learning rate (log-scale)
    (0.01, 1.0)                                   # physics weight (log-scale)
]

var_type = ["int", "int", "factor", "factor", "float", "float"]
var_trans = [None, None, None, None, "log10", "log10"]

opt = SpotOptim(
    fun=your_objective_function,
    bounds=bounds,
    var_type=var_type,
    var_trans=var_trans,  # Automatic log-scale and factor handling!
    max_iter=MAX_ITER,
    n_initial=N_INITIAL
)

result = opt.optimize()

Your objective function receives parameters in original scale - no manual transformations needed!

34.5 Future Directions

Consider exploring:

Adaptive physics weights that change during training
Architecture search including skip connections or residual blocks
Batch size optimization as an additional hyperparameter
Multi-objective optimization balancing accuracy and computational cost
Transfer learning from pre-optimized configurations
Learning rate schedules with different decay strategies

Note: The specific optimal values depend on the problem, data distribution, and computational budget. Always validate results on held-out test data.

34.6 Jupyter Notebook

Note

The Jupyter-Notebook of this chapter is available on GitHub in the Sequential Parameter Optimization Cookbook Repository