18  Kriging Surrogate Models in SpotOptim

18.1 Introduction

SpotOptim provides a powerful Kriging (Gaussian Process) surrogate model that can significantly enhance optimization performance. This tutorial introduces the Kriging class, explains its theory and parameters, and demonstrates how to use it effectively with SpotOptim.

18.1.1 What is Kriging?

Kriging, also known as Gaussian Process regression, is a sophisticated interpolation method that:

  • Predicts function values at untested points based on observed data
  • Provides uncertainty estimates indicating confidence in predictions
  • Models smooth functions common in engineering and scientific optimization
  • Handles noisy observations through regression approaches

The mathematical foundation of Kriging is based on the assumption that the objective function \(f(\mathbf{x})\) can be modeled as:

\[ f(\mathbf{x}) = \mu + Z(\mathbf{x}) \]

where \(\mu\) is a constant mean and \(Z(\mathbf{x})\) is a Gaussian process with correlation structure. The correlation between two points \(\mathbf{x}_i\) and \(\mathbf{x}_j\) is typically modeled using a Gaussian (RBF) correlation function:

\[ R(\mathbf{x}_i, \mathbf{x}_j) = \exp\left(-\sum_{k=1}^{d} \theta_k |x_{i,k} - x_{j,k}|^2\right) \]

where \(\theta_k > 0\) are the correlation length parameters that control smoothness in each dimension.

18.1.2 Why Use Kriging in SpotOptim?

  1. Better Surrogate Models: More accurate predictions than simple interpolation
  2. Uncertainty Quantification: Know where the model is uncertain
  3. Mixed Variable Types: Handle continuous, integer, and categorical variables
  4. Multiple Methods: Choose between interpolation, regression, and reinterpolation
  5. Customizable: Control regularization, correlation parameters, and more

18.2 Basic Usage

18.2.1 Creating a Simple Kriging Model

Let’s start with the most basic usage - creating a Kriging model with default settings:

import numpy as np
from spotoptim import SpotOptim
from spotoptim.surrogate import Kriging

# Simple objective function
def sphere(X):
    """Sphere function: f(x) = sum(x^2)"""
    return np.sum(X**2, axis=1)

# Create Kriging surrogate with defaults
kriging = Kriging(seed=42)

# Use with SpotOptim
optimizer = SpotOptim(
    fun=sphere,
    bounds=[(-5, 5), (-5, 5)],
    surrogate=kriging,  # Use Kriging instead of default GP
    max_iter=20,
    n_initial=10,
    seed=42,
    verbose=True
)

result = optimizer.optimize()
print(f"\nBest solution: {result.x}")
print(f"Best value: {result.fun:.6f}")
TensorBoard logging disabled
Initial best: f(x) = 2.420807
Iter 1 | Best: 0.001883 | Rate: 1.00 | Evals: 55.0%
Iter 2 | Best: 0.000325 | Rate: 1.00 | Evals: 60.0%
Iter 3 | Best: 0.000260 | Rate: 1.00 | Evals: 65.0%
Iter 4 | Best: 0.000260 | Curr: 0.003399 | Rate: 0.75 | Evals: 70.0%
Iter 5 | Best: 0.000260 | Curr: 0.001198 | Rate: 0.60 | Evals: 75.0%
Iter 6 | Best: 0.000260 | Curr: 0.001311 | Rate: 0.50 | Evals: 80.0%
Iter 7 | Best: 0.000260 | Curr: 0.001334 | Rate: 0.43 | Evals: 85.0%
Iter 8 | Best: 0.000260 | Curr: 0.001252 | Rate: 0.38 | Evals: 90.0%
Iter 9 | Best: 0.000002 | Rate: 0.44 | Evals: 95.0%
Iter 10 | Best: 0.000000 | Rate: 0.50 | Evals: 100.0%

Best solution: [-0.0003981   0.00049417]
Best value: 0.000000

18.2.2 Default vs Custom Surrogate

SpotOptim uses a Gaussian Process Regressor by default. Here’s how Kriging compares:

import numpy as np
from spotoptim import SpotOptim
from spotoptim.surrogate import Kriging

def rosenbrock(X):
    """Rosenbrock function: (1-x)^2 + 100(y-x^2)^2"""
    x = X[:, 0]
    y = X[:, 1]
    return (1 - x)**2 + 100 * (y - x**2)**2

# With default Gaussian Process
opt_default = SpotOptim(
    fun=rosenbrock,
    bounds=[(-2, 2), (-2, 2)],
    max_iter=30,
    n_initial=15,
    seed=42,
    verbose=False
)
result_default = opt_default.optimize()

# With Kriging surrogate
kriging = Kriging(method='regression', seed=42)
opt_kriging = SpotOptim(
    fun=rosenbrock,
    bounds=[(-2, 2), (-2, 2)],
    surrogate=kriging,
    max_iter=30,
    n_initial=15,
    seed=42,
    verbose=False
)
result_kriging = opt_kriging.optimize()

print("Comparison:")
print(f"Default GP:  f(x) = {result_default.fun:.6f}")
print(f"Kriging:     f(x) = {result_kriging.fun:.6f}")
Comparison:
Default GP:  f(x) = 0.219777
Kriging:     f(x) = 0.004172

Both approaches work well, but Kriging offers more control over the surrogate behavior.

18.3 Understanding Kriging Parameters

18.3.1 Method: Interpolation vs Regression

The method parameter controls how Kriging handles data:

  • “interpolation”: Exact interpolation, passes through all training points
  • “regression”: Allows smoothing, better for noisy data
  • “reinterpolation”: Hybrid approach
import numpy as np
from spotoptim.surrogate import Kriging
import matplotlib.pyplot as plt

# Create noisy training data
np.random.seed(42)
X_train = np.linspace(0, 2*np.pi, 10).reshape(-1, 1)
y_train = np.sin(X_train.ravel()) + 0.1 * np.random.randn(10)

# Test data for smooth predictions
X_test = np.linspace(0, 2*np.pi, 100).reshape(-1, 1)
y_true = np.sin(X_test.ravel())

# Compare methods
methods = ['interpolation', 'regression', 'reinterpolation']
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for ax, method in zip(axes, methods):
    model = Kriging(method=method, seed=42, model_fun_evals=50)
    model.fit(X_train, y_train)
    y_pred, y_std = model.predict(X_test, return_std=True)
    
    # Plot
    ax.plot(X_test, y_true, 'k--', label='True function', linewidth=2)
    ax.plot(X_test, y_pred, 'b-', label='Prediction', linewidth=2)
    ax.fill_between(X_test.ravel(), 
                     y_pred - 2*y_std, 
                     y_pred + 2*y_std, 
                     alpha=0.3, label='95% CI')
    ax.scatter(X_train, y_train, c='r', s=50, zorder=5, label='Training data')
    ax.set_title(f'Method: {method}')
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Key differences:")
print("- Interpolation: Passes exactly through training points")
print("- Regression: Smooths over noisy data (recommended)")
print("- Reinterpolation: Balances between the two")

Key differences:
- Interpolation: Passes exactly through training points
- Regression: Smooths over noisy data (recommended)
- Reinterpolation: Balances between the two

Recommendation: Use method='regression' for most optimization problems, especially with noisy objectives.

18.3.2 Noise Parameter and Regularization

The noise parameter adds a small nugget effect for numerical stability:

import numpy as np
from spotoptim import SpotOptim
from spotoptim.surrogate import Kriging

def noisy_ackley(X):
    """Ackley function with observation noise"""
    a = 20
    b = 0.2
    c = 2 * np.pi
    n = X.shape[1]
    
    sum_sq = np.sum(X**2, axis=1)
    sum_cos = np.sum(np.cos(c * X), axis=1)
    
    base = -a * np.exp(-b * np.sqrt(sum_sq / n)) - np.exp(sum_cos / n) + a + np.e
    noise = np.random.normal(0, 0.5, size=base.shape)  # Add noise
    return base + noise

# Very small noise (near interpolation)
kriging_small = Kriging(noise=1e-10, method='interpolation', seed=42)
opt_small = SpotOptim(
    fun=noisy_ackley,
    bounds=[(-5, 5), (-5, 5)],
    surrogate=kriging_small,
    max_iter=25,
    n_initial=12,
    seed=42,
    verbose=False
)
result_small = opt_small.optimize()

# Larger noise (more robust to noise)
kriging_large = Kriging(noise=0.01, method='interpolation', seed=42)
opt_large = SpotOptim(
    fun=noisy_ackley,
    bounds=[(-5, 5), (-5, 5)],
    surrogate=kriging_large,
    max_iter=25,
    n_initial=12,
    seed=42,
    verbose=False
)
result_large = opt_large.optimize()

print("Impact of noise parameter:")
print(f"Small noise (1e-10): f(x) = {result_small.fun:.6f}")
print(f"Large noise (0.01):  f(x) = {result_large.fun:.6f}")
print("\nNote: For noisy functions, use 'regression' method instead,")
print("      which automatically optimizes the Lambda (nugget) parameter.")
Impact of noise parameter:
Small noise (1e-10): f(x) = 0.361890
Large noise (0.01):  f(x) = 0.090893

Note: For noisy functions, use 'regression' method instead,
      which automatically optimizes the Lambda (nugget) parameter.

Best Practice: For noisy functions, use method='regression' which automatically optimizes the regularization parameter instead of fixing it.

18.3.3 Correlation Length Parameters (Theta)

The theta parameters control smoothness. SpotOptim optimizes these automatically:

  • min_theta: Minimum log₁₀(θ) value (default: -3.0 → θ = 0.001)
  • max_theta: Maximum log₁₀(θ) value (default: 2.0 → θ = 100)

Smaller θ → smoother function, larger θ → more wiggly function.

import numpy as np
from spotoptim.surrogate import Kriging

# Sample data
X_train = np.array([[0.0], [0.5], [1.0], [1.5], [2.0]])
y_train = np.sin(X_train.ravel())

# Tight bounds (smoother)
model_smooth = Kriging(min_theta=-1.0, max_theta=0.0, seed=42, model_fun_evals=30)
model_smooth.fit(X_train, y_train)

# Wide bounds (more flexible)
model_flexible = Kriging(min_theta=-3.0, max_theta=2.0, seed=42, model_fun_evals=30)
model_flexible.fit(X_train, y_train)

print("Theta bounds and optimal values:")
print(f"Smooth model:   bounds=[-1.0, 0.0], theta={model_smooth.theta_}")
print(f"Flexible model: bounds=[-3.0, 2.0], theta={model_flexible.theta_}")
print("\nThe optimizer automatically finds the best theta within the bounds.")
Theta bounds and optimal values:
Smooth model:   bounds=[-1.0, 0.0], theta=[-0.95216449]
Flexible model: bounds=[-3.0, 2.0], theta=[-0.95309049]

The optimizer automatically finds the best theta within the bounds.

Default Values: The defaults min_theta=-3.0 and max_theta=2.0 work well for most problems.

18.3.4 Isotropic vs Anisotropic Correlation

  • Isotropic (isotropic=True): Single θ for all dimensions (faster, fewer parameters)
  • Anisotropic (isotropic=False): Different θ per dimension (more flexible, default)
import numpy as np
from spotoptim.surrogate import Kriging

# 3D problem
np.random.seed(42)
X_train = np.random.rand(15, 3) * 10 - 5
y_train = X_train[:, 0]**2 + 0.1*X_train[:, 1]**2 + 2*X_train[:, 2]**2

# Isotropic: one theta for all dimensions
model_iso = Kriging(isotropic=True, seed=42, model_fun_evals=40)
model_iso.fit(X_train, y_train)

# Anisotropic: different theta per dimension
model_aniso = Kriging(isotropic=False, seed=42, model_fun_evals=40)
model_aniso.fit(X_train, y_train)

print("Correlation structure:")
print(f"Isotropic:   n_theta={model_iso.n_theta}, theta={model_iso.theta_}")
print(f"Anisotropic: n_theta={model_aniso.n_theta}, theta={model_aniso.theta_}")
print("\nAnisotropic can capture different smoothness in each dimension.")

# Test predictions
X_test = np.array([[0.0, 0.0, 0.0]])
y_iso = model_iso.predict(X_test)
y_aniso = model_aniso.predict(X_test)
print(f"\nPrediction at origin:")
print(f"Isotropic:   {y_iso[0]:.4f}")
print(f"Anisotropic: {y_aniso[0]:.4f}")
Correlation structure:
Isotropic:   n_theta=1, theta=[-3.]
Anisotropic: n_theta=3, theta=[-2.23292358 -3.         -1.97228776]

Anisotropic can capture different smoothness in each dimension.

Prediction at origin:
Isotropic:   -0.0666
Anisotropic: -1.0267

When to Use:

  • Use isotropic for faster fitting when dimensions have similar characteristics
  • Use anisotropic (default) when dimensions behave differently

18.4 Advanced Features

18.4.1 Mixed Variable Types

Kriging supports mixed variable types: continuous (float), integer (int), and categorical (factor):

import numpy as np
from spotoptim import SpotOptim
from spotoptim.surrogate import Kriging

def mixed_objective(X):
    """
    Optimize a system with:
    - x0: continuous learning rate (0.001 to 0.1)
    - x1: integer number of layers (1 to 5)
    - x2: categorical activation (0=ReLU, 1=Tanh, 2=Sigmoid)
    """
    X = np.atleast_2d(X)
    learning_rate = X[:, 0]
    n_layers = X[:, 1]
    activation = X[:, 2]
    
    # Simplified model: prefer lr~0.01, 3 layers, ReLU
    loss = (learning_rate - 0.01)**2 + (n_layers - 3)**2
    loss += np.where(activation == 0, 0.0,  # ReLU is best
                     np.where(activation == 1, 0.5,  # Tanh is ok
                              1.0))  # Sigmoid is worst
    return loss

# Define bounds and types
bounds = [
    (0.001, 0.1),  # learning_rate (float)
    (1, 5),        # n_layers (int)
    (0, 2)         # activation (factor: 0, 1, or 2)
]

var_type = ['float', 'int', 'factor']

# Create Kriging with mixed types
kriging_mixed = Kriging(
    method='regression',
    var_type=var_type,
    seed=42,
    model_fun_evals=50
)

# Optimize
optimizer = SpotOptim(
    fun=mixed_objective,
    bounds=bounds,
    var_type=var_type,
    surrogate=kriging_mixed,
    max_iter=30,
    n_initial=15,
    seed=42,
    verbose=True
)

result = optimizer.optimize()

print(f"\nOptimal configuration:")
print(f"Learning rate: {result.x[0]:.4f}")
print(f"Num layers:    {int(result.x[1])}")
print(f"Activation:    {int(result.x[2])} (0=ReLU, 1=Tanh, 2=Sigmoid)")
print(f"Loss:          {result.fun:.6f}")
TensorBoard logging disabled
Initial best: f(x) = 0.000173
Iter 1 | Best: 0.000081 | Rate: 1.00 | Evals: 53.3%
Optimizer candidate 1/3 was duplicate/invalid.
Iter 2 | Best: 0.000038 | Rate: 1.00 | Evals: 56.7%
Optimizer candidate 1/3 was duplicate/invalid.
Iter 3 | Best: 0.000008 | Rate: 1.00 | Evals: 60.0%
Optimizer candidate 1/3 was duplicate/invalid.
Iter 4 | Best: 0.000008 | Curr: 0.000079 | Rate: 0.75 | Evals: 63.3%
Optimizer candidate 1/3 was duplicate/invalid.
Iter 5 | Best: 0.000002 | Rate: 0.80 | Evals: 66.7%
Optimizer candidate 1/3 was duplicate/invalid.
Iter 6 | Best: 0.000001 | Rate: 0.83 | Evals: 70.0%
Iter 7 | Best: 0.000001 | Curr: 0.000001 | Rate: 0.71 | Evals: 73.3%
Iter 8 | Best: 0.000001 | Curr: 0.000004 | Rate: 0.62 | Evals: 76.7%
Iter 9 | Best: 0.000001 | Curr: 0.500006 | Rate: 0.56 | Evals: 80.0%
Iter 10 | Best: 0.000001 | Curr: 0.500001 | Rate: 0.50 | Evals: 83.3%
Iter 11 | Best: 0.000001 | Curr: 0.000001 | Rate: 0.45 | Evals: 86.7%
Iter 12 | Best: 0.000001 | Curr: 0.000001 | Rate: 0.42 | Evals: 90.0%
Iter 13 | Best: 0.000001 | Curr: 0.500001 | Rate: 0.38 | Evals: 93.3%
Iter 14 | Best: 0.000001 | Curr: 0.500001 | Rate: 0.36 | Evals: 96.7%
Iter 15 | Best: 0.000001 | Curr: 0.500001 | Rate: 0.33 | Evals: 100.0%

Optimal configuration:
Learning rate: 0.0092
Num layers:    3
Activation:    0 (0=ReLU, 1=Tanh, 2=Sigmoid)
Loss:          0.000001

Key Points:

  • Set var_type in both Kriging and SpotOptim
  • Factor variables use specialized distance metrics (default: Canberra)
  • Integer variables are treated as ordered but discrete

18.4.2 Customizing the Distance Metric for Factors

For categorical variables, you can choose different distance metrics.

import numpy as np
from spotoptim.surrogate import Kriging

# Categorical data: 2 factor variables
X_train = np.array([
    [0, 0],  # Category A, Color Red
    [1, 0],  # Category B, Color Red
    [0, 1],  # Category A, Color Blue
    [1, 1],  # Category B, Color Blue
    [2, 2],  # Category C, Color Green
])
y_train = np.array([1.0, 1.5, 1.2, 2.0, 3.5])

# Different distance metrics
metrics = ['canberra', 'hamming']

for metric in metrics:
    model = Kriging(
        method='regression',
        var_type=['factor', 'factor'],
        metric_factorial=metric,
        seed=42,
        model_fun_evals=40
    )
    model.fit(X_train, y_train)
    
    # Test prediction
    X_test = np.array([[1, 1]])
    y_pred = model.predict(X_test)
    
    print(f"Metric: {metric:10s} | Prediction at [1,1]: {y_pred[0]:.4f}")

print("\nAvailable metrics: 'canberra' (default), 'hamming', 'jaccard', etc.")
Metric: canberra   | Prediction at [1,1]: 1.9463
Metric: hamming    | Prediction at [1,1]: 2.0000

Available metrics: 'canberra' (default), 'hamming', 'jaccard', etc.

18.4.3 Handling High-Dimensional Problems

For high-dimensional problems, Kriging can become computationally expensive. Here are strategies:

import numpy as np
from spotoptim import SpotOptim
from spotoptim.surrogate import Kriging

def high_dim_sphere(X):
    """10-dimensional sphere function"""
    return np.sum(X**2, axis=1)

# Strategy 1: Use isotropic correlation (fewer parameters)
kriging_iso = Kriging(
    method='regression',
    isotropic=True,  # Single theta for all 10 dimensions
    seed=42,
    model_fun_evals=50  # Faster optimization
)

optimizer_iso = SpotOptim(
    fun=high_dim_sphere,
    bounds=[(-5, 5)] * 10,  # 10 dimensions
    surrogate=kriging_iso,
    max_iter=50,
    n_initial=30,
    seed=42,
    verbose=False
)

result_iso = optimizer_iso.optimize()

# Strategy 2: Use max_surrogate_points to limit training set size
kriging_limited = Kriging(method='regression', seed=42)

optimizer_limited = SpotOptim(
    fun=high_dim_sphere,
    bounds=[(-5, 5)] * 10,
    surrogate=kriging_limited,
    max_iter=50,
    n_initial=30,
    max_surrogate_points=100,  # Limit surrogate training set
    seed=42,
    verbose=False
)

result_limited = optimizer_limited.optimize()

print("High-dimensional optimization strategies:")
print(f"Isotropic Kriging:      f(x) = {result_iso.fun:.6f}")
print(f"Limited training set:   f(x) = {result_limited.fun:.6f}")
print("\nFor >5 dimensions, consider isotropic=True or limit training set.")
High-dimensional optimization strategies:
Isotropic Kriging:      f(x) = 0.006626
Limited training set:   f(x) = 0.038973

For >5 dimensions, consider isotropic=True or limit training set.

18.4.4 Uncertainty Quantification

Kriging provides uncertainty estimates, useful for exploration vs exploitation:

import numpy as np
from spotoptim.surrogate import Kriging
import matplotlib.pyplot as plt

# 1D example for visualization
np.random.seed(42)
X_train = np.array([[0.0], [1.0], [3.0], [5.0], [6.0]])
y_train = np.sin(X_train.ravel()) + 0.05 * np.random.randn(5)

# Fit Kriging model
model = Kriging(method='regression', seed=42, model_fun_evals=50)
model.fit(X_train, y_train)

# Dense test points
X_test = np.linspace(-0.5, 6.5, 200).reshape(-1, 1)
y_pred, y_std = model.predict(X_test, return_std=True)

# True function
y_true = np.sin(X_test.ravel())

# Plot
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(X_test, y_true, 'k--', label='True function', linewidth=2)
plt.plot(X_test, y_pred, 'b-', label='Kriging prediction', linewidth=2)
plt.fill_between(X_test.ravel(), 
                 y_pred - 2*y_std, 
                 y_pred + 2*y_std, 
                 alpha=0.3, color='blue', label='95% confidence')
plt.scatter(X_train, y_train, c='red', s=100, zorder=5, 
           edgecolors='black', label='Training points')
plt.xlabel('x', fontsize=12)
plt.ylabel('y', fontsize=12)
plt.title('Kriging Predictions with Uncertainty', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
plt.plot(X_test, y_std, 'r-', linewidth=2)
plt.scatter(X_train, np.zeros_like(X_train), c='blue', s=100, 
           zorder=5, edgecolors='black', label='Training points')
plt.xlabel('x', fontsize=12)
plt.ylabel('Standard Deviation', fontsize=12)
plt.title('Uncertainty Estimates', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Key observations:")
print("- Uncertainty is low near training points")
print("- Uncertainty is high far from training data")
print("- This guides acquisition functions (e.g., EI exploits low uncertainty)")

Key observations:
- Uncertainty is low near training points
- Uncertainty is high far from training data
- This guides acquisition functions (e.g., EI exploits low uncertainty)

18.5 Practical Examples

18.5.1 Example 1: Optimizing the Rastrigin Function

The Rastrigin function is highly multimodal - a challenging test case:

import numpy as np
from spotoptim import SpotOptim
from spotoptim.surrogate import Kriging

def rastrigin(X):
    """
    Rastrigin function: highly multimodal
    Global minimum: f(0,...,0) = 0
    """
    A = 10
    n = X.shape[1]
    return A * n + np.sum(X**2 - A * np.cos(2 * np.pi * X), axis=1)

# Configure Kriging for multimodal function
kriging = Kriging(
    method='regression',      # Smooth over local variations
    min_theta=-3.0,          # Allow flexible correlation
    max_theta=2.0,
    seed=42,
    model_fun_evals=100      # More budget for better surrogate
)

optimizer = SpotOptim(
    fun=rastrigin,
    bounds=[(-5.12, 5.12), (-5.12, 5.12)],
    surrogate=kriging,
    acquisition='ei',         # Expected Improvement for exploration
    max_iter=60,
    n_initial=30,
    seed=42,
    verbose=True
)

result = optimizer.optimize()

print(f"\nRastrigin optimization:")
print(f"Best solution: {result.x}")
print(f"Best value:    {result.fun:.6f} (global optimum: 0.0)")
print(f"Distance to optimum: {np.linalg.norm(result.x):.6f}")
TensorBoard logging disabled
Initial best: f(x) = 6.442225
Iter 1 | Best: 6.442225 | Curr: 28.951136 | Rate: 0.00 | Evals: 51.7%
Iter 2 | Best: 6.442225 | Curr: 22.873904 | Rate: 0.00 | Evals: 53.3%
Iter 3 | Best: 6.442225 | Curr: 21.817218 | Rate: 0.00 | Evals: 55.0%
Iter 4 | Best: 6.442225 | Curr: 7.079611 | Rate: 0.00 | Evals: 56.7%
Iter 5 | Best: 6.442225 | Curr: 16.543245 | Rate: 0.00 | Evals: 58.3%
Iter 6 | Best: 6.442225 | Curr: 11.563456 | Rate: 0.00 | Evals: 60.0%
Iter 7 | Best: 6.442225 | Curr: 26.971256 | Rate: 0.00 | Evals: 61.7%
Iter 8 | Best: 2.626286 | Rate: 0.12 | Evals: 63.3%
Iter 9 | Best: 2.626286 | Curr: 11.785249 | Rate: 0.11 | Evals: 65.0%
Iter 10 | Best: 2.626286 | Curr: 4.200729 | Rate: 0.10 | Evals: 66.7%
Iter 11 | Best: 1.812638 | Rate: 0.18 | Evals: 68.3%
Iter 12 | Best: 1.812638 | Curr: 19.259015 | Rate: 0.17 | Evals: 70.0%
Iter 13 | Best: 1.812638 | Curr: 11.302721 | Rate: 0.15 | Evals: 71.7%
Iter 14 | Best: 1.150421 | Rate: 0.21 | Evals: 73.3%
Iter 15 | Best: 1.150421 | Curr: 4.869078 | Rate: 0.20 | Evals: 75.0%
Iter 16 | Best: 1.150421 | Curr: 2.108171 | Rate: 0.19 | Evals: 76.7%
Iter 17 | Best: 1.150421 | Curr: 23.557487 | Rate: 0.18 | Evals: 78.3%
Iter 18 | Best: 1.150421 | Curr: 23.474958 | Rate: 0.17 | Evals: 80.0%
Iter 19 | Best: 1.150421 | Curr: 45.686095 | Rate: 0.16 | Evals: 81.7%
Iter 20 | Best: 1.150421 | Curr: 43.120920 | Rate: 0.15 | Evals: 83.3%
Iter 21 | Best: 1.150421 | Curr: 20.306436 | Rate: 0.14 | Evals: 85.0%
Iter 22 | Best: 1.150421 | Curr: 21.343287 | Rate: 0.14 | Evals: 86.7%
Iter 23 | Best: 1.150421 | Curr: 10.682663 | Rate: 0.13 | Evals: 88.3%
Iter 24 | Best: 1.150421 | Curr: 24.431941 | Rate: 0.12 | Evals: 90.0%
Iter 25 | Best: 1.150421 | Curr: 13.310950 | Rate: 0.12 | Evals: 91.7%
Iter 26 | Best: 1.150421 | Curr: 25.096914 | Rate: 0.12 | Evals: 93.3%
Iter 27 | Best: 1.107241 | Rate: 0.15 | Evals: 95.0%
Iter 28 | Best: 1.107241 | Curr: 27.919970 | Rate: 0.14 | Evals: 96.7%
Iter 29 | Best: 1.107241 | Curr: 32.112536 | Rate: 0.14 | Evals: 98.3%
Iter 30 | Best: 1.107241 | Curr: 33.179155 | Rate: 0.13 | Evals: 100.0%

Rastrigin optimization:
Best solution: [-0.99278878 -0.02371262]
Best value:    1.107241 (global optimum: 0.0)
Distance to optimum: 0.993072

18.5.2 Example 2: Robust Optimization with Noise

When optimizing noisy functions, Kriging’s regression mode helps:

import numpy as np
from spotoptim import SpotOptim
from spotoptim.surrogate import Kriging

def noisy_beale(X):
    """
    Beale function with Gaussian noise
    Global minimum: f(3, 0.5) = 0
    """
    x = X[:, 0]
    y = X[:, 1]
    
    term1 = (1.5 - x + x * y)**2
    term2 = (2.25 - x + x * y**2)**2
    term3 = (2.625 - x + x * y**3)**2
    
    base = term1 + term2 + term3
    noise = np.random.normal(0, 0.5, size=base.shape)
    
    return base + noise

# Use regression method for noisy objectives
kriging_robust = Kriging(
    method='regression',        # Automatically optimizes Lambda (nugget)
    min_Lambda=-9.0,           # Allow small regularization
    max_Lambda=-2.0,           # But not too large
    seed=42,
    model_fun_evals=80
)

# Use repeated evaluations for noise handling
optimizer = SpotOptim(
    fun=noisy_beale,
    bounds=[(-4.5, 4.5), (-4.5, 4.5)],
    surrogate=kriging_robust,
    max_iter=50,
    n_initial=20,
    repeats_initial=3,         # Evaluate each initial point 3 times
    repeats_surrogate=2,       # Evaluate each new point 2 times
    seed=42,
    verbose=True
)

result = optimizer.optimize()

print(f"\nNoisy Beale optimization:")
print(f"Best solution: {result.x}")
print(f"Best mean value: {optimizer.min_mean_y:.6f}")
print(f"Variance at best: {optimizer.min_var_y:.6f}")
print(f"True optimum: [3.0, 0.5]")
print(f"Distance: {np.linalg.norm(result.x - np.array([3.0, 0.5])):.4f}")
TensorBoard logging disabled
Initial best: f(x) = 9.823312, mean best: f(x) = 10.144184

Noisy Beale optimization:
Best solution: [ 0.34789079 -0.3014163 ]
Best mean value: 10.144184
Variance at best: 0.068775
True optimum: [3.0, 0.5]
Distance: 2.7706

18.5.3 Example 3: Real-World Machine Learning Hyperparameter Tuning

Optimize hyperparameters for a neural network:

import numpy as np
from spotoptim import SpotOptim
from spotoptim.surrogate import Kriging

def ml_objective(X):
    """
    Simulate training a neural network
    Variables:
    - x0: learning_rate (log scale: 1e-4 to 1e-1)
    - x1: num_layers (integer: 1 to 5)
    - x2: hidden_size (integer: 16, 32, 64, 128, 256)
    - x3: dropout_rate (float: 0.0 to 0.5)
    
    Returns validation error
    """
    X = np.atleast_2d(X)
    lr = X[:, 0]
    n_layers = X[:, 1]
    hidden_size = X[:, 2]
    dropout = X[:, 3]
    
    # Simulate validation error (lower is better)
    # Optimal around: lr=0.001, 3 layers, 64 hidden, 0.2 dropout
    error = (np.log10(lr) + 3)**2  # Prefer lr ~ 0.001
    error += (n_layers - 3)**2     # Prefer 3 layers
    error += ((hidden_size - 64) / 32)**2  # Prefer 64 hidden units
    error += (dropout - 0.2)**2 * 10  # Prefer 0.2 dropout
    
    # Add some noise to simulate training variance
    error += np.random.normal(0, 0.1, size=error.shape)
    
    return error

# Setup optimization
bounds = [
    (1e-4, 1e-1),  # learning_rate
    (1, 5),        # num_layers
    (16, 256),     # hidden_size
    (0.0, 0.5)     # dropout_rate
]

var_type = ['float', 'int', 'int', 'float']

# Use Kriging with mixed types
kriging_ml = Kriging(
    method='regression',
    var_type=var_type,
    seed=42,
    model_fun_evals=60
)

optimizer = SpotOptim(
    fun=ml_objective,
    bounds=bounds,
    var_type=var_type,
    var_trans=['log10', None, None, None],  # Log scale for learning rate
    surrogate=kriging_ml,
    max_iter=40,
    n_initial=20,
    seed=42,
    verbose=True
)

result = optimizer.optimize()

print(f"\nOptimal hyperparameters:")
print(f"Learning rate: {result.x[0]:.6f}")
print(f"Num layers:    {int(result.x[1])}")
print(f"Hidden size:   {int(result.x[2])}")
print(f"Dropout rate:  {result.x[3]:.3f}")
print(f"Validation error: {result.fun:.6f}")
TensorBoard logging disabled
Initial best: f(x) = 1.187624
Iter 1 | Best: 1.187624 | Curr: 5.461273 | Rate: 0.00 | Evals: 52.5%
Iter 2 | Best: 1.187624 | Curr: 6.230768 | Rate: 0.00 | Evals: 55.0%
Iter 3 | Best: 1.187624 | Curr: 2.531065 | Rate: 0.00 | Evals: 57.5%
Iter 4 | Best: 1.187624 | Curr: 1.231990 | Rate: 0.00 | Evals: 60.0%
Iter 5 | Best: 0.816258 | Rate: 0.20 | Evals: 62.5%
Iter 6 | Best: 0.630677 | Rate: 0.33 | Evals: 65.0%
Iter 7 | Best: 0.569314 | Rate: 0.43 | Evals: 67.5%
Iter 8 | Best: 0.569314 | Curr: 1.300450 | Rate: 0.38 | Evals: 70.0%
Iter 9 | Best: 0.486376 | Rate: 0.44 | Evals: 72.5%
Iter 10 | Best: 0.479102 | Rate: 0.50 | Evals: 75.0%
Iter 11 | Best: 0.443912 | Rate: 0.55 | Evals: 77.5%
Iter 12 | Best: 0.443912 | Curr: 0.667579 | Rate: 0.50 | Evals: 80.0%
Iter 13 | Best: 0.443912 | Curr: 0.522793 | Rate: 0.46 | Evals: 82.5%
Iter 14 | Best: 0.418696 | Rate: 0.50 | Evals: 85.0%
Iter 15 | Best: 0.418696 | Curr: 0.618720 | Rate: 0.47 | Evals: 87.5%
Iter 16 | Best: 0.408472 | Rate: 0.50 | Evals: 90.0%
Iter 17 | Best: 0.408472 | Curr: 0.550742 | Rate: 0.47 | Evals: 92.5%
Iter 18 | Best: 0.332956 | Rate: 0.50 | Evals: 95.0%
Iter 19 | Best: 0.332956 | Curr: 0.389700 | Rate: 0.47 | Evals: 97.5%
Iter 20 | Best: 0.332956 | Curr: 0.538207 | Rate: 0.45 | Evals: 100.0%

Optimal hyperparameters:
Learning rate: 0.000666
Num layers:    3
Hidden size:   74
Dropout rate:  0.000
Validation error: 0.332956

18.5.4 Example 4: Comparing Kriging Methods

Let’s compare all three Kriging methods on the same problem:

import numpy as np
from spotoptim import SpotOptim
from spotoptim.surrogate import Kriging

def levy(X):
    """
    Levy function N. 13
    Global minimum: f(1, 1) = 0
    """
    x = X[:, 0]
    y = X[:, 1]
    
    w1 = 1 + (x - 1) / 4
    w2 = 1 + (y - 1) / 4
    
    term1 = np.sin(np.pi * w1)**2
    term2 = (w1 - 1)**2 * (1 + 10 * np.sin(np.pi * w1 + 1)**2)
    term3 = (w2 - 1)**2 * (1 + np.sin(2 * np.pi * w2)**2)
    
    return term1 + term2 + term3

methods = ['interpolation', 'regression', 'reinterpolation']
results_methods = []

for method in methods:
    kriging = Kriging(
        method=method,
        seed=42,
        model_fun_evals=60
    )
    
    optimizer = SpotOptim(
        fun=levy,
        bounds=[(-10, 10), (-10, 10)],
        surrogate=kriging,
        max_iter=40,
        n_initial=20,
        seed=42,
        verbose=False
    )
    
    result = optimizer.optimize()
    results_methods.append((method, result.fun, result.x))
    
    print(f"{method:15s} | f(x) = {result.fun:8.6f} | x = {result.x}")

print(f"\nGlobal optimum: f(1, 1) = 0.0")
print("\nAll methods find good solutions, but 'regression' is most robust.")
interpolation   | f(x) = 0.000455 | x = [0.98159731 0.96484576]
regression      | f(x) = 0.004572 | x = [0.93674684 1.05645873]
reinterpolation | f(x) = 0.004572 | x = [0.93674684 1.05645873]

Global optimum: f(1, 1) = 0.0

All methods find good solutions, but 'regression' is most robust.

18.5.5 Example 5: Sensitivity to Theta Bounds

Theta bounds control the range of smoothness. Let’s see their impact:

import numpy as np
from spotoptim import SpotOptim
from spotoptim.surrogate import Kriging

def griewank(X):
    """Griewank function"""
    sum_sq = np.sum(X**2 / 4000, axis=1)
    prod_cos = np.prod(np.cos(X / np.sqrt(np.arange(1, X.shape[1] + 1))), axis=1)
    return sum_sq - prod_cos + 1

# Different theta bounds
theta_configs = [
    (-2.0, 1.0, "Tight (smoother)"),
    (-3.0, 2.0, "Default"),
    (-4.0, 3.0, "Wide (more flexible)")
]

for min_theta, max_theta, label in theta_configs:
    kriging = Kriging(
        method='regression',
        min_theta=min_theta,
        max_theta=max_theta,
        seed=42,
        model_fun_evals=50
    )
    
    optimizer = SpotOptim(
        fun=griewank,
        bounds=[(-600, 600), (-600, 600)],
        surrogate=kriging,
        max_iter=35,
        n_initial=18,
        seed=42,
        verbose=False
    )
    
    result = optimizer.optimize()
    print(f"{label:20s} [{min_theta:5.1f}, {max_theta:4.1f}] | f(x) = {result.fun:8.6f}")

print("\nDefault bounds work well for most problems.")
Tight (smoother)     [ -2.0,  1.0] | f(x) = 11.777737
Default              [ -3.0,  2.0] | f(x) = 11.509271
Wide (more flexible) [ -4.0,  3.0] | f(x) = 3.857089

Default bounds work well for most problems.

18.6 Comparing Surrogates

18.6.1 Kriging vs Gaussian Process vs Random Forest

Let’s compare different surrogate models:

import numpy as np
from spotoptim import SpotOptim
from spotoptim.surrogate import Kriging, SimpleKriging
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern, ConstantKernel
from sklearn.ensemble import RandomForestRegressor

def schwefel(X):
    """Schwefel function"""
    return 418.9829 * X.shape[1] - np.sum(X * np.sin(np.sqrt(np.abs(X))), axis=1)

bounds = [(-500, 500), (-500, 500)]

# 1. Kriging (full-featured)
kriging = Kriging(method='regression', seed=42, model_fun_evals=60)
opt_kriging = SpotOptim(fun=schwefel, bounds=bounds, surrogate=kriging,
                        max_iter=35, n_initial=18, seed=42, verbose=False)
result_kriging = opt_kriging.optimize()

# 2. SimpleKriging (lightweight)
simple_kriging = SimpleKriging(noise=1e-10, seed=42)
opt_simple = SpotOptim(fun=schwefel, bounds=bounds, surrogate=simple_kriging,
                       max_iter=35, n_initial=18, seed=42, verbose=False)
result_simple = opt_simple.optimize()

# 3. Gaussian Process (sklearn default)
kernel = ConstantKernel(1.0) * Matern(length_scale=1.0, nu=2.5)
gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10,
                               normalize_y=True, random_state=42)
opt_gp = SpotOptim(fun=schwefel, bounds=bounds, surrogate=gp,
                   max_iter=35, n_initial=18, seed=42, verbose=False)
result_gp = opt_gp.optimize()

# 4. Random Forest
rf = RandomForestRegressor(n_estimators=100, random_state=42)
opt_rf = SpotOptim(fun=schwefel, bounds=bounds, surrogate=rf,
                   max_iter=35, n_initial=18, seed=42, verbose=False)
result_rf = opt_rf.optimize()

print("Surrogate Model Comparison:")
print(f"Kriging (full):     f(x) = {result_kriging.fun:8.2f}")
print(f"SimpleKriging:      f(x) = {result_simple.fun:8.2f}")
print(f"Gaussian Process:   f(x) = {result_gp.fun:8.2f}")
print(f"Random Forest:      f(x) = {result_rf.fun:8.2f}")
print(f"\nGlobal optimum: f(420.9687, 420.9687) = 0.0")
print("\nKriging offers:")
print("- Mixed variable types (continuous, integer, categorical)")
print("- Multiple methods (interpolation, regression, reinterpolation)")
print("- Explicit control over regularization and correlation")
Surrogate Model Comparison:
Kriging (full):     f(x) =   128.22
SimpleKriging:      f(x) =   128.74
Gaussian Process:   f(x) =   118.44
Random Forest:      f(x) =   197.51

Global optimum: f(420.9687, 420.9687) = 0.0

Kriging offers:
- Mixed variable types (continuous, integer, categorical)
- Multiple methods (interpolation, regression, reinterpolation)
- Explicit control over regularization and correlation

18.7 Best Practices

18.7.1 1. Choosing the Right Method

# For smooth, deterministic functions
kriging = Kriging(method='interpolation', noise=1e-10, seed=42)

# For general optimization (RECOMMENDED)
kriging = Kriging(method='regression', seed=42)

# For noisy functions with repeated evaluations
kriging = Kriging(method='regression', seed=42)
# Use with repeats_initial and repeats_surrogate in SpotOptim

18.7.2 2. Setting Model Complexity

# For low-dimensional problems (<5D)
kriging = Kriging(
    method='regression',
    isotropic=False,           # Anisotropic (default)
    model_fun_evals=100,       # More budget for better fit
    seed=42
)

# For high-dimensional problems (>5D)
kriging = Kriging(
    method='regression',
    isotropic=True,            # Fewer parameters
    model_fun_evals=50,        # Faster fitting
    seed=42
)

18.7.3 3. Handling Different Variable Types

# Mixed types example
bounds = [
    (0.0, 10.0),    # continuous
    (1, 10),        # integer
    (0, 3)          # categorical (4 categories)
]

var_type = ['float', 'int', 'factor']

kriging = Kriging(
    method='regression',
    var_type=var_type,
    metric_factorial='canberra',  # For factor variables
    seed=42
)

optimizer = SpotOptim(
    fun=objective,
    bounds=bounds,
    var_type=var_type,
    surrogate=kriging,
    seed=42
)

18.7.4 4. Reproducibility

Always set the seed for reproducible results:

# Both Kriging and SpotOptim should have seeds
kriging = Kriging(method='regression', seed=42)

optimizer = SpotOptim(
    fun=objective,
    bounds=bounds,
    surrogate=kriging,
    seed=42  # Same seed or different seed
)

18.7.5 5. Monitoring Surrogate Quality

Check the negative log-likelihood after fitting:

import numpy as np
from spotoptim.surrogate import Kriging

# Sample data
X = np.random.rand(20, 2) * 10 - 5
y = np.sum(X**2, axis=1)

model = Kriging(method='regression', seed=42)
model.fit(X, y)

print(f"Negative log-likelihood: {model.negLnLike:.4f}")
print(f"Optimized theta: {model.theta_}")
print(f"Optimized Lambda: {model.Lambda_:.4f}")
print(f"Condition number of Psi: {model.cnd_Psi:.2e}")

print("\nLower negLnLike = better fit")
print("Check condition number for numerical stability")
Negative log-likelihood: -15.4723
Optimized theta: [-2.99999999 -3.        ]
Optimized Lambda: -8.9984
Condition number of Psi: 3.14e+14

Lower negLnLike = better fit
Check condition number for numerical stability

18.8 Common Issues and Solutions

18.8.1 Issue 1: Slow Fitting for Large Datasets

Problem: Kriging becomes slow with many training points.

Solution: Limit surrogate training set size:

# Use max_surrogate_points in SpotOptim
optimizer = SpotOptim(
    fun=objective,
    bounds=bounds,
    surrogate=kriging,
    max_surrogate_points=200,  # Limit to 200 points
    selection_method='distant', # or 'best'
    seed=42
)

18.8.2 Issue 2: Poor Predictions for Categorical Variables

Problem: Kriging doesn’t handle factors well.

Solution: Try different distance metrics:

# Try different metrics
for metric in ['canberra', 'hamming', 'jaccard']:
    kriging = Kriging(
        method='regression',
        var_type=['float', 'factor'],
        metric_factorial=metric,
        seed=42
    )
    # Test performance

18.8.3 Issue 3: Numerical Instability

Problem: Correlation matrix is nearly singular.

Solution: Increase regularization:

# For interpolation method
kriging = Kriging(
    method='interpolation',
    noise=1e-6,  # Increase from default 1e-8
    seed=42
)

# Or use regression method (recommended)
kriging = Kriging(
    method='regression',
    min_Lambda=-6.0,  # Don't go too small
    seed=42
)

18.8.4 Issue 4: Overfitting to Noisy Data

Problem: Kriging fits noise instead of underlying function.

Solution: Use regression method with reasonable Lambda bounds:

kriging = Kriging(
    method='regression',
    min_Lambda=-5.0,  # Not too small
    max_Lambda=-1.0,  # Not too large
    seed=42
)

18.9 Summary

18.9.1 Key Takeaways

  1. Use method='regression' for most optimization problems
  2. Set seed for reproducibility
  3. Use var_type for mixed variable types
  4. Default parameters work well for most cases
  5. Use isotropic=True for high-dimensional problems (>5D)

18.9.2 Quick Reference

from spotoptim import SpotOptim
from spotoptim.surrogate import Kriging

# Recommended configuration for general use
kriging = Kriging(
    method='regression',        # Smooths over noise
    seed=42                     # Reproducibility
)

optimizer = SpotOptim(
    fun=objective,
    bounds=bounds,
    surrogate=kriging,
    max_iter=50,
    n_initial=20,
    seed=42,
    verbose=True
)

result = optimizer.optimize()

18.9.3 Parameter Cheat Sheet

Parameter Default When to Change
method 'regression' Use 'interpolation' for exact fit
noise sqrt(eps) Rarely; use method='regression' instead
min_theta -3.0 Adjust if default range doesn’t work
max_theta 2.0 Adjust if default range doesn’t work
isotropic False Set True for >5 dimensions
var_type ['num'] Set for mixed variable types
metric_factorial 'canberra' Try 'hamming' for factors
model_fun_evals 100 Reduce for faster fitting
seed 124 Always set for reproducibility

18.10 Jupyter Notebook

Note