15  Reproducibility in SpotOptim

15.1 Introduction

SpotOptim provides full support for reproducible optimization runs through the seed parameter. This is essential for:

  • Scientific research: Ensuring experiments can be replicated
  • Debugging: Reproducing specific optimization behaviors
  • Benchmarking: Fair comparison between different configurations
  • Production: Consistent results in deployed applications

When you specify a seed, SpotOptim guarantees that running the same optimization multiple times will produce identical results. Without a seed, each run explores the search space differently, which can be useful for robustness testing.

15.2 Basic Usage

15.2.1 Making Optimization Reproducible

To ensure reproducible results, simply specify the seed parameter when creating the optimizer:

import numpy as np
from spotoptim import SpotOptim

def sphere(X):
    """Simple sphere function: f(x) = sum(x^2)"""
    return np.sum(X**2, axis=1)

# Reproducible optimization
optimizer = SpotOptim(
    fun=sphere,
    bounds=[(-5, 5), (-5, 5)],
    max_iter=30,
    n_initial=15,
    seed=42,  # This ensures reproducibility
    verbose=True
)

result = optimizer.optimize()
print(f"Best solution: {result.x}")
print(f"Best value: {result.fun}")
TensorBoard logging disabled
Initial best: f(x) = 5.542803
Iter 1 | Best: 0.000459 | Rate: 1.00 | Evals: 53.3%
Iter 2 | Best: 0.000014 | Rate: 1.00 | Evals: 56.7%
Iter 3 | Best: 0.000010 | Rate: 1.00 | Evals: 60.0%
Iter 5 | Best: 0.000010 | Rate: 0.80 | Evals: 66.7%
Iter 6 | Best: 0.000010 | Rate: 0.83 | Evals: 70.0%
Iter 11 | Best: 0.000009 | Rate: 0.55 | Evals: 86.7%
Iter 14 | Best: 0.000008 | Rate: 0.50 | Evals: 96.7%
Best solution: [4.11141873e-05 2.79358818e-03]
Best value: 7.805825304638719e-06

Key Point: Running this code multiple times (even on different days or machines) will always produce the same result.

15.2.2 Running Independent Experiments

If you don’t specify a seed, each optimization run will explore the search space differently:

# Non-reproducible: different results each time
optimizer = SpotOptim(
    fun=sphere,
    bounds=[(-5, 5), (-5, 5)],
    max_iter=30,
    n_initial=15
    # No seed specified
)

result = optimizer.optimize()
# Results will vary between runs

This is useful when you want to: - Explore different regions of the search space - Test the robustness of your results - Run multiple independent optimization attempts

15.3 Practical Examples

15.3.1 Example 1: Comparing Different Configurations

When comparing different optimizer settings, use the same seed for fair comparison:

import numpy as np
from spotoptim import SpotOptim

def rosenbrock(X):
    """Rosenbrock function"""
    x = X[:, 0]
    y = X[:, 1]
    return (1 - x)**2 + 100 * (y - x**2)**2

# Configuration 1: More initial points
opt1 = SpotOptim(
    fun=rosenbrock,
    bounds=[(-2, 2), (-2, 2)],
    max_iter=50,
    n_initial=20,
    seed=42  # Same seed for fair comparison
)
result1 = opt1.optimize()

# Configuration 2: Fewer initial points, more iterations
opt2 = SpotOptim(
    fun=rosenbrock,
    bounds=[(-2, 2), (-2, 2)],
    max_iter=50,
    n_initial=10,
    seed=42  # Same seed
)
result2 = opt2.optimize()

print(f"Config 1 (more initial): {result1.fun:.6f}")
print(f"Config 2 (fewer initial): {result2.fun:.6f}")
Config 1 (more initial): 0.001750
Config 2 (fewer initial): 0.005165

15.3.2 Example 2: Reproducible Research Experiment

For scientific papers or reports, always use a fixed seed and document it:

import numpy as np
from spotoptim import SpotOptim

def rastrigin(X):
    """Rastrigin function (multimodal)"""
    A = 10
    n = X.shape[1]
    return A * n + np.sum(X**2 - A * np.cos(2 * np.pi * X), axis=1)

# Documented seed for reproducibility
RANDOM_SEED = 12345

optimizer = SpotOptim(
    fun=rastrigin,
    bounds=[(-5.12, 5.12), (-5.12, 5.12), (-5.12, 5.12)],
    max_iter=100,
    n_initial=30,
    seed=RANDOM_SEED,
    verbose=True
)

result = optimizer.optimize()

print(f"\nExperiment Results (seed={RANDOM_SEED}):")
print(f"Best solution: {result.x}")
print(f"Best value: {result.fun}")
print(f"Iterations: {result.nit}")
print(f"Function evaluations: {result.nfev}")

# These results can now be cited in a paper
TensorBoard logging disabled
Initial best: f(x) = 20.392774
Iter 2 | Best: 16.117880 | Rate: 0.50 | Evals: 32.0%
Iter 3 | Best: 9.476452 | Rate: 0.67 | Evals: 33.0%
Iter 4 | Best: 8.708107 | Rate: 0.75 | Evals: 34.0%
Iter 6 | Best: 8.654858 | Rate: 0.67 | Evals: 36.0%
Iter 8 | Best: 6.182059 | Rate: 0.62 | Evals: 38.0%
Iter 10 | Best: 3.579214 | Rate: 0.60 | Evals: 40.0%
Iter 12 | Best: 2.001850 | Rate: 0.58 | Evals: 42.0%
Iter 15 | Best: 1.994746 | Rate: 0.53 | Evals: 45.0%
Iter 20 | Best: 1.992141 | Rate: 0.45 | Evals: 50.0%
Iter 21 | Best: 1.989949 | Rate: 0.48 | Evals: 51.0%
Iter 22 | Best: 1.989944 | Rate: 0.50 | Evals: 52.0%
Optimizer candidate 1/3 was duplicate/invalid.
Iter 37 | Best: 1.989919 | Rate: 0.32 | Evals: 67.0%
Iter 38 | Best: 1.989919 | Rate: 0.34 | Evals: 68.0%
Optimizer candidate 1/3 was duplicate/invalid.
Iter 56 | Best: 1.989919 | Rate: 0.25 | Evals: 86.0%
Iter 58 | Best: 1.989919 | Rate: 0.26 | Evals: 88.0%
Iter 70 | Best: 1.989919 | Rate: 0.23 | Evals: 100.0%

Experiment Results (seed=12345):
Best solution: [ 9.94999942e-01 -9.94979062e-01 -6.10362377e-05]
Best value: 1.9899192742928875
Iterations: 70
Function evaluations: 100

15.3.3 Example 3: Multiple Independent Runs

To test robustness, run the same optimization with different seeds:

import numpy as np
from spotoptim import SpotOptim

def ackley(X):
    """Ackley function"""
    a = 20
    b = 0.2
    c = 2 * np.pi
    n = X.shape[1]
    
    sum_sq = np.sum(X**2, axis=1)
    sum_cos = np.sum(np.cos(c * X), axis=1)
    
    return -a * np.exp(-b * np.sqrt(sum_sq / n)) - np.exp(sum_cos / n) + a + np.e

# Run 5 independent optimizations
results = []
seeds = [42, 123, 456, 789, 1011]

for seed in seeds:
    optimizer = SpotOptim(
        fun=ackley,
        bounds=[(-5, 5), (-5, 5)],
        max_iter=40,
        n_initial=20,
        seed=seed,
        verbose=False
    )
    result = optimizer.optimize()
    results.append(result.fun)
    print(f"Run with seed {seed:4d}: f(x) = {result.fun:.6f}")

# Analyze robustness
print(f"\nBest result: {min(results):.6f}")
print(f"Worst result: {max(results):.6f}")
print(f"Mean: {np.mean(results):.6f}")
print(f"Std dev: {np.std(results):.6f}")
Run with seed   42: f(x) = 0.000905
Run with seed  123: f(x) = 0.001456
Run with seed  456: f(x) = 0.003297
Run with seed  789: f(x) = 0.000651
Run with seed 1011: f(x) = 0.003043

Best result: 0.000651
Worst result: 0.003297
Mean: 0.001870
Std dev: 0.001096

15.3.4 Example 4: Reproducible Initial Design

The seed ensures that even the initial design points are reproducible:

import numpy as np
from spotoptim import SpotOptim

def simple_quadratic(X):
    return np.sum((X - 1)**2, axis=1)

# Create two optimizers with same seed
opt1 = SpotOptim(
    fun=simple_quadratic,
    bounds=[(-5, 5), (-5, 5)],
    max_iter=25,
    n_initial=10,
    seed=999
)

opt2 = SpotOptim(
    fun=simple_quadratic,
    bounds=[(-5, 5), (-5, 5)],
    max_iter=25,
    n_initial=10,
    seed=999  # Same seed
)

# Run both optimizations
result1 = opt1.optimize()
result2 = opt2.optimize()

# Verify identical results
print("Initial design points are identical:", 
      np.allclose(opt1.X_[:10], opt2.X_[:10]))
print("All evaluated points are identical:", 
      np.allclose(opt1.X_, opt2.X_))
print("All function values are identical:", 
      np.allclose(opt1.y_, opt2.y_))
print("Best solutions are identical:", 
      np.allclose(result1.x, result2.x))
Initial design points are identical: True
All evaluated points are identical: True
All function values are identical: True
Best solutions are identical: True

15.3.5 Example 5: Custom Initial Design with Seed

Even when providing a custom initial design, the seed ensures reproducible subsequent iterations:

import numpy as np
from spotoptim import SpotOptim

def beale(X):
    """Beale function"""
    x = X[:, 0]
    y = X[:, 1]
    term1 = (1.5 - x + x * y)**2
    term2 = (2.25 - x + x * y**2)**2
    term3 = (2.625 - x + x * y**3)**2
    return term1 + term2 + term3

# Custom initial design (e.g., from previous knowledge)
X_start = np.array([
    [0.0, 0.0],
    [1.0, 1.0],
    [2.0, 2.0],
    [-1.0, -1.0]
])

# Run twice with same seed and initial design
opt1 = SpotOptim(
    fun=beale,
    bounds=[(-4.5, 4.5), (-4.5, 4.5)],
    max_iter=30,
    n_initial=10,
    seed=777
)
result1 = opt1.optimize(X0=X_start)

opt2 = SpotOptim(
    fun=beale,
    bounds=[(-4.5, 4.5), (-4.5, 4.5)],
    max_iter=30,
    n_initial=10,
    seed=777  # Same seed
)
result2 = opt2.optimize(X0=X_start)

print("Results are identical:", np.allclose(result1.x, result2.x))
print(f"Best value: {result1.fun:.6f}")
Results are identical: True
Best value: 1.236328

15.4 Advanced Topics

15.4.1 Seed and Noisy Functions

When optimizing noisy functions with repeated evaluations, the seed ensures reproducible noise:

import numpy as np
from spotoptim import SpotOptim

def noisy_sphere(X):
    """Sphere function with Gaussian noise"""
    base = np.sum(X**2, axis=1)
    noise = np.random.normal(0, 0.1, size=base.shape)
    return base + noise

optimizer = SpotOptim(
    fun=noisy_sphere,
    bounds=[(-5, 5), (-5, 5)],
    max_iter=40,
    n_initial=20,
    repeats_initial=3,  # 3 evaluations per point
    repeats_surrogate=2,
    seed=42  # Ensures same noise pattern
)

result = optimizer.optimize()
print(f"Best mean value: {optimizer.min_mean_y:.6f}")
print(f"Variance at best: {optimizer.min_var_y:.6f}")
Best mean value: -0.033586
Variance at best: 0.009499

Important: With the same seed, even the noise will be identical across runs!

15.4.2 Different Seeds for Different Exploration

Use different seeds to explore different regions systematically:

import numpy as np
from spotoptim import SpotOptim

def griewank(X):
    """Griewank function"""
    sum_sq = np.sum(X**2 / 4000, axis=1)
    prod_cos = np.prod(np.cos(X / np.sqrt(np.arange(1, X.shape[1] + 1))), axis=1)
    return sum_sq - prod_cos + 1

# Systematic exploration with different seeds
best_overall = float('inf')
best_seed = None

for seed in range(10, 20):  # Seeds 10-19
    optimizer = SpotOptim(
        fun=griewank,
        bounds=[(-600, 600), (-600, 600)],
        max_iter=50,
        n_initial=25,
        seed=seed
    )
    result = optimizer.optimize()
    
    if result.fun < best_overall:
        best_overall = result.fun
        best_seed = seed
    
    print(f"Seed {seed}: f(x) = {result.fun:.6f}")

print(f"\nBest result with seed {best_seed}: {best_overall:.6f}")
Seed 10: f(x) = 7.953579
Seed 11: f(x) = 0.828587
Seed 12: f(x) = 2.252967
Seed 13: f(x) = 0.243745
Seed 14: f(x) = 0.092171
Seed 15: f(x) = 0.383212
Seed 16: f(x) = 9.048331
Seed 17: f(x) = 0.548623
Seed 18: f(x) = 2.079286
Seed 19: f(x) = 0.246417

Best result with seed 14: 0.092171

15.5 Best Practices

15.5.1 1. Always Use Seeds for Production Code

# Good: Reproducible
optimizer = SpotOptim(fun=objective, bounds=bounds, seed=42)

# Risky: Non-reproducible
optimizer = SpotOptim(fun=objective, bounds=bounds)

15.5.2 2. Document Your Seeds

# Configuration for experiment reported in Section 4.2
EXPERIMENT_SEED = 2024
MAX_ITERATIONS = 100

optimizer = SpotOptim(
    fun=my_objective,
    bounds=my_bounds,
    max_iter=MAX_ITERATIONS,
    seed=EXPERIMENT_SEED
)

15.5.3 3. Use Different Seeds for Different Experiments

# Different experiments should use different seeds
BASELINE_SEED = 100
EXPERIMENT_A_SEED = 200
EXPERIMENT_B_SEED = 300

15.5.4 4. Test Robustness Across Multiple Seeds

# Run same optimization with multiple seeds
for seed in [42, 123, 456, 789, 1011]:
    optimizer = SpotOptim(fun=objective, bounds=bounds, seed=seed)
    result = optimizer.optimize()
    # Analyze results

15.6 What the Seed Controls

The seed parameter ensures reproducibility by controlling:

  1. Initial Design Generation: Latin Hypercube Sampling produces the same initial points
  2. Surrogate Model: Gaussian Process random initialization is identical
  3. Acquisition Optimization: Differential evolution explores the same candidates
  4. Random Sampling: Any random exploration uses the same random numbers

This guarantees that the entire optimization pipeline is deterministic and reproducible.

15.7 Common Questions

Q: Can I use seed=0?
A: Yes, any integer (including 0) is a valid seed.

Q: Will different Python versions give the same results?
A: Generally yes, but minor numerical differences may occur due to underlying library changes. Use the same environment for exact reproducibility.

Q: Does the seed affect the objective function?
A: No, the seed only affects SpotOptim’s internal random processes. If your objective function has its own randomness, you’ll need to control that separately.

Q: How do I choose a good seed value?
A: Any integer works. Common choices are 42, 123, or dates (e.g., 20241112). What matters is consistency, not the specific value.

15.8 Summary

  • Use seed parameter for reproducible optimization
  • Same seed → identical results (every time)
  • No seed → different results (random exploration)
  • Essential for research, debugging, and production
  • Document your seeds for transparency
  • Test robustness with multiple different seeds

15.9 Jupyter Notebook

Note