18  Infill Criteria

This chapter describes, analyzes, and compares different infill criterion. An infill criterion defines how the next point \(x_{n+1}\) is selected from the surrogate model \(S\). Expected improvement is a popular infill criterion in Bayesian optimization.

18.1 Expected Improvement

Expected Improvement (EI) is one of the most influential and widely-used infill criteria in surrogate-based optimization, particularly in Bayesian optimization. An infill criterion defines how the next evaluation point \(x_{n+1}\) is selected from the surrogate model \(S\), balancing the fundamental trade-off between exploitation (sampling where the surrogate predicts good values) and exploration (sampling where the surrogate is uncertain).

The concept of Expected Improvement was formalized by Jones, Schonlau, and Welch (1998) and builds upon the theoretical foundation established by Močkus (1974). It provides an elegant mathematical framework that naturally combines both exploitation and exploration in a single criterion, making it particularly well-suited for expensive black-box optimization problems.

18.1.1 The Philosophy Behind Expected Improvement

The core idea of Expected Improvement is deceptively simple yet mathematically sophisticated. Rather than simply choosing the point where the surrogate model predicts the best value (pure exploitation) or the point with the highest uncertainty (pure exploration), EI asks a more nuanced question:

“What is the expected value of improvement over the current best observation if we evaluate the objective function at point \(x\)?”

This approach naturally balances exploitation and exploration because:

  • Points near the current best solution have a reasonable chance of improvement (exploitation)
  • Points in unexplored regions with high uncertainty may yield surprising improvements (exploration)
  • The mathematical expectation provides a principled way to combine these considerations

18.1.2 Mathematical Definition

18.1.2.1 Setup and Notation

Consider a Gaussian Process (Kriging) surrogate model fitted to \(n\) observations \(\{(x^{(i)}, y^{(i)})\}_{i=1}^n\), where \(y^{(i)} = f(x^{(i)})\) are the expensive function evaluations. Let \(f_{best} = \min_{i=1,\ldots,n} y^{(i)}\) be the best (minimum) observed value so far.

At any unobserved point \(x\), the Gaussian Process provides:

  • A predictive mean: \(\hat{f}(x) = \mu(x)\)
  • A predictive standard deviation: \(s(x) = \sigma(x)\)

The GP assumes that the true function value \(f(x)\) follows a normal distribution: \[ f(x) \sim \mathcal{N}(\mu(x), \sigma^2(x)) \]

18.1.2.2 The Improvement Function

The improvement at point \(x\) is defined as: \[ I(x) = \max(f_{best} - f(x), 0) \]

This represents how much better the function value at \(x\) is compared to the current best. Note that \(I(x) = 0\) if \(f(x) \geq f_{best}\) (no improvement).

Definition 18.1 (Expected Improvement Formula) The Expected Improvement is the expectation of the improvement function: \[ EI(x) = \mathbb{E}[I(x)] = \mathbb{E}[\max(f_{best} - f(x), 0)] \]

Since \(f(x)\) is normally distributed under the GP model, this expectation has a closed-form solution:

\[ EI(x) = \begin{cases} (f_{best} - \mu(x)) \Phi\left(\frac{f_{best} - \mu(x)}{\sigma(x)}\right) + \sigma(x) \phi\left(\frac{f_{best} - \mu(x)}{\sigma(x)}\right) & \text{if } \sigma(x) > 0 \\ 0 & \text{if } \sigma(x) = 0 \end{cases} \]

where:

  • \(\Phi(\cdot)\) is the cumulative distribution function (CDF) of the standard normal distribution
  • \(\phi(\cdot)\) is the probability density function (PDF) of the standard normal distribution
  • \(Z = \frac{f_{best} - \mu(x)}{\sigma(x)}\) is the standardized improvement

18.1.2.3 Alternative Formulation

The Expected Improvement can also be written as: \[ EI(x) = \sigma(x) \left[ Z \Phi(Z) + \phi(Z) \right] \]

where \(Z = \frac{f_{best} - \mu(x)}{\sigma(x)}\) is the standardized improvement.

18.1.3 Understanding the Components

The EI formula elegantly combines two terms:

  1. Exploitation Term: \((f_{best} - \mu(x)) \Phi(Z)\)
    • Larger when \(\mu(x)\) is small (good predicted value)
    • Weighted by the probability that \(f(x) < f_{best}\)
  2. Exploration Term: \(\sigma(x) \phi(Z)\)
    • Larger when \(\sigma(x)\) is large (high uncertainty)
    • Represents the potential for discovering unexpectedly good values

18.2 EI: Implementation in spotpython

The spotpython package implements Expected Improvement in its Kriging class. Here’s how it works in practice:

18.2.1 Key Implementation Details

  1. Negative Expected Improvement: In optimization contexts, spotpython often returns the negative Expected Improvement because many optimization algorithms are designed to minimize rather than maximize objectives.

  2. Logarithmic Transformation: To handle numerical issues and improve optimization stability, spotpython often works with \(\log(EI)\):

    ExpImp = np.log10(EITermOne + EITermTwo + self.eps)
    return float(-ExpImp)  # Negative for minimization
  3. Numerical Stability: A small epsilon value (self.eps) is added to prevent numerical issues when EI becomes very small.

18.2.2 Code Example from the Kriging Class

def _pred(self, x: np.ndarray) -> Tuple[float, float, float]:
    """Computes Kriging prediction including Expected Improvement."""
    # ... [prediction calculations] ...
    
    # Compute Expected Improvement
    if self.return_ei:
        yBest = np.min(y)  # Current best observation
        
        # First term: (f_best - mu) * Phi(Z)
        EITermOne = (yBest - f) * (0.5 + 0.5 * erf((1 / np.sqrt(2)) * ((yBest - f) / s)))
        
        # Second term: sigma * phi(Z)
        EITermTwo = s * (1 / np.sqrt(2 * np.pi)) * np.exp(-0.5 * ((yBest - f) ** 2 / SSqr))
        
        # Expected Improvement (in log scale)
        ExpImp = np.log10(EITermOne + EITermTwo + self.eps)
        
        return float(f), float(s), float(-ExpImp)  # Return negative EI

18.3 Practical Advantages of Expected Improvement

  1. Automatic Balance: EI naturally balances exploitation and exploration without requiring manual tuning of weights or parameters.
  2. Scale Invariance: EI is relatively invariant to the scale of the objective function.
  3. Theoretical Foundation: EI has strong theoretical justification from decision theory and information theory.
  4. Efficient Optimization: The smooth, differentiable nature of EI makes it suitable for gradient-based optimization of the acquisition function.
  5. Proven Performance: EI has been successfully applied across numerous domains and consistently performs well in practice.

18.4 Connection to the Hyperparameter Tuning Cookbook

In the context of hyperparameter tuning, Expected Improvement plays a crucial role in:

  • Sequential Model-Based Optimization: EI guides the selection of which hyperparameter configurations to evaluate next
  • Efficient Resource Utilization: By balancing exploration and exploitation, EI helps find good hyperparameters with fewer expensive model training runs
  • Automated Optimization: EI provides a principled, automatic way to navigate the hyperparameter space without manual intervention

The implementation in spotpython makes Expected Improvement accessible for practical hyperparameter optimization tasks, providing both the theoretical rigor of Bayesian optimization and the computational efficiency needed for real-world applications.

18.5 Example: Spot and the 1-dim Sphere Function

import numpy as np
from math import inf
from spotpython.fun.objectivefunctions import Analytical
from spotpython.spot import Spot
from spotpython.utils.init import fun_control_init, surrogate_control_init, design_control_init
import matplotlib.pyplot as plt

18.5.1 The Objective Function: 1-dim Sphere

  • The spotpython package provides several classes of objective functions.
  • We will use an analytical objective function, i.e., a function that can be described by a (closed) formula: \[f(x) = x^2 \]
fun = Analytical().fun_sphere
  • The size of the lower bound vector determines the problem dimension.
  • Here we will use np.array([-1]), i.e., a one-dim function.
TensorBoard

Similar to the one-dimensional case, which was introduced in Section Section 13.8, we can use TensorBoard to monitor the progress of the optimization. We will use the same code, only the prefix is different:

from spotpython.utils.init import fun_control_init
PREFIX = "07_Y"
fun_control = fun_control_init(
    PREFIX=PREFIX,
    fun_evals = 25,
    lower = np.array([-1]),
    upper = np.array([1]),
    tolerance_x = np.sqrt(np.spacing(1)),)
design_control = design_control_init(init_size=10)
spot_1 = Spot(
            fun=fun,
            fun_control=fun_control,
            design_control=design_control)
spot_1.run()
spotpython tuning: 4.74409224815101e-10 [####------] 44.00% 
spotpython tuning: 4.74409224815101e-10 [#####-----] 48.00% 
spotpython tuning: 4.74409224815101e-10 [#####-----] 52.00% 
spotpython tuning: 4.74409224815101e-10 [######----] 56.00% 
spotpython tuning: 1.6645032376738785e-10 [######----] 60.00% 
spotpython tuning: 1.6645032376738785e-10 [######----] 64.00% 
spotpython tuning: 1.6645032376738785e-10 [#######---] 68.00% 
spotpython tuning: 1.6645032376738785e-10 [#######---] 72.00% 
spotpython tuning: 1.6645032376738785e-10 [########--] 76.00% 
spotpython tuning: 1.6645032376738785e-10 [########--] 80.00% 
spotpython tuning: 1.6645032376738785e-10 [########--] 84.00% 
spotpython tuning: 1.6645032376738785e-10 [#########-] 88.00% 
spotpython tuning: 1.6645032376738785e-10 [#########-] 92.00% 
spotpython tuning: 1.6645032376738785e-10 [##########] 96.00% 
spotpython tuning: 1.6645032376738785e-10 [##########] 100.00% Done...

Experiment saved to 07_Y_res.pkl
<spotpython.spot.spot.Spot at 0x10bda6270>

18.5.2 Results

spot_1.print_results()
min y: 1.6645032376738785e-10
x0: 1.2901562842050875e-05
[['x0', np.float64(1.2901562842050875e-05)]]
spot_1.plot_progress(log_y=True)

TensorBoard visualization of the spotpython optimization process and the surrogate model.

18.6 Same, but with EI as infill_criterion

PREFIX = "07_EI_ISO"
fun_control = fun_control_init(
    PREFIX=PREFIX,
    lower = np.array([-1]),
    upper = np.array([1]),
    fun_evals = 25,
    tolerance_x = np.sqrt(np.spacing(1)),
    infill_criterion = "ei")
spot_1_ei = Spot(fun=fun,
                     fun_control=fun_control)
spot_1_ei.run()
spotpython tuning: 1.6739119739724672e-09 [####------] 44.00% 
spotpython tuning: 1.6739119739724672e-09 [#####-----] 48.00% 
spotpython tuning: 1.6739119739724672e-09 [#####-----] 52.00% 
spotpython tuning: 1.6739119739724672e-09 [######----] 56.00% 
spotpython tuning: 5.969349640837553e-12 [######----] 60.00% 
spotpython tuning: 5.969349640837553e-12 [######----] 64.00% 
spotpython tuning: 5.969349640837553e-12 [#######---] 68.00% 
spotpython tuning: 5.969349640837553e-12 [#######---] 72.00% 
spotpython tuning: 5.969349640837553e-12 [########--] 76.00% 
spotpython tuning: 5.969349640837553e-12 [########--] 80.00% 
spotpython tuning: 5.969349640837553e-12 [########--] 84.00% 
spotpython tuning: 5.969349640837553e-12 [#########-] 88.00% 
spotpython tuning: 5.969349640837553e-12 [#########-] 92.00% 
spotpython tuning: 5.969349640837553e-12 [##########] 96.00% 
spotpython tuning: 5.969349640837553e-12 [##########] 100.00% Done...

Experiment saved to 07_EI_ISO_res.pkl
<spotpython.spot.spot.Spot at 0x125fccad0>
spot_1_ei.plot_progress(log_y=True)

spot_1_ei.print_results()
min y: 5.969349640837553e-12
x0: 2.443225253806442e-06
[['x0', np.float64(2.443225253806442e-06)]]

TensorBoard visualization of the spotpython optimization process and the surrogate model. Expected improvement, isotropic Kriging.

18.7 Non-isotropic Kriging

PREFIX = "07_EI_NONISO"
fun_control = fun_control_init(
    PREFIX=PREFIX,
    lower = np.array([-1, -1]),
    upper = np.array([1, 1]),
    fun_evals = 25,
    tolerance_x = np.sqrt(np.spacing(1)),
    infill_criterion = "ei")
surrogate_control = surrogate_control_init(
    n_theta=2,
    method="interpolation",
    )
spot_2_ei_noniso = Spot(fun=fun,
                   fun_control=fun_control,
                   surrogate_control=surrogate_control)
spot_2_ei_noniso.run()
spotpython tuning: 1.8879649092418398e-05 [####------] 44.00% 
spotpython tuning: 1.8879649092418398e-05 [#####-----] 48.00% 
spotpython tuning: 1.8879649092418398e-05 [#####-----] 52.00% 
spotpython tuning: 1.8879649092418398e-05 [######----] 56.00% 
spotpython tuning: 1.8879649092418398e-05 [######----] 60.00% 
spotpython tuning: 1.8879649092418398e-05 [######----] 64.00% 
spotpython tuning: 1.8879649092418398e-05 [#######---] 68.00% 
spotpython tuning: 1.8879649092418398e-05 [#######---] 72.00% 
spotpython tuning: 1.8879649092418398e-05 [########--] 76.00% 
spotpython tuning: 1.8879649092418398e-05 [########--] 80.00% 
spotpython tuning: 1.8879649092418398e-05 [########--] 84.00% 
spotpython tuning: 1.8879649092418398e-05 [#########-] 88.00% 
spotpython tuning: 1.8879649092418398e-05 [#########-] 92.00% 
spotpython tuning: 1.8879649092418398e-05 [##########] 96.00% 
spotpython tuning: 1.8879649092418398e-05 [##########] 100.00% Done...

Experiment saved to 07_EI_NONISO_res.pkl
<spotpython.spot.spot.Spot at 0x1260977a0>
spot_2_ei_noniso.plot_progress(log_y=True)

spot_2_ei_noniso.print_results()
min y: 1.8879649092418398e-05
x0: 0.0016422868343098733
x1: 0.004022753167455201
[['x0', np.float64(0.0016422868343098733)],
 ['x1', np.float64(0.004022753167455201)]]
spot_2_ei_noniso.surrogate.plot()

TensorBoard visualization of the spotpython optimization process and the surrogate model. Expected improvement, isotropic Kriging.

18.8 Using sklearn Surrogates

18.8.1 The spot Loop

The spot loop consists of the following steps:

  1. Init: Build initial design \(X\)
  2. Evaluate initial design on real objective \(f\): \(y = f(X)\)
  3. Build surrogate: \(S = S(X,y)\)
  4. Optimize on surrogate: \(X_0 = \text{optimize}(S)\)
  5. Evaluate on real objective: \(y_0 = f(X_0)\)
  6. Impute (Infill) new points: \(X = X \cup X_0\), \(y = y \cup y_0\).
  7. Got 3.

The spot loop is implemented in R as follows:

Visual representation of the model based search with SPOT. Taken from: Bartz-Beielstein, T., and Zaefferer, M. Hyperparameter tuning approaches. In Hyperparameter Tuning for Machine and Deep Learning with R - A Practical Guide, E. Bartz, T. Bartz-Beielstein, M. Zaefferer, and O. Mersmann, Eds. Springer, 2022, ch. 4, pp. 67–114.

18.8.2 spot: The Initial Model

18.8.2.1 Example: Modifying the initial design size

This is the “Example: Modifying the initial design size” from Chapter 4.5.1 in [bart21i].

spot_ei = Spot(fun=fun,
                fun_control=fun_control_init(
                lower = np.array([-1,-1]),
                upper= np.array([1,1])), 
                design_control = design_control_init(init_size=5))
spot_ei.run()
spotpython tuning: 0.13773784008577408 [####------] 40.00% 
spotpython tuning: 0.137092032817552 [#####-----] 46.67% 
spotpython tuning: 0.13507127750732323 [#####-----] 53.33% 
spotpython tuning: 0.12519833727871527 [######----] 60.00% 
spotpython tuning: 0.09323163938334049 [#######---] 66.67% 
spotpython tuning: 0.057966805090302165 [#######---] 73.33% 
spotpython tuning: 0.010203880941217082 [########--] 80.00% 
spotpython tuning: 0.0030660266707283317 [#########-] 86.67% 
spotpython tuning: 0.0030473908765633047 [#########-] 93.33% 
spotpython tuning: 0.0030473908765633047 [##########] 100.00% Done...

Experiment saved to 000_res.pkl
<spotpython.spot.spot.Spot at 0x126267d10>
spot_ei.plot_progress()

np.min(spot_1.y), np.min(spot_ei.y)
(np.float64(1.6645032376738785e-10), np.float64(0.0030473908765633047))

18.8.3 Init: Build Initial Design

from spotpython.design.spacefilling import SpaceFilling
from spotpython.surrogate.kriging import Kriging
from spotpython.fun.objectivefunctions import Analytical
gen = SpaceFilling(2)
rng = np.random.RandomState(1)
lower = np.array([-5,-0])
upper = np.array([10,15])
fun = Analytical().fun_branin

X = gen.scipy_lhd(10, lower=lower, upper = upper)
print(X)
y = fun(X, fun_control=fun_control)
print(y)
[[ 8.97647221 13.41926847]
 [ 0.66946019  1.22344228]
 [ 5.23614115 13.78185824]
 [ 5.6149825  11.5851384 ]
 [-1.72963184  1.66516096]
 [-4.26945568  7.1325531 ]
 [ 1.26363761 10.17935555]
 [ 2.88779942  8.05508969]
 [-3.39111089  4.15213772]
 [ 7.30131231  5.22275244]]
[128.95676449  31.73474356 172.89678121 126.71295908  64.34349975
  70.16178611  48.71407916  31.77322887  76.91788181  30.69410529]
S = Kriging(name='kriging',  seed=123)
S.fit(X, y)
S.plot()

gen = SpaceFilling(2, seed=123)
X0 = gen.scipy_lhd(3)
gen = SpaceFilling(2, seed=345)
X1 = gen.scipy_lhd(3)
X2 = gen.scipy_lhd(3)
gen = SpaceFilling(2, seed=123)
X3 = gen.scipy_lhd(3)
X0, X1, X2, X3
(array([[0.77254938, 0.31539299],
        [0.59321338, 0.93854273],
        [0.27469803, 0.3959685 ]]),
 array([[0.78373509, 0.86811887],
        [0.06692621, 0.6058029 ],
        [0.41374778, 0.00525456]]),
 array([[0.121357  , 0.69043832],
        [0.41906219, 0.32838498],
        [0.86742658, 0.52910374]]),
 array([[0.77254938, 0.31539299],
        [0.59321338, 0.93854273],
        [0.27469803, 0.3959685 ]]))

18.8.4 Evaluate

18.8.5 Build Surrogate

18.8.6 A Simple Predictor

The code below shows how to use a simple model for prediction.

  • Assume that only two (very costly) measurements are available:

    1. f(0) = 0.5
    2. f(2) = 2.5
  • We are interested in the value at \(x_0 = 1\), i.e., \(f(x_0 = 1)\), but cannot run an additional, third experiment.

from sklearn import linear_model
X = np.array([[0], [2]])
y = np.array([0.5, 2.5])
S_lm = linear_model.LinearRegression()
S_lm = S_lm.fit(X, y)
X0 = np.array([[1]])
y0 = S_lm.predict(X0)
print(y0)
[1.5]
  • Central Idea:
    • Evaluation of the surrogate model S_lm is much cheaper (or / and much faster) than running the real-world experiment \(f\).

18.9 Gaussian Processes regression: basic introductory example

This example was taken from scikit-learn. After fitting our model, we see that the hyperparameters of the kernel have been optimized. Now, we will use our kernel to compute the mean prediction of the full dataset and plot the 95% confidence interval.

import numpy as np
import matplotlib.pyplot as plt
import math as m
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF

X = np.linspace(start=0, stop=10, num=1_000).reshape(-1, 1)
y = np.squeeze(X * np.sin(X))
rng = np.random.RandomState(1)
training_indices = rng.choice(np.arange(y.size), size=6, replace=False)
X_train, y_train = X[training_indices], y[training_indices]

kernel = 1 * RBF(length_scale=1.0, length_scale_bounds=(1e-2, 1e2))
gaussian_process = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
gaussian_process.fit(X_train, y_train)
gaussian_process.kernel_

mean_prediction, std_prediction = gaussian_process.predict(X, return_std=True)

plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted")
plt.scatter(X_train, y_train, label="Observations")
plt.plot(X, mean_prediction, label="Mean prediction")
plt.fill_between(
    X.ravel(),
    mean_prediction - 1.96 * std_prediction,
    mean_prediction + 1.96 * std_prediction,
    alpha=0.5,
    label=r"95% confidence interval",
)
plt.legend()
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
_ = plt.title("sk-learn Version: Gaussian process regression on noise-free dataset")

from spotpython.surrogate.kriging import Kriging
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.RandomState(1)
X = np.linspace(start=0, stop=10, num=1_000).reshape(-1, 1)
y = np.squeeze(X * np.sin(X))
training_indices = rng.choice(np.arange(y.size), size=6, replace=False)
X_train, y_train = X[training_indices], y[training_indices]


S = Kriging(name='kriging',  seed=123, log_level=50, cod_type="norm")
S.fit(X_train, y_train)

mean_prediction, std_prediction, ei = S.predict(X, return_val="all")

std_prediction

plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted")
plt.scatter(X_train, y_train, label="Observations")
plt.plot(X, mean_prediction, label="Mean prediction")
plt.fill_between(
    X.ravel(),
    mean_prediction - 1.96 * std_prediction,
    mean_prediction + 1.96 * std_prediction,
    alpha=0.5,
    label=r"95% confidence interval",
)
plt.legend()
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
_ = plt.title("spotpython Version: Gaussian process regression on noise-free dataset")

18.10 The Surrogate: Using scikit-learn models

Default is the internal kriging surrogate.

S_0 = Kriging(name='kriging', seed=123)

Models from scikit-learn can be selected, e.g., Gaussian Process:

# Needed for the sklearn surrogates:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn import linear_model
from sklearn import tree
import pandas as pd
kernel = 1 * RBF(length_scale=1.0, length_scale_bounds=(1e-2, 1e2))
S_GP = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
  • and many more:
S_Tree = DecisionTreeRegressor(random_state=0)
S_LM = linear_model.LinearRegression()
S_Ridge = linear_model.Ridge()
S_RF = RandomForestRegressor(max_depth=2, random_state=0) 
  • The scikit-learn GP model S_GP is selected.
S = S_GP
isinstance(S, GaussianProcessRegressor)
True
from spotpython.fun.objectivefunctions import Analytical
fun = Analytical().fun_branin
fun_control = fun_control_init(
    lower = np.array([-5,-0]),
    upper = np.array([10,15]),
    fun_evals = 15)    
design_control = design_control_init(init_size=5)
spot_GP = Spot(fun=fun, 
                    fun_control=fun_control,
                    surrogate=S, 
                    design_control=design_control)
spot_GP.run()
spotpython tuning: 24.51465459019188 [####------] 40.00% 
spotpython tuning: 11.003092545432404 [#####-----] 46.67% 
spotpython tuning: 11.003092545432404 [#####-----] 53.33% 
spotpython tuning: 7.281405479109784 [######----] 60.00% 
spotpython tuning: 7.281405479109784 [#######---] 66.67% 
spotpython tuning: 7.281405479109784 [#######---] 73.33% 
spotpython tuning: 2.9520033012954237 [########--] 80.00% 
spotpython tuning: 2.9520033012954237 [#########-] 86.67% 
spotpython tuning: 2.1049818033904044 [#########-] 93.33% 
spotpython tuning: 1.9431597967021723 [##########] 100.00% Done...

Experiment saved to 000_res.pkl
<spotpython.spot.spot.Spot at 0x1263134a0>
spot_GP.y
array([ 69.32459936, 152.38491454, 107.92560483,  24.51465459,
        76.73500031,  86.30426863,  11.00309255,  16.11758333,
         7.28140548,  21.82343562,  10.96088904,   2.9520033 ,
         3.02912616,   2.1049818 ,   1.9431598 ])
spot_GP.plot_progress()

spot_GP.print_results()
min y: 1.9431597967021723
x0: 10.0
x1: 2.99858238342458
[['x0', np.float64(10.0)], ['x1', np.float64(2.99858238342458)]]

18.11 Additional Examples

# Needed for the sklearn surrogates:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn import linear_model
from sklearn import tree
import pandas as pd
kernel = 1 * RBF(length_scale=1.0, length_scale_bounds=(1e-2, 1e2))
S_GP = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
from spotpython.surrogate.kriging import Kriging
import numpy as np
import spotpython
from spotpython.fun.objectivefunctions import Analytical
from spotpython.spot import Spot

S_K = Kriging(name='kriging',
              seed=123,
              log_level=50,
              infill_criterion = "y",
              n_theta=1,
              method="interpolation",
              cod_type="norm")
fun = Analytical().fun_sphere

fun_control = fun_control_init(
    lower = np.array([-1,-1]),
    upper = np.array([1,1]),
    fun_evals = 25)

spot_S_K = Spot(fun=fun,
                     fun_control=fun_control,
                     surrogate=S_K,
                     design_control=design_control,
                     surrogate_control=surrogate_control)
spot_S_K.run()
spotpython tuning: 0.13771720249971786 [##--------] 24.00% 
spotpython tuning: 0.008765811130597791 [###-------] 28.00% 
spotpython tuning: 0.002838288758657914 [###-------] 32.00% 
spotpython tuning: 0.0008164210951892503 [####------] 36.00% 
spotpython tuning: 0.0003661048177839494 [####------] 40.00% 
spotpython tuning: 0.0003589648342263893 [####------] 44.00% 
spotpython tuning: 0.0003589648342263893 [#####-----] 48.00% 
spotpython tuning: 0.00032902762400155227 [#####-----] 52.00% 
spotpython tuning: 0.0002817371331525184 [######----] 56.00% 
spotpython tuning: 0.0001682443401655298 [######----] 60.00% 
spotpython tuning: 2.039354315945154e-05 [######----] 64.00% 
spotpython tuning: 1.5898357927868756e-06 [#######---] 68.00% 
spotpython tuning: 7.231797257673966e-07 [#######---] 72.00% 
spotpython tuning: 4.7009088690905644e-07 [########--] 76.00% 
spotpython tuning: 3.8991843792581266e-07 [########--] 80.00% 
spotpython tuning: 3.7436106441025836e-07 [########--] 84.00% 
spotpython tuning: 3.7287987551444754e-07 [#########-] 88.00% 
spotpython tuning: 3.7287987551444754e-07 [#########-] 92.00% 
spotpython tuning: 3.7287987551444754e-07 [##########] 96.00% 
spotpython tuning: 3.7287987551444754e-07 [##########] 100.00% Done...

Experiment saved to 000_res.pkl
<spotpython.spot.spot.Spot at 0x1259a2870>
spot_S_K.plot_progress(log_y=True)

spot_S_K.surrogate.plot()

spot_S_K.print_results()
min y: 3.7287987551444754e-07
x0: -0.0006065092770223268
x1: 7.089691389829288e-05
[['x0', np.float64(-0.0006065092770223268)],
 ['x1', np.float64(7.089691389829288e-05)]]

18.11.1 Optimize on Surrogate

18.11.2 Evaluate on Real Objective

18.11.3 Impute / Infill new Points

18.12 Tests

import numpy as np
from spotpython.spot import Spot
from spotpython.fun.objectivefunctions import Analytical

fun_sphere = Analytical().fun_sphere

fun_control = fun_control_init(
                    lower=np.array([-1, -1]),
                    upper=np.array([1, 1]),
                    n_points = 2)
spot_1 = Spot(
    fun=fun_sphere,
    fun_control=fun_control,
)

# (S-2) Initial Design:
spot_1.X = spot_1.design.scipy_lhd(
    spot_1.design_control["init_size"], lower=spot_1.lower, upper=spot_1.upper
)
print(spot_1.X)

# (S-3): Eval initial design:
spot_1.y = spot_1.fun(spot_1.X)
print(spot_1.y)

spot_1.fit_surrogate()
X0 = spot_1.suggest_new_X()
print(X0)
assert X0.size == spot_1.n_points * spot_1.k
[[ 0.86352963  0.7892358 ]
 [-0.24407197 -0.83687436]
 [ 0.36481882  0.8375811 ]
 [ 0.415331    0.54468512]
 [-0.56395091 -0.77797854]
 [-0.90259409 -0.04899292]
 [-0.16484832  0.35724741]
 [ 0.05170659  0.07401196]
 [-0.78548145 -0.44638164]
 [ 0.64017497 -0.30363301]]
[1.36857656 0.75992983 0.83463487 0.46918172 0.92329124 0.8170764
 0.15480068 0.00815134 0.81623768 0.502017  ]
[[0.03166402 0.03957873]
 [0.03166914 0.03957624]]

18.13 EI: The Famous Schonlau Example

X_train0 = np.array([1, 2, 3, 4, 12]).reshape(-1,1)
X_train = np.linspace(start=0, stop=10, num=5).reshape(-1, 1)
from spotpython.surrogate.kriging import Kriging
import numpy as np
import matplotlib.pyplot as plt

X_train = np.array([1., 2., 3., 4., 12.]).reshape(-1,1)
y_train = np.array([0., -1.75, -2, -0.5, 5.])

S = Kriging(name='kriging',  seed=123, log_level=50, n_theta=1, method="interpolation", cod_type="norm")
S.fit(X_train, y_train)

X = np.linspace(start=0, stop=13, num=1000).reshape(-1, 1)
mean_prediction, std_prediction, ei = S.predict(X, return_val="all")

plt.scatter(X_train, y_train, label="Observations")
plt.plot(X, mean_prediction, label="Mean prediction")
if True:
    plt.fill_between(
        X.ravel(),
        mean_prediction - 2 * std_prediction,
        mean_prediction + 2 * std_prediction,
        alpha=0.5,
        label=r"95% confidence interval",
    )
plt.legend()
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
_ = plt.title("Gaussian process regression on noise-free dataset")

#plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted")
# plt.scatter(X_train, y_train, label="Observations")
plt.plot(X, -ei, label="Expected Improvement")
plt.legend()
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
_ = plt.title("Gaussian process regression on noise-free dataset")

S.get_model_params()
{'log_theta_lambda': array([-0.99002527]),
 'U': array([[1.00000001e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
         0.00000000e+00],
        [9.02737603e-01, 4.30191626e-01, 0.00000000e+00, 0.00000000e+00,
         0.00000000e+00],
        [6.64119362e-01, 7.04830290e-01, 2.49318571e-01, 0.00000000e+00,
         0.00000000e+00],
        [3.98156512e-01, 7.08262302e-01, 5.57958584e-01, 1.68873137e-01,
         0.00000000e+00],
        [4.19706687e-06, 7.48476021e-05, 7.85849126e-04, 5.55938288e-03,
         9.99984242e-01]]),
 'X': array([[ 1.],
        [ 2.],
        [ 3.],
        [ 4.],
        [12.]]),
 'y': array([ 0.  , -1.75, -2.  , -0.5 ,  5.  ]),
 'negLnLike': np.float64(1.2078820477330403)}

18.14 EI: The Forrester Example

from spotpython.surrogate.kriging import Kriging
import numpy as np
import matplotlib.pyplot as plt
import spotpython
from spotpython.fun.objectivefunctions import Analytical
from spotpython.spot import Spot

# exact x locations are unknown:
X_train = np.array([0.0, 0.175, 0.225, 0.3, 0.35, 0.375, 0.5,1]).reshape(-1,1)

fun = Analytical().fun_forrester
fun_control = fun_control_init(
    PREFIX="07_EI_FORRESTER",
    sigma=1.0,
    seed=123,)
y_train = fun(X_train, fun_control=fun_control)

S = Kriging(name='kriging',  seed=123, log_level=50, n_theta=1, method="interpolation", cod_type="norm")
S.fit(X_train, y_train)

X = np.linspace(start=0, stop=1, num=1000).reshape(-1, 1)
mean_prediction, std_prediction, ei = S.predict(X, return_val="all")

plt.scatter(X_train, y_train, label="Observations")
plt.plot(X, mean_prediction, label="Mean prediction")
if True:
    plt.fill_between(
        X.ravel(),
        mean_prediction - 2 * std_prediction,
        mean_prediction + 2 * std_prediction,
        alpha=0.5,
        label=r"95% confidence interval",
    )
plt.legend()
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
_ = plt.title("Gaussian process regression on noise-free dataset")

#plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted")
# plt.scatter(X_train, y_train, label="Observations")
plt.plot(X, -ei, label="Expected Improvement")
plt.legend()
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
_ = plt.title("Gaussian process regression on noise-free dataset")

18.15 Noise

import numpy as np
import spotpython
from spotpython.fun.objectivefunctions import Analytical
from spotpython.spot import Spot
from spotpython.design.spacefilling import SpaceFilling
from spotpython.surrogate.kriging import Kriging
import matplotlib.pyplot as plt

gen = SpaceFilling(1)
rng = np.random.RandomState(1)
lower = np.array([-10])
upper = np.array([10])
fun = Analytical().fun_sphere
fun_control = fun_control_init(
    PREFIX="07_Y",
    sigma=2.0,
    seed=123,)
X = gen.scipy_lhd(10, lower=lower, upper = upper)
print(X)
y = fun(X, fun_control=fun_control)
print(y)
y.shape
X_train = X.reshape(-1,1)
y_train = y

S = Kriging(name='kriging',
            seed=123,
            log_level=50,
            n_theta=1,
            method="interpolation")
S.fit(X_train, y_train)

X_axis = np.linspace(start=-13, stop=13, num=1000).reshape(-1, 1)
mean_prediction, std_prediction, ei = S.predict(X_axis, return_val="all")

#plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted")
plt.scatter(X_train, y_train, label="Observations")
#plt.plot(X, ei, label="Expected Improvement")
plt.plot(X_axis, mean_prediction, label="mue")
plt.legend()
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
_ = plt.title("Sphere: Gaussian process regression on noisy dataset")
[[ 0.63529627]
 [-4.10764204]
 [-0.44071975]
 [ 9.63125638]
 [-8.3518118 ]
 [-3.62418901]
 [ 4.15331   ]
 [ 3.4468512 ]
 [ 6.36049088]
 [-7.77978539]]
[-1.57464135 16.13714981  2.77008442 93.14904827 71.59322218 14.28895359
 15.9770567  12.96468767 39.82265329 59.88028242]

S.get_model_params()
{'log_theta_lambda': array([-1.10547476]),
 'U': array([[ 1.00000001e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00],
        [ 1.71273420e-01,  9.85223543e-01,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00],
        [ 9.13185648e-01,  1.94770737e-01,  3.57989311e-01,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00],
        [ 1.75066965e-03, -3.03963173e-04, -3.32220779e-03,
          9.99992910e-01,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00],
        [ 1.77266598e-03,  2.46779757e-01, -1.18173383e-01,
         -3.20690193e-04,  9.61837602e-01,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00],
        [ 2.40962648e-01,  9.54670161e-01,  1.27460012e-01,
          2.92823322e-04, -4.96183483e-02,  1.08783176e-01,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00],
        [ 3.78787902e-01, -6.10436927e-02, -3.99469260e-01,
          9.30038103e-02, -3.40797821e-02,  2.28886571e-01,
          7.94366109e-01,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00],
        [ 5.37923928e-01, -8.19698319e-02, -4.73894997e-01,
          4.72464311e-02, -3.81494553e-02,  2.47600403e-01,
          6.30909812e-01,  1.27677658e-01,  0.00000000e+00,
          0.00000000e+00],
        [ 7.64573844e-02, -1.31037818e-02, -1.13704605e-01,
          4.31578080e-01, -1.06049066e-02,  7.65591659e-02,
          6.91377243e-01, -4.55944025e-01,  3.20831704e-01,
          0.00000000e+00],
        [ 3.87015427e-03,  3.51787204e-01, -1.60406611e-01,
         -4.32752122e-04,  9.03358179e-01, -1.23536920e-01,
          1.89427140e-02,  3.06145331e-02,  1.92052594e-02,
          1.32355746e-01]]),
 'X': array([[ 0.63529627],
        [-4.10764204],
        [-0.44071975],
        [ 9.63125638],
        [-8.3518118 ],
        [-3.62418901],
        [ 4.15331   ],
        [ 3.4468512 ],
        [ 6.36049088],
        [-7.77978539]]),
 'y': array([-1.57464135, 16.13714981,  2.77008442, 93.14904827, 71.59322218,
        14.28895359, 15.9770567 , 12.96468767, 39.82265329, 59.88028242]),
 'negLnLike': np.float64(26.185053861403652)}
S = Kriging(name='kriging',
            seed=123,
            log_level=50,
            n_theta=1,
            method="regression")
S.fit(X_train, y_train)

X_axis = np.linspace(start=-13, stop=13, num=1000).reshape(-1, 1)
mean_prediction, std_prediction, ei = S.predict(X_axis, return_val="all")

#plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted")
plt.scatter(X_train, y_train, label="Observations")
#plt.plot(X, ei, label="Expected Improvement")
plt.plot(X_axis, mean_prediction, label="mue")
plt.legend()
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
_ = plt.title("Sphere: Gaussian process regression with nugget on noisy dataset")

S.get_model_params()
{'log_theta_lambda': array([-2.96944858, -4.36747214]),
 'U': array([[ 1.00002145e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00],
        [ 9.76133029e-01,  2.17272217e-01,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00],
        [ 9.98737153e-01,  4.96011067e-02,  1.03313745e-02,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00],
        [ 9.16817553e-01, -3.60197553e-01, -8.88468998e-02,
          1.47825687e-01,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00],
        [ 9.16974148e-01,  3.94762935e-01, -3.27890052e-02,
          3.66328503e-02,  3.07645906e-02,  0.00000000e+00,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00],
        [ 9.80701686e-01,  1.95395269e-01,  2.97962139e-03,
         -1.95305835e-03,  2.02283513e-03,  8.42704730e-03,
          0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00],
        [ 9.86788177e-01, -1.55737151e-01, -1.99497526e-02,
          3.88600274e-02,  2.80752262e-03, -2.58965848e-03,
          1.07358109e-02,  0.00000000e+00,  0.00000000e+00,
          0.00000000e+00],
        [ 9.91533639e-01, -1.25471967e-01, -1.37303482e-02,
          2.92651839e-02,  2.04205703e-03, -2.17491982e-03,
          6.31892482e-03,  8.18125303e-03,  0.00000000e+00,
          0.00000000e+00],
        [ 9.65423727e-01, -2.45324121e-01, -4.38018569e-02,
          7.58583497e-02,  4.13729684e-03, -2.99254932e-03,
          6.62089183e-03,  2.90075086e-03,  8.03708950e-03,
          0.00000000e+00],
        [ 9.26819918e-01,  3.72515229e-01, -2.67187334e-02,
          3.00084948e-02,  2.43303379e-02,  1.24104117e-03,
         -3.28660593e-04, -2.24039384e-04,  1.35180886e-04,
          8.48929786e-03]]),
 'X': array([[ 0.63529627],
        [-4.10764204],
        [-0.44071975],
        [ 9.63125638],
        [-8.3518118 ],
        [-3.62418901],
        [ 4.15331   ],
        [ 3.4468512 ],
        [ 6.36049088],
        [-7.77978539]]),
 'y': array([-1.57464135, 16.13714981,  2.77008442, 93.14904827, 71.59322218,
        14.28895359, 15.9770567 , 12.96468767, 39.82265329, 59.88028242]),
 'negLnLike': np.float64(21.820591738048208)}

18.16 Cubic Function

import numpy as np
import spotpython
from spotpython.fun.objectivefunctions import Analytical
from spotpython.spot import Spot
from spotpython.design.spacefilling import SpaceFilling
from spotpython.surrogate.kriging import Kriging
import matplotlib.pyplot as plt

gen = SpaceFilling(1)
rng = np.random.RandomState(1)
lower = np.array([-10])
upper = np.array([10])
fun = Analytical().fun_cubed
fun_control = fun_control_init(
    PREFIX="07_Y",
    sigma=10.0,
    seed=123,)

X = gen.scipy_lhd(10, lower=lower, upper = upper)
print(X)
y = fun(X, fun_control=fun_control)
print(y)
y.shape
X_train = X.reshape(-1,1)
y_train = y

S = Kriging(name='kriging',  seed=123, log_level=50, n_theta=1, method="interpolation")
S.fit(X_train, y_train)

X_axis = np.linspace(start=-13, stop=13, num=1000).reshape(-1, 1)
mean_prediction, std_prediction, ei = S.predict(X_axis, return_val="all")

plt.scatter(X_train, y_train, label="Observations")
#plt.plot(X, ei, label="Expected Improvement")
plt.plot(X_axis, mean_prediction, label="mue")
plt.legend()
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
_ = plt.title("Cubed: Gaussian process regression on noisy dataset")
[[ 0.63529627]
 [-4.10764204]
 [-0.44071975]
 [ 9.63125638]
 [-8.3518118 ]
 [-3.62418901]
 [ 4.15331   ]
 [ 3.4468512 ]
 [ 6.36049088]
 [-7.77978539]]
[  -9.63480707  -72.98497325   12.7936499   895.34567477 -573.35961837
  -41.83176425   65.27989461   46.37081417  254.1530734  -474.09587355]

S = Kriging(name='kriging',  seed=123, log_level=0, n_theta=1, method="regression")
S.fit(X_train, y_train)

X_axis = np.linspace(start=-13, stop=13, num=1000).reshape(-1, 1)
mean_prediction, std_prediction, ei = S.predict(X_axis, return_val="all")

plt.scatter(X_train, y_train, label="Observations")
#plt.plot(X, ei, label="Expected Improvement")
plt.plot(X_axis, mean_prediction, label="mue")
plt.legend()
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
_ = plt.title("Cubed: Gaussian process with nugget regression on noisy dataset")

import numpy as np
import spotpython
from spotpython.fun.objectivefunctions import Analytical
from spotpython.spot import Spot
from spotpython.design.spacefilling import SpaceFilling
from spotpython.surrogate.kriging import Kriging
import matplotlib.pyplot as plt

gen = SpaceFilling(1)
rng = np.random.RandomState(1)
lower = np.array([-10])
upper = np.array([10])
fun = Analytical().fun_runge
fun_control = fun_control_init(
    PREFIX="07_Y",
    sigma=0.25,
    seed=123,)

X = gen.scipy_lhd(10, lower=lower, upper = upper)
print(X)
y = fun(X, fun_control=fun_control)
print(y)
y.shape
X_train = X.reshape(-1,1)
y_train = y

S = Kriging(name='kriging',  seed=123, log_level=50, n_theta=1, method="interpolation")
S.fit(X_train, y_train)

X_axis = np.linspace(start=-13, stop=13, num=1000).reshape(-1, 1)
mean_prediction, std_prediction, ei = S.predict(X_axis, return_val="all")

plt.scatter(X_train, y_train, label="Observations")
#plt.plot(X, ei, label="Expected Improvement")
plt.plot(X_axis, mean_prediction, label="mue")
plt.legend()
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
_ = plt.title("Gaussian process regression on noisy dataset")
[[ 0.63529627]
 [-4.10764204]
 [-0.44071975]
 [ 9.63125638]
 [-8.3518118 ]
 [-3.62418901]
 [ 4.15331   ]
 [ 3.4468512 ]
 [ 6.36049088]
 [-7.77978539]]
[ 0.46517267 -0.03599548  1.15933822  0.05915901  0.24419145  0.21502359
 -0.10432134  0.21312309 -0.05502681 -0.06434374]

S = Kriging(name='kriging',
            seed=123,
            log_level=50,
            n_theta=1,
            method="regression")
S.fit(X_train, y_train)

X_axis = np.linspace(start=-13, stop=13, num=1000).reshape(-1, 1)
mean_prediction, std_prediction, ei = S.predict(X_axis, return_val="all")

plt.scatter(X_train, y_train, label="Observations")
#plt.plot(X, ei, label="Expected Improvement")
plt.plot(X_axis, mean_prediction, label="mue")
plt.legend()
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
_ = plt.title("Gaussian process regression with nugget on noisy dataset")

18.17 Modifying Lambda Search Space

S = Kriging(name='kriging',
            seed=123,
            log_level=50,
            n_theta=1,
            method="regression",
            min_Lambda=0.1,
            max_Lambda=10)
S.fit(X_train, y_train)

print(f"Lambda: {S.Lambda}")
Lambda: [0.1]
X_axis = np.linspace(start=-13, stop=13, num=1000).reshape(-1, 1)
mean_prediction, std_prediction, ei = S.predict(X_axis, return_val="all")

plt.scatter(X_train, y_train, label="Observations")
#plt.plot(X, ei, label="Expected Improvement")
plt.plot(X_axis, mean_prediction, label="mue")
plt.legend()
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
_ = plt.title("Gaussian process regression with nugget on noisy dataset. Modified Lambda search space.")