import numpy as np
from math import inf
from spotpython.fun.objectivefunctions import Analytical
from spotpython.spot import Spot
from spotpython.utils.init import fun_control_init, surrogate_control_init, design_control_init
import matplotlib.pyplot as plt
18 Infill Criteria
This chapter describes, analyzes, and compares different infill criterion. An infill criterion defines how the next point \(x_{n+1}\) is selected from the surrogate model \(S\). Expected improvement is a popular infill criterion in Bayesian optimization.
18.1 Expected Improvement
Expected Improvement (EI) is one of the most influential and widely-used infill criteria in surrogate-based optimization, particularly in Bayesian optimization. An infill criterion defines how the next evaluation point \(x_{n+1}\) is selected from the surrogate model \(S\), balancing the fundamental trade-off between exploitation (sampling where the surrogate predicts good values) and exploration (sampling where the surrogate is uncertain).
The concept of Expected Improvement was formalized by Jones, Schonlau, and Welch (1998) and builds upon the theoretical foundation established by Močkus (1974). It provides an elegant mathematical framework that naturally combines both exploitation and exploration in a single criterion, making it particularly well-suited for expensive black-box optimization problems.
18.1.1 The Philosophy Behind Expected Improvement
The core idea of Expected Improvement is deceptively simple yet mathematically sophisticated. Rather than simply choosing the point where the surrogate model predicts the best value (pure exploitation) or the point with the highest uncertainty (pure exploration), EI asks a more nuanced question:
“What is the expected value of improvement over the current best observation if we evaluate the objective function at point \(x\)?”
This approach naturally balances exploitation and exploration because:
- Points near the current best solution have a reasonable chance of improvement (exploitation)
- Points in unexplored regions with high uncertainty may yield surprising improvements (exploration)
- The mathematical expectation provides a principled way to combine these considerations
18.1.2 Mathematical Definition
18.1.2.1 Setup and Notation
Consider a Gaussian Process (Kriging) surrogate model fitted to \(n\) observations \(\{(x^{(i)}, y^{(i)})\}_{i=1}^n\), where \(y^{(i)} = f(x^{(i)})\) are the expensive function evaluations. Let \(f_{best} = \min_{i=1,\ldots,n} y^{(i)}\) be the best (minimum) observed value so far.
At any unobserved point \(x\), the Gaussian Process provides:
- A predictive mean: \(\hat{f}(x) = \mu(x)\)
- A predictive standard deviation: \(s(x) = \sigma(x)\)
The GP assumes that the true function value \(f(x)\) follows a normal distribution: \[ f(x) \sim \mathcal{N}(\mu(x), \sigma^2(x)) \]
18.1.2.2 The Improvement Function
The improvement at point \(x\) is defined as: \[ I(x) = \max(f_{best} - f(x), 0) \]
This represents how much better the function value at \(x\) is compared to the current best. Note that \(I(x) = 0\) if \(f(x) \geq f_{best}\) (no improvement).
Definition 18.1 (Expected Improvement Formula) The Expected Improvement is the expectation of the improvement function: \[ EI(x) = \mathbb{E}[I(x)] = \mathbb{E}[\max(f_{best} - f(x), 0)] \]
Since \(f(x)\) is normally distributed under the GP model, this expectation has a closed-form solution:
\[ EI(x) = \begin{cases} (f_{best} - \mu(x)) \Phi\left(\frac{f_{best} - \mu(x)}{\sigma(x)}\right) + \sigma(x) \phi\left(\frac{f_{best} - \mu(x)}{\sigma(x)}\right) & \text{if } \sigma(x) > 0 \\ 0 & \text{if } \sigma(x) = 0 \end{cases} \]
where:
- \(\Phi(\cdot)\) is the cumulative distribution function (CDF) of the standard normal distribution
- \(\phi(\cdot)\) is the probability density function (PDF) of the standard normal distribution
- \(Z = \frac{f_{best} - \mu(x)}{\sigma(x)}\) is the standardized improvement
18.1.2.3 Alternative Formulation
The Expected Improvement can also be written as: \[ EI(x) = \sigma(x) \left[ Z \Phi(Z) + \phi(Z) \right] \]
where \(Z = \frac{f_{best} - \mu(x)}{\sigma(x)}\) is the standardized improvement.
18.1.3 Understanding the Components
The EI formula elegantly combines two terms:
- Exploitation Term: \((f_{best} - \mu(x)) \Phi(Z)\)
- Larger when \(\mu(x)\) is small (good predicted value)
- Weighted by the probability that \(f(x) < f_{best}\)
- Exploration Term: \(\sigma(x) \phi(Z)\)
- Larger when \(\sigma(x)\) is large (high uncertainty)
- Represents the potential for discovering unexpectedly good values
18.2 EI: Implementation in spotpython
The spotpython package implements Expected Improvement in its Kriging class. Here’s how it works in practice:
18.2.1 Key Implementation Details
Negative Expected Improvement: In optimization contexts, spotpython often returns the negative Expected Improvement because many optimization algorithms are designed to minimize rather than maximize objectives.
Logarithmic Transformation: To handle numerical issues and improve optimization stability, spotpython often works with \(\log(EI)\):
= np.log10(EITermOne + EITermTwo + self.eps) ExpImp return float(-ExpImp) # Negative for minimization
Numerical Stability: A small epsilon value (
self.eps
) is added to prevent numerical issues when EI becomes very small.
18.2.2 Code Example from the Kriging Class
def _pred(self, x: np.ndarray) -> Tuple[float, float, float]:
"""Computes Kriging prediction including Expected Improvement."""
# ... [prediction calculations] ...
# Compute Expected Improvement
if self.return_ei:
= np.min(y) # Current best observation
yBest
# First term: (f_best - mu) * Phi(Z)
= (yBest - f) * (0.5 + 0.5 * erf((1 / np.sqrt(2)) * ((yBest - f) / s)))
EITermOne
# Second term: sigma * phi(Z)
= s * (1 / np.sqrt(2 * np.pi)) * np.exp(-0.5 * ((yBest - f) ** 2 / SSqr))
EITermTwo
# Expected Improvement (in log scale)
= np.log10(EITermOne + EITermTwo + self.eps)
ExpImp
return float(f), float(s), float(-ExpImp) # Return negative EI
18.3 Practical Advantages of Expected Improvement
- Automatic Balance: EI naturally balances exploitation and exploration without requiring manual tuning of weights or parameters.
- Scale Invariance: EI is relatively invariant to the scale of the objective function.
- Theoretical Foundation: EI has strong theoretical justification from decision theory and information theory.
- Efficient Optimization: The smooth, differentiable nature of EI makes it suitable for gradient-based optimization of the acquisition function.
- Proven Performance: EI has been successfully applied across numerous domains and consistently performs well in practice.
18.4 Connection to the Hyperparameter Tuning Cookbook
In the context of hyperparameter tuning, Expected Improvement plays a crucial role in:
- Sequential Model-Based Optimization: EI guides the selection of which hyperparameter configurations to evaluate next
- Efficient Resource Utilization: By balancing exploration and exploitation, EI helps find good hyperparameters with fewer expensive model training runs
- Automated Optimization: EI provides a principled, automatic way to navigate the hyperparameter space without manual intervention
The implementation in spotpython
makes Expected Improvement accessible for practical hyperparameter optimization tasks, providing both the theoretical rigor of Bayesian optimization and the computational efficiency needed for real-world applications.
18.5 Example: Spot
and the 1-dim Sphere Function
18.5.1 The Objective Function: 1-dim Sphere
- The
spotpython
package provides several classes of objective functions. - We will use an analytical objective function, i.e., a function that can be described by a (closed) formula: \[f(x) = x^2 \]
= Analytical().fun_sphere fun
- The size of the
lower
bound vector determines the problem dimension. - Here we will use
np.array([-1])
, i.e., a one-dim function.
Similar to the one-dimensional case, which was introduced in Section Section 13.8, we can use TensorBoard to monitor the progress of the optimization. We will use the same code, only the prefix is different:
from spotpython.utils.init import fun_control_init
= "07_Y"
PREFIX = fun_control_init(
fun_control =PREFIX,
PREFIX= 25,
fun_evals = np.array([-1]),
lower = np.array([1]),
upper = np.sqrt(np.spacing(1)),)
tolerance_x = design_control_init(init_size=10) design_control
= Spot(
spot_1 =fun,
fun=fun_control,
fun_control=design_control)
design_control spot_1.run()
spotpython tuning: 4.74409224815101e-10 [####------] 44.00%
spotpython tuning: 4.74409224815101e-10 [#####-----] 48.00%
spotpython tuning: 4.74409224815101e-10 [#####-----] 52.00%
spotpython tuning: 4.74409224815101e-10 [######----] 56.00%
spotpython tuning: 1.6645032376738785e-10 [######----] 60.00%
spotpython tuning: 1.6645032376738785e-10 [######----] 64.00%
spotpython tuning: 1.6645032376738785e-10 [#######---] 68.00%
spotpython tuning: 1.6645032376738785e-10 [#######---] 72.00%
spotpython tuning: 1.6645032376738785e-10 [########--] 76.00%
spotpython tuning: 1.6645032376738785e-10 [########--] 80.00%
spotpython tuning: 1.6645032376738785e-10 [########--] 84.00%
spotpython tuning: 1.6645032376738785e-10 [#########-] 88.00%
spotpython tuning: 1.6645032376738785e-10 [#########-] 92.00%
spotpython tuning: 1.6645032376738785e-10 [##########] 96.00%
spotpython tuning: 1.6645032376738785e-10 [##########] 100.00% Done...
Experiment saved to 07_Y_res.pkl
<spotpython.spot.spot.Spot at 0x10bda6270>
18.5.2 Results
spot_1.print_results()
min y: 1.6645032376738785e-10
x0: 1.2901562842050875e-05
[['x0', np.float64(1.2901562842050875e-05)]]
=True) spot_1.plot_progress(log_y
18.6 Same, but with EI as infill_criterion
= "07_EI_ISO"
PREFIX = fun_control_init(
fun_control =PREFIX,
PREFIX= np.array([-1]),
lower = np.array([1]),
upper = 25,
fun_evals = np.sqrt(np.spacing(1)),
tolerance_x = "ei") infill_criterion
= Spot(fun=fun,
spot_1_ei =fun_control)
fun_control spot_1_ei.run()
spotpython tuning: 1.6739119739724672e-09 [####------] 44.00%
spotpython tuning: 1.6739119739724672e-09 [#####-----] 48.00%
spotpython tuning: 1.6739119739724672e-09 [#####-----] 52.00%
spotpython tuning: 1.6739119739724672e-09 [######----] 56.00%
spotpython tuning: 5.969349640837553e-12 [######----] 60.00%
spotpython tuning: 5.969349640837553e-12 [######----] 64.00%
spotpython tuning: 5.969349640837553e-12 [#######---] 68.00%
spotpython tuning: 5.969349640837553e-12 [#######---] 72.00%
spotpython tuning: 5.969349640837553e-12 [########--] 76.00%
spotpython tuning: 5.969349640837553e-12 [########--] 80.00%
spotpython tuning: 5.969349640837553e-12 [########--] 84.00%
spotpython tuning: 5.969349640837553e-12 [#########-] 88.00%
spotpython tuning: 5.969349640837553e-12 [#########-] 92.00%
spotpython tuning: 5.969349640837553e-12 [##########] 96.00%
spotpython tuning: 5.969349640837553e-12 [##########] 100.00% Done...
Experiment saved to 07_EI_ISO_res.pkl
<spotpython.spot.spot.Spot at 0x125fccad0>
=True) spot_1_ei.plot_progress(log_y
spot_1_ei.print_results()
min y: 5.969349640837553e-12
x0: 2.443225253806442e-06
[['x0', np.float64(2.443225253806442e-06)]]
18.7 Non-isotropic Kriging
= "07_EI_NONISO"
PREFIX = fun_control_init(
fun_control =PREFIX,
PREFIX= np.array([-1, -1]),
lower = np.array([1, 1]),
upper = 25,
fun_evals = np.sqrt(np.spacing(1)),
tolerance_x = "ei")
infill_criterion = surrogate_control_init(
surrogate_control =2,
n_theta="interpolation",
method )
= Spot(fun=fun,
spot_2_ei_noniso =fun_control,
fun_control=surrogate_control)
surrogate_control spot_2_ei_noniso.run()
spotpython tuning: 1.8879649092418398e-05 [####------] 44.00%
spotpython tuning: 1.8879649092418398e-05 [#####-----] 48.00%
spotpython tuning: 1.8879649092418398e-05 [#####-----] 52.00%
spotpython tuning: 1.8879649092418398e-05 [######----] 56.00%
spotpython tuning: 1.8879649092418398e-05 [######----] 60.00%
spotpython tuning: 1.8879649092418398e-05 [######----] 64.00%
spotpython tuning: 1.8879649092418398e-05 [#######---] 68.00%
spotpython tuning: 1.8879649092418398e-05 [#######---] 72.00%
spotpython tuning: 1.8879649092418398e-05 [########--] 76.00%
spotpython tuning: 1.8879649092418398e-05 [########--] 80.00%
spotpython tuning: 1.8879649092418398e-05 [########--] 84.00%
spotpython tuning: 1.8879649092418398e-05 [#########-] 88.00%
spotpython tuning: 1.8879649092418398e-05 [#########-] 92.00%
spotpython tuning: 1.8879649092418398e-05 [##########] 96.00%
spotpython tuning: 1.8879649092418398e-05 [##########] 100.00% Done...
Experiment saved to 07_EI_NONISO_res.pkl
<spotpython.spot.spot.Spot at 0x1260977a0>
=True) spot_2_ei_noniso.plot_progress(log_y
spot_2_ei_noniso.print_results()
min y: 1.8879649092418398e-05
x0: 0.0016422868343098733
x1: 0.004022753167455201
[['x0', np.float64(0.0016422868343098733)],
['x1', np.float64(0.004022753167455201)]]
spot_2_ei_noniso.surrogate.plot()
18.8 Using sklearn
Surrogates
18.8.1 The spot Loop
The spot
loop consists of the following steps:
- Init: Build initial design \(X\)
- Evaluate initial design on real objective \(f\): \(y = f(X)\)
- Build surrogate: \(S = S(X,y)\)
- Optimize on surrogate: \(X_0 = \text{optimize}(S)\)
- Evaluate on real objective: \(y_0 = f(X_0)\)
- Impute (Infill) new points: \(X = X \cup X_0\), \(y = y \cup y_0\).
- Got 3.
The spot
loop is implemented in R
as follows:
18.8.2 spot: The Initial Model
18.8.2.1 Example: Modifying the initial design size
This is the “Example: Modifying the initial design size” from Chapter 4.5.1 in [bart21i].
= Spot(fun=fun,
spot_ei =fun_control_init(
fun_control= np.array([-1,-1]),
lower = np.array([1,1])),
upper= design_control_init(init_size=5))
design_control spot_ei.run()
spotpython tuning: 0.13773784008577408 [####------] 40.00%
spotpython tuning: 0.137092032817552 [#####-----] 46.67%
spotpython tuning: 0.13507127750732323 [#####-----] 53.33%
spotpython tuning: 0.12519833727871527 [######----] 60.00%
spotpython tuning: 0.09323163938334049 [#######---] 66.67%
spotpython tuning: 0.057966805090302165 [#######---] 73.33%
spotpython tuning: 0.010203880941217082 [########--] 80.00%
spotpython tuning: 0.0030660266707283317 [#########-] 86.67%
spotpython tuning: 0.0030473908765633047 [#########-] 93.33%
spotpython tuning: 0.0030473908765633047 [##########] 100.00% Done...
Experiment saved to 000_res.pkl
<spotpython.spot.spot.Spot at 0x126267d10>
spot_ei.plot_progress()
min(spot_1.y), np.min(spot_ei.y) np.
(np.float64(1.6645032376738785e-10), np.float64(0.0030473908765633047))
18.8.3 Init: Build Initial Design
from spotpython.design.spacefilling import SpaceFilling
from spotpython.surrogate.kriging import Kriging
from spotpython.fun.objectivefunctions import Analytical
= SpaceFilling(2)
gen = np.random.RandomState(1)
rng = np.array([-5,-0])
lower = np.array([10,15])
upper = Analytical().fun_branin
fun
= gen.scipy_lhd(10, lower=lower, upper = upper)
X print(X)
= fun(X, fun_control=fun_control)
y print(y)
[[ 8.97647221 13.41926847]
[ 0.66946019 1.22344228]
[ 5.23614115 13.78185824]
[ 5.6149825 11.5851384 ]
[-1.72963184 1.66516096]
[-4.26945568 7.1325531 ]
[ 1.26363761 10.17935555]
[ 2.88779942 8.05508969]
[-3.39111089 4.15213772]
[ 7.30131231 5.22275244]]
[128.95676449 31.73474356 172.89678121 126.71295908 64.34349975
70.16178611 48.71407916 31.77322887 76.91788181 30.69410529]
= Kriging(name='kriging', seed=123)
S
S.fit(X, y) S.plot()
= SpaceFilling(2, seed=123)
gen = gen.scipy_lhd(3)
X0 = SpaceFilling(2, seed=345)
gen = gen.scipy_lhd(3)
X1 = gen.scipy_lhd(3)
X2 = SpaceFilling(2, seed=123)
gen = gen.scipy_lhd(3)
X3 X0, X1, X2, X3
(array([[0.77254938, 0.31539299],
[0.59321338, 0.93854273],
[0.27469803, 0.3959685 ]]),
array([[0.78373509, 0.86811887],
[0.06692621, 0.6058029 ],
[0.41374778, 0.00525456]]),
array([[0.121357 , 0.69043832],
[0.41906219, 0.32838498],
[0.86742658, 0.52910374]]),
array([[0.77254938, 0.31539299],
[0.59321338, 0.93854273],
[0.27469803, 0.3959685 ]]))
18.8.4 Evaluate
18.8.5 Build Surrogate
18.8.6 A Simple Predictor
The code below shows how to use a simple model for prediction.
Assume that only two (very costly) measurements are available:
- f(0) = 0.5
- f(2) = 2.5
We are interested in the value at \(x_0 = 1\), i.e., \(f(x_0 = 1)\), but cannot run an additional, third experiment.
from sklearn import linear_model
= np.array([[0], [2]])
X = np.array([0.5, 2.5])
y = linear_model.LinearRegression()
S_lm = S_lm.fit(X, y)
S_lm = np.array([[1]])
X0 = S_lm.predict(X0)
y0 print(y0)
[1.5]
- Central Idea:
- Evaluation of the surrogate model
S_lm
is much cheaper (or / and much faster) than running the real-world experiment \(f\).
- Evaluation of the surrogate model
18.9 Gaussian Processes regression: basic introductory example
This example was taken from scikit-learn. After fitting our model, we see that the hyperparameters of the kernel have been optimized. Now, we will use our kernel to compute the mean prediction of the full dataset and plot the 95% confidence interval.
import numpy as np
import matplotlib.pyplot as plt
import math as m
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
= np.linspace(start=0, stop=10, num=1_000).reshape(-1, 1)
X = np.squeeze(X * np.sin(X))
y = np.random.RandomState(1)
rng = rng.choice(np.arange(y.size), size=6, replace=False)
training_indices = X[training_indices], y[training_indices]
X_train, y_train
= 1 * RBF(length_scale=1.0, length_scale_bounds=(1e-2, 1e2))
kernel = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
gaussian_process
gaussian_process.fit(X_train, y_train)
gaussian_process.kernel_
= gaussian_process.predict(X, return_std=True)
mean_prediction, std_prediction
=r"$f(x) = x \sin(x)$", linestyle="dotted")
plt.plot(X, y, label="Observations")
plt.scatter(X_train, y_train, label="Mean prediction")
plt.plot(X, mean_prediction, label
plt.fill_between(
X.ravel(),- 1.96 * std_prediction,
mean_prediction + 1.96 * std_prediction,
mean_prediction =0.5,
alpha=r"95% confidence interval",
label
)
plt.legend()"$x$")
plt.xlabel("$f(x)$")
plt.ylabel(= plt.title("sk-learn Version: Gaussian process regression on noise-free dataset") _
from spotpython.surrogate.kriging import Kriging
import numpy as np
import matplotlib.pyplot as plt
= np.random.RandomState(1)
rng = np.linspace(start=0, stop=10, num=1_000).reshape(-1, 1)
X = np.squeeze(X * np.sin(X))
y = rng.choice(np.arange(y.size), size=6, replace=False)
training_indices = X[training_indices], y[training_indices]
X_train, y_train
= Kriging(name='kriging', seed=123, log_level=50, cod_type="norm")
S
S.fit(X_train, y_train)
= S.predict(X, return_val="all")
mean_prediction, std_prediction, ei
std_prediction
=r"$f(x) = x \sin(x)$", linestyle="dotted")
plt.plot(X, y, label="Observations")
plt.scatter(X_train, y_train, label="Mean prediction")
plt.plot(X, mean_prediction, label
plt.fill_between(
X.ravel(),- 1.96 * std_prediction,
mean_prediction + 1.96 * std_prediction,
mean_prediction =0.5,
alpha=r"95% confidence interval",
label
)
plt.legend()"$x$")
plt.xlabel("$f(x)$")
plt.ylabel(= plt.title("spotpython Version: Gaussian process regression on noise-free dataset") _
18.10 The Surrogate: Using scikit-learn models
Default is the internal kriging
surrogate.
= Kriging(name='kriging', seed=123) S_0
Models from scikit-learn
can be selected, e.g., Gaussian Process:
# Needed for the sklearn surrogates:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn import linear_model
from sklearn import tree
import pandas as pd
= 1 * RBF(length_scale=1.0, length_scale_bounds=(1e-2, 1e2))
kernel = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9) S_GP
- and many more:
= DecisionTreeRegressor(random_state=0)
S_Tree = linear_model.LinearRegression()
S_LM = linear_model.Ridge()
S_Ridge = RandomForestRegressor(max_depth=2, random_state=0) S_RF
- The scikit-learn GP model
S_GP
is selected.
= S_GP S
isinstance(S, GaussianProcessRegressor)
True
from spotpython.fun.objectivefunctions import Analytical
= Analytical().fun_branin
fun = fun_control_init(
fun_control = np.array([-5,-0]),
lower = np.array([10,15]),
upper = 15)
fun_evals = design_control_init(init_size=5)
design_control = Spot(fun=fun,
spot_GP =fun_control,
fun_control=S,
surrogate=design_control)
design_control spot_GP.run()
spotpython tuning: 24.51465459019188 [####------] 40.00%
spotpython tuning: 11.003092545432404 [#####-----] 46.67%
spotpython tuning: 11.003092545432404 [#####-----] 53.33%
spotpython tuning: 7.281405479109784 [######----] 60.00%
spotpython tuning: 7.281405479109784 [#######---] 66.67%
spotpython tuning: 7.281405479109784 [#######---] 73.33%
spotpython tuning: 2.9520033012954237 [########--] 80.00%
spotpython tuning: 2.9520033012954237 [#########-] 86.67%
spotpython tuning: 2.1049818033904044 [#########-] 93.33%
spotpython tuning: 1.9431597967021723 [##########] 100.00% Done...
Experiment saved to 000_res.pkl
<spotpython.spot.spot.Spot at 0x1263134a0>
spot_GP.y
array([ 69.32459936, 152.38491454, 107.92560483, 24.51465459,
76.73500031, 86.30426863, 11.00309255, 16.11758333,
7.28140548, 21.82343562, 10.96088904, 2.9520033 ,
3.02912616, 2.1049818 , 1.9431598 ])
spot_GP.plot_progress()
spot_GP.print_results()
min y: 1.9431597967021723
x0: 10.0
x1: 2.99858238342458
[['x0', np.float64(10.0)], ['x1', np.float64(2.99858238342458)]]
18.11 Additional Examples
# Needed for the sklearn surrogates:
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn import linear_model
from sklearn import tree
import pandas as pd
= 1 * RBF(length_scale=1.0, length_scale_bounds=(1e-2, 1e2))
kernel = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9) S_GP
from spotpython.surrogate.kriging import Kriging
import numpy as np
import spotpython
from spotpython.fun.objectivefunctions import Analytical
from spotpython.spot import Spot
= Kriging(name='kriging',
S_K =123,
seed=50,
log_level= "y",
infill_criterion =1,
n_theta="interpolation",
method="norm")
cod_type= Analytical().fun_sphere
fun
= fun_control_init(
fun_control = np.array([-1,-1]),
lower = np.array([1,1]),
upper = 25)
fun_evals
= Spot(fun=fun,
spot_S_K =fun_control,
fun_control=S_K,
surrogate=design_control,
design_control=surrogate_control)
surrogate_control spot_S_K.run()
spotpython tuning: 0.13771720249971786 [##--------] 24.00%
spotpython tuning: 0.008765811130597791 [###-------] 28.00%
spotpython tuning: 0.002838288758657914 [###-------] 32.00%
spotpython tuning: 0.0008164210951892503 [####------] 36.00%
spotpython tuning: 0.0003661048177839494 [####------] 40.00%
spotpython tuning: 0.0003589648342263893 [####------] 44.00%
spotpython tuning: 0.0003589648342263893 [#####-----] 48.00%
spotpython tuning: 0.00032902762400155227 [#####-----] 52.00%
spotpython tuning: 0.0002817371331525184 [######----] 56.00%
spotpython tuning: 0.0001682443401655298 [######----] 60.00%
spotpython tuning: 2.039354315945154e-05 [######----] 64.00%
spotpython tuning: 1.5898357927868756e-06 [#######---] 68.00%
spotpython tuning: 7.231797257673966e-07 [#######---] 72.00%
spotpython tuning: 4.7009088690905644e-07 [########--] 76.00%
spotpython tuning: 3.8991843792581266e-07 [########--] 80.00%
spotpython tuning: 3.7436106441025836e-07 [########--] 84.00%
spotpython tuning: 3.7287987551444754e-07 [#########-] 88.00%
spotpython tuning: 3.7287987551444754e-07 [#########-] 92.00%
spotpython tuning: 3.7287987551444754e-07 [##########] 96.00%
spotpython tuning: 3.7287987551444754e-07 [##########] 100.00% Done...
Experiment saved to 000_res.pkl
<spotpython.spot.spot.Spot at 0x1259a2870>
=True) spot_S_K.plot_progress(log_y
spot_S_K.surrogate.plot()
spot_S_K.print_results()
min y: 3.7287987551444754e-07
x0: -0.0006065092770223268
x1: 7.089691389829288e-05
[['x0', np.float64(-0.0006065092770223268)],
['x1', np.float64(7.089691389829288e-05)]]
18.11.1 Optimize on Surrogate
18.11.2 Evaluate on Real Objective
18.11.3 Impute / Infill new Points
18.12 Tests
import numpy as np
from spotpython.spot import Spot
from spotpython.fun.objectivefunctions import Analytical
= Analytical().fun_sphere
fun_sphere
= fun_control_init(
fun_control =np.array([-1, -1]),
lower=np.array([1, 1]),
upper= 2)
n_points = Spot(
spot_1 =fun_sphere,
fun=fun_control,
fun_control
)
# (S-2) Initial Design:
= spot_1.design.scipy_lhd(
spot_1.X "init_size"], lower=spot_1.lower, upper=spot_1.upper
spot_1.design_control[
)print(spot_1.X)
# (S-3): Eval initial design:
= spot_1.fun(spot_1.X)
spot_1.y print(spot_1.y)
spot_1.fit_surrogate()= spot_1.suggest_new_X()
X0 print(X0)
assert X0.size == spot_1.n_points * spot_1.k
[[ 0.86352963 0.7892358 ]
[-0.24407197 -0.83687436]
[ 0.36481882 0.8375811 ]
[ 0.415331 0.54468512]
[-0.56395091 -0.77797854]
[-0.90259409 -0.04899292]
[-0.16484832 0.35724741]
[ 0.05170659 0.07401196]
[-0.78548145 -0.44638164]
[ 0.64017497 -0.30363301]]
[1.36857656 0.75992983 0.83463487 0.46918172 0.92329124 0.8170764
0.15480068 0.00815134 0.81623768 0.502017 ]
[[0.03166402 0.03957873]
[0.03166914 0.03957624]]
18.13 EI: The Famous Schonlau Example
= np.array([1, 2, 3, 4, 12]).reshape(-1,1)
X_train0 = np.linspace(start=0, stop=10, num=5).reshape(-1, 1) X_train
from spotpython.surrogate.kriging import Kriging
import numpy as np
import matplotlib.pyplot as plt
= np.array([1., 2., 3., 4., 12.]).reshape(-1,1)
X_train = np.array([0., -1.75, -2, -0.5, 5.])
y_train
= Kriging(name='kriging', seed=123, log_level=50, n_theta=1, method="interpolation", cod_type="norm")
S
S.fit(X_train, y_train)
= np.linspace(start=0, stop=13, num=1000).reshape(-1, 1)
X = S.predict(X, return_val="all")
mean_prediction, std_prediction, ei
="Observations")
plt.scatter(X_train, y_train, label="Mean prediction")
plt.plot(X, mean_prediction, labelif True:
plt.fill_between(
X.ravel(),- 2 * std_prediction,
mean_prediction + 2 * std_prediction,
mean_prediction =0.5,
alpha=r"95% confidence interval",
label
)
plt.legend()"$x$")
plt.xlabel("$f(x)$")
plt.ylabel(= plt.title("Gaussian process regression on noise-free dataset") _
#plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted")
# plt.scatter(X_train, y_train, label="Observations")
-ei, label="Expected Improvement")
plt.plot(X,
plt.legend()"$x$")
plt.xlabel("$f(x)$")
plt.ylabel(= plt.title("Gaussian process regression on noise-free dataset") _
S.get_model_params()
{'log_theta_lambda': array([-0.99002527]),
'U': array([[1.00000001e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[9.02737603e-01, 4.30191626e-01, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[6.64119362e-01, 7.04830290e-01, 2.49318571e-01, 0.00000000e+00,
0.00000000e+00],
[3.98156512e-01, 7.08262302e-01, 5.57958584e-01, 1.68873137e-01,
0.00000000e+00],
[4.19706687e-06, 7.48476021e-05, 7.85849126e-04, 5.55938288e-03,
9.99984242e-01]]),
'X': array([[ 1.],
[ 2.],
[ 3.],
[ 4.],
[12.]]),
'y': array([ 0. , -1.75, -2. , -0.5 , 5. ]),
'negLnLike': np.float64(1.2078820477330403)}
18.14 EI: The Forrester Example
from spotpython.surrogate.kriging import Kriging
import numpy as np
import matplotlib.pyplot as plt
import spotpython
from spotpython.fun.objectivefunctions import Analytical
from spotpython.spot import Spot
# exact x locations are unknown:
= np.array([0.0, 0.175, 0.225, 0.3, 0.35, 0.375, 0.5,1]).reshape(-1,1)
X_train
= Analytical().fun_forrester
fun = fun_control_init(
fun_control ="07_EI_FORRESTER",
PREFIX=1.0,
sigma=123,)
seed= fun(X_train, fun_control=fun_control)
y_train
= Kriging(name='kriging', seed=123, log_level=50, n_theta=1, method="interpolation", cod_type="norm")
S
S.fit(X_train, y_train)
= np.linspace(start=0, stop=1, num=1000).reshape(-1, 1)
X = S.predict(X, return_val="all")
mean_prediction, std_prediction, ei
="Observations")
plt.scatter(X_train, y_train, label="Mean prediction")
plt.plot(X, mean_prediction, labelif True:
plt.fill_between(
X.ravel(),- 2 * std_prediction,
mean_prediction + 2 * std_prediction,
mean_prediction =0.5,
alpha=r"95% confidence interval",
label
)
plt.legend()"$x$")
plt.xlabel("$f(x)$")
plt.ylabel(= plt.title("Gaussian process regression on noise-free dataset") _
#plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted")
# plt.scatter(X_train, y_train, label="Observations")
-ei, label="Expected Improvement")
plt.plot(X,
plt.legend()"$x$")
plt.xlabel("$f(x)$")
plt.ylabel(= plt.title("Gaussian process regression on noise-free dataset") _
18.15 Noise
import numpy as np
import spotpython
from spotpython.fun.objectivefunctions import Analytical
from spotpython.spot import Spot
from spotpython.design.spacefilling import SpaceFilling
from spotpython.surrogate.kriging import Kriging
import matplotlib.pyplot as plt
= SpaceFilling(1)
gen = np.random.RandomState(1)
rng = np.array([-10])
lower = np.array([10])
upper = Analytical().fun_sphere
fun = fun_control_init(
fun_control ="07_Y",
PREFIX=2.0,
sigma=123,)
seed= gen.scipy_lhd(10, lower=lower, upper = upper)
X print(X)
= fun(X, fun_control=fun_control)
y print(y)
y.shape= X.reshape(-1,1)
X_train = y
y_train
= Kriging(name='kriging',
S =123,
seed=50,
log_level=1,
n_theta="interpolation")
method
S.fit(X_train, y_train)
= np.linspace(start=-13, stop=13, num=1000).reshape(-1, 1)
X_axis = S.predict(X_axis, return_val="all")
mean_prediction, std_prediction, ei
#plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted")
="Observations")
plt.scatter(X_train, y_train, label#plt.plot(X, ei, label="Expected Improvement")
="mue")
plt.plot(X_axis, mean_prediction, label
plt.legend()"$x$")
plt.xlabel("$f(x)$")
plt.ylabel(= plt.title("Sphere: Gaussian process regression on noisy dataset") _
[[ 0.63529627]
[-4.10764204]
[-0.44071975]
[ 9.63125638]
[-8.3518118 ]
[-3.62418901]
[ 4.15331 ]
[ 3.4468512 ]
[ 6.36049088]
[-7.77978539]]
[-1.57464135 16.13714981 2.77008442 93.14904827 71.59322218 14.28895359
15.9770567 12.96468767 39.82265329 59.88028242]
S.get_model_params()
{'log_theta_lambda': array([-1.10547476]),
'U': array([[ 1.00000001e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 1.71273420e-01, 9.85223543e-01, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 9.13185648e-01, 1.94770737e-01, 3.57989311e-01,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 1.75066965e-03, -3.03963173e-04, -3.32220779e-03,
9.99992910e-01, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 1.77266598e-03, 2.46779757e-01, -1.18173383e-01,
-3.20690193e-04, 9.61837602e-01, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 2.40962648e-01, 9.54670161e-01, 1.27460012e-01,
2.92823322e-04, -4.96183483e-02, 1.08783176e-01,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 3.78787902e-01, -6.10436927e-02, -3.99469260e-01,
9.30038103e-02, -3.40797821e-02, 2.28886571e-01,
7.94366109e-01, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 5.37923928e-01, -8.19698319e-02, -4.73894997e-01,
4.72464311e-02, -3.81494553e-02, 2.47600403e-01,
6.30909812e-01, 1.27677658e-01, 0.00000000e+00,
0.00000000e+00],
[ 7.64573844e-02, -1.31037818e-02, -1.13704605e-01,
4.31578080e-01, -1.06049066e-02, 7.65591659e-02,
6.91377243e-01, -4.55944025e-01, 3.20831704e-01,
0.00000000e+00],
[ 3.87015427e-03, 3.51787204e-01, -1.60406611e-01,
-4.32752122e-04, 9.03358179e-01, -1.23536920e-01,
1.89427140e-02, 3.06145331e-02, 1.92052594e-02,
1.32355746e-01]]),
'X': array([[ 0.63529627],
[-4.10764204],
[-0.44071975],
[ 9.63125638],
[-8.3518118 ],
[-3.62418901],
[ 4.15331 ],
[ 3.4468512 ],
[ 6.36049088],
[-7.77978539]]),
'y': array([-1.57464135, 16.13714981, 2.77008442, 93.14904827, 71.59322218,
14.28895359, 15.9770567 , 12.96468767, 39.82265329, 59.88028242]),
'negLnLike': np.float64(26.185053861403652)}
= Kriging(name='kriging',
S =123,
seed=50,
log_level=1,
n_theta="regression")
method
S.fit(X_train, y_train)
= np.linspace(start=-13, stop=13, num=1000).reshape(-1, 1)
X_axis = S.predict(X_axis, return_val="all")
mean_prediction, std_prediction, ei
#plt.plot(X, y, label=r"$f(x) = x \sin(x)$", linestyle="dotted")
="Observations")
plt.scatter(X_train, y_train, label#plt.plot(X, ei, label="Expected Improvement")
="mue")
plt.plot(X_axis, mean_prediction, label
plt.legend()"$x$")
plt.xlabel("$f(x)$")
plt.ylabel(= plt.title("Sphere: Gaussian process regression with nugget on noisy dataset") _
S.get_model_params()
{'log_theta_lambda': array([-2.96944858, -4.36747214]),
'U': array([[ 1.00002145e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 9.76133029e-01, 2.17272217e-01, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 9.98737153e-01, 4.96011067e-02, 1.03313745e-02,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 9.16817553e-01, -3.60197553e-01, -8.88468998e-02,
1.47825687e-01, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 9.16974148e-01, 3.94762935e-01, -3.27890052e-02,
3.66328503e-02, 3.07645906e-02, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 9.80701686e-01, 1.95395269e-01, 2.97962139e-03,
-1.95305835e-03, 2.02283513e-03, 8.42704730e-03,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 9.86788177e-01, -1.55737151e-01, -1.99497526e-02,
3.88600274e-02, 2.80752262e-03, -2.58965848e-03,
1.07358109e-02, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00],
[ 9.91533639e-01, -1.25471967e-01, -1.37303482e-02,
2.92651839e-02, 2.04205703e-03, -2.17491982e-03,
6.31892482e-03, 8.18125303e-03, 0.00000000e+00,
0.00000000e+00],
[ 9.65423727e-01, -2.45324121e-01, -4.38018569e-02,
7.58583497e-02, 4.13729684e-03, -2.99254932e-03,
6.62089183e-03, 2.90075086e-03, 8.03708950e-03,
0.00000000e+00],
[ 9.26819918e-01, 3.72515229e-01, -2.67187334e-02,
3.00084948e-02, 2.43303379e-02, 1.24104117e-03,
-3.28660593e-04, -2.24039384e-04, 1.35180886e-04,
8.48929786e-03]]),
'X': array([[ 0.63529627],
[-4.10764204],
[-0.44071975],
[ 9.63125638],
[-8.3518118 ],
[-3.62418901],
[ 4.15331 ],
[ 3.4468512 ],
[ 6.36049088],
[-7.77978539]]),
'y': array([-1.57464135, 16.13714981, 2.77008442, 93.14904827, 71.59322218,
14.28895359, 15.9770567 , 12.96468767, 39.82265329, 59.88028242]),
'negLnLike': np.float64(21.820591738048208)}
18.16 Cubic Function
import numpy as np
import spotpython
from spotpython.fun.objectivefunctions import Analytical
from spotpython.spot import Spot
from spotpython.design.spacefilling import SpaceFilling
from spotpython.surrogate.kriging import Kriging
import matplotlib.pyplot as plt
= SpaceFilling(1)
gen = np.random.RandomState(1)
rng = np.array([-10])
lower = np.array([10])
upper = Analytical().fun_cubed
fun = fun_control_init(
fun_control ="07_Y",
PREFIX=10.0,
sigma=123,)
seed
= gen.scipy_lhd(10, lower=lower, upper = upper)
X print(X)
= fun(X, fun_control=fun_control)
y print(y)
y.shape= X.reshape(-1,1)
X_train = y
y_train
= Kriging(name='kriging', seed=123, log_level=50, n_theta=1, method="interpolation")
S
S.fit(X_train, y_train)
= np.linspace(start=-13, stop=13, num=1000).reshape(-1, 1)
X_axis = S.predict(X_axis, return_val="all")
mean_prediction, std_prediction, ei
="Observations")
plt.scatter(X_train, y_train, label#plt.plot(X, ei, label="Expected Improvement")
="mue")
plt.plot(X_axis, mean_prediction, label
plt.legend()"$x$")
plt.xlabel("$f(x)$")
plt.ylabel(= plt.title("Cubed: Gaussian process regression on noisy dataset") _
[[ 0.63529627]
[-4.10764204]
[-0.44071975]
[ 9.63125638]
[-8.3518118 ]
[-3.62418901]
[ 4.15331 ]
[ 3.4468512 ]
[ 6.36049088]
[-7.77978539]]
[ -9.63480707 -72.98497325 12.7936499 895.34567477 -573.35961837
-41.83176425 65.27989461 46.37081417 254.1530734 -474.09587355]
= Kriging(name='kriging', seed=123, log_level=0, n_theta=1, method="regression")
S
S.fit(X_train, y_train)
= np.linspace(start=-13, stop=13, num=1000).reshape(-1, 1)
X_axis = S.predict(X_axis, return_val="all")
mean_prediction, std_prediction, ei
="Observations")
plt.scatter(X_train, y_train, label#plt.plot(X, ei, label="Expected Improvement")
="mue")
plt.plot(X_axis, mean_prediction, label
plt.legend()"$x$")
plt.xlabel("$f(x)$")
plt.ylabel(= plt.title("Cubed: Gaussian process with nugget regression on noisy dataset") _
import numpy as np
import spotpython
from spotpython.fun.objectivefunctions import Analytical
from spotpython.spot import Spot
from spotpython.design.spacefilling import SpaceFilling
from spotpython.surrogate.kriging import Kriging
import matplotlib.pyplot as plt
= SpaceFilling(1)
gen = np.random.RandomState(1)
rng = np.array([-10])
lower = np.array([10])
upper = Analytical().fun_runge
fun = fun_control_init(
fun_control ="07_Y",
PREFIX=0.25,
sigma=123,)
seed
= gen.scipy_lhd(10, lower=lower, upper = upper)
X print(X)
= fun(X, fun_control=fun_control)
y print(y)
y.shape= X.reshape(-1,1)
X_train = y
y_train
= Kriging(name='kriging', seed=123, log_level=50, n_theta=1, method="interpolation")
S
S.fit(X_train, y_train)
= np.linspace(start=-13, stop=13, num=1000).reshape(-1, 1)
X_axis = S.predict(X_axis, return_val="all")
mean_prediction, std_prediction, ei
="Observations")
plt.scatter(X_train, y_train, label#plt.plot(X, ei, label="Expected Improvement")
="mue")
plt.plot(X_axis, mean_prediction, label
plt.legend()"$x$")
plt.xlabel("$f(x)$")
plt.ylabel(= plt.title("Gaussian process regression on noisy dataset") _
[[ 0.63529627]
[-4.10764204]
[-0.44071975]
[ 9.63125638]
[-8.3518118 ]
[-3.62418901]
[ 4.15331 ]
[ 3.4468512 ]
[ 6.36049088]
[-7.77978539]]
[ 0.46517267 -0.03599548 1.15933822 0.05915901 0.24419145 0.21502359
-0.10432134 0.21312309 -0.05502681 -0.06434374]
= Kriging(name='kriging',
S =123,
seed=50,
log_level=1,
n_theta="regression")
method
S.fit(X_train, y_train)
= np.linspace(start=-13, stop=13, num=1000).reshape(-1, 1)
X_axis = S.predict(X_axis, return_val="all")
mean_prediction, std_prediction, ei
="Observations")
plt.scatter(X_train, y_train, label#plt.plot(X, ei, label="Expected Improvement")
="mue")
plt.plot(X_axis, mean_prediction, label
plt.legend()"$x$")
plt.xlabel("$f(x)$")
plt.ylabel(= plt.title("Gaussian process regression with nugget on noisy dataset") _
18.17 Modifying Lambda Search Space
= Kriging(name='kriging',
S =123,
seed=50,
log_level=1,
n_theta="regression",
method=0.1,
min_Lambda=10)
max_Lambda
S.fit(X_train, y_train)
print(f"Lambda: {S.Lambda}")
Lambda: [0.1]
= np.linspace(start=-13, stop=13, num=1000).reshape(-1, 1)
X_axis = S.predict(X_axis, return_val="all")
mean_prediction, std_prediction, ei
="Observations")
plt.scatter(X_train, y_train, label#plt.plot(X, ei, label="Expected Improvement")
="mue")
plt.plot(X_axis, mean_prediction, label
plt.legend()"$x$")
plt.xlabel("$f(x)$")
plt.ylabel(= plt.title("Gaussian process regression with nugget on noisy dataset. Modified Lambda search space.") _