from spotpython.design.spacefilling import SpaceFilling
from spotpython.surrogate.kriging import Kriging
from spotpython.fun.objectivefunctions import Analytical
import numpy as np
22 Factorial Variables
Until now, we have considered continuous variables. However, in many applications, the variables are not continuous, but rather discrete or categorical. For example, the number of layers in a neural network, the number of trees in a random forest, or the type of kernel in a support vector machine are all discrete variables. In the following, we will consider a simple example with two numerical variables and one categorical variable.
First, we generate the test data set for fitting the Kriging model. We use the SpaceFilling
class to generate the first two diemnsion of \(n=30\) design points. The third dimension is a categorical variable, which can take the values \(0\), \(1\), or \(2\).
= SpaceFilling(2)
gen = 30
n = np.random.RandomState(1)
rng = np.array([-5,-0])
lower = np.array([10,15])
upper = Analytical().fun_branin
fun_orig = Analytical().fun_branin_factor
fun
= gen.scipy_lhd(n, lower=lower, upper = upper)
X0 = np.random.randint(low=0, high=3, size=(n,))
X1 = np.c_[X0, X1]
X print(X[:5,:])
[[-2.84117593 5.97308949 2. ]
[-3.61017994 6.90781409 0. ]
[ 9.91204705 5.09395275 0. ]
[-4.4616725 1.3617128 2. ]
[-2.40987728 8.05505365 1. ]]
The objective function is the fun_branin_factor
in the analytical
class [SOURCE]. It calculates the Branin function of \((x_1, x_2)\) with an additional factor based on the value of \(x_3\). If \(x_3 = 1\), the value of the Branin function is increased by 10. If \(x_3 = 2\), the value of the Branin function is decreased by 10. Otherwise, the value of the Branin function is not changed.
= fun(X)
y = fun_orig(X0)
y_orig = np.c_[X, y_orig, y]
data print(data[:5,:])
[[ -2.84117593 5.97308949 2. 32.09388125 22.09388125]
[ -3.61017994 6.90781409 0. 43.965223 43.965223 ]
[ 9.91204705 5.09395275 0. 6.25588575 6.25588575]
[ -4.4616725 1.3617128 2. 212.41884106 202.41884106]
[ -2.40987728 8.05505365 1. 9.25981051 19.25981051]]
We fit two Kriging models, one with three numerical variables and one with two numerical variables and one categorical variable. We then compare the predictions of the two models.
= Kriging(name='kriging', seed=123, log_level=50, method="interpolation", var_type=["num", "num", "num"])
S
S.fit(X, y)= Kriging(name='kriging', seed=123, log_level=50, method="interpolation", var_type=["num", "num", "factor"])
Sf Sf.fit(X, y)
Anisotropic model: n_theta set to 3
Anisotropic model: n_theta set to 3
Kriging(eps=np.float64(1.4901161193847656e-08), method='interpolation', model_fun_evals=100, model_optimizer=<function differential_evolution at 0x1109263e0>, name='kriging', seed=123, var_type=['num', 'num', 'factor'])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Parameters
eps | np.float64(1....193847656e-08) | |
penalty | 10000.0 | |
method | 'interpolation' | |
noise | False | |
var_type | ['num', 'num', ...] | |
name | 'kriging' | |
seed | 123 | |
model_optimizer | <function dif...t 0x1109263e0> | |
model_fun_evals | 100 | |
min_theta | -3.0 | |
max_theta | 2.0 | |
theta_init_zero | False | |
p_val | 2.0 | |
n_p | 1 | |
optim_p | False | |
min_p | 1.0 | |
max_p | 2.0 | |
min_Lambda | -9.0 | |
max_Lambda | 0.0 | |
log_level | 50 | |
spot_writer | None | |
counter | None | |
metric_factorial | 'canberra' | |
isotropic | False |
We can now compare the predictions of the two models. We generate a new test data set and calculate the sum of the absolute differences between the predictions of the two models and the true values of the objective function. If the categorical variable is important, the sum of the absolute differences should be smaller than if the categorical variable is not important.
= 100
n = 100
k = np.zeros(n*k)
y_true = np.zeros(n*k)
y_pred= np.zeros(n*k)
y_factor_predfor i in range(k):
= gen.scipy_lhd(n, lower=lower, upper = upper)
X0 = np.random.randint(low=0, high=3, size=(n,))
X1 = np.c_[X0, X1]
X = i*n
a = (i+1)*n
b = fun(X)
y_true[a:b] = S.predict(X)
y_pred[a:b] = Sf.predict(X) y_factor_pred[a:b]
import pandas as pd
= pd.DataFrame({"y":y_true, "Prediction":y_pred, "Prediction_factor":y_factor_pred})
df df.head()
y | Prediction | Prediction_factor | |
---|---|---|---|
0 | 16.684749 | 14.065908 | 14.857172 |
1 | 105.865258 | 105.792426 | 105.710852 |
2 | 49.811774 | 49.251145 | 49.937606 |
3 | 18.177150 | 18.152566 | 18.621294 |
4 | 10.968377 | -2.918720 | 2.995571 |
df.tail()
y | Prediction | Prediction_factor | |
---|---|---|---|
9995 | 93.620503 | 93.935619 | 93.764760 |
9996 | 86.187178 | 96.735830 | 87.559558 |
9997 | 29.494401 | 30.541060 | 30.166253 |
9998 | 25.390268 | 26.009600 | 28.823019 |
9999 | 16.261264 | 15.555043 | 16.906403 |
=np.sum(np.abs(y_pred - y_true))
s=np.sum(np.abs(y_factor_pred - y_true))
sf= (sf - s)
res print(res)
-25450.289482965745
from spotpython.plot.validation import plot_actual_vs_predicted
=df["y"], y_pred=df["Prediction"], title="Default")
plot_actual_vs_predicted(y_test=df["y"], y_pred=df["Prediction_factor"], title="Factor") plot_actual_vs_predicted(y_test
22.1 Jupyter Notebook
- The Jupyter-Notebook of this lecture is available on GitHub in the Hyperparameter-Tuning-Cookbook Repository