1  Introduction: Optimization

1.1 Optimization, Simulation, and Surrogate Modeling

  • We will consider the interplay between
    • mathematical models,
    • numerical approximation,
    • simulation,
    • computer experiments, and
    • field data
  • Experimental design will play a key role in our developments, but not in the classical regression and response surface methodology sense
  • Challenging real-data/real-simulation examples benefiting from modern surrogate modeling methodology
  • We will consider the classical, response surface methodology (RSM) approach, and then move on to more modern approaches
  • All approaches are based on surrogates

1.2 Surrogates

  • Gathering data is expensive, and sometimes getting exactly the data you want is impossible or unethical
  • Surrogate: substitute for the real thing
  • In statistics, draws from predictive equations derived from a fitted model can act as a surrogate for the data-generating mechanism
  • Benefits of the surrogate approach:
    • Surrogate could represent a cheaper way to explore relationships, and entertain “what ifs?”
    • Surrogates favor faithful yet pragmatic reproduction of dynamics:
      • interpretation,
      • establishing causality, or
      • identification
    • Many numerical simulators are deterministic, whereas field observations are noisy or have measurement error

1.2.1 Costs of Simulation

  • Computer simulations are generally cheaper (but not always!) than physical observation
  • Some computer simulations can be just as expensive as field experimentation, but computer modeling is regarded as easier because:
    • the experimental apparatus is better understood
    • more aspects may be controlled.

1.2.2 Mathematical Models and Meta-Models

  • Use of mathematical models leveraging numerical solvers has been commonplace for some time
  • Mathematical models became more complex, requiring more resources to simulate/solve numerically
  • Practitioners increasingly relied on meta-models built off of limited simulation campaigns

1.2.3 Surrogates = Trained Meta-models

  • Data collected via expensive computer evaluations tuned flexible functional forms that could be used in lieu of further simulation to
    • save money or computational resources;
    • cope with an inability to perform future runs (expired licenses, off-line or over-impacted supercomputers)
  • Trained meta-models became known as surrogates

1.2.4 Computer Experiments

  • Computer experiment: design, running, and fitting meta-models.
    • Like an ordinary statistical experiment, except the data are generated by computer codes rather than physical or field observations, or surveys
  • Surrogate modeling is statistical modeling of computer experiments

1.2.5 Limits of Mathematical Modeling

  • Mathematical biologists, economists and others had reached the limit of equilibrium-based mathematical modeling with cute closed-form solutions
  • Stochastic simulations replace deterministic solvers based on FEM, Navier–Stokes or Euler methods
  • Agent-based simulation models are used to explore predator-prey (Lotka–Voltera) dynamics, spread of disease, management of inventory or patients in health insurance markets
  • Consequence: the distinction between surrogate and statistical model is all but gone

1.2.6 Example: Why Computer Simulations are Necessary

  • You can’t seed a real community with Ebola and watch what happens
  • If there’s (real) field data, say on a historical epidemic, further experimentation may be almost entirely limited to the mathematical and computer modeling side
  • Classical statistical methods offer little guidance

1.2.7 Simulation Requirements

  • Simulation should
    • enable rich diagnostics to help criticize that models
    • understanding its sensitivity to inputs and other configurations
    • providing the ability to optimize and
    • refine both automatically and with expert intervention
  • And it has to do all that while remaining computationally tractable
  • One perspective is so-called response surface methods (RSMs):
  • a poster child from industrial statistics’ heyday, well before information technology became a dominant industry

1.3 Applications of Surrogate Models

The four most common usages of surrogate models are:

  • Augmenting Expensive Simulations: Surrogate models act as a ‘curve fit’ to approximate the results of expensive simulation codes, enabling predictions without rerunning the primary source. This provides significant speed improvements while maintaining useful accuracy.
  • Calibration of Predictive Codes: Surrogates bridge the gap between simpler, faster but less accurate models and more accurate, slower models. This multi-fidelity approach allows for improved accuracy without the full computational expense.
  • Handling Noisy or Missing Data: Surrogates smooth out random or systematic errors in experimental or computational data, filling gaps and revealing overall trends while filtering out extraneous details.
  • Data Mining and Insight Generation: Surrogates help identify functional relationships between variables and their impact on results. They enable engineers to focus on critical variables and visualize data trends effectively.

1.4 The Curse of Dimensionality

The “curse of dimensionality” refers to the exponential increase in computational complexity and data requirements as the number of dimensions (variables) in a problem grows. In high-dimensional spaces, the amount of data needed to maintain the same level of accuracy or coverage increases dramatically. For example, if a one-dimensional space requires \(n\) samples for a certain accuracy, a \(k\)-dimensional space would require \(n^k\) samples. This makes tasks like optimization, sampling, and modeling computationally expensive and often impractical in high-dimensional settings.

1.5 Updating a Surrogate Model

A surrogate model is updated by incorporating new data points, known as infill points, into the model to improve its accuracy and predictive capabilities. This process is iterative and involves the following steps:

  • Identify Regions of Interest: The surrogate model is analyzed to determine areas where it is inaccurate or where further exploration is needed. This could be regions with high uncertainty or areas where the model predicts promising results (e.g., potential optima).
  • Select Infill Points: Infill points are new data points chosen based on specific criteria, such as:
  • Exploitation: Sampling near predicted optima to refine the solution. Exploration: Sampling in regions of high uncertainty to improve the model globally. Balanced Approach: Combining exploitation and exploration to ensure both local and global improvements.
  • Evaluate the True Function: The true function (e.g., a simulation or experiment) is evaluated at the selected infill points to obtain their corresponding outputs.
  • Update the Surrogate Model: The surrogate model is retrained or updated using the new data, including the infill points, to improve its accuracy.
  • Repeat: The process is repeated until the model meets predefined accuracy criteria or the computational budget is exhausted.

Definition 1.1 (Infill Points) Infill points are strategically chosen new data points added to the surrogate model. They are selected to:

  • Reduce uncertainty in the model.
  • Improve predictions in regions of interest.
  • Enhance the model’s ability to identify optima or trends.

The selection of infill points is often guided by infill criteria, such as:

  • Expected Improvement (EI): Maximizing the expected improvement over the current best solution.
  • Uncertainty Reduction: Sampling where the model’s predictions have high variance.
  • Probability of Improvement (PI): Sampling where the probability of improving the current best solution is highest.

The iterative infill-points updating process ensures that the surrogate model becomes increasingly accurate and useful for optimization or decision-making tasks.

1.6 Sampling Plans

1.6.1 Ideas and Concepts

Engineering design often requires the construction of a surrogate model \(\hat{f}\) to approximate the expensive response of a black-box function \(f\). The function \(f(x)\) represents a continuous metric (e.g., quality, cost, or performance) defined over a design space \(D \subset \mathbb{R}^k\), where \(x\) is a \(k\)-dimensional vector of design variables. Since evaluating \(f\) is costly, only a sparse set of samples is used to construct \(\hat{f}\), which can then provide inexpensive predictions for any \(x \in D\).

The process involves:

  • Sampling discrete observations:
  • Using these samples to construct an approximation \(\hat{f}\).
  • Ensuring the surrogate model is well-posed, meaning it is mathematically valid and can generalize predictions effectively.

A sampling plan

\[ X = \left\{ x^{(i)} \in D | i = 1, \ldots, n \right\} \]

determines the spatial arrangement of observations. While some models require a minimum number of data points \(n\), once this threshold is met, a surrogate model can be constructed to approximate \(f\) efficiently.

A well-posed model does not always perform well because its ability to generalize depends heavily on the sampling plan used to collect data. If the sampling plan is poorly designed, the model may fail to capture critical behaviors in the design space. For example:

  • Extreme Sampling: Measuring performance only at the extreme values of parameters may miss important behaviors in the center of the design space, leading to incomplete understanding.
  • Uneven Sampling: Concentrating samples in certain regions while neglecting others forces the model to extrapolate over unsampled areas, potentially resulting in inaccurate or misleading predictions. Additionally, in some cases, the data may come from external sources or be limited in scope, leaving little control over the sampling plan. This can further restrict the model’s ability to generalize effectively.

1.6.2 The ‘Curse of Dimensionality’ and How to Avoid It

The curse of dimensionality refers to the exponential increase in computational effort as the number of design variables grows. For a one-dimensional space, sampling \(n\) locations may suffice for accurate predictions, but for a \(k\)-dimensional space, the required number of observations increases to \(n^k\). This makes high-dimensional problems computationally expensive and often impractical.

Example 1.1 (Example: Curse of Dimensionality) Consider a simple example where we want to model the cost of a car tire based on its wheel diameter. If we have one variable (wheel diameter), we might need 10 simulations to get a good estimate of the cost. Now, if we add 8 more variables (e.g., tread pattern, rubber type, etc.), the number of simulations required increases to \(10^8\) (10 million). This is because the number of combinations of design variables grows exponentially with the number of dimensions. This means that the computational budget required to evaluate all combinations of design variables becomes infeasible. In this case, it would take 11,416 years to complete the simulations, making it impractical to explore the design space fully.

1.6.3 Physical versus Computational Experiments

Physical experiments are prone to experimental errors from three main sources:

  • Human error: Mistakes made by the experimenter.
  • Random error: Measurement inaccuracies that vary unpredictably.
  • Systematic error: Consistent bias due to flaws in the experimental setup.

The key distinction is repeatability: systematic errors remain constant across repetitions, while random errors vary.

Computational experiments, on the other hand, are deterministic and free from random errors. However, they are still affected by:

  • Human error: Bugs in code or incorrect boundary conditions.
  • Systematic error: Biases from model simplifications (e.g., inviscid flow approximations) or finite resolution (e.g., insufficient mesh resolution).

The term “noise” is used differently in physical and computational contexts. In physical experiments, it refers to random errors, while in computational experiments, it often refers to systematic errors.

Understanding these differences is crucial for designing experiments and applying techniques like Gaussian process-based approximations. For physical experiments, replication mitigates random errors, but this is unnecessary for deterministic computational experiments.

1.6.4 Designing Preliminary Experiments (Screening)

Minimizing the number of design variables \(x_1, x_2, \dots, x_k\) is crucial before modeling the objective function \(f\). This process, called screening, aims to reduce dimensionality without compromising the analysis. If \(f\) is at least once differentiable over the design domain \(D\), the partial derivative \(\frac{\partial f}{\partial x_i}\) can be used to classify variables:

  • Negligible Variables: If \(\frac{\partial f}{\partial x_i} = 0, \, \forall x \in D\), the variable \(x_i\) can be safely neglected.
  • Linear Additive Variables: If \(\frac{\partial f}{\partial x_i} = \text{constant} \neq 0, \, \forall x \in D\), the effect of \(x_i\) is linear and additive.
  • Nonlinear Variables: If \(\frac{\partial f}{\partial x_i} = g(x_i), \, \forall x \in D\), where \(g(x_i)\) is a non-constant function, \(f\) is nonlinear in \(x_i\).
  • Interactive Nonlinear Variables: If \(\frac{\partial f}{\partial x_i} = g(x_i, x_j, \dots), /, \forall x \in D\), where \(g(x_i, x_j, \dots)\) is a function involving interactions with other variables, \(f\) is nonlinear in \(x_i\) and interacts with \(x_j\).

Measuring \(\frac{\partial f}{\partial x_i}\) across the entire design space is often infeasible due to limited budgets. The percentage of time allocated to screening depends on the problem: If many variables are expected to be inactive, thorough screening can significantly improve model accuracy by reducing dimensionality. If most variables are believed to impact the objective, focus should shift to modeling instead. Screening is a trade-off between computational cost and model accuracy, and its effectiveness depends on the specific problem context.

1.6.4.1 Estimating the Distribution of Elementary Effects

In order to simplify the presentation of what follows, we make, without loss of generality, the assumption that the design space \(D = [0, 1]^k\); that is, we normalize all variables into the unit cube. We shall adhere to this convention for the rest of the book and strongly urge the reader to do likewise when implementing any algorithms described here, as this step not only yields clearer mathematics in some cases but also safeguards against scaling issues.

Before proceeding with the description of the Morris algorithm, we need to define an important statistical concept. Let us restrict our design space \(D\) to a \(k\)-dimensional, \(p\)-level full factorial grid, that is,

\[ x_i \in \{0, \frac{1}{p-1}, \frac{2}{p-1}, \dots, 1\}, \quad \text{ for } i = 1, \dots, k. \]

For a given baseline value \(x \in D\), let \(d_i(x)\) denote the elementary effect of \(x_i\), where:

\[ d_i(x) = \frac{f(x_1, \dots, x_i + \Delta, \dots, x_k) - f(x_1, \dots, x_i - \Delta, \dots, x_k)}{2\Delta}, \quad i = 1, \dots, k, \tag{1.1}\] where \(\Delta\) is the step size, which is defined as the distance between two adjacent levels in the grid. In other words, we have:

with \[\Delta = \frac{\xi}{p-1}, \quad \xi \in \mathbb{N}^*, \quad \text{and} \quad x \in D , \text{ such that its components } x_i \leq 1 - \Delta. \]

\(\Delta\) is the step size. The elementary effect \(d_i(x)\) measures the sensitivity of the function \(f\) to changes in the variable \(x_i\) at the point \(x\).

Morris’s method aims to estimate the parameters of the distribution of elementary effects associated with each variable. A large measure of central tendency indicates that a variable has a significant influence on the objective function across the design space, while a large measure of spread suggests that the variable is involved in interactions or contributes to the nonlinearity of \(f\). In practice, the sample mean and standard deviation of a set of \(d_i(x)\) values, calculated in different parts of the design space, are used for this estimation.

To ensure efficiency, the preliminary sampling plan \(X\) should be designed so that each evaluation of the objective function \(f\) contributes to the calculation of two elementary effects, rather than just one (as would occur with a naive random spread of baseline \(x\) values and adding \(\Delta\) to one variable). Additionally, the sampling plan should provide a specified number (e.g., \(r\)) of elementary effects for each variable, independently drawn with replacement. For a detailed discussion on constructing such a sampling plan, readers are encouraged to consult Morris’s original paper (Morris, 1991). Here, we focus on describing the process itself.

The random orientation of the sampling plan \(B\) can be constructed as follows: * Let \(B\) be a \((k+1) \times k\) matrix of 0s and 1s, where for each column \(i\), two rows differ only in their \(i\)-th entries. * Compute a random orientation of \(B\), denoted \(B^*\):

\[ B^* = \left( 1_{k+1,k} x^* + (\Delta/2) \left[ (2B-1_{k+1,k}) D^* + 1_{k+1,k} \right] \right) P^*, \]

where:

  • \(D^*\) is a \(k\)-dimensional diagonal matrix with diagonal elements \(\pm 1\) (equal probability),
  • \(\mathbf{1}\) is a matrix of 1s,
  • \(x^*\) is a randomly chosen point in the \(p\)-level design space (limited by \(\Delta\)),
  • \(P^*\) is a \(k \times k\) random permutation matrix with one 1 per column and row.

spotpython provides a Python implementation to compute \(B^*\), see https://github.com/sequential-parameter-optimization/spotPython/blob/main/src/spotpython/utils/effects.py.

Here is the corresponding code:

import numpy as np
import pandas as pd

def randorient(k, p, xi, seed=None):
    # Initialize random number generator with the provided seed
    if seed is not None:
        rng = np.random.default_rng(seed)
    else:
        rng = np.random.default_rng()

    # Step length
    Delta = xi / (p - 1)

    m = k + 1

    # A truncated p-level grid in one dimension
    xs = np.arange(0, 1 - Delta, 1 / (p - 1))
    xsl = len(xs)
    if xsl < 1:
        print(f"xi = {xi}.")
        print(f"p = {p}.")
        print(f"Delta = {Delta}.")
        print(f"p - 1 = {p - 1}.")
        raise ValueError(f"The number of levels xsl is {xsl}, but it must be greater than 0.")

    # Basic sampling matrix
    B = np.vstack((np.zeros((1, k)), np.tril(np.ones((k, k)))))

    # Randomization

    # Matrix with +1s and -1s on the diagonal with equal probability
    Dstar = np.diag(2 * rng.integers(0, 2, size=k) - 1)

    # Random base value
    xstar = xs[rng.integers(0, xsl, size=k)]

    # Permutation matrix
    Pstar = np.zeros((k, k))
    rp = rng.permutation(k)
    for i in range(k):
        Pstar[i, rp[i]] = 1

    # A random orientation of the sampling matrix
    Bstar = (np.ones((m, 1)) @ xstar.reshape(1, -1) + (Delta / 2) * ((2 * B - np.ones((m, k))) @ Dstar + np.ones((m, k)))) @ Pstar

    return Bstar

The code following snippet generates a random orientation of a sampling matrix Bstar using the randorient() function. The input parameters are:

  • k = 3: The number of design variables (dimensions).
  • p = 3: The number of levels in the grid for each variable.
  • xi = 1: A parameter used to calculate the step size Delta.

Step-size calculation is performed as follows: Delta = xi / (p - 1) = 1 / (3 - 1) = 0.5, which determines the spacing between levels in the grid.

Next, random sampling matrix construction is computed:

  • A truncated grid is created with levels [0, 0.5] (based on Delta).
  • A basic sampling matrix B is constructed, which is a lower triangular matrix with 0s and 1s.

Then, randomization is applied:

  • Dstar: A diagonal matrix with random entries of +1 or -1.
  • xstar: A random starting point from the grid.
  • Pstar: A random permutation matrix.

Random orientation is applied to the basic sampling matrix B to create Bstar. This involves scaling, shifting, and permuting the rows and columns of B.

The final output is the matrix Bstar, which represents a random orientation of the sampling plan. Each row corresponds to a sampled point in the design space, and each column corresponds to a design variable.

# Example usage
k = 3
p = 3
xi = 1
Bstar = randorient(k, p, xi)
print(f"Random orientation of the sampling matrix:\n{Bstar}")
Random orientation of the sampling matrix:
[[0.  0.  0. ]
 [0.  0.  0.5]
 [0.5 0.  0.5]
 [0.5 0.5 0.5]]

To obtain \(r\) elementary effects for each variable, the screening plan is built from \(r\) random orientations:

\[ X = \begin{pmatrix} B^*_1 \\ B^*_2 \\ \vdots \\ B^*_r \end{pmatrix} \]

The screening plan implementation in Python is as follows (see https://github.com/sequential-parameter-optimization/spotPython/blob/main/src/spotpython/utils/effects.py):

def screeningplan(k, p, xi, r):
    # Empty list to accumulate screening plan rows
    X = []

    for i in range(r):
        X.append(randorient(k, p, xi))

    # Concatenate list of arrays into a single array
    X = np.vstack(X)

    return X

The function screeningplan() generates a screening plan by calling the randorient() function r times. It creates a list of random orientations and then concatenates them into a single array, which represents the screening plan. It works like follows:

  • The value of the objective function \(f\) is computed for each row of the screening plan matrix \(X\). These values are stored in a column vector \(t\) of size \((r * (k + 1)) \times 1\), where:

  • r is the number of random orientations.

  • k is the number of design variables.

The elementary effects are calculated using the following formula:

  • For each random orientation, adjacent rows of the screening plan matrix X and their corresponding function values from t are used.
  • These values are inserted into Equation 1.1 to compute elementary effects for each variable. An elementary effect measures the sensitivity of the objective function to changes in a specific variable.

Results can be used for a statistical analysis. After collecting a sample of \(r\) elementary effects for each variable:

  • The sample mean (central tendency) is computed to indicate the overall influence of the variable.
  • The sample standard deviation (spread) is computed to capture variability, which may indicate interactions or nonlinearity.

The results (sample means and standard deviations) are plotted on a chart for comparison. This helps identify which variables have the most significant impact on the objective function and whether their effects are linear or involve interactions.

import numpy as np
from spotpython.utils.effects import screening_plot, screeningplan
from spotpython.fun.objectivefunctions import Analytical
fun = Analytical()
k = 10
p = 10
xi = 1
r = 25
X = screeningplan(k=k, p=p, xi=xi, r=r)  # shape (r x (k+1), k)
value_range = np.array([
    [150, 220,   6, -10, 16, 0.5, 0.08, 2.5, 1700, 0.025],
    [200, 300,  10,  10, 45, 1.0, 0.18, 6.0, 2500, 0.08 ],
])
labels = [
    "S_W", "W_fw", "A", "Lambda",
    "q",   "lambda", "tc", "N_z",
    "W_dg", "W_p"
]
screening_plot(
    X=X,
    fun=fun.fun_wingwt,
    bounds=value_range,
    xi=xi,
    p=p,
    labels=labels,
)
Figure 1.1: Estimated means and standard deviations of the elementary effects for the 10 design variables of the wing weight function. Example based on Forrester, Sóbester, and Keane (2008).

1.7 Designing a Sampling Plan

1.7.1 Stratification

A feature shared by all of the approximation models discussed in Forrester, Sóbester, and Keane (2008) is that they are more accurate in the vicinity of the points where we have evaluated the objective function. In later chapters we will delve into the laws that quantify our decaying trust in the model as we move away from a known, sampled point, but for the purposes of the present discussion we shall merely draw the intuitive conclusion that a uniform level of model accuracy throughout the design space requires a uniform spread of points. A sampling plan possessing this feature is said to be space-filling .

The most straightforward way of sampling a design space in a uniform fashion is by means of a rectangular grid of points. This is the full factorial sampling technique referred to in the section about the curse of dimensionality.

Here is the simplified version of a Python function that will sample the unit hypercube at all levels in all dimensions, with the \(k\)-vector \(q\) containing the number of points required along each dimension, see https://github.com/sequential-parameter-optimization/spotPython/blob/main/src/spotpython/utils/sampling.py.

The variable Edges specifies whether we want the points to be equally spaced from edge to edge (Edges=1) or we want them to be in the centres of \(n = q_1 \times q_2 \times \ldots \times q_k\) bins filling the unit hypercube (for any other value of Edges).

import numpy as np
from typing import Tuple, Optional
import matplotlib.pyplot as plt


def fullfactorial(q, Edges=1) -> np.ndarray:
    """Generates a full factorial sampling plan in the unit cube.

    Args:
        q (list or np.ndarray):
            A list or array containing the number of points along each dimension (k-vector).
        Edges (int, optional):
            Determines spacing of points. If `Edges=1`, points are equally spaced from edge to edge (default).
            Otherwise, points will be in the centers of n = q[0]*q[1]*...*q[k-1] bins filling the unit cube.

    Returns:
        (np.ndarray): Full factorial sampling plan as an array of shape (n, k), where n is the total number of points and k is the number of dimensions.

    Raises:
        ValueError: If any dimension in `q` is less than 2.

    Notes:
        Many thanks to the original author of this code, A Sobester, for providing the original Matlab code under the GNU Licence. Original Matlab Code: Copyright 2007 A Sobester:
        "This program is free software: you can redistribute it and/or modify  it
        under the terms of the GNU Lesser General Public License as published by
        the Free Software Foundation, either version 3 of the License, or any
        later version.
        This program is distributed in the hope that it will be useful, but
        WITHOUT ANY WARRANTY; without even the implied warranty of
        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser
        General Public License for more details.
        You should have received a copy of the GNU General Public License and GNU
        Lesser General Public License along with this program. If not, see
        <http://www.gnu.org/licenses/>."

    Examples:
        >>> from spotpython.utils.sampling import fullfactorial
        >>> q = [3, 2]
        >>> X = fullfactorial(q, Edges=0)
        >>> print(X)
                [[0.         0.        ]
                [0.         0.75      ]
                [0.41666667 0.        ]
                [0.41666667 0.75      ]
                [0.83333333 0.        ]
                [0.83333333 0.75      ]]
        >>> X = fullfactorial(q, Edges=1)
        >>> print(X)
                [[0.  0. ]
                [0.  1. ]
                [0.5 0. ]
                [0.5 1. ]
                [1.  0. ]
                [1.  1. ]]

    """
    q = np.array(q)
    if np.min(q) < 2:
        raise ValueError("You must have at least two points per dimension.")

    # Total number of points in the sampling plan
    n = np.prod(q)

    # Number of dimensions
    k = len(q)

    # Pre-allocate memory for the sampling plan
    X = np.zeros((n, k))

    # Additional phantom element
    q = np.append(q, 1)

    for j in range(k):
        if Edges == 1:
            one_d_slice = np.linspace(0, 1, q[j])
        else:
            one_d_slice = np.linspace(1 / (2 * q[j]), 1, q[j]) - 1 / (2 * q[j])

        column = np.array([])

        while len(column) < n:
            for ll in range(q[j]):
                column = np.append(column, np.ones(np.prod(q[j + 1 : k])) * one_d_slice[ll])

        X[:, j] = column

    return X
from spotpython.utils.sampling import fullfactorial
q = [3, 2]
X = fullfactorial(q, Edges=0)
print(X)
[[0.         0.        ]
 [0.         0.75      ]
 [0.41666667 0.        ]
 [0.41666667 0.75      ]
 [0.83333333 0.        ]
 [0.83333333 0.75      ]]
X = fullfactorial(q, Edges=1)
print(X)
[[0.  0. ]
 [0.  1. ]
 [0.5 0. ]
 [0.5 1. ]
 [1.  0. ]
 [1.  1. ]]

The full factorial sampling plan method generates a uniform sampling design by creating a grid of points across all dimensions. For example, calling fullfactorial([3, 4, 5], 1) produces a three-dimensional sampling plan with 3, 4, and 5 levels along each dimension, respectively. While this approach satisfies the uniformity criterion, it has two significant limitations:

  • Restricted Design Sizes: The method only works for designs where the total number of points \(n\) can be expressed as the product of the number of levels in each dimension, i.e., \(n = q_1 \times q_2 \times \cdots \times q_k\).

  • Overlapping Projections: When the sampling points are projected onto individual axes, sets of points may overlap, reducing the effectiveness of the sampling plan. This can lead to non-uniform coverage in the projections, which may not fully represent the design space.

1.7.2 Latin Squares and Random Latin Hypercubes

To improve the uniformity of projections for any individual variable, the range of that variable can be divided into a large number of equal-sized bins, and random subsamples of equal size can be generated within these bins. This method is called stratified random sampling. Extending this idea to all dimensions results in a stratified sampling plan, commonly implemented using Latin hypercube sampling.

For two-dimensional discrete variables, a Latin square ensures uniform projections. An \((n \times n)\) Latin square is constructed by filling each row and column with a permutation of \(\{1, 2, \dots, n\}\), ensuring each number appears only once per row and column. For example, for $n = 4 $, a Latin square might look like this:

2   1   3   4
3   2   4   1
1   4   2   3
4   3   1   2

Latin Hypercubes are the multidimensional extension of Latin squares. The design space is divided into equal-sized hypercubes (bins), and one point is placed in each bin. The placement ensures that moving along any axis from an occupied bin does not encounter another occupied bin. This guarantees uniform projections across all dimensions. To construct a Latin hypercube:

Represent the sampling plan as an ( n k ) matrix ( X ), where ( n ) is the number of points and ( k ) is the number of dimensions. Fill each column of ( X ) with random permutations of ( {1, 2, , n} ). Normalize the plan into the unit hypercube ([0, 1]^k). This approach ensures multidimensional stratification and uniformity in projections. Here is the code:

def rlh(n: int, k: int, edges: int = 0) -> np.ndarray:
    # Initialize array
    X = np.zeros((n, k), dtype=float)

    # Fill with random permutations
    for i in range(k):
        X[:, i] = np.random.permutation(n)

    # Adjust normalization based on the edges flag
    if edges == 1:
        # [X=0..n-1] -> [0..1]
        X = X / (n - 1)
    else:
        # Points at true midpoints
        # [X=0..n-1] -> [0.5/n..(n-0.5)/n]
        X = (X + 0.5) / n

    return X
import numpy as np
from spotpython.utils.sampling import rlh
# Generate a 2D Latin hypercube with 5 points and edges=0
X = rlh(n=5, k=2, edges=0)
print(X)
[[0.9 0.1]
 [0.3 0.7]
 [0.5 0.9]
 [0.7 0.5]
 [0.1 0.3]]
import numpy as np
from spotpython.utils.sampling import rlh
# Generate a 2D Latin hypercube with 5 points and edges=1
X = rlh(n=5, k=2, edges=1)
print(X)
[[0.75 0.25]
 [0.   0.  ]
 [1.   1.  ]
 [0.25 0.5 ]
 [0.5  0.75]]
import numpy as np
import matplotlib.pyplot as plt
from spotpython.utils.sampling import rlh

# Generate a 2D Latin hypercube with 5 points and edges=0
X = rlh(n=5, k=2, edges=0)

# Plot the points
plt.figure(figsize=(6, 6))
plt.scatter(X[:, 0], X[:, 1], color='blue', s=50, label='Hypercube Points')
plt.grid(True, linestyle='--', alpha=0.7)
plt.title('2D Latin Hypercube Sampling (5 Points, Edges=0)')
plt.xlabel('Dimension 1')
plt.ylabel('Dimension 2')
plt.xlim(0, 1)
plt.ylim(0, 1)
plt.legend()
plt.show()

Goals
  • How to choose models and optimizers for solving real-world problems
  • How to use simulation to understand and improve processes

1.8 Jupyter Notebook

Note