18 Point Selection Implementation in SpotOptim

18.1 Overview

This feature automatically selects a subset of evaluated points for surrogate model training when the total number of points exceeds a specified threshold.

It is implemented as a point selection mechanism for SpotOptim that mirrors the functionality in spotpython’s Spot class.

18.2 Implementation Details

18.2.1 Parameters

Added to SpotOptim.__init__:

max_surrogate_points (int, optional): Maximum number of points to use for surrogate fitting
selection_method (str, default=‘distant’): Method for selecting points (‘distant’ or ‘best’)

18.2.2 Methods

select_distant_points(X, y, k)
- Uses K-means clustering to find k clusters
- Selects the point closest to each cluster center
- Ensures space-filling properties for surrogate training
- Mimics spotpython.utils.aggregate.select_distant_points
select_best_cluster(X, y, k)
- Uses K-means clustering to find k clusters
- Computes mean objective value for each cluster
- Selects all points from the cluster with the best (lowest) mean value
- Mimics spotpython.utils.aggregate.select_best_cluster
_selection_dispatcher(X, y)
- Dispatcher method that routes to the appropriate selection function
- Returns all points if max_surrogate_points is None
- Mimics spotpython.spot.spot.Spot.selection_dispatcher

The method _fit_surrogate(X, y) checks if X.shape[0] > self.max_surrogate_points. If true, it calls _selection_dispatcher to get a subset. Then, it fits the surrogate only on the selected points. This implementation matches the logic in spotpython.spot.spot.Spot.fit_surrogate

18.3 Key Differences from spotpython

While the implementation follows spotpython’s design, there is a difference: spotoptim uses a simplified clustering, it uses sklearn’s KMeans directly instead of a custom implementation.

18.4 Example Usage

This example demonstrates the point selection feature with a limited number of surrogate points. Increase MAX_ITER, N_INITIAL, and MAX_SURROGATE_POINTS to see more pronounced effects.

from spotoptim import SpotOptim
import numpy as np

MAX_ITER = 20
N_INITIAL = 5
MAX_SURROGATE_POINTS = 10

# Define an example objective function
def sphere(X):
    """Simple sphere function for demonstration"""
    return np.sum(X**2, axis=1)

bounds = [(-5, 5), (-5, 5), (-5, 5)]

# Without point selection (default behavior)
optimizer1 = SpotOptim(
    fun=sphere,
    bounds=bounds,
    max_iter=MAX_ITER,
    n_initial=N_INITIAL,
    seed=42
)
result1 = optimizer1.optimize()
print(f"Without selection - Best value: {result1.fun:.6f}")
print(f"Total points evaluated: {result1.nfev}")

# With point selection using distant method
optimizer2 = SpotOptim(
    fun=sphere,
    bounds=bounds,
    max_iter=MAX_ITER,
    n_initial=N_INITIAL,
    max_surrogate_points=MAX_SURROGATE_POINTS,
    selection_method='distant',
    seed=42
)
result2 = optimizer2.optimize()
print(f"\nWith 'distant' selection - Best value: {result2.fun:.6f}")
print(f"Total points evaluated: {result2.nfev}")
print(f"Max surrogate points: {optimizer2.max_surrogate_points}")

# With point selection using best cluster method
optimizer3 = SpotOptim(
    fun=sphere,
    bounds=bounds,
    max_iter=MAX_ITER,
    n_initial=N_INITIAL,
    max_surrogate_points=MAX_SURROGATE_POINTS,
    selection_method='best',
    seed=42
)
result3 = optimizer3.optimize()
print(f"\nWith 'best' selection - Best value: {result3.fun:.6f}")
print(f"Total points evaluated: {result3.nfev}")
print(f"Max surrogate points: {optimizer3.max_surrogate_points}")

Without selection - Best value: 0.000008
Total points evaluated: 20

With 'distant' selection - Best value: 2.380503
Total points evaluated: 20
Max surrogate points: 10

With 'best' selection - Best value: 1.830392
Total points evaluated: 20
Max surrogate points: 10

18.5 Benefits

Scalability: Enables efficient optimization with many function evaluations
Computational efficiency: Reduces surrogate training time for large datasets
Maintained accuracy: Careful point selection preserves model quality
Flexibility: Two selection methods for different optimization scenarios

18.6 Comparison with spotpython

Feature	spotpython	SpotOptim
Point selection via clustering	✓	✓
‘distant’ method	✓	✓
‘best’ method	✓	✓
Selection dispatcher	✓	✓
Nyström approximation	✓	✗
Modular design	✓ (utils.aggregate)	✓ (class methods)

18.7 Jupyter Notebook

Note

The Jupyter-Notebook of this chapter is available on GitHub in the Sequential Parameter Optimization Cookbook Repository