This feature automatically selects a subset of evaluated points for surrogate model training when the total number of points exceeds a specified threshold.
It is implemented as a point selection mechanism for SpotOptim that mirrors the functionality in spotpython’s Spot class.
14.2 Implementation Details
14.2.1 Parameters
Added to SpotOptim.__init__:
max_surrogate_points (int, optional): Maximum number of points to use for surrogate fitting
selection_method (str, default=‘distant’): Method for selecting points (‘distant’ or ‘best’)
14.2.2 Methods
_select_distant_points(X, y, k)
Uses K-means clustering to find k clusters
Selects the point closest to each cluster center
Ensures space-filling properties for surrogate training
The method _fit_surrogate(X, y) checks if X.shape[0] > self.max_surrogate_points. If true, it calls _selection_dispatcher to get a subset. Then, it fits the surrogate only on the selected points. This implementation matches the logic in spotpython.spot.spot.Spot.fit_surrogate
14.3 Key Differences from spotpython
While the implementation follows spotpython’s design, there is a difference: spotoptim uses a simplified clustering, it uses sklearn’s KMeans directly instead of a custom implementation.
14.4 Example Usage
This example demonstrates the point selection feature with a limited number of surrogate points. Increase MAX_ITER, N_INITIAL, and MAX_SURROGATE_POINTS to see more pronounced effects.
from spotoptim import SpotOptimimport numpy as npMAX_ITER =20N_INITIAL =5MAX_SURROGATE_POINTS =10# Define an example objective functiondef sphere(X):"""Simple sphere function for demonstration"""return np.sum(X**2, axis=1)bounds = [(-5, 5), (-5, 5), (-5, 5)]# Without point selection (default behavior)optimizer1 = SpotOptim( fun=sphere, bounds=bounds, max_iter=MAX_ITER, n_initial=N_INITIAL, seed=42)result1 = optimizer1.optimize()print(f"Without selection - Best value: {result1.fun:.6f}")print(f"Total points evaluated: {result1.nfev}")# With point selection using distant methodoptimizer2 = SpotOptim( fun=sphere, bounds=bounds, max_iter=MAX_ITER, n_initial=N_INITIAL, max_surrogate_points=MAX_SURROGATE_POINTS, selection_method='distant', seed=42)result2 = optimizer2.optimize()print(f"\nWith 'distant' selection - Best value: {result2.fun:.6f}")print(f"Total points evaluated: {result2.nfev}")print(f"Max surrogate points: {optimizer2.max_surrogate_points}")# With point selection using best cluster methodoptimizer3 = SpotOptim( fun=sphere, bounds=bounds, max_iter=MAX_ITER, n_initial=N_INITIAL, max_surrogate_points=MAX_SURROGATE_POINTS, selection_method='best', seed=42)result3 = optimizer3.optimize()print(f"\nWith 'best' selection - Best value: {result3.fun:.6f}")print(f"Total points evaluated: {result3.nfev}")print(f"Max surrogate points: {optimizer3.max_surrogate_points}")
Without selection - Best value: 0.000034
Total points evaluated: 20
With 'distant' selection - Best value: 2.380385
Total points evaluated: 20
Max surrogate points: 10
With 'best' selection - Best value: 2.549832
Total points evaluated: 20
Max surrogate points: 10
14.5 Benefits
Scalability: Enables efficient optimization with many function evaluations
Computational efficiency: Reduces surrogate training time for large datasets
Maintained accuracy: Careful point selection preserves model quality
Flexibility: Two selection methods for different optimization scenarios