Overview
This document describes the var_type implementation in SpotOptim, which allows users to specify different data types for optimization variables.
Supported Variable Types
SpotOptim supports three main data types:
1. ‘float’
- Purpose: Continuous optimization with Python floats
- Behavior: No rounding applied, values remain continuous
- Use case: Standard continuous optimization variables
- Example: Temperature (23.5°C), Distance (1.234m)
2. ‘int’
- Purpose: Discrete integer optimization
- Behavior: Float values are automatically rounded to integers
- Use case: Count variables, discrete parameters
- Example: Number of layers (5), Population size (100)
3. ‘factor’
- Purpose: Unordered categorical data
- Behavior: Internally mapped to integer values (0, 1, 2, …)
- Use case: Categorical choices like colors, algorithms, modes
- Example: Color (“red”→0, “green”→1, “blue”→2)
- Note: The actual string-to-int mapping is external to SpotOptim; the optimizer works with the integer representation
Implementation Details
Where var_type is Used
The var_type parameter is properly propagated throughout the optimization process:
Initialization (__init__):
- Stored as
self.var_type
- Default:
["float"] * n_dim if not specified
Initial Design Generation (_generate_initial_design):
- Applies type constraints via
_repair_non_numeric()
- Ensures initial points respect variable types
New Point Suggestion (_suggest_next_point):
- Applies type constraints to acquisition function optimization results
- Ensures suggested points respect variable types
User-Provided Initial Design (optimize):
- Applies type constraints to X0 if provided
- Ensures consistency regardless of input source
Mesh Grid Generation (_generate_mesh_grid):
- Used for plotting, respects variable types
- Ensures visualization shows correct discrete/continuous behavior
Core Method: _repair_non_numeric()
This method enforces variable type constraints:
def _repair_non_numeric(self, X: np.ndarray, var_type: List[str]) -> np.ndarray:
"""Round non-continuous values to integers."""
mask = np.isin(var_type, ["float"], invert=True)
X[:, mask] = np.around(X[:, mask])
return X
Logic:
- Variables with type
'float': No change (continuous)
- Variables with type
'int' or 'factor': Rounded to integers
Example Usage
Example 1: All Float Variables (Default)
import numpy as np
from spotoptim import SpotOptim
# Example 1: All float variables (default)
opt1 = SpotOptim(
fun=lambda X: np.sum(X**2, axis=1),
bounds=[(0, 10), (0, 10), (0, 10)],
max_iter=20,
n_initial=10,
seed=42
# var_type defaults to ["float", "float", "float"]
)
result1 = opt1.optimize()
print(f"Best value: {result1.fun:.6f}")
print(f"Best point (floats): {result1.x}")
Best value: 0.000000
Best point (floats): [0. 0. 0.]
Example 2: Pure Integer Optimization
import numpy as np
from spotoptim import SpotOptim
def discrete_func(X):
return np.sum(X**2, axis=1)
bounds = [(-5, 5), (-5, 5)]
var_type = ["int", "int"]
opt = SpotOptim(
fun=discrete_func,
bounds=bounds,
var_type=var_type,
max_iter=20,
n_initial=10,
seed=42
)
result = opt.optimize()
print(f"Best value: {result.fun:.6f}")
print(f"Best point (integers): {result.x}")
print(f"Note: Values are rounded to integers")
Best value: 0.000000
Best point (integers): [ 0. -0.]
Note: Values are rounded to integers
Example 3: Categorical (Factor) Variables
import numpy as np
from spotoptim import SpotOptim
def categorical_func(X):
# Assume X[:, 0] represents 3 categories: 0, 1, 2
# Category 0 is best
return (X[:, 0]**2) + (X[:, 1]**2)
bounds = [(0, 2), (0, 3)] # 3 and 4 categories respectively
var_type = ["factor", "factor"]
opt = SpotOptim(
fun=categorical_func,
bounds=bounds,
var_type=var_type,
max_iter=20,
n_initial=10,
seed=42
)
result = opt.optimize()
print(f"Best value: {result.fun:.6f}")
print(f"Best point (categories): {result.x}")
print(f"Note: Values are integers representing categories")
Best value: 0.000000
Best point (categories): [0. 0.]
Note: Values are integers representing categories
Example 4: Mixed Variable Types
import numpy as np
from spotoptim import SpotOptim
def mixed_func(X):
# X[:, 0]: continuous temperature
# X[:, 1]: discrete number of iterations
# X[:, 2]: categorical algorithm choice (0, 1, 2)
return X[:, 0]**2 + X[:, 1]**2 + X[:, 2]**2
bounds = [(-5, 5), (1, 100), (0, 2)]
var_type = ["float", "int", "factor"]
var_name = ["temperature", "iterations", "algorithm"]
opt = SpotOptim(
fun=mixed_func,
bounds=bounds,
var_type=var_type,
var_name=var_name,
max_iter=20,
n_initial=10,
seed=42
)
result = opt.optimize()
print(f"Best value: {result.fun:.6f}")
print(f"Best point: {result.x}")
print(f" {var_name[0]} (float): {result.x[0]:.6f}")
print(f" {var_name[1]} (int): {int(result.x[1])}")
print(f" {var_name[2]} (factor): {int(result.x[2])}")
Best value: 1.000010
Best point: [-0.00309017 1. 0. ]
temperature (float): -0.003090
iterations (int): 1
algorithm (factor): 0
Key Findings
Type Persistence: Variable types are correctly maintained throughout the entire optimization process, from initial design through all iterations.
Automatic Enforcement: The _repair_non_numeric() method is called at all critical points, ensuring type constraints are never violated.
Three Explicit Types: Only 'float', 'int', and 'factor' are supported. The legacy 'num' type has been removed for clarity.
User-Provided Data: Type constraints are applied even to user-provided initial designs, ensuring consistency.
Plotting Compatibility: The plotting functionality respects variable types, ensuring correct visualization of discrete vs. continuous variables.
Recommendations
- Always specify var_type explicitly for clarity, especially in mixed-type problems
- Use appropriate bounds for factor variables (e.g.,
(0, n_categories-1))
- External mapping for string categories: Maintain your own mapping dictionary outside SpotOptim (e.g.,
{"red": 0, "green": 1, "blue": 2})
- Validation: The current implementation doesn’t validate var_type length matches bounds length - users should ensure this manually
Future Enhancements (Optional)
Potential improvements that could be added:
- Validation: Add validation in
__init__ to check len(var_type) == len(bounds)
- String Categories: Add built-in support for automatic string-to-int mapping
- Ordered Categories: Support ordered categorical variables (ordinal data)
- Type Checking: Validate that var_type values are one of the allowed strings
- Bounds Checking: Warn if factor bounds are not integer ranges