7Variable Transformations for Search Space Scaling
SpotOptim supports automatic variable transformations to improve optimization in scaled search spaces. Instead of manually handling transformations (e.g., log-scale for learning rates), you can specify transformations via the var_trans parameter, and SpotOptim handles everything internally.
7.1 Overview
What are Variable Transformations?
Variable transformations allow you to specify how search space dimensions should be scaled during optimization:
Original scale (user interface): Input bounds, output results, plots
Some hyperparameters span multiple orders of magnitude:
Learning rates: 0.0001 to 1.0 (4 orders of magnitude)
Regularization: 0.001 to 100 (5 orders of magnitude)
Network sizes: 10 to 1000 neurons
Direct optimization in these spaces is inefficient because:
Surrogate models struggle with extreme scales
Uniform sampling wastes evaluations in unimportant regions
Acquisition functions behave poorly with skewed distributions
7.2.2 Solution: Logarithmic and Other Transformations
Transform the space for optimization while maintaining user-friendly interfaces:
# Without transformations (manual approach)bounds = [(-4, 0)] # log10(lr): awkward for userslr =10** params[0] # Manual transformation in objective# With transformations (automatic)bounds = [(0.0001, 1.0)] # lr in natural scalevar_trans = ["log10"] # SpotOptim handles transformationlr = params[0] # Already in original scale!
7.3 Quick Start
7.3.1 Basic Log-Scale Transformation
from spotoptim import SpotOptimimport numpy as npdef objective_function(X):"""Objective receives parameters in ORIGINAL scale.""" results = []for params in X: lr = params[0] # Already in [0.001, 0.1] - original scale! alpha = params[1] # Already in [0.01, 1.0] - original scale!# Simulate model training score = (lr -0.01)**2+ (alpha -0.1)**2+ np.random.normal(0, 0.01) results.append(score)return np.array(results)# Create optimizer with transformationsoptimizer = SpotOptim( fun=objective_function, bounds=[ (0.001, 0.1), # learning rate (original scale) (0.01, 1.0) # alpha (original scale) ], var_trans=["log10", "log10"], # Both use log10 transformation var_name=["lr", "alpha"], max_iter=20, seed=42)# Run optimizationresult = optimizer.optimize()print(f"Best lr: {result.x[0]:.6f}") # In original scaleprint(f"Best alpha: {result.x[1]:.6f}") # In original scaleprint(f"Best score: {result.fun:.6f}")
Best lr: 0.002188
Best alpha: 0.063046
Best score: -0.008857
7.4 Supported Transformations
SpotOptim supports the following transformations:
Transformation
Forward (x → t)
Inverse (t → x)
Use Case
"log10"
t = log₁₀(x)
x = 10^t
Learning rates, regularization
"log" or "ln"
t = ln(x)
x = e^t
Natural exponential scales
"sqrt"
t = √x
x = t²
Moderate scaling
"exp"
t = e^x
x = ln(t)
Inverse of natural log
"square"
t = x²
x = √t
Inverse of sqrt
"cube"
t = x³
x = ∛t
Strong scaling
"inv" or "reciprocal"
t = 1/x
x = 1/t
Reciprocal relationships
"pow(base, x)"
t = base^x
x = log_base(t)
Power scaling (e.g. pow(2, x))
"log(x, base)"
t = log_base(x)
x = base^t
Log scaling with custom base
None or "id"
t = x
x = t
No transformation
7.4.1 Transformation Guidelines
When to use "log10" or "log":
Parameters spanning multiple orders of magnitude
Learning rates: (1e-5, 1e-1) → uniform sampling in log space
Regularization parameters: (1e-6, 1e2)
Batch sizes, hidden units when range is large
When to use "sqrt":
Moderate scaling (1-2 orders of magnitude)
Batch sizes: (16, 512)
Number of neurons: (32, 256)
When to use "inv" (reciprocal):
Inverse relationships (e.g., 1/temperature)
When smaller values are more important
Dynamic Transformations:
"pow(base, x)": Exponential scaling. Example: transform="pow(2, x)" with bounds [2, 5] leads to internal search on [4, 32].
"log(x, base)": Logarithmic scaling with custom base. Example: transform="log(x, 2)".
When to use None:
Parameters with narrow ranges
Already well-scaled parameters
Categorical indices (use with var_type=["factor"])
7.5 Detailed Examples
7.5.1 Example 1: Neural Network Hyperparameter Tuning
import torchimport torch.nn as nnfrom spotoptim import SpotOptimfrom spotoptim.data import get_diabetes_dataloadersfrom spotoptim.nn.linear_regressor import LinearRegressorimport numpy as npdef train_neural_network(X):"""Train neural network with hyperparameters in original scale.""" results = []for params in X:# All parameters in original scale hidden_size =int(params[0]) # [16, 256] num_layers =int(params[1]) # [1, 4] lr = params[2] # [0.0001, 0.1] weight_decay = params[3] # [1e-6, 0.01]print(f"Training: hidden={hidden_size}, layers={num_layers}, "f"lr={lr:.6f}, wd={weight_decay:.6f}")# Load data train_loader, test_loader, _ = get_diabetes_dataloaders(batch_size=32)# Create model model = LinearRegressor( input_dim=10, output_dim=1, l1=hidden_size, num_hidden_layers=num_layers, activation="ReLU", lr=lr )# Get optimizer with weight decay optimizer = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)# Train model.train()for epoch inrange(50):for batch_x, batch_y in train_loader: optimizer.zero_grad() outputs = model(batch_x) loss = nn.MSELoss()(outputs, batch_y) loss.backward() optimizer.step()# Evaluate model.eval() total_loss =0with torch.no_grad():for batch_x, batch_y in test_loader: outputs = model(batch_x) loss = nn.MSELoss()(outputs, batch_y) total_loss += loss.item() avg_loss = total_loss /len(test_loader) results.append(avg_loss)return np.array(results)# Create optimizer with appropriate transformationsoptimizer = SpotOptim( fun=train_neural_network, bounds=[ (16, 256), # hidden_size: moderate range (1, 4), # num_layers: small range (0.0001, 0.1), # lr: 3 orders of magnitude (1e-6, 0.01) # weight_decay: 4 orders of magnitude ], var_trans=["sqrt", # sqrt for hidden_sizeNone, # no transformation for num_layers"log10", # log10 for learning rate"log10"# log10 for weight_decay ], var_type=["int", "int", "float", "float"], var_name=["hidden_size", "num_layers", "lr", "weight_decay"], max_iter=30, n_initial=10, seed=42)result = optimizer.optimize()print("\nBest Configuration:")print(f" Hidden Size: {int(result.x[0])}")print(f" Num Layers: {int(result.x[1])}")print(f" Learning Rate: {result.x[2]:.6f}")print(f" Weight Decay: {result.x[3]:.8f}")print(f" Best Loss: {result.fun:.6f}")
Output shows the “trans” column with transformation types, helping you understand which parameters were optimized in which scale.
7.7 Internal Architecture
Understanding how transformations work internally can help debug issues and understand behavior:
7.7.1 Flow Diagram
User Input (Original Scale)
↓
[Transform to Internal Scale]
↓
Optimization (Transformed Scale)
• Initial design generation
• Surrogate model fitting
• Acquisition function optimization
↓
[Inverse Transform to Original Scale]
↓
Objective Function Evaluation (Original Scale)
↓
Storage & Results (Original Scale)
7.7.2 Key Components
Bounds Transformation (_transform_bounds()):
Called during initialization
Transforms _original_lower and _original_upper → lower and upper
Updates self.bounds for internal use
Forward Transformation (_transform_X()):
Converts from original scale to transformed scale
Used before surrogate fitting
Used when comparing distances
Inverse Transformation (_inverse_transform_X()):
Converts from transformed scale to original scale
Used before function evaluation
Used when storing results
Storage:
self.X_ stores in original scale
self.best_x_ stores in original scale
All external-facing data in original scale
7.8 Best Practices
7.8.1 1. Choose Appropriate Transformations
# Good: Log scale for learning ratebounds = [(1e-5, 1e-1)]var_trans = ["log10"]# Bad: No transformation for wide rangebounds = [(1e-5, 1e-1)]var_trans = [None] # Poor sampling distribution
7.8.2 2. Match Transformation to Range
# Wide range (>3 orders of magnitude): use logbounds = [(1e-6, 1e-2)]var_trans = ["log10"]# Moderate range (1-2 orders): use sqrtbounds = [(10, 500)]var_trans = ["sqrt"]# Narrow range (<1 order): no transformationbounds = [(-1, 1)]var_trans = [None]
7.8.3 3. Validate Transformation Choice
# Check if transformation makes senseimport numpy as np# Original spacex_orig = np.linspace(0.001, 1.0, 10)print("Original:", x_orig)# Log10 transformed spacex_trans = np.log10(x_orig)print("Transformed:", x_trans)print("Range ratio:", np.ptp(x_trans) / np.ptp(x_orig))# Should be much more uniform distribution
7.8.4 4. Combine with Variable Types
# Mix transformations with variable typesoptimizer = SpotOptim( fun=objective, bounds=[ (10, 200), # int with sqrt ("ReLU", "Tanh", "Sigmoid"), # factor (no transform) (0.0001, 0.1), # num with log10 (0.01, 1.0) # num with log10 ], var_type=["int", "factor", "float", "float"], var_trans=["sqrt", None, "log10", "log10"], var_name=["neurons", "activation", "lr", "dropout"])
7.9 Troubleshooting
7.9.1 Issue: Values Out of Bounds
Problem: Objective function receives values outside specified bounds.
Solution: This should not happen with transformations. If it does:
# Check transformation is applied correctlyprint(f"Original bounds: {optimizer._original_lower} to {optimizer._original_upper}")print(f"Transformed bounds: {optimizer.lower} to {optimizer.upper}")print(f"Transformations: {optimizer.var_trans}")
7.9.2 Issue: Poor Optimization Performance
Problem: Optimization doesn’t find good solutions.
Possible causes:
Wrong transformation type for the parameter scale
Transformation not needed (adding unnecessary complexity)
Bounds too wide or too narrow
Solution:
# Try different transformationsfor trans in [None, "log10", "sqrt"]: optimizer = SpotOptim( fun=objective, bounds=[(0.001, 1.0)], var_trans=[trans], max_iter=20, seed=42 ) result = optimizer.optimize()print(f"Transformation: {trans}, Best: {result.fun:.6f}")
7.9.3 Issue: Transformation Not Applied
Problem: Transformation doesn’t seem to affect optimization.
Check:
# Verify var_trans length matches dimensionsprint(f"Number of dimensions: {len(optimizer.bounds)}")print(f"Number of transformations: {len(optimizer.var_trans)}")# These must match!# Check transformation is not None/"id"print(f"Transformations: {optimizer.var_trans}")
7.10 Comparison: Manual vs Automatic Transformations
7.10.1 Manual Approach (Old Way)
def objective_manual(X):"""Manual transformation - error-prone!""" results = []for params in X:# Must remember to transform lr =10** params[0] # Was in log scale alpha =10** params[1] # Was in log scale# Use parameters score = compute_score(lr, alpha) results.append(score)return np.array(results)# Bounds in log scale - confusing!optimizer = SpotOptim( fun=objective_manual, bounds=[(-4, -1), (-2, 0)], # log10 scale var_name=["log10_lr", "log10_alpha"] # Confusing names)result = optimizer.optimize()# Must transform back for interpretationbest_lr =10** result.x[0]best_alpha =10** result.x[1]
7.10.2 Automatic Approach (New Way)
def objective_auto(X):"""Automatic transformation - clean!""" results = []for params in X:# Already in original scale lr = params[0] alpha = params[1]# Use parameters directly score = compute_score(lr, alpha) results.append(score)return np.array(results)# Bounds in natural scale - intuitive!optimizer = SpotOptim( fun=objective_auto, bounds=[(0.0001, 0.1), (0.01, 1.0)], var_trans=["log10", "log10"], # Specify transformation var_name=["lr", "alpha"] # Natural names)result = optimizer.optimize()# Results already in original scalebest_lr = result.x[0]best_alpha = result.x[1]
7.11 Summary
Key Takeaways:
✅ Use var_trans to specify transformations for each dimension
✅ Transformations improve optimization for poorly scaled spaces
✅ All user interfaces (bounds, results, plots) use original scale
✅ Optimization happens internally in transformed space
✅ Common transformations: "log10" for learning rates, "sqrt" for moderate scaling
✅ View transformations in tables with “trans” column
When to Use:
Parameters spanning multiple orders of magnitude → "log10" or "log"
Moderate scaling (1-2 orders) → "sqrt"
Reciprocal relationships → "inv"
Well-scaled parameters → None
Benefits:
Better surrogate model performance
More efficient sampling
Improved optimization convergence
User-friendly interface (no manual transformations in objective function)