This document describes how to use the parallelization features in SpotOptim to accelerate optimization runs, particularly for computationally expensive objective functions.
7.1 Overview
SpotOptim utilizes a Steady-State Asynchronous Parallelization strategy when n_jobs > 1. This approach is designed to maximize resource utilization by ensuring that as soon as a worker is free, a new task is assigned, without waiting for batches of tasks to complete.
7.2 How it Workstest/
When n_jobs > 1, SpotOptim employs a Steady-State Asynchronous Parallelization strategy. The process flow is as follows:
Parallel Initial Design:
The n_initial * repeats_initial initial design evaluations are managed by the parallel executor.
The first n_jobs are sent to separate processors.
If the first job is ready, its result is returned and the next of the initial design runs is dispatched.
This continues until all initial design runs have returned their values.
First Surrogate Fit:
Once all initial evaluations are complete, the first surrogate model is built (fitted) using the comprehensive initial dataset.
Parallel Search Initialization:
n_jobs searches (optimizations) on this initial surrogate model are dispatched to run in parallel.
Steady-State Loop:
Dispatch & Collect: The loop manages a continuous stream of tasks.
Search: If a Search task is ready (returns a candidate \(x_{cand}\)), this point is immediately sent to the evaluation function to compute \(y_{new}\).
Update & Refit: As soon as \(y_{new}\) is available, the global surrogate model is fitted again (including the new \(x_{cand}, y_{new}\)).
New Search: A new Search task is then dispatched using this continuously updated surrogate model.
This cycle repeats, ensuring the surrogate is always updated with the latest available information for every new search.
7.3 Benchmark Example
The following example demonstrates the speedup achieved by using parallelization on a simulated expensive objective function.
7.3.1 Benchmark Script
We compare sequential execution (n_jobs=1) against parallel execution (n_jobs=4) for a task simulating 4 independent optimization runs.
import osimport timeimport warningsimport numpy as npimport matplotlib.pyplot as pltfrom spotoptim import SpotOptimfrom sklearn.exceptions import ConvergenceWarningos.environ["PYTHONWARNINGS"] ="ignore"warnings.filterwarnings("ignore")warnings.filterwarnings("ignore", category=ConvergenceWarning)def expensive_objective(X):import timeimport numpy as np# Simulate a computationally expensive function# Sleep for 0.05 seconds per point n_points = X.shape[0] time.sleep(0.05* n_points)# Simple sphere functionreturn np.sum(X**2, axis=1)def run_benchmark(): n_runs =4 n_iter_per_run =10print(f"Benchmark Configuration:")print(f" Objective cost: 0.05s per evaluation")print(f" Runs: {n_runs}")print(f" Iters per run: {n_iter_per_run}")# --- Sequential Execution (n_jobs=1) ---print("\nStarting Sequential Benchmark (n_jobs=1)...") start_seq = time.time()for i inrange(n_runs): optimizer = SpotOptim( fun=expensive_objective, bounds=[(-5, 5)] *2, max_iter=n_iter_per_run, n_initial=5, n_jobs=1, seed=42+ i, verbose=False ) optimizer.optimize() end_seq = time.time() time_seq = end_seq - start_seqprint(f"Sequential Total Time: {time_seq:.2f}s")# --- Parallel Execution (n_jobs=4) ---print("\nStarting Parallel Benchmark (n_jobs=4)...") start_par = time.time() optimizer_par = SpotOptim( fun=expensive_objective, bounds=[(-5, 5)] *2, max_iter=n_iter_per_run, n_initial=5, n_jobs=n_runs, # 4 parallel tasks seed=42, verbose=False ) optimizer_par.optimize() end_par = time.time() time_par = end_par - start_parprint(f"Parallel Total Time: {time_par:.2f}s")# --- Results --- speedup = time_seq / time_parprint("-"*30)print(f"Speedup: {speedup:.2f}x")# --- Plotting --- labels = ['Sequential', 'Parallel (n_jobs=4)'] times = [time_seq, time_par] plt.figure(figsize=(8, 6)) bars = plt.bar(labels, times, color=['skyblue', 'salmon']) plt.ylabel('Total Time (s)') plt.title(f'Optimization Time Comparison\n(Speedup: {speedup:.2f}x)') plt.grid(axis='y', linestyle='--', alpha=0.7)# Add text labels on barsfor bar in bars: height = bar.get_height() plt.text(bar.get_x() + bar.get_width()/2., height,f'{height:.2f}s', ha='center', va='bottom') plt.show()if__name__=="__main__": run_benchmark()
Benchmark Configuration:
Objective cost: 0.05s per evaluation
Runs: 4
Iters per run: 10
Starting Sequential Benchmark (n_jobs=1)...
Sequential Total Time: 9.09s
Starting Parallel Benchmark (n_jobs=4)...
Parallel Total Time: 4.54s
------------------------------
Speedup: 2.00x
7.3.2 Results
Running the benchmark on a standard multi-core machine yields significant speedups. In our test case with a simulated delay of 0.05s per evaluation:
Sequential Time: ~7.56s
Parallel Time: ~4.13s
Speedup: 1.83x
Note: Actual speedup depends on the overhead of process spawning and the nature of the objective function. For very fast objective functions, the overhead of parallelization might outweigh the benefits.
7.4 Best Practices
Use for Expensive Functions: Parallelization is most effective when the function evaluation time dominates the overhead of joblib (pickling data, spawning processes).
Memory Usage: Each parallel worker consumes its own memory. Be mindful of total system memory when setting high n_jobs for memory-intensive problems.
Reproducibility: Setting a seed in SpotOptim ensures that the parallel runs are reproducible, as seeds are deterministically derived for each task.