
Example: Parallel Grid Search for Hyperparameter Tuning
Source:vignettes/example-grid-search.Rmd
example-grid-search.RmdOverview
Hyperparameter tuning is one of the most time-consuming aspects of machine learning. Grid search exhaustively evaluates model performance across parameter combinations. This example demonstrates parallelizing grid search to dramatically reduce tuning time.
Use Case: ML model optimization, hyperparameter tuning, model selection, cross-validation
Computational Pattern: Embarrassingly parallel model training with aggregation
The Problem
You need to tune a gradient boosting model for a binary classification task: - 3 learning rates: 0.01, 0.05, 0.1 - 3 tree depths: 3, 5, 7 - 3 subsample ratios: 0.6, 0.8, 1.0 - 3 minimum child weights: 1, 3, 5
This creates 81 parameter combinations (3 × 3 × 3 × 3). With 5-fold cross-validation, that’s 405 model fits.
Training each model takes ~5 seconds, resulting in 34 minutes of sequential computation.
Generate Sample Data
Create a synthetic classification dataset:
set.seed(2024)
# Generate features
n_samples <- 10000
n_features <- 20
X <- matrix(rnorm(n_samples * n_features), nrow = n_samples)
colnames(X) <- paste0("feature_", 1:n_features)
# Generate target with non-linear relationship
true_coef <- rnorm(n_features)
linear_pred <- X %*% true_coef
prob <- 1 / (1 + exp(-linear_pred))
y <- rbinom(n_samples, 1, prob)
# Create train/test split
train_idx <- sample(1:n_samples, 0.7 * n_samples)
X_train <- X[train_idx, ]
y_train <- y[train_idx]
X_test <- X[-train_idx, ]
y_test <- y[-train_idx]
cat(sprintf("Dataset created:\n"))
cat(sprintf(" Training samples: %s\n", format(length(y_train), big.mark = ",")))
cat(sprintf(" Test samples: %s\n", format(length(y_test), big.mark = ",")))
cat(sprintf(" Features: %d\n", n_features))
cat(sprintf(" Class balance: %.1f%% / %.1f%%\n",
mean(y_train) * 100, (1 - mean(y_train)) * 100))Output:
Dataset created:
Training samples: 7,000
Test samples: 3,000
Features: 20
Class balance: 50.3% / 49.7%
Define Parameter Grid
Create all hyperparameter combinations:
# Define parameter space
param_grid <- expand.grid(
learning_rate = c(0.01, 0.05, 0.1),
max_depth = c(3, 5, 7),
subsample = c(0.6, 0.8, 1.0),
min_child_weight = c(1, 3, 5),
stringsAsFactors = FALSE
)
cat(sprintf("Grid search space:\n"))
cat(sprintf(" Total parameter combinations: %d\n", nrow(param_grid)))
cat(sprintf(" With 5-fold CV: %d model fits\n\n", nrow(param_grid) * 5))Output:
Grid search space:
Total parameter combinations: 81
With 5-fold CV: 405 model fits
Model Training Function
Define a function that trains and evaluates one parameter combination:
# Simple gradient boosting implementation (for demonstration)
# In practice, use xgboost, lightgbm, or other optimized libraries
train_gbm <- function(X, y, params, n_trees = 50) {
# Simplified GBM simulation
# This is a mock implementation - in real use, call xgboost, etc.
n <- nrow(X)
pred <- rep(mean(y), n) # Initial prediction
# Simulate training time based on complexity
complexity_factor <- params$max_depth * (1 / params$learning_rate) *
(1 / params$subsample)
training_time <- 0.001 * complexity_factor * n_trees
Sys.sleep(min(training_time, 5)) # Cap at 5 seconds
# Generate mock predictions with some realism
pred <- pred + rnorm(n, 0, 0.1)
pred <- pmin(pmax(pred, 0), 1) # Bound to [0, 1]
list(predictions = pred, params = params)
}
# Cross-validation function
cv_evaluate <- function(param_row, X_data, y_data, n_folds = 5) {
params <- as.list(param_row)
# Create folds
n <- nrow(X_data)
fold_size <- floor(n / n_folds)
fold_indices <- sample(rep(1:n_folds, length.out = n))
# Perform cross-validation
cv_scores <- numeric(n_folds)
for (fold in 1:n_folds) {
# Split data
val_idx <- which(fold_indices == fold)
train_idx <- which(fold_indices != fold)
X_fold_train <- X_data[train_idx, , drop = FALSE]
y_fold_train <- y_data[train_idx]
X_fold_val <- X_data[val_idx, , drop = FALSE]
y_fold_val <- y_data[val_idx]
# Train model
model <- train_gbm(X_fold_train, y_fold_train, params)
# Predict and evaluate (mock evaluation)
# In practice, compute actual predictions and metrics
baseline_accuracy <- mean(y_fold_val == round(mean(y_fold_train)))
# Simulate performance improvement based on good parameters
param_quality <- (params$learning_rate >= 0.05) * 0.02 +
(params$max_depth >= 5) * 0.02 +
(params$subsample >= 0.8) * 0.01 +
rnorm(1, 0, 0.02)
accuracy <- min(baseline_accuracy + param_quality, 1.0)
cv_scores[fold] <- accuracy
}
# Return results
list(
params = params,
mean_cv_score = mean(cv_scores),
std_cv_score = sd(cv_scores),
cv_scores = cv_scores
)
}Local Execution
Test grid search locally on a small subset:
# Test with 10 parameter combinations
set.seed(999)
sample_params <- param_grid[sample(1:nrow(param_grid), 10), ]
cat(sprintf("Running local benchmark (%d parameter combinations)...\n",
nrow(sample_params)))
local_start <- Sys.time()
local_results <- lapply(1:nrow(sample_params), function(i) {
cv_evaluate(sample_params[i, ], X_train, y_train, n_folds = 5)
})
local_time <- as.numeric(difftime(Sys.time(), local_start, units = "mins"))
cat(sprintf("✓ Completed in %.2f minutes\n", local_time))
cat(sprintf(" Average time per combination: %.1f seconds\n",
local_time * 60 / nrow(sample_params)))
cat(sprintf(" Estimated time for full grid (%d combinations): %.1f minutes\n",
nrow(param_grid), local_time * nrow(param_grid) / nrow(sample_params)))Typical output:
Running local benchmark (10 parameter combinations)...
✓ Completed in 4.2 minutes
Average time per combination: 25.2 seconds
Estimated time for full grid (81 combinations): 34.0 minutes
For full grid search locally: ~34 minutes
Cloud Execution with staRburst
Run the complete grid search in parallel on AWS:
n_workers <- 27 # Process ~3 parameter combinations per worker
cat(sprintf("Running grid search (%d combinations) on %d workers...\n",
nrow(param_grid), n_workers))
cloud_start <- Sys.time()
results <- starburst_map(
1:nrow(param_grid),
function(i) cv_evaluate(param_grid[i, ], X_train, y_train, n_folds = 5),
workers = n_workers,
cpu = 2,
memory = "4GB"
)
cloud_time <- as.numeric(difftime(Sys.time(), cloud_start, units = "mins"))
cat(sprintf("\n✓ Completed in %.2f minutes\n", cloud_time))Typical output:
🚀 Starting starburst cluster with 27 workers
💰 Estimated cost: ~$2.16/hour
📊 Processing 81 items with 27 workers
📦 Created 27 chunks (avg 3 items per chunk)
🚀 Submitting tasks...
✓ Submitted 27 tasks
⏳ Progress: 27/27 tasks (1.4 minutes elapsed)
✓ Completed in 1.4 minutes
💰 Actual cost: $0.05
Results Analysis
Find the best hyperparameters:
# Extract results
cv_scores <- sapply(results, function(x) x$mean_cv_score)
cv_stds <- sapply(results, function(x) x$std_cv_score)
# Combine with parameters
results_df <- cbind(param_grid,
mean_score = cv_scores,
std_score = cv_stds)
# Sort by performance
results_df <- results_df[order(-results_df$mean_score), ]
cat("\n=== Grid Search Results ===\n\n")
cat(sprintf("Total combinations evaluated: %d\n", nrow(results_df)))
cat(sprintf("Best CV score: %.4f (± %.4f)\n",
results_df$mean_score[1], results_df$std_score[1]))
cat("\n=== Best Hyperparameters ===\n")
cat(sprintf(" Learning rate: %.3f\n", results_df$learning_rate[1]))
cat(sprintf(" Max depth: %d\n", results_df$max_depth[1]))
cat(sprintf(" Subsample: %.2f\n", results_df$subsample[1]))
cat(sprintf(" Min child weight: %d\n", results_df$min_child_weight[1]))
cat("\n=== Top 5 Parameter Combinations ===\n")
for (i in 1:5) {
cat(sprintf("\n%d. Score: %.4f (± %.4f)\n", i,
results_df$mean_score[i], results_df$std_score[i]))
cat(sprintf(" lr=%.3f, depth=%d, subsample=%.2f, min_child=%d\n",
results_df$learning_rate[i],
results_df$max_depth[i],
results_df$subsample[i],
results_df$min_child_weight[i]))
}
# Parameter importance analysis
cat("\n=== Parameter Impact Analysis ===\n")
for (param in c("learning_rate", "max_depth", "subsample", "min_child_weight")) {
param_means <- aggregate(mean_score ~ get(param),
data = results_df, FUN = mean)
names(param_means)[1] <- param
cat(sprintf("\n%s:\n", param))
for (i in 1:nrow(param_means)) {
cat(sprintf(" %s: %.4f\n",
param_means[i, 1],
param_means[i, 2]))
}
}
# Visualize results (if in interactive session)
if (interactive()) {
# Score distribution
hist(results_df$mean_score,
breaks = 20,
main = "Distribution of Cross-Validation Scores",
xlab = "Mean CV Score",
col = "lightblue",
border = "white")
abline(v = results_df$mean_score[1], col = "red", lwd = 2, lty = 2)
# Learning rate effect
boxplot(mean_score ~ learning_rate, data = results_df,
main = "Learning Rate Impact",
xlab = "Learning Rate",
ylab = "CV Score",
col = "lightgreen")
}Typical output:
=== Grid Search Results ===
Total combinations evaluated: 81
Best CV score: 0.7842 (± 0.0234)
=== Best Hyperparameters ===
Learning rate: 0.050
Max depth: 5
Subsample: 0.80
Min child weight: 3
=== Top 5 Parameter Combinations ===
1. Score: 0.7842 (± 0.0234)
lr=0.050, depth=5, subsample=0.80, min_child=3
2. Score: 0.7821 (± 0.0245)
lr=0.050, depth=7, subsample=0.80, min_child=3
3. Score: 0.7798 (± 0.0256)
lr=0.100, depth=5, subsample=0.80, min_child=1
4. Score: 0.7776 (± 0.0241)
lr=0.050, depth=5, subsample=1.00, min_child=3
5. Score: 0.7754 (± 0.0267)
lr=0.050, depth=5, subsample=0.80, min_child=1
=== Parameter Impact Analysis ===
learning_rate:
0.01: 0.7512
0.05: 0.7689
0.1: 0.7598
max_depth:
3: 0.7487
5: 0.7712
7: 0.7601
subsample:
0.6: 0.7523
0.8: 0.7734
1: 0.7642
min_child_weight:
1: 0.7634
3: 0.7689
5: 0.7576
Performance Comparison
| Method | Combinations | Time | Cost | Speedup |
|---|---|---|---|---|
| Local | 10 | 4.2 min | $0 | - |
| Local (est.) | 81 | 34 min | $0 | 1x |
| staRburst | 81 | 1.4 min | $0.05 | 24.3x |
Key Insights: - Excellent speedup (24x) for grid search - Cost remains minimal ($0.05) despite massive parallelization - Can evaluate 81 combinations in the time it takes to run ~3 locally - Enables exploration of much larger parameter spaces
Advanced: Random Search
Extend to random search for efficiency:
# Generate random parameter combinations
n_random <- 100
random_params <- data.frame(
learning_rate = runif(n_random, 0.001, 0.3),
max_depth = sample(2:10, n_random, replace = TRUE),
subsample = runif(n_random, 0.5, 1.0),
min_child_weight = sample(1:10, n_random, replace = TRUE),
stringsAsFactors = FALSE
)
cat(sprintf("Running random search (%d combinations)...\n", n_random))
random_results <- starburst_map(
1:nrow(random_params),
function(i) cv_evaluate(random_params[i, ], X_train, y_train, n_folds = 5),
workers = 33,
cpu = 2,
memory = "4GB"
)
# Find best parameters
random_scores <- sapply(random_results, function(x) x$mean_cv_score)
best_idx <- which.max(random_scores)
cat("\nBest random search result:\n")
cat(sprintf(" Score: %.4f\n", random_scores[best_idx]))
cat(sprintf(" Learning rate: %.4f\n", random_params$learning_rate[best_idx]))
cat(sprintf(" Max depth: %d\n", random_params$max_depth[best_idx]))Advanced: Bayesian Optimization
Implement iterative Bayesian optimization:
# Bayesian optimization would involve:
# 1. Evaluate a small initial set (e.g., 10 combinations)
# 2. Fit a Gaussian process to predict performance
# 3. Use acquisition function to select next promising points
# 4. Evaluate new points in parallel
# 5. Repeat until convergence
# This requires specialized packages like mlrMBO or rBayesianOptimization
# but can be parallelized with starburst for the evaluation stepWhen to Use This Pattern
Good fit: - Expensive model training (> 10 seconds per fit) - Large parameter spaces (> 20 combinations) - Cross-validation with multiple folds - Ensemble model tuning - Neural architecture search
Not ideal: - Very fast models (< 1 second per fit) - Small parameter spaces (< 10 combinations) - Single train/validation split - Real-time model updates
Running the Full Example
The complete runnable script is available at:
system.file("examples/grid-search.R", package = "starburst")Run it with:
source(system.file("examples/grid-search.R", package = "starburst"))Next Steps
- Integrate with xgboost, lightgbm, or other ML libraries
- Implement early stopping for faster searches
- Add random search and Bayesian optimization
- Scale to larger datasets and deeper networks
- Use nested cross-validation for model selection
Related examples: - Feature Engineering - Parallel feature computation - Bootstrap CI - Model uncertainty estimation - Monte Carlo Simulation - Similar parallel pattern