Simulation Optimization
Overview
Simulation optimization finds the best input parameters for a system that can only be evaluated through simulation. Unlike classical optimization where the objective function has a closed form, here each function evaluation requires running a stochastic simulation — expensive, noisy, and often black-box.
┌────────────┐ parameters ┌────────────┐ noisy output
│ Optimization│───────────────►│ Simulation │──────────────► statistics
│ Algorithm │◄───────────────│ Model │
└────────────┘ performance └────────────┘
Ranking and Selection
Given k alternative system configurations, identify the best (highest or lowest mean performance) with a statistical guarantee.
Indifference-Zone Procedures
Guarantee: select the best system with probability >= 1 - alpha, whenever the best is at least delta better than the second best.
Procedure (Rinott's two-stage):
- Take n_0 replications of each system
- Compute sample means and variances
- Calculate total replications needed per system: n_i = max(n_0, ceil((h·S_i / delta)²))
- Take additional replications as needed
- Select system with best sample mean
// Select the system with the smallest mean from simulation results
PROCEDURE RANKING_AND_SELECTION(means, std_errs, delta) → index
// Simple: return index of minimum mean
// In practice: apply Rinott or KN procedure for guarantees
best_index ← 0
FOR i ← 1 TO LENGTH(means) - 1 DO
IF means[i] < means[best_index] THEN
best_index ← i
RETURN best_index
Procedures for Many Systems
- KN procedure (Kim-Nelson): Sequential elimination. After each batch of replications, eliminate systems that are statistically dominated.
- OCBA (Optimal Computing Budget Allocation): Allocate replications proportionally to uncertainty about which system is best.
Response Surface Methodology (RSM)
Approximate the simulation response with a surrogate model, then optimize the surrogate.
Workflow
- Screening — Identify influential factors (fractional factorial, Morris method)
- First-order model — Fit y = beta_0 + sum(beta_i·x_i) + epsilon using a 2^k factorial design
- Steepest ascent/descent — Follow the gradient of the fitted model
- Second-order model — Near the optimum, fit y = beta_0 + sum(beta_i·x_i) + sum(beta_ii·x_i²) + sum(beta_ij·x_i·x_j) using central composite or Box-Behnken design
- Optimize — Find the stationary point of the quadratic surface
Metamodel-Based Optimization
Modern alternatives to classical RSM:
- Kriging / Gaussian Process (GP) — Interpolating surrogate with uncertainty estimates. Enables Bayesian optimization.
- Radial Basis Functions — Flexible interpolation for high-dimensional spaces.
- Neural network surrogates — For very complex response surfaces with large training budgets.
Bayesian Optimization
Use a GP surrogate with an acquisition function to balance exploration and exploitation:
- Expected Improvement (EI): E[max(f(x) - f_best, 0)]
- Upper Confidence Bound (UCB): mu(x) + kappa·sigma(x)
- Knowledge Gradient: value of information from one additional sample
Particularly effective when simulation is very expensive (minutes to hours per run) and the parameter space is moderate-dimensional (< ~20).
Simulation-Based Optimization
Stochastic Approximation
Robbins-Monro: iteratively update parameters along noisy gradient estimates:
theta_{n+1} = theta_n - a_n · g_hat(theta_n)
where g_hat is a noisy gradient estimate and {a_n} is a diminishing step-size sequence (a_n -> 0, sum(a_n) = inf, sum(a_n²) < inf).
Stochastic Gradient Methods
- Finite differences: Estimate gradient by perturbing each parameter. Cost: 2d simulations per iteration for d parameters.
- Simultaneous Perturbation (SPSA): Perturb all parameters at once with random direction. Cost: 2 simulations per iteration regardless of d.
- Likelihood ratio / score function: Differentiate the expectation analytically through the probability distribution.
Metaheuristics
For combinatorial or non-smooth problems:
- Genetic algorithms — Population of solutions evolved through selection, crossover, mutation
- Simulated annealing — Random perturbations accepted with probability depending on temperature schedule
- Particle swarm — Solutions move through parameter space influenced by personal and global best
- Cross-entropy method — Iteratively update sampling distribution toward elite solutions
Reinforcement Learning for Simulation
RL treats the simulation as an environment. An agent learns a policy mapping states to actions to maximize cumulative reward.
Simulation as RL Environment
┌─────────┐ action ┌────────────┐ next state, reward
│ RL Agent│──────────►│ Simulation │──────────────────────►
│ (Policy)│◄──────────│Environment │
└─────────┘ observation└────────────┘
Advantages over classical simulation optimization:
- Handles sequential decision-making naturally
- Learns adaptive policies, not just static parameter settings
- Generalizes across states (neural network function approximation)
Challenges:
- Sample efficiency — millions of simulation episodes may be needed
- Sim-to-real gap — policy trained in simulation may fail in reality
- Reward shaping — defining the right reward signal is critical
Applications
- Inventory management — RL agent learns reorder policies
- Traffic signal control — Optimize signal timing based on real-time state
- Manufacturing scheduling — Sequence jobs across machines
- Robot control — Learn locomotion or manipulation in physics simulator
Digital Twins
A digital twin is a live virtual replica of a physical asset, process, or system, continuously synchronized with real-world data.
Architecture
┌─────────────────────────────────────────────┐
│ Physical System │
│ sensors ──► data stream ──► actuator cmds │
└──────┬──────────────────────────────┬────────┘
│ IoT / SCADA ▲
▼ │
┌──────────────────────────────────────────────┐
│ Digital Twin Platform │
│ ┌──────────┐ ┌──────────┐ ┌───────────┐ │
│ │ Data │ │Simulation│ │ Analytics │ │
│ │Ingestion │─►│ Model │─►│& Decision │ │
│ └──────────┘ └──────────┘ └───────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌───────────┐ │
│ │ State │ │ What-If │ │Visualization│ │
│ │Estimation│ │ Scenarios│ │ & UI │ │
│ └──────────┘ └──────────┘ └───────────┘ │
└──────────────────────────────────────────────┘
Synchronization
The twin must reflect current physical state:
- State estimation — Kalman filters, particle filters, or data assimilation techniques fuse sensor data with model predictions
- Parameter updating — Online calibration adjusts model parameters as the physical system drifts
- Update frequency — Ranges from real-time (milliseconds) for control to periodic (hourly/daily) for strategic planning
Predictive Maintenance
A primary digital twin application. The twin tracks degradation state and predicts remaining useful life (RUL):
- Ingest sensor data (vibration, temperature, current)
- Update degradation model (physics-based, data-driven, or hybrid)
- Simulate forward to predict failure time distribution
- Optimize maintenance schedule: minimize cost = f(downtime, repair, spare parts)
// Simple degradation model with digital twin update
STRUCTURE DegradationTwin
degradation: real // current estimated degradation level
drift_rate: real // degradation rate parameter
threshold: real // failure threshold
process_noise: real
measurement_noise: real
estimation_var: real // Kalman filter variance
// Kalman filter update with new sensor reading
PROCEDURE UPDATE(twin, measurement, dt)
// Predict
predicted ← twin.degradation + twin.drift_rate * dt
predicted_var ← twin.estimation_var + twin.process_noise * dt
// Update
kalman_gain ← predicted_var / (predicted_var + twin.measurement_noise)
twin.degradation ← predicted + kalman_gain * (measurement - predicted)
twin.estimation_var ← (1.0 - kalman_gain) * predicted_var
// Estimate remaining useful life
PROCEDURE PREDICT_RUL(twin) → real
remaining ← twin.threshold - twin.degradation
IF remaining ≤ 0.0 THEN RETURN 0.0
RETURN remaining / twin.drift_rate
Parallel and Distributed Simulation
Parallel Discrete-Event Simulation (PDES)
Partition the simulation model into Logical Processes (LPs) executing on separate processors.
Conservative approach (Chandy-Misra-Bryant):
- LPs only process events when safe (no earlier event can arrive from other LPs)
- Uses null messages or lookahead to avoid deadlock
- No rollback but may have idle time waiting for safety guarantees
Optimistic approach (Time Warp):
- LPs process events speculatively
- If a straggler event arrives with earlier timestamp, rollback via state restoration and anti-messages
- Global Virtual Time (GVT) provides a floor below which no rollback can occur — enables fossil collection
Scaling Considerations
| Aspect | Conservative | Optimistic | |--------|-------------|------------| | Correctness | Guaranteed | Via rollback | | Memory | Low | High (state saving) | | Performance | Limited by slowest LP | Better for unbalanced loads | | Complexity | Moderate | High (anti-messages, GVT) |
Cloud and HPC Simulation
- Embarrassingly parallel: Independent replications across nodes (trivial parallelism)
- Parameter sweeps: Each node runs a different parameter combination
- Large-scale PDES: Federated simulations using HLA (High Level Architecture) standard
- GPU acceleration: Thousands of agents or Monte Carlo paths computed in parallel on GPU
Distributed Simulation Standards
- HLA (IEEE 1516): Federates communicate through a Runtime Infrastructure (RTI). Used in defense and large-scale training simulations.
- DIS (IEEE 1278): Protocol for real-time military simulations. Simpler than HLA, uses PDUs (Protocol Data Units) over UDP.
Choosing an Approach
| Problem Type | Recommended Method | |-------------|-------------------| | Few alternatives, need guarantee | Ranking and Selection | | Continuous parameters, expensive simulation | Bayesian Optimization | | High-dimensional continuous | SPSA, stochastic gradient | | Combinatorial | Metaheuristics (GA, SA) | | Sequential decisions | Reinforcement Learning | | Live system monitoring | Digital Twin | | Many independent runs | Parallel replications |