Tip of the Week #29                     Tip Index

Go to the Prior Tip Definitive's "Scenario"  Monte Carlo Simulation Tool
Go to the Next Tip Improve Your Lottery Expected Value
Go to the Earlier Tip this topic Monte Carlo Stopping Rule, Part 1
Return to MaxValue Home Page

Monte Carlo Stopping Rule, Part 2

In discussing this with Eric Wainwright, Decisioneering's chief scientist, in May 1999, I became more clear about how we should stop trials in a Monte Carlo simulation. Here is a revised approach from what appeared earlier on this page.
Crystal Ball 2000 and later features Precision Control, a stopping function that uses the "bootstrapping" method. They have applied for a patent on their algorithm, which I haven't seen. Precision Control has features to use a variety of statistics in stopping rules.

Prior Tip 25, Monte Carlo Stopping Rule, Part 1, discused the standard error of the mean (SEM) statistic as a measure of the precision in calculating a sample mean. It turns out that this is too large when the simulation uses Latin hypercube sampling (LHS).

This tip offers a stopping rule procedure when using LHS or some variant of LHS:

What is LHS?

Conventional Monte Carlo sampling uses a random number, ranging 0-1, to sample distribution functions. If you examine a series of random numbers, you not see a uniform distribution. Instead, you will find gaps and clusters. LHS is an anti-clustering technique. If we use 100 as the number of layers, the system generates random numbers so that each .01 interval on the 0-1 range gets exactly one sample in every 100 trials. This avoids clustering in some intervals and missing samples in others.

The Latin Hypercube Sampling name and idea comes from a Latin squares geometric construction by 18th century Swiss mathematician Leonhard Euler. Design of experiments (a.k.a. experimental design) also uses a similar technique.

Here is the process that I use for LHS in one dimension:

  1. Determine a "batch size" for convergence testing. Consider a batch size that is the lesser of:
      (a) the sample sized used for Latin hypercube and correlation sampling in the simulation tool or
      (b) 1/5 (rounded) of the minimum number of trials anticipated.

  2. After several batch size intervals, determine whether there are sufficient trials to assure adequate convergence. The test is below in step 5. Determine this by either:

  3. Method A. As the simulation is running, accumulate sample values in batch-sized sets. After completing at least five batch sets, determine whether there is adequate convergence: Calculate the sample mean of each batch set.

    Method B. This uses bootstrapping, a statistical technique involving resampling data. With the trials values generated thus far:
    (a) Randomly select values for batch-size groups (obtaining these by sampling without replacement is ideal).
    (b) Calculate the sample mean of each batch set.

  4. Calculate the standard deviation of the sample means. (This would be a good approximattion of SEM for a simulations of one batch size trials.)

  5. Adjust the SEM calculated in the prior step for the actual number of trials in the simulation run.  For example, if batch size is 100 and 500 trials were run, then the SEM of the sample mean is
       SEM500 >= SEM100 x sqrt(100/500).

  6. Stop when the SEM is < 1% of the mean (or whatever stopping threshold that you choose. Otherwise, return to Step 2 and perform another batch-size of simulation trials.

If you need precision better than the SEM calculated as above, then run more trials in multiples of Sample Size. SEM should decrease as the sqrt(NoTrials). Recall that a 68% confidence interval for the Expected Value is the overall sample mean +/- SEM. The 68% is from a table of the standard normal distribution. Here is an example statement of the calculation: "We estimate EMV = $1.867 ± .040 million (with 1000 trials).

This is the best stopping rule that I can offer. Perhaps Eric's Precision Control method in Crystal Ball is better. Compared to conventional Monte Carlo sampling, for most purposes, LHS greatly reduces the number of trials (by a factor of 5 or more) required for the same degree of EV precision.

The Student-t distribution provides an additional refinement in correcting the sample standard deviation for the small number of values.

Multi-dimensioned LHS
Others people have likely thought of this as well: If there n very important uncertainties to the problem, imagine an n-dimensioned array. Let's say n = 3, so we can call the array a cube. Suppose that we divide the cube into 10 intervals along each dimension. There are 1000 cells in the cube. Sample the distribution for each cube cell dimension using LHS. For every 1000 trials, we obtain a value in each cell for each distribution. Ideally, we run the simulation for an integer number of 1000 trials. This will converge much faster that LHS performed separately on each variable.

Update: In searching palisade.com for a technique related to LHS, I came across a note in their knowledge base. "6.23 Confidence intervals in @RISK" http://kb.palisade.com/index.php?pg=kb.page&id=124 This describes a similar process and a published paper: Michael Stein, "Large Sample Properties of Simulations Using Latin Hypercube Sampling", Technometrics 29:2 [May 1987], pages 143-151. A link to a posted copy of the paper is inside the Palisade article. (Otherwise, the paywall site is http://www.jstor.org/stable/1269769 (working 23-May-2019).

John Schuyler, January 1998. Last revised 24-May-2019.

Copyright   1998-2019 by John R. Schuyler. All rights reserved. Permission to copy with reproduction of this notice.