I use simulations all the time to help answer questions like: Is this outcome possible? What outcomes are most likely? How much data is enough?
Simulations can give an answer faster than detailed calculations. They are less precise but far more intuitive. If you run a simulation 10 times and get a certain outcome even once, you know it’s possible. If you get it a few times, you know it’s quite likely. If you want more confidence, just rerun the simulation 10 or 100 or 1000 more times.
What if?
In the previous post, I included an under-powered simulation in Excel, where we ended up with a 23.7% drop instead of the true +10% lift. Using that template, you can set up a simulation in seconds.
Now, let’s go beyond that example. What if we collected 14,500 more visitors? Would such a test be more likely to detect our true effect? Could the measured effect still point in the opposite direction, and if so then by how much? We can add more rows to the template and get some quick answers:
- The risk of pointing in the opposite direction is low (a loss by up to 10% would have about a 1 in 100 chance of happening and would not be statistically significant)
- I should expect the effect to be positive this time with the effect range reliably between +2% and+25%
- The effect would need to be inflated by chance to come out statistically significant i.e. it would need to be closer to 25% rather than the true 10%.
- a total sample of 15,000 is barely adequate
Interpreting unexpected outcomes
What if I ran you test and measured a -20% significant decrease instead. Based on the previous simulation, I would know that the true effect is likely not 10% as I anticipated.
What if I measured a 40% lift? I could conclude that the true effect really is higher than what I predicted. To test this hypothesis, I rerun the simulation with the lower 5% true lift and confirm that an outcome even close to 40% doesn’t happen even once in hundreds of trials in that case.