### Lowering alpha is a better way of reducing false positive risk

In an A/B test, B might win just by chance. This risk is called **alpha or significance level**. In the absence of other information, **false positive risk is greatest if** you stop the test as soon as you see a statistically significant lift. This risk is lowest when you plan you sample size upfront and stop when that sample size is reached.

One argument for A/A/B tests is reducing the risk of a false positive: ensure that the two identical As are actually performing the same and that B beats both of them. If that doesn’t happen, you reject the winner.

**Yes, when you require the two As to match, you’re lowering your false positive risk**. This is because you are **indirectly choosing a lower significance level cut-off**. Let’s see an example.

Below is the result from a simulated study. This is *exactly the same data* shown as an A/B test and as an A/A/B/B test. A has the average baseline conversion rate of 5%+/-0.5%. B is a true 15% improvement. The total sample of 27,000 visitors has good power:

*A/B result with p-value 0.002 is rejected because both B subsamples did not win and show drastic difference in performance*

Notice that the** A/B split shows a 17% lift with a very low p-value – an ideal outcome. **However, the A/A/B/B split shows an inflated 40% lift for B subsample, while the other **B lost to one of the As**. So,** if we required both Bs to beat both As and required subsamples to show similar performance, this p-value would not pass.** Keep reading

### Try a complete statement

Your top-level heading is the first thing visitors see in Google and on your site, so get your message in there. Think of **headings as content, not mere labels**.

This original home page of HelpTheChickens.ca simply restated the URL:

I suggested they take text from their intro paragraph and turn that into the heading:

### Try being more specific

This stock heading from RelateIQ says nothing about a potentially useful product:

Keep reading

### Example

- Baseline 100 sales out of 2000 visitors (5.0% rate)
- Variation B 130 sales out of 2100 visitors (6.2% rate)

B did better than Basline this time, but what **if we tested again**? How trustworthy is this result?

**abstats.js** is a small **library **that gives you 5 ways to answer this question. You need **no programming skills** to use it. It’s available on this page, so open up **browser console** and run any of the examples in this article.

### Way #1: Estimate the true conversion rates

With abstats.js, I can easily get 95% confidence intervals for each variation:

`interval_binary(100, 2000, 0.95) // returns {upper=0.0605, point=0.0509, lower=0.0413}`

interval_binary(130, 2100, 0.95) // returns {upper=0.0731, point=0.0628, lower=0.0524}

This gives me the point estimate and the margin of error for A and B:

This says that my best estimates for the true conversion rates are Basline = 5.2% and B = 6.2%. Nonetheless, A could be as high as 6%, while B could be as low as 5.2%. So, it’s plausible that B is actually worse than A but performed better just by chance. **How likely is that to happen?**

Keep reading

In the **town of Perfectville**, a company ran a winning A/B test with a 20% lift. A few weeks after implementing the winner, they checked their daily conversions data:

*97 days of daily conversion rates in Perfectville showing 20% lift*

The graph perfectly related what happened: The baseline increased by 10% during the test, with half the traffic exposed to the winning variation. Then was the week when the test was stopped, followed by a lift of 20% once the winner was implemented.

The good people **in nearby Realville** heard about this and ran the test on their site. When they later checked their daily conversions data, they scratched their heads (as they often do in Realville):

*97 days of daily conversion rates in Realville showing same improvement*

The data actually includes the same 10% lift during the test, a gap, and a final 20% improvement. The problem is the improvement is relative to natural fluctuations in daily conversion rates, so 20% improvement doesn’t necessarily mean 20% lift.

Here are 6 reasons why people in Realville might find it difficult to see a lift and what they can do about it.

Keep reading