Visual design is the substance of your test. **Test structure is the backbone.**

In this post, I will introduce **my preferred terms** for describing test structure (things like test conditions, goals, and pages), and I’ll use a **visual language** to cover the basic patterns. Here’s an example:

A simple test showing Gate (Start), Path, and two Goals (the big black circle is primary).

Keep reading

### Convey extra information and mood

A **slow animation** can communicate weight or importance. A **fast animation** can communicate urgency. Animating a headline adds meaning.

On my breath-hold diving site, the countdown starts slow to create tension or suspense. Then it speeds up to create excitement, illustrate the sheer duration of the world breath hold record, and of course avoid boring the visitor. The headline animation also doubles as a delay for showing the intro paragraph, which becomes a visual cue to continue downward.

This counter turns the headline into a center-piece, communicates mood, and guides the visitor’s gaze downward.

Keep reading

I use simulations all the time to help answer questions like:

- Is this outcome possible?
- What outcomes are most likely?
- How much data is enough?

Simulations can **give an answer faster **than detailed calculations. They are **less precise but far more intuitive.** If you run a simulation 10 times and get a certain outcome even once, you know it’s possible. If you get it a few times, you know it’s quite likely. **If you want more confidence, just rerun** the simulation 10 or 100 or 1000 more times.

### What if?

In the previous post, I included an under-powered simulation in Excel, where we ended up with a 23.7% drop instead of the true +10% lift. Using that template, you can set up a simulation in seconds.

Keep reading

### Simulated visitors, power, and significance on one page

Simulations can answer key questions without painful calculations. If you haven’t gotten around to learning R, here’s an A/B test simulator for Excel or Google Docs. It does a **power calculation**, so you can see the impact of baseline conversion rate, effect size, and traffic has on your chances of detecting an effect. It gives the effect size and **p-value** for each outcome.

Download Excel A/B test simulator

### Example

Here is the default simulation (every time you load it or refresh it, it’s different):

*The default simulation settings and outcome*

Keep reading

A hypothesis is an explanation of why something is the way it is.

### Example business hypothesis:

“We are a new company, and visitors have doubts about the quality of our product.”

### Do we really create hypotheses to test them?

To see if my example hypothesis is true, it would be best to talk to some potential customers. A/B testing is **not really about testing business hypotheses but about using them **to iterate a design. An A/B tester is not a scientist. He takes the** hypothesis as inspiration** for new visual treatments in order to increase his chances of raising revenue.

Keep reading

### Not a precise way to reduce false positive risk

One argument for A/A/B/B tests is reducing the risk that B will win just by chance (false positive). The thinking is that unless both identical Bs outperform both identical As, you should reject the winner.

**Yes, when you require the two As to match, you’re lowering your false positive risk**. This is simply because you are **indirectly choosing a lower significance level cut-off**. Let’s see an example.

In the next simulation, *exactly the same data* is shown as an A/B test and as an A/A/B/B test. A has the average baseline conversion rate of 5% with 0.5% variation. B is a true 15% improvement. The total sample of 27,000 visitors has good power:

*A/B result with p-value 0.002 is rejected because both B subsamples did not win and show drastic difference in performance*

Keep reading

### Try a complete statement

A visitor should **learn something just by reading your headings**. Write informative headings. Don’t save content for later.

On HelpTheChickens.ca, we can replace the generic headline with text from their intro paragraph:

Keep reading

### Example

- Baseline 100 sales out of 2000 visitors (5.0% rate)
- Variation B 130 sales out of 2100 visitors (6.2% rate)

B did better than Basline this time, but what **if we tested again**? How trustworthy is this result?

**abstats.js** is a small **library **that gives you 5 ways to answer this question. You need **no programming skills** to use it. It’s available on this page, so open up **browser console** and run any of the examples in this article.

### Way #1: Estimate the true conversion rates

With abstats.js, I can easily get 95% confidence intervals for each variation:

`interval_binary(100, 2000, 0.95) // returns {upper=0.0605, point=0.0509, lower=0.0413}`

interval_binary(130, 2100, 0.95) // returns {upper=0.0731, point=0.0628, lower=0.0524}

This gives me the point estimate and the margin of error for A and B:

This says that my best estimates for the true conversion rates are Basline = 5.2% and B = 6.2%. Nonetheless, A could be as high as 6%, while B could be as low as 5.2%. So, it’s plausible that B is actually worse than A but performed better just by chance. **How likely is that to happen?**

Keep reading

In the **town of Perfectville**, a company ran a winning A/B test with a 20% lift. A few weeks after implementing the winner, they checked their daily conversions data:

*97 days of daily conversion rates in Perfectville showing 20% lift*

The graph perfectly related what happened: The baseline increased by 10% during the test, with half the traffic exposed to the winning variation. Then was the week when the test was stopped, followed by a lift of 20% once the winner was implemented.

The good people **in nearby Realville** heard about this and ran the test on their site. When they later checked their daily conversions data, they scratched their heads (as they often do in Realville):

*97 days of daily conversion rates in Realville showing same improvement*

The data actually includes the same 10% lift during the test, a gap, and a final 20% improvement. The problem is the improvement is relative to natural fluctuations in daily conversion rates, so 20% improvement doesn’t necessarily mean 20% lift.

Here are 6 reasons why people in Realville might find it difficult to see a lift and what they can do about it.

Keep reading

© 2015 License for all content: Attribution not required. No commercial use.