vlad malik

Visual patterns for A/B test structure
2 days ago

Visual design is the substance of your test. Test structure is the backbone.

In this post, I will introduce my preferred terms for describing test structure (things like test conditions, goals, and pages), and I’ll use a visual language to cover the basic patterns. Here’s an example:

Example test

A simple test showing Gate (Start), Path, and two Goals (the big black circle is primary).

Keep reading

Kick-ass headline animations convey more meaning
5 days ago

Convey extra information and mood

A slow animation can communicate weight or importance. A fast animation can communicate urgency. Animating a headline adds meaning.

On my breath-hold diving site, the countdown starts slow to create tension or suspense. Then it speeds up to create excitement, illustrate the sheer duration of the world breath hold record, and of course avoid boring the visitor. The headline animation also doubles as a delay for showing the intro paragraph, which becomes a visual cue to continue downward.

Animated counter from zero to 11:35

This counter turns the headline into a center-piece, communicates mood, and guides the visitor’s gaze downward.

Keep reading

Simulations are faster and more intuitive than calculations
4 weeks ago

I use simulations all the time to help answer questions like:

  • Is this outcome possible?
  • What outcomes are most likely?
  • How much data is enough?

Simulations can give an answer faster than detailed calculations. They are less precise but far more intuitive. If you run a simulation 10 times and get a certain outcome even once, you know it’s possible. If you get it a few times, you know it’s quite likely. If you want more confidence, just rerun the simulation 10 or 100 or 1000 more times.

What if?

In the previous post, I included an under-powered simulation in Excel, where we ended up with a 23.7% drop instead of the true +10% lift. Using that template, you can set up a simulation in seconds.

Keep reading

Easy A/B test simulations using Excel / Google Docs
1 month ago

Simulated visitors, power, and significance on one page

Simulations can answer key questions without painful calculations. If you haven’t gotten around to learning R, here’s an A/B test simulator for Excel or Google Docs. It does a power calculation, so you can see the impact of baseline conversion rate, effect size, and traffic has on your chances of detecting an effect. It gives the effect size and p-value for each outcome.

Download Excel A/B test simulator


Here is the default simulation (every time you load it or refresh it, it’s different):


The default simulation settings and outcome

Keep reading

Business hypotheses improve A/B tests
3 months ago

A hypothesis is an explanation of why something is the way it is.

Example business hypothesis:

“We are a new company, and visitors have doubts about the quality of our product.”

Do we really create hypotheses to test them?

To see if my example hypothesis is true, it would be best to talk to some potential customers. A/B testing is not really about testing business hypotheses but about using them to iterate a design. An A/B tester is not a scientist. He takes the hypothesis as inspiration for new visual treatments in order to increase his chances of raising revenue.

Keep reading

6 statistical reasons to avoid AAB, ABB, AABB tests
5 months ago

Not a precise way to reduce false positive risk

One argument for A/A/B/B tests is reducing the risk that B will win just by chance (false positive). The thinking is that unless both identical Bs outperform both identical As, you should reject the winner.

Yes, when you require the two As to match, you’re lowering your false positive risk. This is simply because you are indirectly choosing a lower significance level cut-off. Let’s see an example.

In the next simulation, exactly the same data is shown as an A/B test and as an A/A/B/B test. A has the average baseline conversion rate of 5% with 0.5% variation. B is a true 15% improvement. The total sample of 27,000 visitors has good power:



A/B result with p-value 0.002 is rejected because both B subsamples did not win and show drastic difference in performance

Keep reading

Pack your headings with content
7 months ago

Try a complete statement

A visitor should learn something just by reading your headings. Write informative headings. Don’t save content for later.

On HelpTheChickens.ca, we can replace the generic headline with text from their intro paragraph:



Keep reading

5 ways to calculate A/B test confidence with 1 line of JavaScript
7 months ago


B did better than Basline this time, but what if we tested again? How trustworthy is this result?

abstats.js is a small library that gives you 5 ways to answer this question. You need no programming skills to use it. It’s available on this page, so open up browser console and run any of the examples in this article.

Way #1: Estimate the true conversion rates

With abstats.js, I can easily get 95% confidence intervals for each variation:

interval_binary(100, 2000, 0.95) // returns {upper=0.0605, point=0.0509, lower=0.0413}
interval_binary(130, 2100, 0.95) // returns {upper=0.0731, point=0.0628, lower=0.0524}

This gives me the point estimate and the margin of error for A and B:


This says that my best estimates for the true conversion rates are Basline = 5.2% and B = 6.2%. Nonetheless, A could be as high as 6%, while B could be as low as 5.2%. So, it’s plausible that B is actually worse than A but performed better just by chance. How likely is that to happen?

Keep reading

I Have An A/B Test Winner, So Why Can’t I See The Lift?
8 months ago

In the town of Perfectville, a company ran a winning A/B test with a 20% lift. A few weeks after implementing the winner, they checked their daily conversions data:



97 days of daily conversion rates in Perfectville showing 20% lift


The graph perfectly related what happened: The baseline increased by 10% during the test, with half the traffic exposed to the winning variation. Then was the week when the test was stopped, followed by a lift of 20% once the winner was implemented.

The good people in nearby Realville heard about this and ran the test on their site. When they later checked their daily conversions data, they scratched their heads (as they often do in Realville):



97 days of daily conversion rates in Realville showing same improvement


The data actually includes the same 10% lift during the test, a gap, and a final 20% improvement. The problem is the improvement is relative to natural fluctuations in daily conversion rates, so 20% improvement doesn’t necessarily mean 20% lift.

Here are 6 reasons why people in Realville might find it difficult to see a lift and what they can do about it.

Keep reading

@VladMalik is an interaction designer based in Toronto.
I enjoy breath-hold diving, weight-lifting, and chopping wood.
I eat a purely plant-based diet.

© 2015 License for all content: Attribution not required. No commercial use.