vlad malik

Now thinking about:

Scientists test hypotheses, A/B testers use hypotheses 2 months ago

A hypothesis is an explanation of why something is the way it is.

Example business hypothesis:

“We are a new company, and visitors have doubts about the quality of our product.” It’s practically impossible to test this hypothesis with an A/B test. A/B testers test visual treatments, not business hypotheses.

Is it true?   vs.   Assuming it’s true, how can we use it?

A scientist designs an experiment to challenge the hypothesis and see if it holds up. This methodical, very precise experiment may have no immediate business benefit.

An A/B tester takes the hypothesis as inspiration and builds on it to create a new design. The purpose of testing the new design is to confirm that the new visual treatment has immediate business benefit. Even if successful, such a test does not mean that the business hypothesis is actually true.

Hypotheses provide context, encourage intentionality, lead to stronger concepts

This doesn’t always mean writing formal-sounding statements. It means challenging yourself and your team to justify visual changes and weed out implausible theories.

A hypothesis is a theme that helps focus our efforts. Stating explicit hypotheses can facilitate organization and ideation of visual treatments. “Hey, I think our pricing might be confusing. Let’s attack that with a few tests next month”. Whether pricing truly is confusing or not, testing new ideas might reveal better ways of doing pricing.

Hypotheses help reduce exposure to false positives. These days it’s easy to collect tons of data and then make numerous comparisons until you find a significant effect just by chance. In contrast, a well-reasoned hypothesis allows “prior information” to inform test results. For example, say we have some good reason to expect X to create a positive change in user behavior Y, and then we see this confirmed in a test. That result has a higher chance of being trustworthy compared to a random finding.

Null/alternative hypotheses are not business hypotheses

A null hypothesis for experimental purposes says that “there is no difference between the things compared”, while the alternative hypothesis is that “there is a difference”. But statistical hypotheses have nothing to do with the business hypothesis.

If you run a test and get a statistically significant result, you have indirect evidence whether a visual treatment caused an effect, but you have no evidence about your business hypothesis, which may be true or false, who knows.

Same business hypothesis, a million visual implementations

If we think that quality is a concern for our visitors, we can add customer testimonials, quality certifications, manufacturing details, reframe “new company” as “innovative”, and so on. For each of these many strategies, there are a million possible visual implementations.

A/B tests are so multi-faceted that they rarely if ever provide compelling evidence regarding a business hypothesis. If we tried customer testimonials and the test was a success, did we prove the hypothesis? No. Maybe testimonials drew more attention to the customer service, and quality was never the problem. If the test failed, do we reject the hypothesis? No. Maybe a different way of doing testimonials would work.

If a test fails, we should not give up on a business hypothesis too quickly. Even though an A/B test is usually insufficient to test a business hypothesis, if you attack it from multiple angles, both subtle and more direct, you might get better insight into the business hypothesis. You should test important business hypotheses with more direct methods like user research, surveys, and so on.

Reader should get your main points from headings alone 5 months ago

Try a complete statement

A visitor should learn something just by reading your headings. Write informative headings. Don’t save content for later.

On HelpTheChickens.ca, we can replace the generic headline with text from their intro paragraph:




Beware one-word headings. We can replace the generic heading on this site’s membership page with a value proposition and a call to action:


Keep reading

5 ways to calculate confidence in A/B test results using JavaScript 6 months ago


B did better than Basline this time, but what if we tested again? How trustworthy is this result?

abstats.js is a small library that gives you 5 ways to answer this question. You need no programming skills to use it. It’s available on this page, so open up browser console and run any of the examples in this article.

Way #1: Estimate the true conversion rates

With abstats.js, I can easily get 95% confidence intervals for each variation:

interval_binary(100, 2000, 0.95) // returns {upper=0.0605, point=0.0509, lower=0.0413}
interval_binary(130, 2100, 0.95) // returns {upper=0.0731, point=0.0628, lower=0.0524}

This gives me the point estimate and the margin of error for A and B:


This says that my best estimates for the true conversion rates are Basline = 5.2% and B = 6.2%. Nonetheless, A could be as high as 6%, while B could be as low as 5.2%. So, it’s plausible that B is actually worse than A but performed better just by chance. How likely is that to happen?

Keep reading

I Have An A/B Test Winner, So Why Can’t I See The Lift? 6 months ago

In the town of Perfectville, a company ran a winning A/B test with a 20% lift. A few weeks after implementing the winner, they checked their daily conversions data:



97 days of daily conversion rates in Perfectville showing 20% lift


The graph perfectly related what happened: The baseline increased by 10% during the test, with half the traffic exposed to the winning variation. Then was the week when the test was stopped, followed by a lift of 20% once the winner was implemented.

The good people in nearby Realville heard about this and ran the test on their site. When they later checked their daily conversions data, they scratched their heads (as they often do in Realville):



97 days of daily conversion rates in Realville showing same improvement


The data actually includes the same 10% lift during the test, a gap, and a final 20% improvement. The problem is the improvement is relative to natural fluctuations in daily conversion rates, so 20% improvement doesn’t necessarily mean 20% lift.

Here are 6 reasons why people in Realville might find it difficult to see a lift and what they can do about it.

Keep reading

@VladMalik is an interaction designer based in Toronto.
I enjoy breath-hold diving, weight-lifting, and chopping wood.
I eat only plant foods.