vlad malik

Now thinking about:

Scientists test hypotheses, A/B testers use hypotheses 3 weeks ago

A hypothesis is an explanation of why something is the way it is.

Example hypothesis:

“We are a new company, and visitors have doubts about the quality of our product.” It’s practically impossible to test this hypothesis with an A/B test. A/B testers test visual treatments, not business hypotheses.

Is it true?   vs.   Assuming it’s true, how can we use it?

A scientist designs an experiment to challenge the hypothesis and see if it holds up. This methodical, very precise experiment may have no immediate business benefit.

An A/B tester takes the hypothesis as inspiration and builds on it to create a new design. The purpose of testing the new design is to confirm that the new visual treatment has immediate business benefit. Even if successful, such a test does not mean that the business hypothesis is actually true.

Same business hypothesis, a million visual implementations

If we think that quality is a concern for our visitors, we can add customer testimonials, quality certifications, manufacturing details, reframe “new company” as “innovative”, and so on. For each of these many strategies, there are a million possible visual implementations.

A/B tests are so multi-faceted that they rarely if ever provide compelling evidence regarding a business hypothesis. If we tried customer testimonials and the test was a success, did we prove the hypothesis? No. Maybe testimonials drew more attention to the customer service, and quality was never the problem. If the test failed, do we reject the hypothesis? No. Maybe a different way of doing testimonials would work.

Null/alternative hypotheses are not business hypotheses

A null hypothesis for experimental purposes says that “there is no difference between the things compared”, while the alternative hypothesis is that “there is a difference”. But statistical hypotheses have nothing to do with the business hypothesis.

Keep reading

Reader should get your main points from headings alone 4 months ago

Try a complete statement

A visitor should learn something just by reading your headings. Write informative headings. Don’t save content for later.

On HelpTheChickens.ca, we can replace the generic headline with text from their intro paragraph:

 

headlines-chickens

 

Beware one-word headings. We can replace the generic heading on this site’s membership page with a value proposition and a call to action:

 
headline-membership

Keep reading

5 ways to calculate confidence in A/B test results using JavaScript 5 months ago

Example

B did better than Basline this time, but what if we tested again? How trustworthy is this result?

abstats.js is a small library that gives you 5 ways to answer this question. You need no programming skills to use it. It’s available on this page, so open up browser console and run any of the examples in this article.

Way #1: Estimate the true conversion rates

With abstats.js, I can easily get 95% confidence intervals for each variation:

interval_binary(100, 2000, 0.95) // returns {upper=0.0605, point=0.0509, lower=0.0413}
interval_binary(130, 2100, 0.95) // returns {upper=0.0731, point=0.0628, lower=0.0524}

This gives me the point estimate and the margin of error for A and B:

000643

This says that my best estimates for the true conversion rates are Basline = 5.2% and B = 6.2%. Nonetheless, A could be as high as 6%, while B could be as low as 5.2%. So, it’s plausible that B is actually worse than A but performed better just by chance. How likely is that to happen?

Keep reading

I Have An A/B Test Winner, So Why Can’t I See The Lift? 5 months ago

In the town of Perfectville, a company ran a winning A/B test with a 20% lift. A few weeks after implementing the winner, they checked their daily conversions data:

 

graph-perfectville

97 days of daily conversion rates in Perfectville showing 20% lift

 

The graph perfectly related what happened: The baseline increased by 10% during the test, with half the traffic exposed to the winning variation. Then was the week when the test was stopped, followed by a lift of 20% once the winner was implemented.

The good people in nearby Realville heard about this and ran the test on their site. When they later checked their daily conversions data, they scratched their heads (as they often do in Realville):

 

graph-realville

97 days of daily conversion rates in Realville showing same improvement

 

The data actually includes the same 10% lift during the test, a gap, and a final 20% improvement. The problem is the improvement is relative to natural fluctuations in daily conversion rates, so 20% improvement doesn’t necessarily mean 20% lift.

Here are 6 reasons why people in Realville might find it difficult to see a lift and what they can do about it.

Keep reading

@VladMalik is an interaction designer based in Toronto.
I enjoy breath-hold diving, weight-lifting, and chopping wood.
I eat only plant foods.