Uncategorized – Vlad Malik

Recency: 2015
Role: Owned project, client-facing designer and A/B test developer
Process: Solo with over a dozen A/B test iterations
Top Challenge: Improve sales by genuinely improving UX

Iterative redesign and A/B testing to improve the Plans/Payment flow on a dating site to increase paid sign-ups.

Problems With Original

The original process failed to emphasize the top benefits:

There is no guidance on what to choose. Is 3 month plan long enough?

Buying message packs and the option to enter a coupon code further complicate the choice. Analytics showed some of the options were never used. Some benefits, like “Extra privacy options”, are unclear. Some benefits, like “Organize singles events”, are not relevant for the average user.

Once the user picked a plan, they went to Step 2:

The 2-step flow was awkward. Step 1 had the per-month price prominently, yet step 2 starts with a higher prepay-3-months price. Checking analytics showed people went back and forth between the two steps, suggesting Step 1 wasn’t effective as a gateway page.

My Solution

My final redesign looked like this:

The key aspects of the solution were:

Hierarchy & Flow: Simplified to single-page process over multiple steps. Tabs for each plan allowed user to explore plans without flipping back and forth between pages.
Value proposition: I turned the top benefit into the headline (“Make A Great First Impression, Send the First Message”). Showed only the best next 3 benefits below instead of many.
User-centric: Guided user’s decision with Plain English financial and situational advice. For example, the 6-month plan says: “Pays for itself in X months. Take your time meeting people.” Used more casual language when describing the plans too.
Hierarchy: Removed distractions and moved secondary payment options way down to the footer.

Many Interactions Of A/B Experiments

Over multiple tests, I removed various components, such as the message pack footer. I also tried simplifying the choice by setting different defaults and using a single column layout.

Here’s an intermediate variation, which did NOT do better:

I tested setting the default to Plan 1 as well as Plan 2. During the testing, I monitored the impact on sales counts and revenue, as well as user behavior.

After each round of testing, I prepared summmary reports and analyses with lessons learned and recommendations.

User Behavior Research

I tracked user behavior in detail, like if they were chosing a plan then going back and changing their choice. I tracked how long they spent at a given step.

Some results were surprising. For example, setting the default to Plan 1 reduced sales of Plan 1 but tripled the sales of Plan 2 instead BUT in a way that did not increase overall revenue due to the price differences between the plans.

A/B Test Outcome

I A/B tested this solution and it increased revenue and sales for all plans. It took 3-4 iterations.

In this video, I want to show you a different kind of sample size calculator for your A/B tests. It works backwards compared to how traditional calculators work and you might find that more intuitive. The basic premise of this approach is that we mostly don’t know what effect size to expect, so we projections for a range of outcomes.

Go to the Reverse A/B Test Calculator

Video Transcript

This calculator doesn’t ask you to input power or the effect size you are after, because it assumes that you’re exploring and don’t know what effect size to expect. Instead it just asks you for your current conversion rate and traffic, and then gives you several possible effect sizes that you can reasonably detect on your site and your chance of success for each outcome.

Let’s see how it works. Let’s say you want to run your test on your home page. About 5% of people make it from the home page to a purchase, and you get about 5000 visitors to the home page per week. Let’s run the report with those numbers.

At the top of the report, you’ll see your preliminary estimate. This estimate tries to balance the testing duration with the effect size you can detect. It’ll your duration at 8 weeks regardless. Next, it’ll try to make sure you can detect at minimum a 15% effect.

If I scroll down, you see that’s exactly what I got. The duration is 6 weeks and this is optimal to detect a true 14% lift. I can then adjust my duration up and down and see how it impacts my projections.

The advantage of this report is that it doesn’t give you an estimate for just one effect size. It gives you a range of reasonable what-if scenarios. That’s because we might have little idea what the effect size might be. But I see that if my new version is 10% better or 10% worse, then there is a 50% chance that the effect will actually peak through the noise strongly enough.

But if the effect is 14%, then I have an 80% chance of success or 80% power. I can then use my judgement to see if whatever I am testing can reasonably beat the existing version by at least 10% and ideally by 14%. It will depend on how big my idea is that I’m testing, my experience with similar tests elsewhere, how bad the current design is, and so on.

Another piece of information you can get here is a sense of what the actual observed effect might be. Remember that EVEN IF my new version is 14% better and the test is a success, it doesn’t mean the effect size will actually be 14%. By chance it may be inflated or deflated. So here you can also see the margin of error. This means that if I get a 7.5% lift, I know that the true effect might actually be as high as 14%. But if I see a 3% effect, I know the true effect is at most 10%.

I might wonder if the true effect were 14%, what actual effect might I observe half way through the test. To see that, I can reduce the duration to 3 weeks, find the 14% effect, and see that it might show up as an effect as low as 4.5%.

So far, I assumed there is a true effect. But if I am wrong and my new variation actually has no effect, I might still get a lift – that’s called a false positive. I always like to know what sorts of false positives I can expect.

In this case, let’s put it back to 6 weeks. And we see that with this duration, we have a high chance of a false positive of 5%, positive or negative. The term false positive includes effects in either direction. I see there is a small chance of a false positive as high as 10% – the probability is 5%, small but possible. If I’d like to eliminate that possibility altogether, then I can increase my duration. After 9 weeks, the probability is just 1%. And I scroll back up, then I see with this duration, I can also detect smaller effects.

In the end, whatever happens, you have a much better idea of what to expect. Give it a shot. Let me know how you like it. Thanks for watching.

abstats-screenshot

Category: Uncategorized

Remote Testing Paid Signup Flow