Once you figure out what you want to test, you need to define what you’re going to measure and where. In this post, I will introduce my preferred terms for describing test structure (things like test conditions, goals, and pages), and I’ll use a visual language to cover the basic patterns. Here’s an example:
Gates and Goals
GATE: circle that represents all conditions for entry into the test, including test URL and traffic segmentation (Example: Home page mobile traffic).
GOAL: all success conditions, including confirmation page URL and business rules (Example: Thank you page visit after purchase of premium package).
TIP: If you have a sizable mobile segment, you’ll want to track mobile, tablet, and desktop traffic separately. If your tool doesn’t allow you to segment after the fact, set up 3 separate tests with mutually exclusive gates. Other gates you should distinguish are: existing users vs. new users, ad traffic vs. direct traffic, and so on, in case each segment performs differently. Keep sample size in mind, because segmentation reduces sample size and increases false positives and false negatives.
Primary vs. Secondary Goals
Page visits are generally more reliable than clicks, so they are the preferred primary metric. Clicks on links or form submits are often secondary metrics. There are many other customized types of metrics based on user behavior and business rules.
TIP: Whenever possible, track both the start and end of an interaction e.g., track clicks on a link and the visit to the destination page.
TIP: Tracking how many people start completing form fields is a good measure of intention e.g., track keydown or change events on key form fields. It can also highlight anomalies in other goals. Track attention (user scrolled to and stopped at the element being tested) as a secondary metric via scroll tracking and setTimeout().
Goal Depth
Your primary goal might be directly on the page you’re testing or further down the funnel.
A direct goal happens on the test page and is your ideal end goal (e.g., an AJAX payment event).
A shallow goal is a relative term for a goal at or near your test gate that’s not ideal. For example, visits to the checkout page is a shallow goal relative to a primary goal of completing the purchase.
A deep goal lies further away from the gate and is usually your primary goal. However, you might track other deep goals that are not primary (e.g., post-purchase downloads, dashboard engagement). Changes to deeper goals are harder to detect using statistics, because counts are lower.
TIP: If your primary metric won’t produce enough data in the time you have, then choose the next best metric.
Conditional Goals
Behavioral and business conditions can be added to goals. For example, fire a conversion when a timer expires, a scroll position is reached, several steps are completed, or a user successfully logs out and returns again. You can map goals to any user behavior.
TIP: Additional logic requires additional code, increasing risk of technical and logical errors. Be careful about making your primary goal very complex. Moreover, elaborate goals that are harder to achieve will have a lower conversion rate and will be harder to track. However, they may be more informative – track them but have a fall-back.
TIP: You can set up goals to detect errors on complex screens with lots of dynamic components. For example, part way through the test you might find that clicks on your new button are low. So you might set up a goal to check that the button actually exists on the page for all visitors. In one case, we wanted to check if any visitors to a split-URL test were changing the URL to enter a different variation.
Single Page vs. Template
Your test gate is typically a single page. The gate can also be multiple product pages that use the same template. It can also be multiple pages that are completely different or the test can even be side-wide. For example, if you’re testing a change to your navigation or a sidebar, you’ll want to modify all pages with that element – for consistency. Advantages: potentially much higher traffic to your test and testing the change in broader context. Disadvantage: you may obscure different performance on each page due to different traffic source, different previous page seen, etc.
TIP: If you can, track that your multi-page A and B samples contain similar ratios of visitors to each page. For example, if your site is running side-wide, you might want to know that your A and B samples contain the roughly the same % of people who came from the home page, pricing page, blog, etc. What if, for instance, product A has mostly traffic from your internal search, while product B was mentioned in a blog and has lots of referral traffic?
TIP: You can run separate tests to isolate the data set for each page as long as there is no visitor overlap.
Sequences and Funnels
If you have a unique link to a page, you control the traffic to that page. Other times, you might have multiple links, which means multiple entry points into a page. You might also have steps that can be bypassed. The wavy line means that the page transition is flexible or unknown (e.g., Home page to Pricing page), while a straight line means it’s a direct relationship (e.g., Payment Step 2 to Step 3).
TIP: Verify, don’t assume path paths. Are you getting lots of direct visits from Google into the middle step? Can visitors bypass your link towards step 3?
TIP: Set up tiered metrics, so you can track the user’s progress at each step of the funnel.
If your traffic is too low to detect a change in your end goal, you should make a shallower goal primary. It’s also useful to track whether users step outside the main funnel. For example, if you find that a losing variation is increasing traffic to the pricing page, you might have a hypothesis to explain the loss.
User behaviors other than what you want are also good to track (Distractions).
TIP: Track visits to all main pages, like pricing, about us, blog, etc. Most patterns will be meaningless, but sometimes they are informative. For example, a huge increase in visits to the Pricing page might suggest something you said make people think of cost, which may or may not be a good thing.
Visual Scope
You can test just one page (Page Test) or you can test an entire funnel as a sequence of pages (Funnel Test) – visitors see version A of the complete funnel or version B of the complete funnel.
On rare occasions, you might start your test on a page other than the one you’re testing. For example, a page may not be accessible directly or may depend on an interaction with the preceding page, so you have to start your test a step earlier (Premature Start). You might also have visual changes on different pages that go together conceptually, such a discount offer on the home page vs. on a deeper page. So on one variation, the user will enter prematurely on the page without visual changes.
TIP: Avoid testing related pages in separate simultaneous tests, because you’ll have to account for visitors’ coming from and seeing different versions of each page.
TIP: When running tests simultaneously on the same site, use cookies to ensure visitors can only join 1 test at a time. If you do risk running overlapping tests, at least add metrics to each test to track which variation of each test the users saw. That way you can at least check that the assignment is roughly equal and even split users into non-overlapping segments.
Data Collection
Tests can be run by injecting CSS and JavaScript into an existing page or creating a separate page URL for each variation.
A tracking or blank test simply collects data about your site. You might run such a test to QA your metrics or estimate your traffic and conversion rates to enable you to do a power analysis.
An A/A test is less a way to test the tool than a way to understand the properties of the traffic e.g., are all users relatively similar, producing less variation in the conversion rate over time?
Visual changes can be tested by injecting CSS and JavaScript into an existing page or creating a separate page URL for each variation.
TIP: Dynamic A/B tests can be faster to deploy but can have a flickering problem or take slightly longer to load. Split URL tests don’t have these problems but require your to create a duplicate page, which is not always possible. It also requires separate URLs for each variation, so you then need a process to reuse or expire the old URL variants. The redirection itself and difference in URL might be noticed by some users.
TIP: A hybrid approach is redirecting to a URL parameter on the main URL. Changes are then applied on the back end or front end based on the parameter e.g., example.com/?v=a vs example.com/?v=b.
Single or Multiple Variables
Test can involve a single change or multiple changes, as in a full redesign. Micro tests allow you to connect the observed effect to a specific visual change – this is the most satisfying type of test. A macro test can allow you to test a coordinated set of changes, but you won’t know the effect of each individual change. This can put you into a tough spot if the test loses – do you scrap the whole thing or try to retest specific elements of it?
A multivariate test is used to test multiple changes by essentially running a separate test for each combination of variables. The risk with this type of test is lower power for each sub-sample with more false -positives. This is not the same as running multiple tests simultaneously on the same page, since in that case you won’t know which version of which test each user saw.
In closing
Use your awareness of page and goal patterns to collect richer data and avoid common mistakes.