20+ User Research Techniques

Customer Interviews & Mental Modeling

The more I spoke to customers, the more I could speak their language and understand how they think and what might be holding them back. Here’s an example something I learned from a customer interview, when I realized that what we were calling “indications” (rough insurance quotes) wasn’t the same as what they understood as “indications”. There was apparent alignment, because they and we meant “rough quotes”, but turned out we used the concept differently. As a result, customers were approaching what we were offering with the wrong preconceptions:

Mental Concept of “Indications”Customer’s Mental ModelHow It Worked In Our Product
EffortTakes some effort – broker attempts to tailor assumptions or get preliminary info from clientFocus on high-volume, low-margins means initially putting in minimal effort
PricingFairly close to final quote; client can use it to weed out some optionsBallpark to start conversation; no point comparing options yet
Application RequirementsEnter as much as you knowAI-driven or generic assumptions for faster response
Time FrameUsually 14-30 days out from when you need final pricingAllow to start conversation much earlier, 60+ days out
A thematic analysis of our customer’s mental model vs. how indications worked in our product

Conflicting Mental Models: Learning how brokers think and make trade-offs a key part of my job. For example, brokers traditionally want to send insurance providers strong deals to gain preferential treatment in the future. If they send weak deals, it might damage their reputation. With APIs, they didn’t need to worry about this, because no humans are involved on the receiving end. But old habits may have caused people to avoiding sending greater volumes through the email bot. Empathy means going beneath the surface to understand the oft unspoken principles and values guiding people’s work.

Push/Pull or Forces Analysis

This is a great framework from the Jobs to Be Done field, which I use often to convey the motivations and obstacles of a decision gathered from interviews. It’s a type of mental modeling. I can’t share a real example, but here’s a great illustration (Source):

Customer Profile

I ran a workshop with Sales and CX to understand how we can share and divide the responsibilities for customer discovery among the team. We came to better understand what CX requires from Sales and what Sales is able to get from customers they speak to. I developed a framework for a “Customer Profile” with itemized criteria rated by importance and who is responsible. This eventually became a Google Doc form, and then later was merged into our CRM system. This profile goes beyond being a research technique. It determines how the company triages prospects, where we invest our efforts, and how we close gaps in understanding that impact everyone, from sales to product. Here’s a key gap that was identified: when brokers do not have enough experience with cyber, they are less ideal customers for us – this framework I helped craft reminds the team what to ask them, what to train them on, and whether they are a good fit in the first place:

Shipping To Learn / MVP

One needs to get the smallest viable solution into the hands of customers. Feedback based on real-world usage of a basic solution or a prototype provides more clarity than rigorous user testing of a complex solution in hypothetical situations. Shipping to learn helps tame the complexity that often bogs down teams trying to find their product-market fit. This involved breaking features down to understand what is absolutely essential:

Wizard of Oz & Other Lean Prototyping

“The Wizard of Oz test essentially tries to fool potential customers into believing they are using a finished, automated offering, while it is still being run manually. ” – BMI Lab

I used this technique successfully to test a Gen AI solution. This had several advantages:

  • AI was rapidly changing but not yet ready; this technique allowed us to get ahead of the technology limitations and envision the future
  • We could collect more real-world data from customers (who submitted real documents, made real queries), which we could then use to evaluate the AI internally, safely
  • Building AI tools was a new skill for the team; this technique kept us from pulling development resources off other critical projects
  • Customer actually benefited from the test – when we ended the experiment after 2 weeks, they were eager to get it back. This was great validation for us.

Remote Testing: Clickable Prototypes, A/B comparisons, and 5-Second Tests (via Maze)

There are great tools now that combine sharable prototypes with slides and surveys. This is especially useful for hallway testing remotely. I set up an experiment, present the context for the user, and then have them perform a set of tasks and answer some questions. The advantage of this over moderated testing is they can do it on their own schedule. I would typically follow up with questions over email. The biggest things I learned about running this kind of test is to test the test – roll out v1 to one or two people to catch any issues. You often need to add clarification, reword or reorder steps before it’s ready to scale. This was particularly useful for quick Hallway Tests with SMEs.

Job Mapping Based on Strategyn’s “Outcome-Driven Innovation”

ODI breaks all processes down into basic phases, whether it’s surgery or selling a home. The activity’s outcome is defined in a standardized way like: “Minimize the time it takes to assemble a proposal to the insured”. This leads to a deep understanding of how the user completes their task. Customers can then be surveyed about their success with very well defined tasks, which leads to a quantified understanding of wider market needs. Here’s a sample activity:

Activity: Location all the inputs needed for an application. Diagram below shows the tasks to accomplish that (based on customer interviews).

 

Situational Personas

I introduced the personas concept to a Fintech client to help them shift away from a focus on features and technical capabilities. I started by asking the client to itemize their audiences in a spreadsheet. Then we identified some “top needs”, based on their calls with customer and industry subject matter experts. Later, on another project, I asked them to write first-person user stories.

Over time, the requirements started include more emotional and situational context. At that point, I started to distill insights into personas that focus on the user’s situation:

Context builds actionable empathy and elicits tons of new product ideas and marketing strategies. What seemed like one type of user falls apart into 3 different types of users with distinct, nuanced needs.

Voice of The Customer

This is a type of research that focuses on capturing and sharing words that customers say. On a more recent project, I defined target audiences through story telling rather than demographics e.g., a team large enough where poor intake clarity starts to cause task assignment confusion. I base the quotes on real interviews, so it resonates in our marketing. I create a specific contrast between before and after, so it’s clear what our product solves (a UI pattern that performed well in A/B tests in the past). The goal is to write copy that resonates with customers, because it’s what they themselves would say.

This perspective can be applied to other content, like this feature matrix:


Pre-2019 examples:


Mining Customer Reviews

For one client, I distilled dozens of pages of user reviews into a concise report with 18 themes (Affinity Mapping). The themes described in Pain English what customers thought, supported by 200+ actual quotes of customers. Here are some themes for their Error Tracking software:

Mapping User Flow / User Journey

I have experience mapping complex processes. In this example, I facilitated a session with SMEs to understand the key steps of a health care process, including role of existing IT system (SMEs are usually senior/former users or people intimately acquainted with the process). I broke the process down into higher level phases:

Next step would be to label this process with assumptions, problems, and opportunities (not shown).

Contextual Inquiry / Job Shadowing

I shadowed a health inspector and documented their process as multi-participant user flows:

I then analyzed these flows to figure out how the process could be improved through efficiencies and use of  technology like tablets.

Here’s me touring a coal mine client to gain a deeper appreciation for the business and user context:

Funnel Analysis

For an online experience, I often started with a rough funnel, showing the customer’s journey backed up with visit & conversion metrics. I could identify points in the process where users give up or digress from the path to their goal.

Why isn’t this chart prettier? Because it not going into a magazine. It gets the job done, and then it’s not needed anymore.

This chart hinted at steps where the problem manifests. Similar to a sitemap, this could also help identify problems with the overall IA of a site (e.g., if there are lots of loop-backs or some pages aren’t being visited). Here, for example, I noticed that 50%+ of people drop off at the Tour, which suggested removing or improving the Tour. I also see there are many steps before actual enrollment. Some of my A/B tests tried different ways to move the Enrollment call-to-action higher in the process.

Listening In On Calls & Reading Emails / Chat Logs

There’s nothing like getting the user’s actual words, hearing the tone in their voice, reading between the lines. I’ve listened in on call center conversations to identify common customer questions and persuasive techniques used by CSRs. I’ve also read chat logs and emails to better understand what customers care about.

Remote Monitoring

I’ve used a number of screen replay tools to observe users and identify potential usability issues. To help sort out useful videos with this technique, I tagged various user events using URLs and custom JavaScript. That way I could find and observe problems like, say, only videos of users that completed a form but didn’t click submit:

Linking screen replays and analytics (e.g., Google Analytics) is a useful and cheap way to do usability testing, because you can define a behavior KPI and then filter videos showing that behavior e.g., someone going back and forth between screens or hesitating to click something.

Heatmaps, Clicks, Attention Tracking

In one test, I used the heat map to corroborate our statistically strong finding. I tested a new home page with a simple gradual engagement element: a clear question with several buttons to choose from:

Our hypothesis was that gradual engagement would guide visitors better toward signing up. The heatmaps for the other variations showed clicks on top menu links and all over the page. In contrast, our winner showed clicks on just the buttons I added. There was virtually no distracted clicking elsewhere. This was reassuring. I also saw a pattern in the choices visitors were clicking.

Sometimes I wanted to test if visitors were paying attention to non-clickable content, like sales copy. One of the tools I’ve used was a script I created to track scroll position. By making some assumptions, I could infer how much time users spent looking at specific content of the site. You can see a demo here

Self-Identification

Sometimes designs offer an opportunity to both test a hypothesis and collect data. For one project, I decided that, instead of emphasizing lowest prices (as the norm in that saturated market), I would emphasize our client’s experience dealing with specific real life scenarios (Authority principle). So I interviewed the client further to understand specific scenarios their customers might be facing (based on their own customer conversations and industry knowledge). Then I wrote copy to address customers in each situation:

Now, by giving clickable options, I could track which options users are clicking. Over time, I could learn which scenarios are most common and then tailor the site more to those users.

Use Cases

I interviewed subject matter experts to better understand the end users’ requirements. I summarized the interests of each user segment using some a User-Goal Matrix, such as:

Here’s another example:

User Narratives & Storyboards

The key here is again to build empathy. “Need to do X” may be a requirement, but do X where, why?

Thinking through the plot of a user story helped put me in the shoes of a user and imagine potential requirements to propose to the client (like using Street View, above, or automatically finding similar restaurants nearby). You think “If that actually were me, I’d want to…”

On another project, I documented the user story as a comic book with 2 personas. Here a nurse visits an at-risk new mom and searches for clues she may be depressed and a danger to herself or her child:

Instead of a table of things to look for, the comic shows clues in context (curtains drawn, signs of crying, etc). This kind of deliverable is a good way to build empathy for the project’s users (nurses finding themselves in these situations) and the end recipients of the service (their at-risk clients).

Customer Interviews & Jobs Theory

I’ve interviewed software users and SMEs. I’m also familiar with consumer interview techniques, including  Jobs To Be Done.

I use JTBD insights to shape how I define requirements with clients. JTBD theory argues that the user’s situation is a better predictor of behavior than demographic details. Its emphasis is on capturing the customer’s thinking in that precise moment when they decide to buy a product, especially when they switch products.

See my article 15 Jobs-To-Be-Done Interview Techniques

A/B Testing

I’ve run A/B tests to compare large redesigns as well as smaller changes. Large redesigns only tell us which version is better, while smaller changes help pinpoint the specific cause (which tells us more about users):

As part of A/B testing, I tracked multiple metrics. That way I could say, for example, “the new page increased engagement but didn’t lead to more sales” or “we didn’t increase the volume of sales but order value went up”.

User Analytics & Segmentation

I used Google Analytics and A/B testing tools to segmented visitor data. A classic case is Mobile vs. Desktop segments:

Another useful segmentation is New Visitors vs. Existing Customers, which I tracked by setting/reading cookies. I also segmented users by behavior e.g., Users Who Clicked a webpage element or hit a page.

I’ve done statistical and qualitative analysis of the data collected, teasing out relationships between various user behaviors:

User Testing

I’ve created test cases for moderated testing workshops. One type of user test provided detailed instructions for a user to follow.

“Log into the case dashboard. Find out if there are any cases that need to be escalated asap and flag them for the supervisor.”

Another type of test case posed open-ended goals to see if the user could figure out how to do it.

“You’re a data-entry clerk. You’ve just received a report from John in damage claims. What would you do next? Let the moderator know if you encounter problems or have questions.”

Or

“A cargo ship arrived into the East Port carrying 100 tons of fuel. When is the next train arriving and does it have enough capacity to transport this fuel to the Main Hub?”

I’ve observed users going about their task. The acceptance criteria included usability (is the user struggling to perform the task) and completion (is the user able to complete their job task). Users provided feedback verbally and using a template like this:

I’ve also created detailed training/test scenarios that closely mimicked real job conditions. Users had to successfully complete the tasks and confirm they matched reality.

Hands-On Hardware Research

On one project, I had to understand how to deploy touch-screens and new software across dental offices. I tested the touch-screens on site and interfaced them with the digital X-Ray equipment. I’ve also used:

  • dive computers
  • VR headsets
  • synthesizers

I Have An A/B Test Winner, So Why Can’t I See The Lift?

In the town of Perfectville, a company ran a winning A/B test with a 20% lift. A few weeks after implementing the winner, they checked their daily conversions data:

 

graph-perfectville

97 days of daily conversion rates in Perfectville showing 20% lift

 

The graph perfectly related what happened: The baseline increased by 10% during the test, with half the traffic exposed to the winning variation. Then was the week when the test was stopped, followed by a lift of 20% once the winner was implemented.

The good people in nearby Realville heard about this and ran the test on their site. When they later checked their daily conversions data, they scratched their heads (as they often do in Realville):

 

graph-realville

97 days of daily conversion rates in Realville showing same improvement

 

The data actually includes the same 10% lift during the test, a gap, and a final 20% improvement. The problem is the improvement is relative to natural fluctuations in daily conversion rates, so 20% improvement doesn’t necessarily mean 20% lift.

Here are 6 reasons why people in Realville might find it difficult to see a lift and what they can do about it.

Reason 1: The effect is too small

The smaller the lift, the harder it is to see it through the noise. If the conversion rate drops for some reason unrelated to the test, the lift from your winner might not even offset that. For example, here’s 1 week of simulated daily conversion rates followed by a week with a 20% lift compared to a 5% lift. If the lift were 5%, it would look as though the test actually did worse in the second half:

 

graph-reason1-smalllift

7 days at baseline followed by 7 days with 5% vs. 20% lift

 

Have you just run a test and are looking at during-test data? You likely won’t see any effect. Typically only 70-80% of visitors will join the test (more on this below), and these are split among your variations. If 80% of your traffic actually participated in an ABC test, a third of that is exposed to the winning variation. So, a 20% lift would manifest as 5% overall.

What you can do:

  • Look for a larger cumulative upward trend after several tests.
  • Compare longer timescales for baseline and post-implementation data.

 

Reason 2: Your baseline is too variable or you’re not looking at enough data

In Perfectville, conversions are constant each day, each week, each month. This means a 20% improvement causes a 20% lift. Not so in Realville. In Realville, daily conversions naturally fluctuate, so the full potential for improvement may not manifest. The more your conversions fluctuate, the harder it is to see the lift in the data.

Here are two similar simulated data sets with low and high variability, both showing a 20% lift. The lift is more obvious when variability is lower:

 

graph-reason2-lessvariable

graph-reason2-morevariable

A similar 20% lift with low and high variability

 

Sales may fluctuate for a lot of reasons (weekly, seasonally, in response to your marketing activities, unexpected traffic). The smaller your sample, the higher the chance that the pattern you’re looking for just won’t be there by chance. For example, if you just saw the middle segment of the full graph below, you’d never know that the right, orange side of the graph shows a 20% improvement:

 

14 days of simulated daily conversion rates (blue), then 14 days with a 20% improvement (orange)

 

What you can do:

  • Zoom out to reduce variability. If data is too variable daily, look at semi-daily rate or weekly rate
  • Look at more data to cover the full cycle of ups and downs e.g., a week (note that the lower your conversion rate, the more data you need to see an effect)
  • Check your site analytics to see what might have been different that week. Check if dips have happened before. Might one have coincided with the test?
  • If the data has a lot of variation, it is hard to estimate visually. Compare what I’ll call the “clipping rates”. In this graph, you see higher peaks as well as a higher frequency of peaks in the second half:

 

graph-peakingrate

20% lift manifests in more frequent and higher peaks

 

Reason 3: Not everyone was part of your test

Even if you didn’t put exclusion conditions on the test, some visitors were excluded.

For example, mobile visitors are excluded by default. Another 10-20% of visitors normally get excluded when the A/B testing tool time out. Further technical implementation issues can cause another 10-20% of visitors to be excluded, things like JavaScript-heavy sites or the tracking code not implemented in the right place.

Moreover, gaps in test design can create a discrepancy between test and sales data. For example, we ran a test on the home page of a basic single-product site and noticed that our test data was missing many sales. After investigating, it turned out that about 50% of purchases were by people who didn’t land on and never visited the home page as well as by existing customers from a special upgrade page that we didn’t consider.

As a result of these exclusions, when you implement your winner, you may be exposing it to segments you didn’t test it on. For example, although you tested on desktop and saw a 20% lift, the same design on mobile might cause a 30% drop. So, if you made the winner your new home page for all traffic, the drop in mobile could counteract some of the lift (say, if you had lots of mobile traffic).

What you can do:

  • Factor in 20% exclusions due to technical issues, like timeouts
  • Set up an inverse test to see how many sales are by-passing your main test (target pages and visitors who are excluded from your main test)
  • When looking at sales or conversion data, keep in mind it probably includes segments you didn’t test on. Test the design on all segments that will be exposed to it e.g., new customers and existing customers. For mobile, build and test a dedicated mobile version

 

Reason 4: You are eyeing it instead of using math

Sometimes a lift is obvious. Other times you need to use math. Here’s a sample of real conversion data with about 20 days of basline followed by 20 days of the improved version:

 

graph-reallift

Just over a month of real conversion data with winner on the right

 

The lift is not visually obvious. Nonetheless, the average for the first 20 days is 0.71%, whereas the average for last 20 days is 0.85%, which is a 20% lift. However, if the standard deviation of the data is high, the difference in averages may be coincidental.

 

Reason 5: Your design or conditions are not the same

This happens all the time. You run a winning test, then you tweak the winner before pushing it to your site. It’s entirely possible that those visual tweaks reduced the effectiveness of your variation.

It’s also possible the test conditions are different when you launched the test. Did you test during the holidays or launch during holidays?

Are you including a different page? You might have several pages that look similar. So you tested something on one page, and you decided to apply it in one go to all the pages. If so, there is no guarantee that the same concept will work equally well on other pages.

What you can do:

  • Check your site analytics to see what conditions might be different now and retest if necessary
  • If you know you will be changing something, apply changes to the variation and test it with the changes
  • You should implement the winner as the new control and then test the new changes
  • Retest on each site if you have reason to believe the outcome may be different

 

 Reason 6: It was a false positive

Yes, it happens all the time. There are many reasons you might have gotten a false positive, including improper test design and not running your test long enough. The most common scenario is you run your test until you see a winner and stop. I’ve seen results that looked very exciting flatten after 3-4 weeks.

What you can do:

  • Follow the great tips on http://goodui.org/betterdata to ensure you get good data

 

Back To Realville

Let’s say Realville decided to retest the Perfectville winner 8 more times (it took years!). They found that indeed, the overall tendency of the variation was towards increase, following the same pattern as Perfectville’s test. There’s a small lift during the test, a slight dip when the test is stopped, and then a larger lift after final launch. However, despite the overall trend, individual outcomes showed that chance is a factor in this imaginary scenario:

Let me know if you apply and find useful some of these concepts.