The complete guide to test methods

18 methods by time-to-execute, budget and what they can (and can't) do for you

Time and time again, I see teams expecting too much from their research.

They’ll run an A/B test to find out the ‘whys.’ They’ll run surveys when they need depth, or spend weeks on interviews when a quick prototype test would do.

Every test method has its strengths - but also situations where it’s simply a waste of time.

This is a guide to picking the right method for the stage you’re at, the time you have, and the level of certainty you need.

I looooved writing this, as I learned a few new methods (e.g., tree testing, card sorting) in the process!

Most teams pick a method before they know what decision they need. So before we get into it, let’s take a step back.

First, decide what you need to learn.

Before picking any method, get clear on your research question:

🧠 “Do people have this problem?” → Discovery methods

💡 “Would people want this solution?” → Demand validation

🎨 “Does this design work?” → Usability testing

📊 “Which version performs better?” → Performance testing

Most product decisions follow this rough sequence: discover problems → validate demand → test usability → optimise performance.

But most people skip steps, loop back, or run methods in parallel.

A common one is going straight to AB testing, running many with a high failure rate, then doing some discovery to get better at solving real problems.

As a general rule, if you’re completely new to a problem space, start with discovery. If you’ve got something live but it’s not converting? Skip straight to live testing.

Now, without further ado, the 18 methods:

Now let's run through each one-by-one, first up: discovery:

🔍 Discovery: “Do people have this problem?”

Use when: you’re exploring a new problem space or trying to understand your users better

1. Support Ticket Analysis

This is simply not done enough. Review your support tickets, chat logs, or reviews to identify recurring issues, confusion, or unmet expectations. Ideally, cluster and group them if you have the capability.

What it’s for: Finding recurring problems you didn’t know existed
How many: 50–100 recent tickets
Time: Depends on whether you need an analyst to cluster or you can do it by hand. Can be done in 1–2 days
Budget: Free (just your time)
Reliability: Brilliant for existing user problems

2. User Interviews

One-on-one conversations to uncover problems, motivations, and the ‘why’ context you can’t get from analytics.

Best for: Understanding the reasons behind user behaviour
Sample size: 5–8 users (patterns emerge quickly after 5, unless you have really diverse persons in interviews)
Time: Medium-slow. If you ran 5–10 thirty-minute calls, you might be looking at 2–5 days full time (for prep, calls, synthesis, follow-ups)
Budget: £0–500. Low-to-medium (recruiting people for free can be an option, but sometimes you need to compensate people for time or recruit them, ~£20–£100pp)
Confidence: High for qualitative insights, low for quantification

3. Surveys

Sending structured questions to lots of people to get a broad view. Quick, low-cost, and good for broad patterns, but less useful for deep qualitative insight

Best for: Quantifying how common a problem is, discovering opportunities
Sample size: 100+ responses for basic patterns, 400+ for segmentation
Time: 1–3 days
Budget: £0–200 (depends if you offer a reward for completing)
Confidence: High for broad patterns, low for nuanced understanding

4. Contextual Inquiry or Field Study

Watching users in their natural environment, usually at work or in their daily routine. Essential for B2B tools or complex consumer behaviours.

Best for: Understanding complex workflows or environmental factors
Sample size: 3–5 sessions
Time: 1–2 weeks
Budget: £500–2000
Confidence: Very high for specific contexts, low for generalisation

💹 Demand Validation: “Would people want this?”

Use when: You have a solution idea but haven’t built it yet

5. Concept Testing

Sits between discovery and demand testing really, but this is essentially where you show a mocked up concept and ask targeted questions. This is very low fidelity, often text descriptions, rough wireframes, or slide mockups.

Best for: Early-stage when you’re deciding what to build
Sample size: 5–10 people for qualitative feedback, or 100+ if running as a survey
Budget: £0–£100
Time: 2–4 days
Confidence: Medium for actual intent, but very easy to run

6. Fake Door Tests

Add a button, link, or feature that doesn’t exist yet in the product to see if people click it. If they do click it, then you prompt something that says ‘thanks for your interest, we’re working on this feature. You’ll be the first to know!’.

Best for: Testing specific feature demand
Sample size: Run until you have 100+ interactions
Time: 2–3 days to set up, 1–2 weeks to collect data
Budget: £0–100
Confidence: High for behavioural intent, medium for actual usage

7. Landing Page Waitlists

A landing page to test positioning, pricing, or a new startup idea — often paired with a signup form or waitlist. Similar to the fake door, but capturing the email is stronger in terms of intent than just a button click.

Best for: Testing positioning, pricing, or new startup concepts
Sample size: 200+ visitors for meaningful patterns
Time: 3–5 days
Budget: £100–500
Confidence: Medium — people lie more easily about future behaviour

8. Wizard of Oz / White Glove Testing

Manually deliver the service behind the scenes. Great for technically complex features, complex algorithms, or service businesses.

Best for: Testing complex workflows before building the tech
Sample size: 10–20 users
Time: 1–2 weeks
Budget: £200–1000
Confidence: Very high for workflow validation

9. Reaction Cards

Ask users to pick from a predefined list of words (e.g. “trustworthy,” “confusing”) to describe how they felt about a design. Arguably, this could sit in discovery or usability depending on how it’s used.

Best for: Quickly gauging people’s initial emotional responses to a product, design, or concept
Sample size: 10–20 users
Time: 1–2 hours
Budget: £0–50 (can be DIY with a slide or survey tool)
Confidence: High for emotional response, low for usability

🏗️ Usability Testing: “Does this design work?”

Use when: You have designs but need to validate that they’re usable

10. Prototype Testing

Ask users to complete tasks using a clickable prototype. Watch for hesitation, confusion or misclicks as these signal friction. Keep context minimal, and prompt lightly with questions like “What do you expect here?” or “What’s on your mind?” to find hidden assumptions.

Best for: Testing user flows and new features
Sample size: ~5 users per iteration. 80% of known usability problems can be found with 5 testers, and 3 testers would reveal the most severe problems.
Time: 3–5 days
Budget: £200–800 (or 0 if you have a load of loyal users)
Confidence: High for major usability issues, medium for edge cases (i.e. rare, unusual cases)

11. 5-Second Tests

Show someone a screen or image for 5 seconds, then ask what stood out or what they think it’s for. Useful for headlines, hero sections, or pricing pages.

Best for: Testing if your copy is clear and easy to read. Gauging first impressions, headlines, or value props
Sample size: 15–30 responses
Time: Same day
Budget: £0
Confidence: High for clarity issues, low for detailed feedback

12. Tree Testing

You give people a list of words (like a menu) and ask them to find something. There are no pictures or colours — just words. This helps you see if the labels and order make sense before you start designing the navigation.

Best for: Testing information architecture and navigation
Sample size: 30+ users
Time: 1–2 days
Budget: £50–200
Confidence: High for discoverability issues

13. First-Click Testing

Like a prototype test, but you give users a task and track where they click first. First clicks often predict success, so this is a great tool for testing button placement, menu structure, or onboarding.

Best for: Testing button placement and initial navigation
Sample size: 15–30 users
Time: 1–2 days
Budget: £50–200
Confidence: High — first clicks predict success 87% of the time

14. Card Sorting

Users sort content/features into categories that make sense to them. Helps design intuitive navigation.

Best for: seeing how users perceive and categorise information
Budget: £0–100 (OptimalSort, Maze, or manual)
Sample size: ~15 –30 users
Time: 1–3 days (including synthesis)
Confidence: High for IA and content groupings

🏃‍♀Performance Testing: “Which version works better?”

Use when: You have something live and want to optimise it

15. A/B Testing

One of the most well-known methods: you show two versions (A vs B — usually control vs variant) to two random groups of real users. After enough people have seen each version, you compare results to see which performed better. Best for testing one change at a time, like whether a new button or headline gets more clicks or signups.

Best for: Testing single changes to optimise conversion
Sample size: Depends on current conversion rate — use an A/B calculator
Time: 1–4 weeks, depending on traffic (sometimes longer…)
Budget: Mostly engineering time
Confidence: Very high for causal relationships

16. ABC / Multivariate Testing

Similar to A/B but tests multiple changes at once (e.g. button + headline + layout).

Best for: Helps isolate which combo works best.
Sample size: Need larger sample sizes and the ability to analyse more complexity.
Time: Longer than A/B — often 2+ weeks, depending on complexity and traffic volume.
Budget: Mostly engineering time
Confidence: Very high for causal relationships

17. Session Recordings

Watch real user sessions to spot friction points and where their mouse hovers the most.

Best for: Understanding how users interact with live features and the penetration of your pages
Sample size: 50–100 sessions for patterns
Time: Ongoing
Budget: £40–300/month (for the tool)
Confidence: High for behavioural patterns, medium for root causes

18. Diary Studies

Asking people to log what they’re doing or thinking over time. For instance, notifications with an emotion check-in, or research studies of whether a product impacts longer-term habits.

Best for: Spotting habits or how people feel over time.
Sample size: 10–20 participants
Time: 3 weeks+
Budget: Low-to-medium — you may need incentives or a way to gamify it to keep people engaged.
Confidence: Good confidence for perceived feedback over time. However, feedback is self-reported, which limits it.

Arguably, this could be a discovery test, but I’ve put it here as I think it’s more helpful for habit-forming products like therapy, for people to self-report on outcomes.

How are you feeling at this point?

Overwhelmed? Don’t be.

There are lots of ways to learn — just don’t overthink it

This isn’t a checklist to work through; instead, think of it as a toolbox.

Each method has a job. Some are quick and scrappy, others slow and robust. All of them can be misused if you’re unclear about what decision you’re trying to make.

Even “tiny” methods, like reviewing support tickets or running a 5-second test, can be enough.

The most important lesson here: don’t just go with your gut. At least not all the time.

If you’re tight on time or budget, don’t aim for rigour — aim for direction.

Today: 5‑second test, support ticket sweep
2‑3 days: survey, fake‑door, lo‑fi prototype
A week: interviews, tree test, A/B setup
A month: contextual inquiry, beta, multivariate

*depends on the cost of the tool **depends on speed of team & salaries

Ask five users what confused them. Look for the last 10 users that churned and why. Run a lo-fi prototype test with a colleague.

Got something you use that’s not here? Or something you think’s overrated?

I’d love to hear it 👇

See you next week!!!

Rosie

🚲 in the Alps 🚲

Growth Dives

Growth Dives: The complete guide to test methods