A/B testing for beginners — how to test what works on your website
You changed your headline last week. Signups went up. Was it the headline? Was it the weekend traffic spike? Was it the fact that someone shared your site on a forum the same day? You have no idea. And that is the problem with making changes based on gut feeling — you never actually know what worked.
A/B testing fixes this. Instead of guessing, you show two versions of a page to different visitors at the same time and measure which one performs better. It is the simplest way to make decisions based on evidence instead of opinions. And once you start, you will wonder how you ever made website decisions without it.
This guide covers everything you need to run your first A/B test — what to test, how the mechanics work, when testing makes sense, and the mistakes that trip up most beginners. No statistics degree required.
What A/B testing actually is
A/B testing — also called split testing — is the practice of showing two different versions of a page (or a single element on a page) to different visitors at the same time and measuring which version leads to more conversions. Version A is your current page, often called the control. Version B is the variant — the page with one thing changed.
Half your visitors see version A. The other half see version B. After enough people have visited both versions, you compare the results. Did version B get more signups? More purchases? More clicks on the CTA? If yes, and the difference is statistically meaningful, you have a winner. You roll out the winner to everyone and move on to your next test.
The key insight is that both versions run simultaneously. This eliminates the problem of sequential changes — where you change something on Monday, see better results on Tuesday, and cannot tell whether the improvement came from your change or from normal traffic fluctuations. With A/B testing, both versions experience the same traffic conditions, the same day of the week, the same external factors. The only difference is the thing you changed.
This is what makes A/B testing fundamentally different from just "trying stuff and seeing what happens." It isolates the variable. It gives you a controlled experiment instead of an anecdote.
Why opinions are unreliable
Everyone has opinions about what will work on a website. The designer thinks a bigger hero image will increase engagement. The founder thinks the headline should mention the technology stack. The marketing person thinks a longer page with more social proof will convert better. They are all guessing — and the track record of expert guesses in conversion optimization is surprisingly bad.
Studies from companies that run thousands of A/B tests consistently show that experts predict the winning variant correctly only about 50% of the time. That is the same as flipping a coin. Your intuition about what visitors want is shaped by your own preferences, your familiarity with the product, and your assumptions about the audience. None of those things make you a reliable proxy for the actual visitors landing on your page.
This is not a knock on anyone's intelligence. It is a fundamental limitation of trying to predict the behavior of thousands of strangers from the perspective of someone who already knows and cares about the product. The visitor does not know what you know. They do not care about what you care about. The only way to find out what actually moves them is to test it.
Data beats guessing. Every time. The most successful growth teams are not the ones with the best instincts — they are the ones that test the most and let the data decide.
What you can A/B test
Almost anything on a web page can be tested, but some elements have far more impact than others. Here is a practical list of what is worth testing, roughly ordered by potential impact on conversion rates.
Headlines. The single highest-leverage element on any page. Changing a headline can swing conversion rates by 20% to 50%. Test different angles — benefit-focused vs. pain-point-focused, specific numbers vs. general promises, short and punchy vs. descriptive.
Calls to action. The text on your buttons, their color, their size, their placement, and how many you have on the page. "Start Free Trial" vs. "Get Started Free" vs. "Try It Now" — these sound similar but can produce meaningfully different results.
Page layouts. Long page vs. short page. Single column vs. two columns. Social proof above the fold vs. below the fold. Video hero vs. static image. The overall structure of the page shapes how visitors engage with it.
Pricing presentation. How you display prices matters enormously. Monthly vs. annual pricing shown first. Three tiers vs. two tiers. Highlighting a "recommended" plan. Showing a comparison table vs. individual cards. Free trial vs. freemium vs. money-back guarantee.
Images and visuals. Product screenshots vs. lifestyle photos vs. illustrations. Real people vs. no people. A demo video vs. a static hero image. Visuals set the emotional tone of the page and influence trust more than most founders realize.
Copy and messaging. The tone, length, and framing of your body copy. Technical language vs. plain language. Feature-focused vs. benefit-focused. Formal vs. conversational. Sometimes a single paragraph rewrite in the hero section changes how visitors perceive your entire product.
Form length. Every field you add to a signup form reduces completion rates. Test asking for just an email vs. email plus name vs. a full profile form. The fewer fields, the higher the conversion rate — but shorter forms sometimes produce lower-quality leads. Testing helps you find the right balance for your specific business.
How A/B testing works technically
The mechanics of A/B testing are simpler than they sound. Here is what happens behind the scenes when you run a test.
Traffic splitting. When a visitor arrives on the page being tested, the testing tool randomly assigns them to either version A or version B. The assignment is usually stored in a cookie or local storage so that the same visitor sees the same version if they return. A 50/50 split is standard, though some tools let you adjust the ratio — for example, sending 90% to the control and 10% to a risky variant.
Measuring conversions. The tool tracks how many visitors in each group complete the desired action — signing up, purchasing, clicking a button, filling out a form. It calculates the conversion rate for each version: the number of conversions divided by the number of visitors.
Statistical significance. This is where most beginners get confused, but the concept is straightforward. When version B has a higher conversion rate than version A, the question is whether that difference is real or just random noise. Statistical significance is the measure of confidence that the difference is real. The industry standard is 95% significance — meaning there is only a 5% chance the observed difference is due to random variation.
Most A/B testing tools calculate significance for you automatically. You do not need to do any math. You just need to wait until the tool tells you the result is statistically significant before acting on it. Running the test longer collects more data, which increases your confidence in the result.
When A/B testing makes sense
A/B testing is powerful, but it is not always the right tool. It requires a minimum amount of traffic to produce useful results within a reasonable timeframe. The general threshold is around 1,000 unique visitors per month to the page you want to test — and that is the lower end. With 1,000 visitors, a test might need to run for four to six weeks to reach significance, which is a long time to wait for one answer.
With 5,000 or more visitors per month, you can typically get results in one to two weeks. With 10,000 or more, you can run multiple tests per month and build real momentum. The more traffic you have, the faster you learn, and the more valuable A/B testing becomes.
A/B testing makes the most sense when you have a page that already receives consistent traffic, a clear conversion goal you can measure, and a specific hypothesis about what change will improve performance. "I think a shorter headline will increase signups because visitors are not reading the current long headline" is a good hypothesis. "Let us try some random stuff and see what happens" is not.
When A/B testing does not make sense
Too little traffic. If your page gets 200 visitors a month, a standard A/B test would need to run for months to produce a reliable result. By the time you have an answer, the context may have changed entirely. For low-traffic sites, sequential testing (covered later in this guide) is a better approach.
Too many changes at once. If you redesign the entire page and test the new version against the old one, you will learn whether the new page is better — but you will not learn why. Was it the new headline? The different layout? The shorter form? The new color scheme? You have no idea. Full-page redesign tests tell you which version won but give you no actionable insight about what to do next. Test one element at a time.
Testing trivial things. Changing button color from blue to green is the classic example of a low-impact test. Will it produce a measurable difference? Almost never. The impact of button color on conversions is so small that you would need hundreds of thousands of visitors to detect it reliably. Meanwhile, the headline — which could swing conversions by 30% — goes untested. Focus your testing effort on elements that have the potential to produce large differences.
Before you have product-market fit. If you are still figuring out who your customer is and what they want, optimizing conversion rate on your landing page is premature. The page might be perfectly optimized for the wrong audience or the wrong value proposition. Get the fundamentals right first — then optimize.
How to run your first A/B test
Here is a step-by-step process for running your first test. It is simpler than most guides make it seem.
Step 1: Pick one thing to test. Start with the element most likely to have a big impact. For most pages, that is the headline. If your headline is already strong, test the CTA — the text, the placement, or both (in separate tests). Choose one element and leave everything else identical.
Step 2: Create your variant. Write the alternative version of that element. Make the change meaningfully different — not a subtle word swap but a genuinely different approach. If your current headline is benefit-focused, try a pain-point headline. If it is long and descriptive, try short and punchy. The bigger the difference between variants, the easier it is to detect a winner with less traffic.
Step 3: Set up the split. Use an A/B testing tool to split traffic evenly between the two versions. Most tools handle this with a snippet of code you add to your page. Configure the tool to track your primary conversion metric — signups, purchases, form submissions, or whatever action you care about most.
Step 4: Wait for significance. This is the hardest part — not because it is complicated but because it requires patience. Let the test run until your tool reports statistical significance at the 95% level. Do not peek at the results daily and get excited by early trends. Early results are unreliable. Let the data accumulate.
Step 5: Implement the winner. Once you have a statistically significant result, roll out the winning version to all visitors. Document what you tested, what won, and by how much. Then pick your next test.
Statistical significance explained without math
Imagine you flip a coin ten times and get seven heads. Does that mean the coin is biased? Probably not — with only ten flips, seven heads is well within the range of normal randomness. But if you flip the coin ten thousand times and get seven thousand heads, you can be extremely confident the coin is biased. The difference is sample size.
A/B testing works the same way. When version B has a higher conversion rate than version A, the question is whether that difference is real or just the equivalent of getting a few extra heads in a small number of coin flips. Statistical significance is the answer to that question. A result is statistically significant when you have collected enough data to be confident the difference is not random noise.
The standard threshold is 95% confidence, which means there is only a 5% chance the observed difference happened by chance. This is not arbitrary — it is the same standard used in scientific research and has proven to be a reliable threshold for making practical decisions.
Why you need to wait: early in a test, small amounts of data produce wildly fluctuating results. You might see a 40% improvement after the first 100 visitors, which then shrinks to a 5% improvement after 1,000 visitors, and finally settles at a 12% improvement after 5,000 visitors. This is completely normal. The early numbers are noisy because the sample is too small for the true pattern to emerge. If you stop the test after seeing that initial 40% lift, you are making a decision based on noise.
Why early results lie: there is a well-documented phenomenon called "peeking bias." If you check your test results repeatedly and stop the test as soon as one version looks like it is winning, you will systematically overestimate the size of improvements and sometimes declare winners that are not actually better. The solution is simple: decide your sample size in advance and do not stop early.
Sample size: how many visitors you need
The number of visitors you need depends on two things: your current conversion rate and the minimum improvement you want to be able to detect. Smaller improvements require more data to detect reliably.
As a rough guide: if your current conversion rate is around 5% and you want to detect a 20% relative improvement (meaning an increase from 5% to 6%), you need approximately 15,000 visitors per variation — so 30,000 total. If you are trying to detect a larger improvement, say 50% relative (from 5% to 7.5%), you need around 2,500 visitors per variation.
These numbers surprise most beginners. They expect to need a few hundred visitors, not thousands. But the math is unforgiving — small differences require large samples to distinguish from random noise. This is why it is important to test things that can produce large differences. A new headline might swing conversions by 30%. A different shade of blue on a button might move the needle by 1%. The headline test will give you a clear answer in weeks. The button color test might never give you a clear answer at all.
Most A/B testing tools include a sample size calculator. Use it before starting your test. Enter your current conversion rate and the minimum detectable effect you care about, and it will tell you how many visitors you need. Then divide that number by your daily traffic to estimate how long the test will take. If the answer is "three months," that test is not practical — either test something with a bigger expected impact or use sequential testing instead.
Common mistakes that invalidate your tests
Stopping too early. This is the most common mistake by far. You see version B winning after two days and you stop the test. But those two days might have been a weekend when traffic behaves differently. Or the first few hundred visitors skewed heavily toward one segment. Wait for the predetermined sample size. Always.
Testing too many things simultaneously. Running five tests at the same time on the same page creates interactions between the tests that make the results unreliable. If you change the headline in test one and the CTA in test two, a visitor might see new headline plus old CTA, old headline plus new CTA, or both new elements together. These combinations were not part of your original hypothesis and they muddy the results. Run one test per page at a time.
Ignoring segments. Your overall test might show no winner, but version B might be dramatically better for mobile visitors while version A is better for desktop visitors. If you only look at the aggregate number, you miss this insight entirely. After a test concludes, segment the results by device type, traffic source, and geography. You might find wins hiding inside an inconclusive overall result.
Testing the wrong thing. If your conversion rate is 0.5%, the problem is almost certainly not the button color or the font size. It is the offer, the audience targeting, or the core value proposition. Our guide on how to improve your website conversion rate covers the fundamentals to get right before you start testing. Micro- optimizations only matter when the fundamentals are already working. If your page converts below 2%, step back and ask whether the right people are seeing the right offer with the right messaging. Then test.
Not accounting for seasonality. If you run a test that starts on a Monday and ends on a Thursday, you have missed the weekend entirely. Traffic patterns, visitor intent, and conversion rates often vary significantly by day of the week. Run tests in full-week increments — at minimum one full week, ideally two — to ensure both versions experience the same mix of days.
Declaring a winner based on the wrong metric. If your test increased clicks on the CTA by 15% but decreased actual completed purchases by 5%, the test was a failure — even though one metric improved. Always measure the metric that matters most to your business. For most businesses, that is revenue or qualified signups, not clicks or page views.
Tools for A/B testing in 2026
Google Optimize was discontinued in 2023, and since then the landscape has shifted. Here are the options worth considering depending on your budget and technical ability.
Free and open-source. GrowthBook is the standout option — it is open-source, self- hostable, and supports both A/B testing and feature flags. It requires some technical setup but gives you full control over your data. PostHog also includes A/B testing as part of its analytics suite and has a generous free tier.
Paid tools. VWO and Convert are solid mid-market options with visual editors that let non-developers create tests without writing code. LaunchDarkly focuses on feature flags but supports experimentation. Optimizely remains the enterprise standard but is expensive and overkill for most small teams.
Simple approaches. If you do not want to set up a dedicated testing tool, you can run basic A/B tests manually. Create two versions of a page at different URLs. Use your analytics tool — sourcebeam works well for this — to track conversion rates on each URL separately. Split traffic between the URLs using your ad platform, your email tool, or a simple server-side redirect. It is less elegant than a proper testing platform, but it works. The important thing is that you are testing at all.
For most small teams getting started, the best tool is whichever one you will actually use consistently. A free tool you use every month beats an expensive tool you set up once and forget.
What to test first
Not all tests are created equal. The highest-value tests share two characteristics: they target high-traffic pages and they target high-impact elements. Focus your first tests where these two qualities overlap.
Start with your highest-traffic page. For most websites, this is the homepage or the primary landing page for paid campaigns. A 10% improvement on a page that receives 10,000 visitors per month has 10x more business impact than a 10% improvement on a page that receives 1,000 visitors per month. The math is simple but often overlooked — people test whatever page they happen to be working on rather than the page where improvements matter most.
On that page, test the headline first. It is the highest-leverage element and the easiest to change — especially on landing pages built for a single campaign. Then test the CTA — the text, the placement, and whether there is one or several. Then test the social proof — its presence, placement, and format. These three elements, tested in sequence on your highest-traffic page, will teach you more about your visitors in three months than a year of guessing.
After your primary landing page, move to the next page in your conversion funnel. If visitors go from landing page to pricing page to signup, test the pricing page next. Then the signup flow. Work your way through the funnel from top to bottom, optimizing each stage in order of traffic volume.
Sequential testing for low-traffic sites
If your site gets fewer than 1,000 visitors per month to the page you want to optimize, traditional A/B testing is impractical. The sample size requirements mean tests would take months to complete. But that does not mean you should give up on testing entirely. You just need a different approach.
Sequential testing — also called before-and-after testing — works like this: measure your current conversion rate over a fixed period (two weeks is a good minimum), make one change, then measure the conversion rate over the same length of time. Compare the two periods.
This is less rigorous than a proper A/B test because the two periods might have different traffic conditions. But for low-traffic sites, it is far better than not testing at all. To improve reliability, use two-week periods that include the same days of the week, avoid making changes during unusual traffic periods (holidays, product launches, press coverage), and only draw conclusions from large differences. If your conversion rate went from 3% to 3.5%, that could easily be noise. If it went from 3% to 5%, something real probably happened.
A tool like sourcebeam that tracks conversions over specific date ranges makes sequential testing straightforward. You can pull the exact conversion numbers for each two-week period and compare them cleanly without having to set up a formal A/B testing platform.
The key discipline with sequential testing is changing only one thing between periods. If you change the headline and the CTA and the hero image all at once, you are back to guessing. One change per period. Document what you changed. Measure the result. Move on.
Building a testing culture
The biggest wins from A/B testing come not from any single test but from the compound effect of testing consistently over time. A 5% improvement from one test is modest. But twelve tests over a year, each producing a 5% improvement, compounds to an 80% total improvement. That is transformative. And it starts with treating testing as an ongoing practice rather than a one-time project.
Document everything. Keep a testing log — a simple spreadsheet with columns for the date, the page tested, the hypothesis, the variant description, the sample size, the result, and the improvement percentage. This log becomes your institutional memory. It prevents you from re-testing things you have already tried, helps you spot patterns across tests, and provides ammunition when someone wants to make a change based on opinion instead of data.
Celebrate learning, not just wins. Not every test will produce a winner. In fact, most tests are inconclusive or show no significant difference. That is fine. An inconclusive test still taught you something — that the element you tested is not a major conversion lever for your audience. Now you can focus your energy elsewhere. The only wasted test is one you do not learn from because you did not document the result.
Set a testing cadence. Commit to running at least one test per month. Block time on your calendar to review results from the current test, identify the next hypothesis, and set up the next experiment. Without a cadence, testing becomes something you do when you remember — which is rarely.
Compound small wins. Do not wait for big, dramatic ideas. Most improvements come from small, incremental changes that individually seem modest but add up to significant gains over time. A slightly better headline here, a clearer CTA there, a more relevant testimonial somewhere else. None of these alone will double your conversion rate. All of them together might.
Share results with your team. If you work with others, make test results visible. Post them in Slack, discuss them in team meetings, add them to your project management tool. When everyone on the team sees real data about what works and what does not, it shifts the entire culture from opinion-driven decision-making to evidence-driven decision-making. That shift is worth more than any single test result.
A/B testing is not complicated. It is not expensive. It does not require a data science team. It requires patience, discipline, and the willingness to let data overrule your assumptions. Start with one test on your highest-traffic page. Measure the result. Learn from it. Run the next test. That is the entire playbook — and it works.
sourcebeam makes it easy to track conversions and compare performance across pages — so you always know what your tests are actually doing. Try it free