A/B Test Significance Calculator

Drop in two conversion rates and sample sizes to get statistical confidence

A/B Test Result

Not significant

p = 0.153

Confidence level

Hypothesis

Variant A10.00%

95% CI: 8.14% – 11.86%

Variant B12.00%

95% CI: 9.99% – 14.01%

Relative uplift

+20.00%

z-score

1.4293

Absolute difference

+2.00pp

95% CI of difference

-0.74 to 4.74pp

How it works

This calculator runs a pooled two-proportion z-test on your variant data. It pools the conversions from both variants to estimate a shared rate under the null hypothesis, computes the z-score for the observed difference, and converts it to a p-value using the standard normal CDF (via the Abramowitz–Stegun erf approximation, accurate to about 1.5×10⁻⁷). The confidence interval for the difference uses the unpooled standard error. For one-tailed tests, the p-value is reported in the direction of the observed effect. All math runs in your browser — no data leaves your device.

Free A/B Test Significance Calculator

This A/B test calculator tells you whether the difference between two variants is statistically significant or just random noise. Enter the number of visitors and conversions for your control (Variant A) and your challenger (Variant B), pick a confidence level, and you instantly get a clear verdict, a p-value, the relative uplift, and a confidence interval for the difference. Everything runs in your browser — no signup, no data sent to a server.

How the A/B Test Calculator Works

Under the hood, this tool runs a two-proportion z-test, the standard method for comparing conversion rates. In plain language, it works like this:

If the p-value falls below your threshold (5% for a 95% confidence level), the result is statistically significant. The calculator also reports a confidence interval for the absolute difference, which shows the plausible range of the true effect — if that range includes zero, you can't rule out "no difference at all."

One-Tailed vs Two-Tailed Tests

A two-tailed test checks for a difference in either direction and is the recommended default, because a variant can genuinely perform worse. A one-tailed test only looks in one direction and produces smaller p-values, so it should be reserved for cases where a negative result would be treated the same as no result.

Choosing a Confidence Level

95% is the most common choice and a sensible default for most experiments. Use 99% when a wrong decision is expensive (pricing changes, checkout flows), and 90% only for low-risk tests where you accept a higher chance of a false positive in exchange for faster decisions.

Tips for Trustworthy A/B Test Results

Whether you're testing landing pages, email subject lines, or pricing, this A/B test significance calculator gives you a fast, honest answer about whether your winner is real.

Add this tool to your site

Free to embed anywhere — paste this snippet into your HTML and the tool appears right on your page. It resizes itself automatically. Add data-theme="dark" or data-theme="auto" to match your site.

<script async src="https://whatsmytools.com/embed.js" data-tool="ab-test-calculator"></script>
Preview embed

You might also like

Frequently Asked Questions

What does statistical significance mean in an A/B test?
Statistical significance means the difference between your variants is unlikely to be explained by random chance alone. At a 95% confidence level, a significant result means that if there were truly no difference, you would see data this extreme less than 5% of the time. It does not guarantee the effect is large or important — only that it is probably real.
What is a p-value?
The p-value is the probability of observing a difference at least as extreme as the one in your data, assuming the two variants actually convert at the same rate. A p-value of 0.03 means there is a 3% chance random noise alone would produce a gap this large. If the p-value is below your significance threshold, the result is called significant.
Should I use a one-tailed or two-tailed test?
A two-tailed test checks whether B is different from A in either direction, while a one-tailed test only checks one direction. Two-tailed is the safer default because B can plausibly perform worse, and a one-tailed test halves the p-value, making it easier to declare a winner prematurely. Use one-tailed only when you genuinely don't care about detecting a negative effect.
How much traffic do I need for a reliable A/B test?
It depends on your baseline conversion rate and the smallest uplift you want to detect — smaller effects need much more traffic. As a rough rule, detecting a 20% relative improvement on a 5% conversion rate requires around 8,000-10,000 visitors per variant at 95% confidence and 80% power. Each variant should also have at least 5 expected conversions and non-conversions.
Why shouldn't I stop my A/B test as soon as it shows significance?
Checking results repeatedly and stopping the moment p dips below 0.05 — known as peeking — dramatically inflates your false-positive rate, because random fluctuations will cross the threshold temporarily even when there is no real effect. Decide your sample size or test duration in advance and only evaluate significance at the end.