A/B Testing

I had the pleasure to go and hear Vincent Dirks present on A/B Testing at the Auckland WeTest meet up last week.
I always enjoy WeTest events, and this was no different - here’s a quick summary of what I learned!

A/B Testing

A/B Testing is a method of testing in production that is becoming more and more common.
The premise is simple.
It means deploying two different variations of a product at the same time (version A and version B) and monitoring the results.

When users browse to the screen in question, either one will appear - either at random, or according to some criteria (location, or some user specific data, like age or gender).
Then, behind the scenes, we use logging or monitoring tools to determine which was more successful.

 Image from

Image from

An example might help: Your company has a red signup button. But you have a theory (hypothesis) that you’ll get more signups if the button is green.

So you try it - by deploying both versions, and redirecting half of your traffic to one or the other.

Then, you measure how many visitors complete the signup on each version of the page.

If the number of signups on the green button is significantly higher - you’ve learned something!

You can then take action by changing the button for all users to green - increasing your signups overall.

Of course, if the number of signups on the red button is higher - you’ve still learned something, but the end result will be to keep the button red!

So you’ve got some metrics - then what?

 Image from

Image from

How do you know that the numbers you’re looking at are significant?
How do you know they’re not just the result of random chance, or some external factor?

This is the tough part. Any time I’ve done A/B testing in practice, I’ve really not had exposure to this side of the process.

In fact, I suspect in some cases we haven’t even analysed the numbers - we’ve just looked at the higher number, and declared it the best solution!

But - to do this properly, there’s math involved. A statistical analysis needs to be done.

Unfortunately, this sort of math brings back unpleasant memories of seventh form statistics class - and it’s much too involved for this little blog post :)

Basically, a statistical analysis can prove, with a certain level of confidence, that one of the variations in your A/B test has made a difference.

You can look into the statistics behind this further if you like, or even find calculators online that will tell you whether your results are significant (like this one from kissmetrics).

One of the important things to note is that sample size is important - you need a large data set for your A/B test to be of any value. A set that is too small simply won’t be reliable enough.

What did I actually learn?

My takeaway from this is, that when my team decides to run an A/B test, there are questions that need to be asked.

  • How are we measuring it?
  • How will we decide whether one of our options is more impactful?
  • How long should we test for, and what number is going to be big enough? When do we stop testing?

I now know that if I can get the answers to these ahead of time, our A/B test is going to be more useful.

By getting a statistical analysis done, A/B test feedback can go from “option B gave better results” to “we can be 95% confident that option B will give us a 10% increase in conversions”.

Or something to that effect!

Special thanks to Vincent for taking time to give his presentation last week, I learned a bunch - you can find Vincent on LinkedIn!

  • JE