A/B Testing

I had the pleasure to go and hear Vincent Dirks present on A/B Testing at the Auckland WeTest meet up last week.
I always enjoy WeTest events, and this was no different - here’s a quick summary of what I learned!

A/B Testing

A/B Testing is a method of testing in production that is becoming more and more common.
The premise is simple.
It means deploying two different variations of a product at the same time (version A and version B) and monitoring the results.

When users browse to the screen in question, either one will appear - either at random, or according to some criteria (location, or some user specific data, like age or gender).
Then, behind the scenes, we use logging or monitoring tools to determine which was more successful.

 Image from

Image from

An example might help: Your company has a red signup button. But you have a theory (hypothesis) that you’ll get more signups if the button is green.

So you try it - by deploying both versions, and redirecting half of your traffic to one or the other.

Then, you measure how many visitors complete the signup on each version of the page.

If the number of signups on the green button is significantly higher - you’ve learned something!

You can then take action by changing the button for all users to green - increasing your signups overall.

Of course, if the number of signups on the red button is higher - you’ve still learned something, but the end result will be to keep the button red!

So you’ve got some metrics - then what?

 Image from

Image from

How do you know that the numbers you’re looking at are significant?
How do you know they’re not just the result of random chance, or some external factor?

This is the tough part. Any time I’ve done A/B testing in practice, I’ve really not had exposure to this side of the process.

In fact, I suspect in some cases we haven’t even analysed the numbers - we’ve just looked at the higher number, and declared it the best solution!

But - to do this properly, there’s math involved. A statistical analysis needs to be done.

Unfortunately, this sort of math brings back unpleasant memories of seventh form statistics class - and it’s much too involved for this little blog post :)

Basically, a statistical analysis can prove, with a certain level of confidence, that one of the variations in your A/B test has made a difference.

You can look into the statistics behind this further if you like, or even find calculators online that will tell you whether your results are significant (like this one from kissmetrics).

One of the important things to note is that sample size is important - you need a large data set for your A/B test to be of any value. A set that is too small simply won’t be reliable enough.

What did I actually learn?

My takeaway from this is, that when my team decides to run an A/B test, there are questions that need to be asked.

  • How are we measuring it?
  • How will we decide whether one of our options is more impactful?
  • How long should we test for, and what number is going to be big enough? When do we stop testing?

I now know that if I can get the answers to these ahead of time, our A/B test is going to be more useful.

By getting a statistical analysis done, A/B test feedback can go from “option B gave better results” to “we can be 95% confident that option B will give us a 10% increase in conversions”.

Or something to that effect!

Special thanks to Vincent for taking time to give his presentation last week, I learned a bunch - you can find Vincent on LinkedIn!

  • JE

Some notes on this weeks OWASP update


Important security testing news:

There’s a new OWASP top ten out this week!

The OWASP top ten are the ten most critical security risks to software applications as defined by the OWASP organization.

Disclaimer! I’m not a security tester or expert of any kind. But, all testers need to have some, even a little, awareness of security. Learning about the OWASP top ten is a great entry point for testers to the world of security.

Anyway, here’s the new top ten:

  1. Injection

  2. Broken Authentication

  3. Sensitive Data Exposure

  4. XML External Entities (this one’s new!)

  5. Broken Access Control (this one’s kinda new!)

  6. Security Misconfiguration

  7. Cross-Site Scripting

  8. Insecure Deserialization (this one’s new too!)

  9. Using Components with Known Vulnerabilities

  10. Insufficient Logging & Monitoring (also new!)

Shiny new stuff!!

“Broken Access Control” isn’t actually new . It’s a combination of two items from the old list: “Insecure Direct Object References” and “Missing Function Level Access Control”

The items that have dropped off the top ten are:

  • Cross-Site Request Forgery

  • Unvalidated Redirects and Forwards

Of course, that doesn’t mean that they’re not still security risks, they’re not as critical as some of the others.

I wanted to do some research into these new items, because, that’s important - here’s what I learnt!

XML External Entities:

This one’s a vulnerability when using XML files.

An XML file can contain a header section called a DTD (Data Type Definition).

This in turn can contain an ENTITY element. The ENTITY element serves as a placeholder for a special character or string you want to use later in the XML.

This ENTITY can even be a link to an external source.

Something like:

<!DOCTYPE doc [
    <!ENTITY name SYSTEM "”>

This entity element can then be called upon later in the file, like so:


The vulnerability is when an attacker uploads an XML file. They could put something sinister in this external ENTITY element.

For example, it could link to a file on the hosts machine. Try “file:///etc/hosts” - which would gain the attacker access to the contents of that file.

The simplest way to prevent this vulnerability is to disable DTD external entities completely. This can be done at the code level.

If that can’t be done, OWASP have a range of other ways to mitigate this risk.

What does this mean for testing?
It means being aware that if your software uses XML files, to be aware of this vulnerability.
Talk to your development team and check that steps have been taken to prevent it!

For a much more detailed explanation, check out this post by Ian Muscat on

Insecure Deserialization:

Insecure Deserialization is something I find a little more difficult to understand. My interpretation could be iffy, so, take this with a grain of salt.

Serialization is when an object is translated into a format that can be stored somewhere, and then ‘deserialized’ later. (JSON is an example of a serialised format.)

Some deserialization mechanisms offer the ability to execute code. This is a problem! When the object that is being deserialised comes from an untrusted source, they could send through malicious code.

The simplest way to prevent it, is to disable deserialization of objects from external parties.

If this is not possible, again, OWASP have a list of technical ways of mitigating the risk.

As far as testing goes, one of my team likened it to testing for Injection, which I thought was a good analogy.You might test for Injection by inserting ‘junky’ data into a field. Similarly, you could test for Insecure Deserialisation by adding 'junky' data to a file you’re importing into your system.

This set of slides by Apostolos Giannakidis for OWASP is useful if you'd like to dig further..

Insufficient Logging & Monitoring:

This one is pretty easy to understand (yay!).

Attackers take advantage of companies who aren’t monitoring security events.

Things like failed logins, validation failures and access control failures, should all be logged in a clear, easily accessible format. Any suspicious activity should raise an alarm.
If this isn’t done, an attacker has much more time and freedom to do damage before an organisation can even realise they’re under attack.

When testing, check that auditing and monitoring of your features are done in a sensible way. The cause of problems like the above should be easy to identify, and any bad behaviour should raise an alarm.

One suggestion is to go over your logs after a penetration test has been completed - can you tell what has happened? At what point did you get alerted that something was going on?

You can read a whole lot more on logging, again at OWASP.

So - there's a brief overview of the new entries to the OWASP top ten. The exercise of tracking down what these things mean has been useful to me, so I hope the write up has been of use to you!

Good luck with your security testing!

- JE

WeTest 2017

Sometime last month, James and I went to Wetest 2017 in Auckland. As James has been doing so much writing lately, i took a crack at writing up how i found the day! 

Slides and photos from the events can be found at

This year We-test was bigger than ever, moving into the ANZ Viaduct events centre with ~250 people attending, including most of our QA team. It was my first experience of WeTest and the WeTest community so i was excited and expecting a great day, and it didn’t disappoint. I made sure I was there a bit early and was stoked to bump into a couple of my old colleagues Franklin & John.
The venue had a really nice set up with groups of ~8 around tables, everyone had a great view of the speakers and the two large screens. This years theme ‘what now, what next’ brought together New Zealand and international star speakers to discuss where they are at and where the industry is heading.

 Bonus points for spotting both the Super Testing Bros in the crowd.

Bonus points for spotting both the Super Testing Bros in the crowd.

Key Note, Dan Billing @thetestdoctor

How to be a Redshirt, and survive (or managing bias and negativity when security testing)
Dan Billing, over from the UK, kicked us off with a really insightful talk about life as a security tester. Themed with old school star trek, Dan’s talk centered on being the role of a security officer, or a ‘red shirt’, the first people to reach the surface of a new planet putting their lives on the line and dying gruesome deaths to protect the crew. Without dying quite as often, Dan recounted some of the challenges faced by being responsible for a projects security and the mental obstacles that need to be overcome.

Key Takeaways
Diverse teams
Diverse teams may need longer to complete testing across a piece of code, but will return a broader range of issues in turn improving your security
* Modelling - use flow charts, mind maps and thread diagrams to model your application. (courtesy of @eviltester)
* Observation - observe how your application behaves when used by different groups eg. experienced, inexperienced and malicious users.
* Reflection - step back and reflect before deciding the best way forward.
* Interrogation - dive deep into, selected, risky scenarios, what happens if?
* Manipulate -manipulate data and use scripts to try to manipulate the application.
OWasp Juice shop
This is a fantastic, fun & gamified application to learn how attacks occur by both technical and social engineering.

Ardian Silvandianto

Testing as a team the sequel


Ardian shared his story of how he transformed his testing approach and that of his team. The 2 QA were regularly suffering burn out dealing with the high in-tray of tickets from the 8 developers. With the other QA going on leave for some time, Ardian used this as a time to try something new. Through his presentation we saw the planning, application and development of his teams new working method, evolving from a traditional sprint/scrum methodology and shifting left to have QA work much closer with the developers, much like we do here at Pushpay.
It’s great to see the same ideas being successful elsewhere and Ardian is still very early in his career, I look forward to hearing more from him in the future.

Sunil Kumar @sunilkumar56

Mobile App Testing for Internet of Things and Wearables

I first met Sunil in Sydney last year at CASTx17. He wasn’t speaking then, but it was great to see him up on stage sharing his experiences exploring IoT and the different challenges in testing this vertical. With many objects not having interfaces, testing is very different and requires keen observation and many new techniques. Sunil shared experiences of past endeavours, the tools he used and the importance of security in this new domain. In an anecdote about a seemingly harmless though embarrassing leak, the voices of 2 million parents recorded by a talking bear were taken in an attack, stark consequences followed when it was discovered these recordings could be used to bypass voice recognition security on bank accounts.

Daniel McClelland @DeeMickSee

Paint It Grey: Using debugging proxies to move beyond black box testing

Dan, technical lead at Trademe, treated us to an engaging talk on the concepts and benefits of using debugging proxies such as Fiddler or Charles. This is something that I do on a day to day basis, and while personally there was little for me, there was plenty for the aspiring tester to take on board. Dan demonstrated how simple and beneficial this it is to add into your testing repertoire.
Dan also is a talented musician and artist, I looked up some of his stuff and I liked, so I’ll give him a little plug for that too. you can hear the review of his debut album here:


Emma Barnes

Management, ethics and inclusion in testing

Emma gave us an open and honest talk about a topic that is really relevant to any industry. I’m super proud to work in an industry that I feel is more inclusive and ethical than many others, but we never rest on our laurels. Just because we’re good at this, comparatively, does not mean it’s solved and talks like this help keep us aware and motivated to provide safe and inclusive work environments for all.
The big take away I got from this talk was to accept the inevitable, you/I will screw up, there will be times we get it wrong and hurt people. We know that’s not our intentions, but how we apologise, recover and grow from these mistakes is key. As a tall white English male , i’m aware that I am experiencing the world from a privileged perspective and very paranoid about hurting people. I’m not going to be any less self aware about this, but i can make peace with myself over mistakes now I know how to apologise properly.
* Just Say ‘I’m Sorry’ and thank them for the feedback
* Make a plan to change your actions and talk about it with the person involved.
* Change your actions. (This bit is super important, do not apologise just for the show.)


Kathryn Hempstalk

Machine Learning for Testers

As a Data scientist, Kathryn spoke about the need for testing to extend over the work of her domain, adding quality to the data we analyse is critically important to making fully informed business decisions. When building a machine learning algorithm, data analysts use small data sets to build the rules. The integrity of the data in these rules is vitally important as it will shape the rules that are applied to the rest of the data.
Another fascinating example of the use of data science and machine learning was login services, analysing a user’s typing style to help determine if a user’s credentials have been compromised. They do this without having the sample data of other people - as a tester I can see so many potential issues in this solution, what if the user has an injury and can’t type normally? How can you test other typing styles to tell if the attack recognition works?
It is always really fascinating to hear from other business aspects at conferences, it broadens our horizons and let’s us peek into their world. I wonder if we as testers appear at conferences for other professions in the same way?

Lean Coffee and Games

In-between lunch and afternoon tea we had a lean coffee session. For those who don’t know, check this out here
I was sitting at a leadership table, we covered a number of topics but something that really stuck with me was that everyone wanted to empower and enable their teams to learn and develop. For myself, I feel privileged and fortunate to work in a job I enjoy. However I was left frustrated by how some attitudes around the table seemed negative and archaic, in particular the view that learning should be done outside of work hours and when there is time in office it cannot be done on project work. I feel quite strongly against this view, and i think it’s worth raising in a podcast so look out for that!

Yvonne Tse @1yvonnetse & Rupert Burton @rupert_burton

Testing is a mindset, not a role - from testing to user research

Another great non-testing talk, this time we gained a valuable insight into the user research role courtesy of Rupert and Yvonne. We’re lucky here to work closely with our UX team, it certainly hasn’t been the case in previous roles and even so there was still a lot of new insights here. The talk really highlighted the existing and separate skills between testing and User research, and I was surprised to see how many we share. Another fascinating stat was around usability studies, and how at ~5 responses you tend to cover 85% of the issues that user research can find. I was really surprised by how small this number was - but not that many companies refuse to trust this observation and go on spending money on more testing.
It was also great to see how they use AR/VR and other new technologies to power up the UX process.

Samantha Connelly @sammy_lee12

Using robots to test mobile apps

Sammy gave us a talk in two parts, first up we got to see Tappy McTapface in action, a 3d printed bot using Arduino & Johnny-five javascript API. It was awesome to see Tappy strut it’s stuff and play us some music, and I can definitely see some useful applications for it in device testing, although I think it’s a couple of years away from the mainstream yet. I’d love to see a company really throw some resource into this tech and see the benefits it brings.
In the 2nd half of her talk Sammy showed how she organises the work or bugs that are outstanding on a project, It’s a concept i’ve used myself before but with an extra twist I really liked. The concept relies on plotting out the impact vs likelihood of a particular bug happening to end users.
Where the tasks fall determines the priority - fixing the ones with highest likelihood & severity first. At the bottom of the ladder, you could actually result in removing functionality if it is so unlikely to be used.
The extra twist I really liked was the addition of the colour spots to denote additional context such as security or compliance issues, this might bump up the priority of the task.


Lock Note - Angie Jones @techgirl1908

Owning our Narrative

We ended the day with a fantastic, energetic talk from Angie Jones. She recounted the history of recorded music, how the invention of the gramophone and subsequent technologies affected how musicians entertained. Shortening tracks, changing instruments and adapting to uncertain times. The role of the musician never changed, but the way the performed it did. As testers our methodologies, our tools, the subject of the testing all changes so we must adapt and embrace change to succeed. But also so must our tools and processes!
Our community, particularly in Auckland has a reputation of bemoaning and fighting change, this talk was a fantastic eye opener to welcome change and grow with it.
Key takeaways
* Don’t fight the change, you’ll just be playing catchup
* Adapt your tools and processes to change with you
* Embrace change

The conference was a fantastic day all round. I would have loved to speak to more people outside of those I already knew and swap more ideas, please comment below your thoughts and experiences of the conference!

30 Days of Agile Testing! Day thirty.

Day 30:
What action does your team take on a red build?

I had to google the term ‘red build’ - but I can’t find anything relevant!
I guess it means, automated tests have failed (i.e. they’re red, not green).

Every time we release, a bunch of automated tests run. It’s all driven by Slack.
If everything’s good, there’s a whole lot of green ticks for each set of tests, and some really happy green emoji!


Each letter in that block stands for a suite of tests. Some are obvious - UT = Unit Tests, AT = Acceptance tests - et cetera. I’ll let the others remain a mystery :)

If something fails, then you don’t get the green emoji - and you’ll see some red X’s too. Here’s an example:


In this case the ‘B’ suite of tests has failed (a visual regression). 

At this point, here’s what could happen:

A developer will chime in and say “that was me, I expected that to fail!”
Which is great - he can then go and update the test or do whatever he needs to do to make it green again.

If that doesn’t happen, one or more (most likely all) of the developers will look at the results of the particular test that failed.
At this point, it’s usually pretty clear which piece of work made the test in question fail.Once that’s been figured out, the developer can take corrective action.
Either - ship a fix on top of what they’ve already done, to correct the behaviour. Or, revert their changes and go back and fix it before trying again.

The alternative is the test failed for some other reason - resourcing issue, timeout - these things happen sometimes. If that’s determined to be the case, the test can just be restarted.

What does the tester do while this is happening?
Well, from my own experience:

  • sometimes I’m first on the scene. I can investigate the failure first, alert the right people or restart the tests as needed.
  • help assess the risk. Can we ship something quickly to correct this? Or is that too risky, should we pull out?
  • analysis - what was the cause of this failure - why didn’t it get picked up earlier in our pipeline?
  • finally, testing - if we decide to fix it on the spot, I need to make sure - it’s actually fixed, and - we didn’t undo the work we were shipping originally!

And that, I think, is what our team does on a red build.

I hope that’s been useful!

- JE

30 Days of Agile Testing! Day twenty-nine.

Day 29:
What columns do you have on your work tracker or kanban board?


Prompted by this days exercise, I looked at our Jira Kanban board for the first time in months.
To answer the question, these are the columns on it:

  • To Do
  • Requirements & UX
  • In Progress
  • Ready To Test
  • Verifying
  • Done

But it says a lot, to be honest, that I’ve not looked at it for so long. In fact, I don't think anybody has - I notice our last two projects are completely missing from the board.

Simply: I don’t think you need a kanban board or scrum board to be agile, and having one certainly doesn’t make you agile.

I already know what our implementation plan is - even just a rough idea in my head is enough. I know what my team are working on currently, and I’m pretty sure they know what I’m working on.

The only reason we might need an up to date tracking board is if an external stakeholder wants some visual indication of what we’re doing - but - they seem happy with the feedback they’re getting.

So - 30 days of testing, my question in return is - do you need your work tracker or kanban board?

- JE