A senior Google employee told me that the search engine kept ahead of the competition via a process of rigorous prototype testing. At the time we spoke, prototypes were tested “offline” by measuring the reactions of hired test subjects to particular features and designs. But soon testing moved “online” and we all became the subjects of A/B tests.
Read more: Why OKCupid sending users on bad dates was a good idea
What is A/B testing?
An A/B test is when a company gives a user access to one of two versions of a website or app:
A) the current version
B) the prototype.
The way users interact with the product is measured during testing. Subtle differences in these interactions can illustrate which version is more effective, according to particular criteria. If the prototype is proven superior, it replaces the existing version as the default product.
Google engineers ran their first A/B test in 2000 to determine the optimum number of search results that should be shown per page.
What A/B testing looks like.
Statistics decide, not managers
Websites and apps have become a constellation of comparisons that collectively evolve systems to an improved state. Every change to an interface or alteration to an algorithm is A/B tested.
Web companies run an astonishing number of tests. In a talk, Microsoft stated that the Bing search engine runs over 1,000 a month. So many, in fact, that every time we access an internet site or app, we are likely unwitting subjects of an A/B test. We are rarely aware of the tests because the variations are often subtle.
Companies are able to run so many tests that they have moved to a process known as hill climbing: taking small steps, getting gradually better. This approach has been so successful that it drives the way many companies innovate today.
Teams are charged with the goal of increasing the user measures. If a small tweak tanks, it’s dropped. If it triumphs, it’s launched. The decisions are made by statistics, not managers.
Read more: Data surveillance is all around us, and it's going to change our behaviour
Indeed, advocates of A/B testing stress the importance of ignoring the views of managers, which they call HiPPOs – the Highest Paid Person’s Opinions. This acronym was coined from tales such as that of Greg Linden, an early Amazon employee. Linden suggested that, just as supermarkets put magazines and snacks by the checkout queue, Amazon should adopt the same approach with its online shopping carts.
He recalls that a “senior vice president was dead set against” the idea, fearing it would discourage people from checking out.
Linden ignored the HiPPO and ran an A/B test. The results showed that Amazon would make more money and not lose customers, so Linden’s idea was launched. A/B tests have proved to be more accurate, faster and less biased than any HiPPO.
A/B testing can’t solve everything
The complicated part of A/B testing is figuring out how to measure users in a way that will yield the insights you need. Tests need to be carefully designed, and continually reviewed.
Do it wrong and you could end up with success in the short-term, but failure in the long run. A news site that promotes celebrity tidbits might get the immediate gratification of clicks, but lose loyal readers over time.
There are also limits to what A/B testing can observe. The testing relies on measuring user input, mouse clicks, typing, spoken commands, or taps on a mobile screen. Spotify recently asked if someone has a playlist on in the background and they aren’t interacting with their phone, how can Spotify measure if the user is satisfied? No one currently has an answer.
Read more: Google hits 20 but will struggle to become a trillion dollar company like Apple
Taking A/B testing offline
Despite these risks and limitations, the success of A/B testing pervades all companies with an internet presence. And now this testing is being trialled in the physical world.
A couple of years ago, I met with a company that prints and sends utility bills to customers. They A/B tested different formats of the bill, learning which formats improved the rates of customers paying on time.
Restaurants and bars are reportedly using data from sensors to learn which restaurant layout encourages the most sales. For example, if an intimate seating arrangement in the back of a bar attracts people to stay longer, customers in that space are likely to spend more on drinks.
A/B testing could even extend to manufacturing. Slightly different versions of a product could be made on flexible production lines. Production could then be altered if one version of the product was found to sell better than another.
It’s not always a smooth ride, but the power of A/B testing is here to stay.
Mark Sanderson, Professor of Information Retrieval, RMIT University
This article is republished from The Conversation under a Creative Commons license. Read the original article.