By continuing to use this site you agree to our Cookies Policy.

3 A/B Testing Tips From a Wharton Professor

3 A/B Testing Tips From a Wharton Professor
Table of Contents
  1. Template item

This is an excerpt from MarketerHire's weekly newsletter, Raisin Bread. To get a tasty marketing snack in your inbox every week, subscribe here.

Have you ever gotten an A/B test result that looks statistically significant, but doesn't actually boost performance IRL? 

That’s a false discovery, and it’s a pretty common occurrence. 

Up to 25% of test results significant at the 5% level are false discoveries, according to a new paper co-authored by two marketing professors at University of Pennsylvania’s Wharton School, Ron Berman and Christophe Van den Bulte. 

Berman came away from the research — based on about 5,000 effects from 2,700 experiments run on Optimizely’s platform in 2014 — with some testing tips for growth marketers. 

He broke three of them down for us below. 

Run two-stage A/B tests. 

The best way to avoid false discoveries, according to Berman, is a two-stage A/B test.

Let’s say you’re testing four options on 10,000 people. Instead of showing each version to a quarter of your sample, do this: 

  • Step 1: Show each of the four options to 2,000 people. Pick the top performer. 
  • Step 2: Show the winner to the final 2,000 people, and compare its performance to whatever’s live on your site. This second stage acts as “a defense against the false discovery,” Berman said.  

The more (unique!) options you try in Step 1, Berman and his co-author found, the more robust your test. 

“The smaller sample per variation makes the statistics worse, but picking the best out of and makes it better.”

Worry more about small wins than big flops. 

“If you see very, very, very small improvements,” Berman said, “it's potentially just because you're not courageous enough in the experiments you're doing.”

The distribution of the Optimizely data suggested that the vast majority of A/B test results — 70%! — are true nulls. (In other words, A and B don’t perform differently.)

Berman suspects that’s because “people are too afraid” of big flops, so they test tiny tweaks and cut themselves off from big wins. 

Keep trying. 

Companies with more A/B testing experience see a lower rate of false discoveries, Berman and his co-author found. 

It’s hard to know why this is — are they testing differently? Hiring differently? — but it means that “experience helps.”

Our takeaway? 

When you’re A/B testing, you can’t avoid false discoveries entirely. But you can minimize the damage they do to performance with two-part tests, big creative swings and persistence. 

Mae RiceMae Rice
Mae Rice is editor in chief at MarketerHire. A long-time content marketer, she loves learning about the weird and wonderful feedback loops that connect marketing and culture.
Hire Marketers
Explainers

3 A/B Testing Tips From a Wharton Professor

September 8, 2023
January 25, 2022
Mae Rice

A marketing professor at University of Pennsylvania’s Wharton School dug into data from 2,500+ Optimizely A/B tips for a recent paper — and shared some best practices with us.

Table of Contents

This is an excerpt from MarketerHire's weekly newsletter, Raisin Bread. To get a tasty marketing snack in your inbox every week, subscribe here.

Have you ever gotten an A/B test result that looks statistically significant, but doesn't actually boost performance IRL? 

That’s a false discovery, and it’s a pretty common occurrence. 

Up to 25% of test results significant at the 5% level are false discoveries, according to a new paper co-authored by two marketing professors at University of Pennsylvania’s Wharton School, Ron Berman and Christophe Van den Bulte. 

Berman came away from the research — based on about 5,000 effects from 2,700 experiments run on Optimizely’s platform in 2014 — with some testing tips for growth marketers. 

He broke three of them down for us below. 

Run two-stage A/B tests. 

The best way to avoid false discoveries, according to Berman, is a two-stage A/B test.

Let’s say you’re testing four options on 10,000 people. Instead of showing each version to a quarter of your sample, do this: 

  • Step 1: Show each of the four options to 2,000 people. Pick the top performer. 
  • Step 2: Show the winner to the final 2,000 people, and compare its performance to whatever’s live on your site. This second stage acts as “a defense against the false discovery,” Berman said.  

The more (unique!) options you try in Step 1, Berman and his co-author found, the more robust your test. 

“The smaller sample per variation makes the statistics worse, but picking the best out of and makes it better.”

Worry more about small wins than big flops. 

“If you see very, very, very small improvements,” Berman said, “it's potentially just because you're not courageous enough in the experiments you're doing.”

The distribution of the Optimizely data suggested that the vast majority of A/B test results — 70%! — are true nulls. (In other words, A and B don’t perform differently.)

Berman suspects that’s because “people are too afraid” of big flops, so they test tiny tweaks and cut themselves off from big wins. 

Keep trying. 

Companies with more A/B testing experience see a lower rate of false discoveries, Berman and his co-author found. 

It’s hard to know why this is — are they testing differently? Hiring differently? — but it means that “experience helps.”

Our takeaway? 

When you’re A/B testing, you can’t avoid false discoveries entirely. But you can minimize the damage they do to performance with two-part tests, big creative swings and persistence. 

Mae Rice
about the author

Mae Rice is editor in chief at MarketerHire. A long-time content marketer, she loves learning about the weird and wonderful feedback loops that connect marketing and culture.

Hire a Marketer