top of page

Data Deceptions: Simpson’s Paradox—When the Numbers Trick You

Feb 12

2 min read

0

0

0

"Oh, people can come up with statistics to prove anything, Kent. Forty percent of all people know that." – Homer Simpson





Small business owners make decisions based on data every day, whether it’s analyzing marketing results, tracking employee performance, or setting prices. But sometimes, data can be deceptive. Simpson’s Paradox is one of the most surprising statistical illusions—where a trend appears in different groups but completely reverses when those groups are combined.


What is Simpson’s Paradox?

Imagine you own a coffee shop chain and want to test two different promotional strategies:

  • Loyalty Card Program – Customers get a free coffee after 10 purchases.

  • Discount Days – Customers get 25% off every Monday and Tuesday.

You run the test at two locations, one in a busy downtown area and the other in a quiet suburban neighborhood, tracking how many customers each promotion brings in.


Here’s the data:

Location

Loyalty Card

Customers

Conversion Rate

Discount Days

Customers

Conversion Rate

Downtown

80/400

400 total

20.0%

180/600

600 total

30.0%

Suburbs

40/100

100 total

40.0%

90/200

200 total

45.0%

  • In the Downtown location, Discount Days performed better (30% vs. 20%).

  • In the Suburbs, Discount Days also performed better (45% vs. 40%).

Looking at both locations individually, Discount Days is the better promotion.


But when we combine the data:

Promotion

Total Customers

Total Participants

Overall Success Rate

Loyalty Card

120

500

24.0%

Discount Days

270

800

22.5%

Suddenly, Loyalty Card (24%) looks better than Discount Days (22.5%), even though Discount Days won in both locations individually.


Why Does This Happen?

This paradox occurs because of uneven group sizes.

  • Downtown had far more customers than the Suburbs, and Discount Days performed well there—but not well enough to outweigh its advantage in both individual locations.

  • The imbalance in sample sizes creates a misleading overall trend.


How to Avoid This Trap?

  1. Ensure Proper Randomization – External factors (like location differences, time of year, or customer demographics) can skew test results. The best way to minimize these effects is to randomly assign customers to different promotions rather than testing in separate locations.

  2. Always Segment Your Data – Never trust only the overall numbers—analyze different groups separately.

  3. Identify Hidden Variables – Look for external factors (location, pricing, demographics) that may be skewing results.

  4. Test with Different Groupings – If a trend reverses depending on how data is grouped, you may be seeing Simpson’s Paradox in action.


Numbers don’t lie, but they can mislead. The next time the data seems too good (or bad) to be true, dig deeper—you might be seeing Simpson’s Paradox at work.

Feb 12

2 min read

0

0

0

Comments

Share Your ThoughtsBe the first to write a comment.
bottom of page