
Data Deceptions: Simpson’s Paradox—When the Numbers Trick You
Feb 12
2 min read
0
0
0
"Oh, people can come up with statistics to prove anything, Kent. Forty percent of all people know that." – Homer Simpson

Small business owners make decisions based on data every day, whether it’s analyzing marketing results, tracking employee performance, or setting prices. But sometimes, data can be deceptive. Simpson’s Paradox is one of the most surprising statistical illusions—where a trend appears in different groups but completely reverses when those groups are combined.
What is Simpson’s Paradox?
Imagine you own a coffee shop chain and want to test two different promotional strategies:
Loyalty Card Program – Customers get a free coffee after 10 purchases.
Discount Days – Customers get 25% off every Monday and Tuesday.
You run the test at two locations, one in a busy downtown area and the other in a quiet suburban neighborhood, tracking how many customers each promotion brings in.
Here’s the data:
Location | Loyalty Card | Customers | Conversion Rate | Discount Days | Customers | Conversion Rate |
Downtown | 80/400 | 400 total | 20.0% | 180/600 | 600 total | 30.0% |
Suburbs | 40/100 | 100 total | 40.0% | 90/200 | 200 total | 45.0% |
In the Downtown location, Discount Days performed better (30% vs. 20%).
In the Suburbs, Discount Days also performed better (45% vs. 40%).
Looking at both locations individually, Discount Days is the better promotion.
But when we combine the data:
Promotion | Total Customers | Total Participants | Overall Success Rate |
Loyalty Card | 120 | 500 | 24.0% |
Discount Days | 270 | 800 | 22.5% |
Suddenly, Loyalty Card (24%) looks better than Discount Days (22.5%), even though Discount Days won in both locations individually.
Why Does This Happen?
This paradox occurs because of uneven group sizes.
Downtown had far more customers than the Suburbs, and Discount Days performed well there—but not well enough to outweigh its advantage in both individual locations.
The imbalance in sample sizes creates a misleading overall trend.
How to Avoid This Trap?
Ensure Proper Randomization – External factors (like location differences, time of year, or customer demographics) can skew test results. The best way to minimize these effects is to randomly assign customers to different promotions rather than testing in separate locations.
Always Segment Your Data – Never trust only the overall numbers—analyze different groups separately.
Identify Hidden Variables – Look for external factors (location, pricing, demographics) that may be skewing results.
Test with Different Groupings – If a trend reverses depending on how data is grouped, you may be seeing Simpson’s Paradox in action.
Numbers don’t lie, but they can mislead. The next time the data seems too good (or bad) to be true, dig deeper—you might be seeing Simpson’s Paradox at work.