I cannot tell a lie, I did it with my little hatchet.
One sunny day, as the story goes1, a 6-year-old George Washington ventured into his father’s garden with a brand new hatchet, feeling industrious (if you want to be generous) or mischievous (if you don’t).
Lo and behold, what did the boy spy in the center of the garden but a majestic cherry tree, standing tall and proud, the prize of his father’s orchard. Finding no other outlet for his boyish ambition (I’ll be generous), young George set about testing his mettle, and the blade of his hatchet, against the trunk of that cherry tree.
George eventually triumphed, and the tree was cut down.
(This wouldn’t be a very good story if his father didn’t later discover the tree and confront George.) “Son,” said George’s father, “tell me you did not have a hand in the demise of this noble cherry tree.”
In perhaps the most famous moment of Colonial-American folklore, young George Washington gallantly replied, “I cannot tell a lie, I did it with my little hatchet.”
I wonder what ever became of that hatchet
Cherry picking (or if you prefer, reporting bias) has a deservedly negative connotation. When you accuse someone of cherry picking, you accuse them of intentionally, unscrupulously discarding information that does not support their argument. I’m still feeling generous, so I’m going to propose that almost all instances of cherry picking are unintentional and innocent: When you observe something interesting, you tell people about it. When you don’t, you don’t.
When it comes to statistics, “something interesting” is “an event with less than or equal to a 1 in 20 chance of happening randomly”. So, flipping a coin and seeing 6 of the same side (heads or tails) in a row on the first try is “something interesting”.
- one consecutive → 20:20
- two consecutive → 10:20
- three consecutive → 5:20
- four consecutive → 2.5:20
- five consecutive → 1.25:20
- six consecutive → 0.625:20
If 20 people in your company each pull a coin from their pocket and flip it 6 times, 19 of them won’t have anything to talk about. That’s reporting bias (cherry picking), and in this case it’s completely innocent. Unfortunately, that last, red line is still going to end up in your Power BI report—while the rest won’t. This innocent cherry picking leads to the same problems as malicious cherry picking. I’ve covered that topic before.
This article is about a way to spot cherry picking (innocent or otherwise) on the net or in media, because I know a lot of extremely smart people who still believe that if the scientific method is followed, the result must be valid.
Confluence
We frequently hear how some “bellwether demographic”2 is moving out of major cities, shifting politically, dying younger, buying pets, retiring earlier, etc. Friends who repeat these reports often offer to show us “the data”, and if we examine the data they offer, we’ll see the same result they did. You’ve already guessed the problem: we’re only seeing the reported data. To see the data, we’d have to see all the data no one bothered to report.
Confluence is an intersection of vectors, and the larger the confluence (the more vectors), the greater the difference between the reported data and the data. If you want a rule of thumb, be suspicious of any group name with a comma.
Let’s see an example.
Next week, you might hear that “Asians between 18 and 24 with a household income between $41,776 and $89,075 are buying blue cars in record numbers.” I didn’t use commas to spell that out, but you can see where they’d go. The group in this example has three vectors:
- Ethnic group: Asian
- Age bracket: 18 to 24
- Income bracket: $41,776 to $89,075
The small problem with confluence
The last job application I read offered 7 choices for ethnic group:
- Hispanic or Latino
- American Indian/Alaskan Native
- Asian
- Black (Not of Hispanic Origin)
- Native Hawaiian/Other Pacific Islander
- White (not of Hispanic Origin)
- Two or More Races
I did a quick Internet search and found a survey-supply source offering 7 typical age brackets:
- Under 18
- 18 to 24
- 25 to 34
- 35 to 44
- 45 to 54
- 55 to 64
- 64 and above
There are 7 income brackets in the US tax code:
- $0 to $10,275
- $10,276 to $41,775
- $41,776 to $89,075
- $89,076 to $170,050
- $170,051 to $215,950
- $215,951 to $539,900
- $539,901 or more
That’s 7 x 7 x 7 = 343 chances at a 1:20 result. If you’re keeping score, that’s a greater than 99.999% chance to find “something interesting”.
The big problem with confluence
I reached for the three most convenient bracketings I could find and by coincidence each had seven brackets. As above, that led to 343 potential intersections. That’s 343 examinations of blue-car-buying habits that might have been performed. We have no way of knowing, so we have no way of knowing whether the reported trend is significant or just a result of the near-certain chance to find “something interesting” in 343 tries.
How about red cars?
Blue-car buying is yet another vector. It could have been red or green or silver or white or blue or black or other color cars. That’s another exponential increase. 7 x 7 x 7 x 7 = 2401 chances at a 1:20 result. Still keeping score? that’s so close to a 100% chance of “something interesting” that the difference cannot be expressed with 64-bit floating point precision.
Each vector can also be stretched
The age brackets above seem fair, don’t they? They don’t feel manipulated or “gamed”. If I had used these instead, would you have been suspicious?:
- 0 to 10
- 10 to 20
- 20 to 30
- 30 to 40
- 40 to 50
- 50 to 60
- 60 to 70
- 70 to 80
- 80 to 90
- 90 and above
How about these?:
- The Silent Generation: Born 1928-1945
- Baby Boomers: Born 1946-1964
- Gen X: Born 1965-1980
- Millennials: Born 1981-1996
- Gen Z: Born 1997-2012
- Gen Alpha: Born 2013-2025
For all we know, the same experiment was performed on all of those 23 age brackets. When only one result is published, the fact that experiments overlap is hidden.
That’s still not the worst of it
There are also hidden vectors. In addition to
ethnic group | age bracket | income bracket | car color |
other experimenters might have examined
number of children | country of residence | astrological sign | profession |
additional experiments might have been done with
weight at birth | eye color | last digit of zipcode | level of education |
or any of myriad combinations or additions. “Something interesting” is always taking place, and not even replication keep us safe. The chance of 2 consecutive interesting events is 1:400. Sounds small, but 7 x 7 x 7 x 7 chances at 1:400 is still greater than 99.999%
Is this really a problem?
Sure, conceivably, someone could run millions of experiments, report only interesting findings, and accomplish nothing but a close look at random chance. But is that even possible, much less likely?
Traditionally, no. But now we have big data, meta analyses, and “Support Vector Machines”. Experiments can be performed at hundreds per second, and an entire industry has been built around selling findings. Don’t dismiss everything you read, but common sense may be more important now than at any time in history.
Statistics work, but it’s reasonable to dismiss all confluences that don’t stand up to common sense or multiple replications.
-
Weems. 1806. The Life of Washington, 5th addition ↩
-
The term “bellwether” originally comes from the practice of placing a bell on a sheep or a wether (a castrated ram) within a flock. The bellwether would lead the flock, and the behavior of this leading sheep was believed to reflect the behavior of the entire flock. ↩