I recently met a guy at a party who used to work for MI5. Not a Tom Cruise film, that is, though similar: the British internal security division. Spies! How exciting!... →
Now, whilst I’m not seriously suggesting the US presidential elections might be rigged, the fact that somebody thinks so gives me an excuse for enthusing about a simple but powerful of piece of maths that can help spot faked election results.
As part of our holistic innovation approach, we often do some upfront scientific analysis to understand the problem, including mathematically. This deep science understanding usually reveals powerful and unexpected insights. The maths of election results described here is no exception.
Please don’t be put off by the reference to maths, I’ll keep it real! What I’ll describe is surprising but simple. It tells us about the likelihoods of specific real-world outcomes (like vote counts), also something about what makes things popular on the internet, explains the 80:20 principle used in business, and more. And amazingly, it all arose from noticing which pages in a book got dirty – more about that later…
But first, a quick quiz: if you listed all the numbers that appear in today’s copy of the Wall Street Journal, what fraction of them do you think would begin with the digit “1”? And with a “9”?
I suspect you’d tell me 1/9 for both, right? Surprisingly, you’d be wrong. About 30% of them will begin with 1, and only about 4% begin with a 9! What??? Go check, if you like!
This is called Benford’s Law. It predicts that sets of numbers which represent real things (prices, lengths, populations, distances, electoral counts) mostly begin with the digit 1 (e.g. 19, 176, 1256), and with decreasing “popularity” with 2, 3, 4, etc. It sounds crazy at first (to me too). By contrast, if you roll a dice, the numbers are all equally likely to appear (at 1/6 likelihood). The key thing about numbers representing real things is that they aren’t random like a dice roll, they “get there” from smaller numbers. Populations grow, prices increase, votes trickle in, etc.
Here’s a simplified way to make sense of this: if I start from one unit of something, and grow it by 10% each time, it will take about 7 “grows” to get to 2 units, but only another 18 “grows” to get all the way to 10 units. Growing things “spend more of their life” in the 1s than anywhere else, and so on though the 2s etc.
And here’s some more mind-warp: it doesn’t matter what units you measure the things in (inches, kilometres, microns), Benford’s Law still applies! These systems of numbers are called “scale invariant”. For purists, I should probably mention that Benford’s Law only applies to parameters that span many orders of magnitude (e.g. tens to millions), so it won’t apply to the ages of people in your office, unless you unusually have lots of teenagers!
All very strange, but so what? Well, people faking election counts generally choose what they think of as random numbers, and, in fact, they think numbers beginning with 4, 5, or 6 look most random. Fake results don’t comply with Benford’s Law, as has been seen in the 2009 Iran election results. So, no matter which candidate you’re rooting for, come November 8, you now know that maths can help you feel confident in the outcome of the election.
Well, maybe, more sophisticated fakers know about Benford. I asked our accounts auditors and they do sometimes check figures with Benford’s Law, but thankfully didn’t admit to using it to fake them credibly! Yes, Benford’s Law is useful in business, because it also applies to company accounts. Are you tempted to try it on yours to check them for faking?
Let’s see how this plays out in a seemingly unrelated domain – language. Related to Benford’s Law of Numbers is Zipf’s Law of Language. Zipf says that the most popular word in a language is used 50 times more frequently than the 50th most popular, and 2000 times more than the 2000th most popular. It still blows my mind to wonder how a human construct like language obeys such mathematical rules. Zipf has been used to show that ancient coded manuscripts are “real” codes, and not just gobbledygook.
Back to business: the 80:20 Pareto Principle is another example of scale invariance. It states that roughly 80% of effect is due to only 20% of causes. If we recognise “effect” as real outcomes like sales or wealth, and “causes” as (the number of) entities owning those effects, we can see the parallel with Benford.
And it relates to internet popularity. I mentioned earlier in my Benford explanation the idea of real things “getting there” from smaller things. Internet popularity is created by chains of “likes” and hyperlinks. You can see that popular (linked) things are more likely to be found (via any link in their chain), making them yet more popular, and so on. It’s a positive feedback effect, so a tiny number of memes, videos, songs, blogs, and websites account for the vast majority of web traffic.
How does one of them get far enough above the others to start that climb to stratospheric popularity? Sometimes by your company’s clever marketing campaign, but very often by a random set of views that lift them up. NOW you know why Gangnam Style was such a hit – chance plus linked feedback!
And what about the dirty pages? You may remember logarithms from school maths lessons. If you’re my age, you’ll even remember tables of logs that came in books. Back in the 1880s, an American mathematician noticed that the pages in log tables referring to 1s were more used than the ones at the back (9s), and from this the law was born.
I like irony, so here’s one last reference, to irony in life. The physicist Benford’s work was much later, in the 1930s, so actually he didn’t invent the idea. This is an example of Stigler’s Law, which says that (most) scientific laws are not named after the original discoverer – I won’t dwell on possible reasons for this. And yes, Stigler was not the first to notice it!
Written by G, with thanks to my son’s ace maths teacher, Mr Whelan, for the inspiration.