“Data is like garbage. You'd better know what you are going to do with it before you collect it.”
- Mark Twain
“In God we trust, all others bring data.”
- W. Edwards Deming
The perpetual maelstrom of chaos that is YouTube, seemingly has no idea what I want to watch next. It has a broad idea that I like motor racing, so it wants to provide with AI generated Motorsport rumour slop; it knows that I like music theory, so it tries to give me saccharine pop garbage; occasionally though it brings me an absolute gem, such as it did here:
Link: https://www.youtube.com/watch?v=wzR68Zq4-Rs
The description of this video is that ii is "the entire Beatles album discography but it's just numbers".
The thing is that if you're going to give me a giant string of numbers, then as I work in an accountancy office, then my first inclination is to look at that string of numbers and realise exactly what else you have inadvertently given me. A string of numbers is in fact a data set.
In this case we have a data set which is not random but which is generated from the lyrics of Beatles songs. This means that before we've even begun, we know that the string of numbers is going to exist because it results from a series of contexts. People who are familiar with The Beatles' discography are going to know instantly that "she was just 17", that there are "4000 holes in Blackburn, Lancashire", and that "31" is going to be the collar number of a Policeman is the arresting officer of Maxwell, who is a serial killer.
Since I was given this lovely set of data, what else was I going to do but analyse it?
If we take a list of the frequencies of the various numbers, we find that the following results.
1 - 107
2 - 33
3 - 15
4 - 12
5 - 8
6 - 6
7 - 6
8 - 13
9 - 39
10 - 4
12 - 2
17 - 1
20 - 2
31 - 1
50 - 2
64 - 3
90 - 1
909 - 10
1000 - 1
4000 - 1
1000000 - 2
2000000 - 1
The first thing to notice about this is that apart from the strange spike of 9 at the end, that these numbers follow Benford's Law. That is that the first-digit in this set of numerical data, the leading digit is most likely to be small. In this case, it is 1.
In fact as far as the measures of central tendency go, the Mode of this set is 1, and the Median of this set is 2.
Things get weird when you look at the other measure of central tendency - the Mean.
If you take the mean of the whole data set, then you get: 14762.871
This is purely because of just three outliers in the data which are five orders of magnitude larger than the majority of the data set.
If you take out just those three pieces of data and also take out three 1s then the mean drops to just 58.
If you then remove the three and four digits from the data set an then the mean drops to just 5.8.
The two anomalies which change this data set from being a normal data set into whatever this is, are "8 Days a Week" and "Revolution 9"; where one of the songs is making use of hyperbole and the other... is bonkers to the point where I do not know if it even fits the definition of "music".
That is to be expected when most songs written by humans, for humans, talking about their relationship with other humans, is going to classify those humans as special. The "one" in relation to someone, is very obviously going to be talking about a relationship which is intimate and/or romantically entwined. The corollary that if there is the "one" then in a couple a song is going to be talking about us "two".
In fact, so incredibly obvious is it that that songs for humans that describe relationships with other humans, that we should naturally expect that more than half of the data set results from those terms. The numbers 1 and 2 account for 140 of 270 numbers, or 51% of the data.
Also embedded in this data set are the unavoidable certain things in life of Death and Taxes. "1 for you and 19 for me" refers to the marginal rate of taxation of 95% which George Harrison found as a result of being part of the biggest band in the world, and Paul's question of "Will you still need me, will you still feed me, when I'm 64?" stares at mortality and the ceaseless foot of time silently stealing swiftly by.
Given that this is The Beatles, I can not help but look at the comment to see one of the commenters, who I believe is a spokesperson for all of us when they said:
"We’re all here for that epic 9 solo"
- Mapoleo (user)
Indeed.