Biased Stats in the NBA

One of my favorite NBA-related articles is Tommy Craggs’ “The Confessions Of An NBA Scorekeeper”, which recounts of the experiences of a scorekeeper named Alex in the 1990s. The article highlights the common occurrence of “stat-padding,” or the practice of inflating the stats (e.g., assists, steals, blocks, and rebounds) of players of the home team. As Craggs writes:

Alex quickly found that a scorekeeper is given broad discretion over two categories: assists and blocks (steals and rebounds are also open to some interpretation, though not a lot). “In the NBA, an assist is a pass leading directly to a basket,” he says. “That’s inherently subjective. What does that really mean in practice? The definition is massively variable according to who you talk to. The Jazz guys were pretty open about their liberalities. … John Stockton averaged 10 assists. Is that legit? It’s legit because they entered it. If he’s another guy, would he get 10? Probably not.”

“The Confessions Of An NBA Scorekeeper”

Alex’s comment on Stockton caught my attention. While I was pretty certain stat-padding existed 20 years ago and and still does to this day, I was curious as to what degree the NBA’s all-time career leaders benefited from this bias.

Methodology

I pulled the top 25 all-time career leaders for each of the following categories from Basketball Reference: points, assists, steals, blocks, and rebounds. This yielded a total of 78 unique players, as some players were ranked on the all-time list in multiple categories. I then pulled the stats for each player, split by home vs. road games.

Note that steals and blocks and were not officially recorded in the NBA until the 1973–74 season. Furthermore, not all statistics were broken out by home vs. road splits until more recently, which means the analysis of bias could not be completed for many of the older stars, including Bill Russell, Kareem Abdul-Jabbar, Magic Johnson, Moses Malone, Oscar Robertson, and Wilt Chamberlain.

Setting the Benchmark (Points)

It’s fairly well-established that teams play better at home than on the road. To confirm this, I measured each player’s points per home game and compared it to his points per road game. On average, players scored 2.8% more points per game at home than on the road, with a standard deviation of 5.1%.

Blue = Average; Green = +/- 1 SD; Red = +/- 2 SD

Strong positive outliers included Tree Rollins, Shawn Bradley, and Mookie Blaylock1Two of these players also belong on the all-time greatest names list. I’ll let you guess which., who were all more than two standard deviations higher than the mean. Jermaine O’Neal was the only negative outlier more than two SDs lower than the mean. Notably, the top six career point leaders were all below average.

I then compared each player’s home vs. road performance for assists, steals, blocks, and rebounds relative to his home vs. road scoring performance. For example, if a player scored, on average, 5% more points per game at home than on the road and grabbed 10% more rebounds per game at home than on the road, then the relative home bias of his rebounding performance would be \frac{1.10}{1.05}-1=4.76\%.

The underlying assumption here is that in the absence of any stat-padding, there should not be significant relative home bias in any of the statistical categories. However, given Alex’s scorekeeping experiences, we would expect to see some degree of bias in all four categories, especially assists and blocks.

My analysis revealed the following results:

Assists

Blue = Average; Green = +/- 1 SD; Red = +/- 2 SD

Relative to the baseline (i.e. points), assists showed a relative home bias of 6.4%, with a standard deviation of 9.6%.

Almost everyone fell within two SDs of the mean, although Theo Ratliff was an extreme positive outlier, albeit on small volume. Note that John Stockton, the all-time assists leader by a long shot, had a relative home bias of only 3.6%, indicating a very low likelihood of stat-padding. On the other hand, Jason Kidd, the second all-time assists leader, had a relative home bias of 16.5%.

Of course, a high relative home bias doesn’t necessarily mean that there was stat-padding going on. Kidd also had an average home vs. road point performance of negative 4.5%. One explanation is that he played more as a facilitator at home while having to shoulder more of the scoring burden while on the road.

Steals

Blue = Average; Green = +/- 1 SD; Red = +/- 2 SD

Relative to the baseline (i.e. points), steals showed a relative home bias of 3.2% (half that of assists), with a standard deviation of 9.3% (roughly the same as that of assists).

Again, almost everyone fell within two SDs of the mean, although Manute Bol was an extreme positive outlier on small volume. Alvin Robertson was also more than two SDs higher than the mean, on much higher volume. Remarkably, John Stockton, also the all-time steals leader by a decent margin, had a relative home bias of only 2.0%, indicating once again that he was the real deal.

By now, you may have noticed that Dikembe Mutombo was more than two SDs below the mean for both assists and steals. It doesn’t really make sense for stat-padding to go the other way, so the likely explanation for negative bias is simply underperformance. The reason why the numbers look so extreme in this case is due to small sample size. Mutombo was an all-time rebounding great who averaged 10.7 boards at home and 10.0 on the road. However, he also only scored 10.3 points at home and 9.4 on the road (benchmark of 9.8%). He had so few assists (1.0 home vs. 1.1 road) and steals (0.4 vs. 0.5) that very small absolute differences in home and road performance led to large percentage biases (-15.4% and -25.4%) relative to his baseline.

Blocks

Blue = Average; Green = +/- 1 SD; Red = +/- 2 SD

Relative to the baseline (i.e. points), blocks showed a relative home bias of 12.3% (nearly double that of assists), with a standard deviation of 19.7% (also nearly double that of assists). Blocks were by far the most biased statistic, as well as the most variable.

There were a handful of players that fell more than two SDs below the mean, while Fat Lever2Another all-time great name., Alvin Robertson, and John Stockton were all more than two SDs above the mean (on low volume). Both Robertson and Stockton had a relative home bias of nearly 80%, or almost 3.5 SDs above the average! So while the Utah Jazz scorekeepers may not have been padding Stockton’s assists and steals, they almost certainly were boosting his blocks…3Take that, Stockton! I finally got you 🙂

Interestingly, David Robinson and Tim Duncan, who both played for the San Antonio Spurs for the entirety of their careers, were between one to two SDs above the mean on relatively high volumes!4Alvin Robertson also played for five season for the Spurs at the beginning of his career.

Rebounds

Blue = Average; Green = +/- 1 SD; Red = +/- 2 SD

Relative to the baseline (i.e. points), rebounds showed a relative home bias of 1.4% (one-fifth of that of assists), with a standard deviation of 4.9% (half that of assists). In contrast to blocks, rebounds were by far the least biased statistic, as well as the least variable.

Given the lower variability, it’s not too surprising that almost all players fell within two SDs of the mean, with no positive outliers and only three negative outliers (on low volume).

Closing Thoughts

In short, the results confirmed our initial expectations. Blocks (12.3% average relative home bias, 19.7% standard deviation) and assists (6.4% Avg, 9.6% SD) showed the most evidence of bias, whereas steals (3.2% Avg, 9.3% SD) and rebounds (1.4% Avg, 4.9% SD) showed the least.

At first, I was surprised that blocks showed significantly more bias than assists. Conceptually, assists felt like a much more subjective stat to record, but the data seemed to suggest the opposite. However, I soon realized this was because of the “Mutombo problem” of small sample size. Simply put, assists occur with a lot more frequency than blocks in the NBA. While many great players average more than five assists a game over the course of their careers (the truly elite average over eight!), very few ever manage to block more than three shots a game.

It’s not uncommon for point guards like Stockton and Kidd to average fewer than 0.5 blocks per game, and in certain cases, significantly fewer than that (e.g., Steve Nash and Tony Parker averaged fewer than 0.1 blocks per game). Therefore, even if there were the same amount of absolute stat-padding for assists and blocks, the relative impact would be much greater for blocks. That is to say, a scorekeeper giving a player an “extra” assist or two every home game when the player is averaging eight or ten assists is going to have a much smaller impact than gifting an “extra” block every few home games if that player is averaging a measly 0.1 blocks a game.

As always, you can find my work here.

This is Post #21 of the “Fun with Excel” series. For more content like this, please click here.

Rigging Live Draws Part II: The CNN Democratic Debate Draw

A throwback to Part I and why live draws can absolutely be rigged.

When I heard the news that the first day of the Democratic debates on July 30 featured only white candidates and that all of the non-white candidates were scheduled for the second day, I knew something was off (I wasn’t the only one who had suspicions). Admittedly, I hadn’t been following the debates very closely, but my gut told me that even in such a large primary field, there were enough minority candidates that the likelihood of such an outcome happening by pure chance was quite slim.

I decided to get to the bottom of things, despite combinatorics being one of my weakest areas in math growing up. To start, I had to confirm the number of white vs. non-white candidates in the debate field. I quickly found out that there were only 5 non-white candidates: Kamala Harris, (black), Cory Booker (black), Julián Castro (Latino), Andrew Yang (Asian), and Tulsi Gabbard (Pacific Islander).

A First Pass

If CNN randomly selected each candidate and their debate day, then we can calculate the total number of ways that 20 candidates can be divided into two groups. Assuming that order matters (i.e. having only white candidates on the first day of debates is different from having only white candidates on the second day), then there are a total of \binom{20}{10}=184,756 possible combinations. Out of those, there are \binom{15}{10} \times \binom{5}{0}=3,003 ways to choose only white candidates on the first day. Therefore, the probability of featuring only white candidates on the first day is \frac{3,003}{184,756}=1.63\%. Not very likely, eh?

CNN, What Were You Thinking?

Interestingly enough, CNN did NOT use a purely random selection process, instead electing to use a somewhat convoluted three-part draw “to ensure support for the candidates [was] evenly spread across both nights.” The 20 candidates were first ordered based on their rankings in the latest public polling, and then divided into three groups: Top 4 (Biden, Harris, Sanders, Warren), Middle 6 (Booker, Buttigieg, Castro, Klobuchar, O’Rourke, Yang), and Bottom 10 (Bennet, Bullock, de Blasio, Delaney, Gabbard, Gillibrand, Hickenlooper, Inslee, Ryan, Williamson).

The 3 Initial Groups and Final Debate Lineups, in Alphabetical Order

“During each draw, cards with a candidate’s name [were] placed into a dedicated box, while a second box [held] cards printed with the date of each night. For each draw, the anchor [retrieved] a name card from the first box and then [matched] it with a date card from the second box.”

CNN

In other words, CNN performed a random selection within each of the three groups, and the three draws were independent events.

A New Methodology

To calculate our desired probability under the actual CNN methodology, we need to figure out the likeliness of having only white candidates on the first day for each of the three groups. We can then multiply these probabilities together since the events are independent. For the Top 4 (where Harris is the only non-white candidate), there are \binom{4}{2}=6 total combinations, and \binom{3}{2} \times \binom{1}{0}=3 ways to choose only white candidates on the first day. Therefore, the probability of featuring only white candidates on the first day is \frac{3}{6}=50\%.

For the Bottom 10 (where Gabbard is the only non-white candidate), there are \binom{10}{5}=252 total combinations, and \binom{9}{5} \times \binom{1}{0}=126 ways to choose only white candidates on the first day. Therefore, our desired probability is \frac{126}{252}=50\%.

It should make sense that the probability is 50% for both the Top 4 and Bottom 10, precisely because there is exactly one candidate of color in each group. Think about it for a second: in both scenarios, the non-white candidate either ends up debating on the first day or the second day, hence 50%.

The Middle 6 is where it gets interesting. There are exactly 3 white candidates and 3 non-white candidates. This yields \binom{6}{3}=20 total combinations, but only \binom{3}{3} \times \binom{3}{0}=1 way to choose only white candidates on the first day, or a probability of just \frac{1}{20}=5\%.

Since the three draws are independent events, we can simply multiply the probabilities to get to our desired answer: 50\% \times 50\% \times 5\% = 1.25\%. Even lower than the 1.63% from our first calculation!

One More Twist

Even a casual observer may have noticed that although the first day of debates featured an all-white field, Democratic front-runner Joe Biden was drawn on the second day. This conveniently set up what many media outlets touted as a “rematch” with Senator Kamala Harris, with CNN going so far as comparing the match-up to the “Thrilla in Manila” (I wish I were joking).

The probability of have only white candidates on the first day AND Joe Biden on the second day is 16.67\% \times 50\% \times 5\% = 0.42\%. The only difference between this scenario and the previous one is that within the Top 4, there is only one way to draw both Biden and Harris on the second day out of a total of six possible combinations: \frac{1}{6}=16.67\%.

Validating with Monte Carlo

I wasn’t 100% certain about my mathematical calculations at this point, so I decided to verify them using Monte Carlo simulations. Plus, this wouldn’t be a “Fun with Excel” post if we didn’t let Excel do some of the heavy lifting 🙂

I set up a series of random number generators to simulate CNN’s drawing procedure, keeping track of whether Scenario 1 (only white candidates on the first day) or Scenario 2 (only white candidates on the first day AND Joe Biden on the second day) was fulfilled in each case. Excel’s row limit only let me run 45,000 draws simultaneously, which I then repeated 100 times and graphed as box and whisker plots below:

Min: 1.14%, Max: 1.36%, Average: 1.26%
Min: 0.34%, Max: 0.52%, Average: 0.42%

The simulations yielded an average of 1.26% for Scenario 1 and 0.42% for Scenario 2, thus corroborating the previously calculated theoretical probabilities of 1.25% and 0.42%.

Accurate Portrayal of My Reaction Whenever One of My Crazy Excel Experiments Ends up Actually Working

Concluding Thoughts

Numbers don’t lie, and they lead me to conclude that the CNN Democratic Debate Draw was not truly random. The million dollar question, of course, is why? What does CNN gain from having only white candidates on the first day and Joe Biden on the second day (along with all the minority candidates)? As I don’t intend for my blog to be an outlet for my personal political views, I’ll leave out any “conspiracy” theories and leave them as an exercise for you, the reader.

As always, you can find my work here.

This is Post #20 of the “Fun with Excel” series. For more content like this, please click here.

Fun with Excel #19 – Defending the World Cup

The World Cup is undoubtedly one of the most prestigious tournaments in all of sports. Although the competition has been held 21 times since its debut in 1930, only eight national teams have won it: Brazil (5 times), Germany (4), Italy (4), Argentina (2), France (2), Uruguay (2), England (1), and Spain (1). Only twice has a World Cup champion successfully defended the title (Italy in 1938 and Brazil in 1962). This is not too surprising, given that the tournament is held once every four years, which can be a lifetime in professional sports.

Summary of World Cup Results, 1930–2018
Points Per Game for Defending Champions (Red Bars = Eliminated in the First Round / Group Stage)

The above charts show the performance of every defending champion since 1930, as well as their average points per game (Win = 3 points, Draw = 1 point, Loss = 0 points). Interestingly, since the World Cup expanded to 32 teams in 1998, the defending champion has lost in the group stage (i.e. failed to reach the knockout stage) in four out of the last six World Cups, including the last three tournaments.

One potential explanation for these early exits is the increase in competition over the last 20 years, both from the higher number of teams participating in the World Cup, as well as the rise in overall skill levels which has led to more parity among nations. Even so, out of the four most recent instances where the defending champions were eliminated in the group stage (France in 2002, Italy in 2010, Spain in 2014, and Germany in 2018), all four countries entered their respective World Cups ranked in the top 20% of all teams. On top of that, all of them had favorable groups from which they were expected to advance. So who suffered the worst group stage exit from a defending champion?

A Slight Detour on Methodology

To analyze each team’s performance, I not only examined their win/loss records, but also how they played relative to expectations. I accomplished the latter by comparing each defending champion’s Elo rating to the ratings of all the nations competing in World Cup. I also compared each team’s Elo to the ratings of the other three nations in its group to determine how difficult it would be for each team to advance from the group stage.

Used widely across sports, board games, and video games, the Elo rating system calculates the relative skill of players (or teams) based on match outcomes.

After every game, the winning player takes points from the losing one. The difference between the ratings of the winner and loser determines the total number of points gained or lost after a game. In a series of games between a high-rated player and a low-rated player, the high-rated player is expected to score more wins. If the high-rated player wins, then only a few rating points will be taken from the low-rated player. However, if the lower-rated player scores an upset win, many rating points will be transferred.

Wikipedia

In soccer, the rating system is further modified to account for the goal difference of the match, such that a 7–1 victory will net more rating points than a 2-1 win. Thus, we expect nations with higher pre-World Cup Elo ratings to perform better than those with lower ratings, which the chart below illustrates.

The relationship isn’t perfect, but we can see that teams with higher Pre-World Cup Elo ratings tend to perform better during the tournament

We’re more interested in the outliers on the right side of the chart, so without further adieu, here is my ranking for the “worst of the worst” World Cup defenses.

The Hall of Shame

4. Italy (2010): 0W/2D/1L, -1GD

Italy entered the tournament with the sixth highest Elo (1938), 142 above the average of 1796
Italy had the fifth easiest group (out of eight) in the first stage of the tournament

In 2010, Italy (1938 Pre-WC Elo) drew Paraguay 1–1 (-14 Elo), drew New Zealand 1–1 (-24 Elo), and lost to Slovakia 2–3 (-50 Elo) in Group F, for a cumulative loss of 88 Elo. In doing so, it gained the dubious honor of being the only defending champion to be eliminated in the first round twice (1950 was the first time). That said, compared to other early exits, this one was slightly more forgivable. For one, Italy entered the World Cup ranked sixth by Elo, by far the weakest of the four most recent defending champions that failed to advance past the group stage. Italy also had the fifth easiest group (out of the initial eight), the only defending champion to start off in the bottom half of group difficulty.

3. Spain (2014): 1W/0D/2L, -3GD

Spain entered the tournament with the second highest Elo (2109), 267 above the average of 1842
Spain had the third easiest group (out of eight) in the first stage of the tournament

In 2014, Spain (2109 Pre-WC Elo) lost to the Netherlands 1-5 (-75 Elo) in a re-match of the 2010 Finals, lost to Chile 0-2 (-57 Elo), and beat Australia 3-0 (+16 Elo) in Group B, for a cumulative loss of 116 Elo. Spain entered the World Cup with the second highest Elo overall and played in the third easiest group, but still found themselves mathematically eliminated after only two games, the quickest exit for a defending champion since Italy in 1950 tournament. Pretty embarrassing, but still not enough to make our Top 2…

2. Germany (2018): 1W/0D/2L, -2GD

Germany entered the tournament with the second highest Elo (2077), 249 above the average of 1828
Germany had the second easiest group (out of eight) in the first stage of the tournament

In 2018, Germany (2077 Pre-WC Elo) lost to Mexico 0-1 (-47 Elo), beat Sweden 2-1 (+14 Elo), and lost to South Korea 2-3 (-80 Elo) in Group F, for a cumulative loss of 113 Elo. For the first time since 1938, Germany did not advance past the first round. Although this remarkable streak was bound to end at some point, almost no one would have thought that 2018 would be the year. After all, Germany entered the World Cup with the second highest Elo and also played in the second easiest group.

Unlike the Spanish team in 2014, which appeared to be on its last legs after a remarkable run from 2008 to 2012 during which it won back-to-back European titles and a World Cup, the German team was seemingly still near the height of its powers. Indeed, their early “exit at group stage was greeted with shock in newspapers around the world,” according to The Guardian.

1. France (2002): 0W/1D/2L, -3G

France entered the tournament with the highest Elo (2096), 274 above the average of 1822
France had the easiest group (out of eight) in the first stage of the tournament

In 2002, France (2096 Pre-WC Elo) lost to Senegal 0-1 (-54 Elo), drew Uruguay 0-0 (-19 Elo), and lost to Denmark 0-2 (-61 Elo) in Group A, for a cumulative loss of 134 ELO. Shockingly, the French failed to win a single match despite starting the World Cup with the highest Elo and playing in the easiest group. Perhaps more embarrassing, the team bowed out without scoring a single goal, good enough for the worst performance ever by a defending champion.

An Important Caveat

Of course, one should never draw conclusions solely from data, because knowing the context surrounding the data is just as crucial. As Gareth Bland rightly points out in his article detailing the story behind France’s failure in the 2002, several factors contributed to the team’s early exit besides mere under-performance:

  1. France’s star player, Zinedine Zidane, regarded as one of the greatest players of all time, injured himself in a friendly less than a week before the team’s first match against Senegal. He returned for France’s third match against Denmark, but was clearly not 100%.
  2. Thierry Henry, considered one of the best strikers to ever play the game, committed a poor challenge in the second match against Uruguay and received a red card. Although France managed to scrape a tie while down one man, Henry was forced to miss the third match because of the red card.
  3. Many members of France’s old guard like Marcel Desailly, Frank Leboeuf, and Youri Djorkaeff were pushing their mid-thirties. Although not old by any stretch of the imagination, they were undoubtedly past their prime as players.
  4. On the other hand, the team’s younger players like Patrick Vieira, Sylvain Wiltord, and Henry found themselves mentally and physically exhausted after a successful but grueling campaign with their domestic club Arsenal.

While none of these reasons should pass as excuses (after all, other teams had to deal with injuries and fatigue as well), this perfect storm of events helps to explain why France so drastically under-performed relative to their Elo rating. As Bland writes, the team’s “return home was not met with disgrace…Rather, it was an acknowledgement that some legs had got tired, while some needed to be moved on, while those of the maestro must just be left to heal.”

Lessons Learned?

One last observation is that none of the four defending champions won their opening matches (Italy drew, and the other three lost). With every match being so critical to advancing, a poor start likely put a tremendous amount of pressure on the defending champions and affected their remaining two matches. Perhaps the defending champions failed because of their relatively easy groups, which led them to become complacent going into the first match. In that case, the biggest takeaway is to not be overconfident, advice that I hope team France will heed going into Qatar 2022.

As always, you can find my work here.

Fun with Excel #18 – The Birthday Problem

Meeting someone with the same birthday as you always seems like a happy coincidence. After all, with 365 (366 including February 29th) unique birthdays, the chances of any two people being born on the same day appear to be small. While this is indeed true for any two individuals picked at random, what happens when we add a third, a fourth, or a fifth person into the fray? At what point does it become inevitable that some pair of people will share a birthday?

Of course, the only way to guarantee a shared birthday among a group of people is to have at least 367 people in a room. Take a moment to think about that statement, and if you’re still stumped, read this. Now that we know 100% probability is reached with 367 people, how many people would it take to reach 50%, 75%, or 90% probability? If you think the answer is 184, 275, and 330, then you would be quite wrong. Here’s why:

Let’s assume that all birthdays are equally likely to occur in a given population and that leap years are ignored. To paint a more vivid picture in our minds, let’s further assume that we have a large room and that people are entering the room one at a time while announcing their birthdays for everyone to hear. The first person enters the room and announces that his/her birthday is January 1st (we can choose any date we want without loss of generality). The second person has a 364/365 probability of having a different birthday from the first person and therefore a 1 - 364/365 probability of having the same birthday. The third person has a (364/365) \times (363/365) probability of having a different birthday from either of the first two people and therefore a 1 - (364/365) \times (363/365) probability of having the same birthday as either of first two people. The fourth person has a (364/365) \times (363/365) \times (362/365) probability of having a different birthday from any of the first three people and therefore a 1 - (364/365) \times (363/365) \times (362/365) probability of having the same birthday as any of first three people. To generalize, the probability of the nth person being the first person to have the same birthday as any of the n-1 people before him/her is:

P(n) = 1- \frac{364}{365} \times \frac{363}{365} \times \frac{362}{365} \times \cdots \times \frac{365-n+1}{365}

Note that the yellow series in the above graph grows exponentially rather than linearly, with the probability reaching 50% at just 23 people. 75% and 90% probability are reached at 32 and 41 people, respectively. By the time 70 people are in the room, there is a greater than 99.9% chance that two individuals will have the same birthday!

As the number of people increases, P(n) switches from exponential to logarithmic, with each additional personal providing less incremental probability than the previous. Interestingly, the 20th person provides the greatest incremental probability, as seen in the above table.

In contrast, the probability that any one person has a specific birthday is denoted by the much simpler equation:

P_1(n) = 1 - \left( \frac{364}{365} \right)^n

This relationship, which is highlighted by the green series in the graph, grows at a much slower rate than the yellow series. In comparison, it takes 253 people for P_1(n) to exceed 50%.

Testing Our Assumptions

One key assumption we made in the above exercise was that all birthdays (aside from February 29th) occur with equal probability. But how correct is that assumption? Luckily, Roy Murphy has run the analysis based on birthdays retrieved from over 480,000 life insurance applications. I won’t repeat verbatim the contents of his short and excellent article, but I did re-create some charts showing the expected and actual distribution of birthdays. The bottom line is that the actual data show more variation (including very apparent seasonal variation by month) than what is expected through chance.

Implications on Birthday Matching

Now that we know that birthdays in reality are unevenly distributed, it follows that matches should occur more frequently than we expect. To test this hypothesis, I ran two Monte Carlo Simulations with 1,000 trials each to test the minimum number of people required to get to a matching birthday: the first based on expected probabilities (each birthday with equal likelihood of 1/365.25 and February 29th with likelihood of 1/1461) and the second based on actual probabilities (sourced from the Murphy data set).

Note that the distributions of both simulations are skewed positively (i.e. to the right). The results appear to corroborate our hypothesis, as evidenced by the gray line growing at a faster rate than the yellow line in the above graph. Indeed, the average number of people required for a birthday match is 24.83 under the simulation using actual probabilities, slightly lower than the 25.06 using expected probabilities. However, the difference is not very significant; therefore, our assumption of uniformly distributed birthdays works just fine.

As always, you can find my work here.

Fun with Excel #16 – Rigging Live Draws: The Emirates FA Cup

The Fifth Round Draw of the 2016/17 Emirates FA Cup was rigged.

Bold statement (literally), although that sentence probably meant nothing to anyone who doesn’t follow English Football (re: soccer) and the FA Cup in particular.

A quick introduction to the FA Cup competition, courtesy of Wikipedia (emphasis mine):

The FA Cup, known officially as The Football Association Challenge Cup, is an annual knockout association football competition in men’s domestic English football. First played during the 1871–72 season, it is the oldest association football competition in the world. For sponsorship reasons, from 2015 through to 2018 it is also known as The Emirates FA Cup.

The competition is open to any eligible club down to Levels 10 of the English football league system – all 92 professional clubs in the Premier League and the English Football League (Levels 1 to 4), and several hundred “non-league” teams in Steps 1 to 6 of the National League System (Levels 5 to 10). The tournament consists of 12 randomly drawn rounds followed by the semi-finals and the final. Entrants are not seeded, although a system of byes based on league level ensures higher ranked teams enter in later rounds – the minimum number of games needed to win the competition ranges from six to fourteen.

In the modern era, only one non-league team has ever reached the quarter finals, and teams below Level 2 have never reached the final. As a result, as well as who wins, significant focus is given to those “minnows” (smaller teams) who progress furthest, especially if they achieve an unlikely “giant-killing” victory.

It’s no secret that when it comes to the FA Cup, “giant-killing” victories are more exciting to the average viewer, and therefore better for TV ratings. Therefore, the tournament organizers are incentivized to create as many “minnow-giant” match-ups as possible. Specifically, this means matching up teams from the top level of the English football league system (more commonly known as the English Premier League, or EPL) with teams from lower levels (2nd Tier = Championship, 3rd Tier = League One, 4th Tier = League Two, 5th Tier = National League, etc.) While match-ups in the first 12 rounds of the tournament are determined using “randomly drawn” balls, it has been shown that such live draw events can be effectively rigged by cooling or freezing certain balls.

This year’s FA Cup Fifth Round Draw provided an interesting case study to test the rigging hypothesis, because out of the 16 teams going into the Fifth Round, 8 of them were from the EPL (Tier 1), while the remaining 8 were all from lower divisions. Coincidentally, the 8 EPL teams just happened to get drawn against the 8 non-EPL teams, conveniently leading to the maximum number of 8 “minnow-giant” match-ups. This result should seem suspicious even if you are not familiar with probability theory, but to illustrate just how unlikely such a result is, I will walk through the math.

In order to calculate the probability of the aforementioned result, we first need to figure out the total number of match-ups (i.e. pairs) that can be arranged among a group of 16 teams. As with most problems in mathematics, there is more than one solution, but perhaps the most intuitive one is this: Take one of the 16 teams at random. That first team can be paired up with 15 possible other teams. After a pair is made, 14 teams will remain. Again, we take one of the 14 teams at random. This team can be paired up with 13 possible other teams. By repeating this logic, we see that there are a total of 15x13x11x9x7x5x3x2x1=2,027,025 unique pairs. It turns out that mathematicians already have a function that simplifies this exact result: the double factorial (expressed as n!!). Therefore, we can generalize that for any group of objects, the number of unique pairings is equal to (n-1)!!

To calculate the total number of ways to draw exactly 8 “minnow-giant” match-ups, we can imagine putting all 8 of the EPL teams in a line. Since we are looking to match the EPL teams one-to-one with the non-EPL teams, the question becomes: how many different ways can we line up the non-EPL teams so that they are paired up with the EPL teams? The answer to that is simply 8x7x6x5x4x3x2x1=8!=40,320. It is important to understand why we keep the order of the EPL teams unchanged while we only change the order of the non-EPL teams; otherwise, we would be grossly over-counting!

The probability of drawing exactly 8 “minnow-giant” match-ups is therefore 40,320/2,027,025=1.99%, or just a tad under 2%! To verify this, I ran a Monte Carlo simulation involving 50,000 trials, of which 961 trials ended up with exactly 8 “minnow-giant” match-ups, or 1.92%. The below table and chart also show the theoretical probabilities of drawing “minnow-giant” match-ups, for 0 ≤ n ≤ 8. (Bonus Question: Can you convince yourself why it’s impossible to draw an odd number of “minnow-giant” pairs among a group of 16 teams?)


But wait, it gets even better. Out of the 8 non-EPL teams, 4 teams were from the Championship (2nd Tier league), 2 teams were from League One (3rd Tier), and 2 teams were from the National League (5th Tier). Arsenal, which has been sponsored by Emirates since 2006, ended up drawing Sutton United, one of only two teams (the other being Lincoln City) from the National League (5th Tier). Now, what are the chances that the team that shares a sponsor with the competition itself ends up drawing one of the two easiest (in theory) match-ups available?

The number of ways for Arsenal to draw a National League (5th Tier) team (i.e. either Sutton United or Lincoln City), without any restrictions on how the other match-ups are drawn, is 270,270. We arrive at this number by first assuming Arsenal and Sutton United are already paired off, thus leaving 14 teams reaming. The 14 teams can be paired off in 13!!=135,135 ways without restriction. We can repeat the same reasoning for an Arsenal/Lincoln City pair. Therefore, we double 135,135 to arrive at 270,270. This yields a theoretical probability of 270,270/2,027,025=13.33% (Monte Carlo resulted in 6,620/50,000=13.24%), which is almost 1 in 6. However, this is only the probability of Arsenal drawing a 5th Tier team with no other match-up restrictions. In reality, there were already 8 “minnow-giant” match-ups drawn in the first place.

Therefore, the question becomes: what is the probability that 8 “minnow-giant” match-ups are drawn AND Arsenal draws a 5th Tier team? We already know there are 40,320 possible match-ups for the first part of the requirement. Satisfying both parts of the requirement must result in a number smaller than 40,320. Think of it like this: we start off with the fact that the 8 EPL teams are matched up one-to-one with the 8 non-EPL teams. There are 2 different ways to pair Arsenal with a 5th Tier team (since there are only 2 such teams). Of the remaining teams, there are 7!=5,040 ways to pair them off such that the EPL and non-EPL teams are still matched one-to-one. Therefore, the total number of match-ups satisfying both requirements is 2×7!=10,080. This yields a theoretical probability of 10,080/2,027,025=0.50% (Monte Carlo resulted in 250/50,000=0.50%).

In conclusion, there was only a 0.50% chance that the 2016/17 Emirates FA Cup Fifth Round Draw would lead to exactly 8 “minnow-giant” match-ups AND Arsenal drawing 1 of the 2 National League (5th Tier) teams. The fact that it happened anyway suggests that the drawing process may not have been 100% random.

As always, you can find my back up here. Please note, however, that I had to change all of the Monte Carlo formulas to values and save the file as .xlsb instead of .xlsx, as the file was way too large before (71 MB).

I would also like to give credit to the Chelsea subreddit for inspiring me to explore this topic.