live draw

A throwback to Part I and why live draws can absolutely be rigged.

When I heard the news that the first day of the Democratic debates on July 30 featured only white candidates and that all of the non-white candidates were scheduled for the second day, I knew something was off (I wasn’t the only one who had suspicions). Admittedly, I hadn’t been following the debates very closely, but my gut told me that even in such a large primary field, there were enough minority candidates that the likelihood of such an outcome happening by pure chance was quite slim.

I decided to get to the bottom of things, despite combinatorics being one of my weakest areas in math growing up. To start, I had to confirm the number of white vs. non-white candidates in the debate field. I quickly found out that there were only 5 non-white candidates: Kamala Harris, (black), Cory Booker (black), Julián Castro (Latino), Andrew Yang (Asian), and Tulsi Gabbard (Pacific Islander).

A First Pass

If CNN randomly selected each candidate and their debate day, then we can calculate the total number of ways that 20 candidates can be divided into two groups. Assuming that order matters (i.e. having only white candidates on the first day of debates is different from having only white candidates on the second day), then there are a total of $\binom{20}{10}=184,756$ possible combinations. Out of those, there are $\binom{15}{10} \times \binom{5}{0}=3,003$ ways to choose only white candidates on the first day. Therefore, the probability of featuring only white candidates on the first day is $\frac{3,003}{184,756}=1.63\%$ . Not very likely, eh?

CNN, What Were You Thinking?

Interestingly enough, CNN did NOT use a purely random selection process, instead electing to use a somewhat convoluted three-part draw “to ensure support for the candidates [was] evenly spread across both nights.” The 20 candidates were first ordered based on their rankings in the latest public polling, and then divided into three groups: Top 4 (Biden, Harris, Sanders, Warren), Middle 6 (Booker, Buttigieg, Castro, Klobuchar, O’Rourke, Yang), and Bottom 10 (Bennet, Bullock, de Blasio, Delaney, Gabbard, Gillibrand, Hickenlooper, Inslee, Ryan, Williamson).

The 3 Initial Groups and Final Debate Lineups, in Alphabetical Order

“During each draw, cards with a candidate’s name [were] placed into a dedicated box, while a second box [held] cards printed with the date of each night. For each draw, the anchor [retrieved] a name card from the first box and then [matched] it with a date card from the second box.”
CNN

In other words, CNN performed a random selection within each of the three groups, and the three draws were independent events.

A New Methodology

To calculate our desired probability under the actual CNN methodology, we need to figure out the likeliness of having only white candidates on the first day for each of the three groups. We can then multiply these probabilities together since the events are independent. For the Top 4 (where Harris is the only non-white candidate), there are $\binom{4}{2}=6$ total combinations, and $\binom{3}{2} \times \binom{1}{0}=3$ ways to choose only white candidates on the first day. Therefore, the probability of featuring only white candidates on the first day is $\frac{3}{6}=50\%$ .

For the Bottom 10 (where Gabbard is the only non-white candidate), there are $\binom{10}{5}=252$ total combinations, and $\binom{9}{5} \times \binom{1}{0}=126$ ways to choose only white candidates on the first day. Therefore, our desired probability is $\frac{126}{252}=50\%$ .

It should make sense that the probability is 50% for both the Top 4 and Bottom 10, precisely because there is exactly one candidate of color in each group. Think about it for a second: in both scenarios, the non-white candidate either ends up debating on the first day or the second day, hence 50%.

The Middle 6 is where it gets interesting. There are exactly 3 white candidates and 3 non-white candidates. This yields $\binom{6}{3}=20$ total combinations, but only $\binom{3}{3} \times \binom{3}{0}=1$ way to choose only white candidates on the first day, or a probability of just $\frac{1}{20}=5\%$ .

Since the three draws are independent events, we can simply multiply the probabilities to get to our desired answer: $50\% \times 50\% \times 5\% = 1.25\%$ . Even lower than the 1.63% from our first calculation!

One More Twist

Even a casual observer may have noticed that although the first day of debates featured an all-white field, Democratic front-runner Joe Biden was drawn on the second day. This conveniently set up what many media outlets touted as a “rematch” with Senator Kamala Harris, with CNN going so far as comparing the match-up to the “Thrilla in Manila” (I wish I were joking).

The probability of have only white candidates on the first day AND Joe Biden on the second day is $16.67\% \times 50\% \times 5\% = 0.42\%$ . The only difference between this scenario and the previous one is that within the Top 4, there is only one way to draw both Biden and Harris on the second day out of a total of six possible combinations: $\frac{1}{6}=16.67\%$ .

Validating with Monte Carlo

I wasn’t 100% certain about my mathematical calculations at this point, so I decided to verify them using Monte Carlo simulations. Plus, this wouldn’t be a “Fun with Excel” post if we didn’t let Excel do some of the heavy lifting 🙂

I set up a series of random number generators to simulate CNN’s drawing procedure, keeping track of whether Scenario 1 (only white candidates on the first day) or Scenario 2 (only white candidates on the first day AND Joe Biden on the second day) was fulfilled in each case. Excel’s row limit only let me run 45,000 draws simultaneously, which I then repeated 100 times and graphed as box and whisker plots below:

The simulations yielded an average of 1.26% for Scenario 1 and 0.42% for Scenario 2, thus corroborating the previously calculated theoretical probabilities of 1.25% and 0.42%.

Accurate Portrayal of My Reaction Whenever One of My Crazy Excel Experiments Ends up Actually Working

Concluding Thoughts

Numbers don’t lie, and they lead me to conclude that the CNN Democratic Debate Draw was not truly random. The million dollar question, of course, is why? What does CNN gain from having only white candidates on the first day and Joe Biden on the second day (along with all the minority candidates)? As I don’t intend for my blog to be an outlet for my personal political views, I’ll leave out any “conspiracy” theories and leave them as an exercise for you, the reader.

As always, you can find my work here.

This is Post #20 of the “Fun with Excel” series. For more content like this, please click here.

The Fifth Round Draw of the 2016/17 Emirates FA Cup was rigged.

Bold statement (literally), although that sentence probably meant nothing to anyone who doesn’t follow English Football (re: soccer) and the FA Cup in particular.

A quick introduction to the FA Cup competition, courtesy of Wikipedia (emphasis mine):

The FA Cup, known officially as The Football Association Challenge Cup, is an annual knockout association football competition in men’s domestic English football. First played during the 1871–72 season, it is the oldest association football competition in the world. For sponsorship reasons, from 2015 through to 2018 it is also known as The Emirates FA Cup.

The competition is open to any eligible club down to Levels 10 of the English football league system – all 92 professional clubs in the Premier League and the English Football League (Levels 1 to 4), and several hundred “non-league” teams in Steps 1 to 6 of the National League System (Levels 5 to 10). The tournament consists of 12 randomly drawn rounds followed by the semi-finals and the final. Entrants are not seeded, although a system of byes based on league level ensures higher ranked teams enter in later rounds – the minimum number of games needed to win the competition ranges from six to fourteen.

In the modern era, only one non-league team has ever reached the quarter finals, and teams below Level 2 have never reached the final. As a result, as well as who wins, significant focus is given to those “minnows” (smaller teams) who progress furthest, especially if they achieve an unlikely “giant-killing” victory.

It’s no secret that when it comes to the FA Cup, “giant-killing” victories are more exciting to the average viewer, and therefore better for TV ratings. Therefore, the tournament organizers are incentivized to create as many “minnow-giant” match-ups as possible. Specifically, this means matching up teams from the top level of the English football league system (more commonly known as the English Premier League, or EPL) with teams from lower levels (2nd Tier = Championship, 3rd Tier = League One, 4th Tier = League Two, 5th Tier = National League, etc.) While match-ups in the first 12 rounds of the tournament are determined using “randomly drawn” balls, it has been shown that such live draw events can be effectively rigged by cooling or freezing certain balls.

This year’s FA Cup Fifth Round Draw provided an interesting case study to test the rigging hypothesis, because out of the 16 teams going into the Fifth Round, 8 of them were from the EPL (Tier 1), while the remaining 8 were all from lower divisions. Coincidentally, the 8 EPL teams just happened to get drawn against the 8 non-EPL teams, conveniently leading to the maximum number of 8 “minnow-giant” match-ups. This result should seem suspicious even if you are not familiar with probability theory, but to illustrate just how unlikely such a result is, I will walk through the math.

In order to calculate the probability of the aforementioned result, we first need to figure out the total number of match-ups (i.e. pairs) that can be arranged among a group of 16 teams. As with most problems in mathematics, there is more than one solution, but perhaps the most intuitive one is this: Take one of the 16 teams at random. That first team can be paired up with 15 possible other teams. After a pair is made, 14 teams will remain. Again, we take one of the 14 teams at random. This team can be paired up with 13 possible other teams. By repeating this logic, we see that there are a total of 15x13x11x9x7x5x3x2x1=2,027,025 unique pairs. It turns out that mathematicians already have a function that simplifies this exact result: the double factorial (expressed as n!!). Therefore, we can generalize that for any group of n objects, the number of unique pairings is equal to (n-1)!!

To calculate the total number of ways to draw exactly 8 “minnow-giant” match-ups, we can imagine putting all 8 of the EPL teams in a line. Since we are looking to match the EPL teams one-to-one with the non-EPL teams, the question becomes: how many different ways can we line up the non-EPL teams so that they are paired up with the EPL teams? The answer to that is simply 8x7x6x5x4x3x2x1=8!=40,320. It is important to understand why we keep the order of the EPL teams unchanged while we only change the order of the non-EPL teams; otherwise, we would be grossly over-counting!

The probability of drawing exactly 8 “minnow-giant” match-ups is therefore 40,320/2,027,025=1.99%, or just a tad under 2%! To verify this, I ran a Monte Carlo simulation involving 50,000 trials, of which 961 trials ended up with exactly 8 “minnow-giant” match-ups, or 1.92%. The below table and chart also show the theoretical probabilities of drawing n “minnow-giant” match-ups, for 0 ≤ n ≤ 8. (Bonus Question: Can you convince yourself why it’s impossible to draw an odd number of “minnow-giant” pairs among a group of 16 teams?)

But wait, it gets even better. Out of the 8 non-EPL teams, 4 teams were from the Championship (2nd Tier league), 2 teams were from League One (3rd Tier), and 2 teams were from the National League (5th Tier). Arsenal, which has been sponsored by Emirates since 2006, ended up drawing Sutton United, one of only two teams (the other being Lincoln City) from the National League (5th Tier). Now, what are the chances that the team that shares a sponsor with the competition itself ends up drawing one of the two easiest (in theory) match-ups available?

The number of ways for Arsenal to draw a National League (5th Tier) team (i.e. either Sutton United or Lincoln City), without any restrictions on how the other match-ups are drawn, is 270,270. We arrive at this number by first assuming Arsenal and Sutton United are already paired off, thus leaving 14 teams reaming. The 14 teams can be paired off in 13!!=135,135 ways without restriction. We can repeat the same reasoning for an Arsenal/Lincoln City pair. Therefore, we double 135,135 to arrive at 270,270. This yields a theoretical probability of 270,270/2,027,025=13.33% (Monte Carlo resulted in 6,620/50,000=13.24%), which is almost 1 in 6. However, this is only the probability of Arsenal drawing a 5th Tier team with no other match-up restrictions. In reality, there were already 8 “minnow-giant” match-ups drawn in the first place.

Therefore, the question becomes: what is the probability that 8 “minnow-giant” match-ups are drawn AND Arsenal draws a 5th Tier team? We already know there are 40,320 possible match-ups for the first part of the requirement. Satisfying both parts of the requirement must result in a number smaller than 40,320. Think of it like this: we start off with the fact that the 8 EPL teams are matched up one-to-one with the 8 non-EPL teams. There are 2 different ways to pair Arsenal with a 5th Tier team (since there are only 2 such teams). Of the remaining teams, there are 7!=5,040 ways to pair them off such that the EPL and non-EPL teams are still matched one-to-one. Therefore, the total number of match-ups satisfying both requirements is 2×7!=10,080. This yields a theoretical probability of 10,080/2,027,025=0.50% (Monte Carlo resulted in 250/50,000=0.50%).

In conclusion, there was only a 0.50% chance that the 2016/17 Emirates FA Cup Fifth Round Draw would lead to exactly 8 “minnow-giant” match-ups AND Arsenal drawing 1 of the 2 National League (5th Tier) teams. The fact that it happened anyway suggests that the drawing process may not have been 100% random.

As always, you can find my back up here. Please note, however, that I had to change all of the Monte Carlo formulas to values and save the file as .xlsb instead of .xlsx, as the file was way too large before (71 MB).

I would also like to give credit to the Chelsea subreddit for inspiring me to explore this topic.

Jeffrey Fan

Random Musings of an Amateur Data Scientist

Rigging Live Draws Part II: The CNN Democratic Debate Draw