Rigging Live Draws Part II: The CNN Democratic Debate Draw

A throwback to Part I and why live draws can absolutely be rigged.

When I heard the news that the first day of the Democratic debates on July 30 featured only white candidates and that all of the non-white candidates were scheduled for the second day, I knew something was off (I wasn’t the only one who had suspicions). Admittedly, I hadn’t been following the debates very closely, but my gut told me that even in such a large primary field, there were enough minority candidates that the likelihood of such an outcome happening by pure chance was quite slim.

I decided to get to the bottom of things, despite combinatorics being one of my weakest areas in math growing up. To start, I had to confirm the number of white vs. non-white candidates in the debate field. I quickly found out that there were only 5 non-white candidates: Kamala Harris, (black), Cory Booker (black), Julián Castro (Latino), Andrew Yang (Asian), and Tulsi Gabbard (Pacific Islander).

A First Pass

If CNN randomly selected each candidate and their debate day, then we can calculate the total number of ways that 20 candidates can be divided into two groups. Assuming that order matters (i.e. having only white candidates on the first day of debates is different from having only white candidates on the second day), then there are a total of \binom{20}{10}=184,756 possible combinations. Out of those, there are \binom{15}{10} \times \binom{5}{0}=3,003 ways to choose only white candidates on the first day. Therefore, the probability of featuring only white candidates on the first day is \frac{3,003}{184,756}=1.63\%. Not very likely, eh?

CNN, What Were You Thinking?

Interestingly enough, CNN did NOT use a purely random selection process, instead electing to use a somewhat convoluted three-part draw “to ensure support for the candidates [was] evenly spread across both nights.” The 20 candidates were first ordered based on their rankings in the latest public polling, and then divided into three groups: Top 4 (Biden, Harris, Sanders, Warren), Middle 6 (Booker, Buttigieg, Castro, Klobuchar, O’Rourke, Yang), and Bottom 10 (Bennet, Bullock, de Blasio, Delaney, Gabbard, Gillibrand, Hickenlooper, Inslee, Ryan, Williamson).

The 3 Initial Groups and Final Debate Lineups, in Alphabetical Order

“During each draw, cards with a candidate’s name [were] placed into a dedicated box, while a second box [held] cards printed with the date of each night. For each draw, the anchor [retrieved] a name card from the first box and then [matched] it with a date card from the second box.”

CNN

In other words, CNN performed a random selection within each of the three groups, and the three draws were independent events.

A New Methodology

To calculate our desired probability under the actual CNN methodology, we need to figure out the likeliness of having only white candidates on the first day for each of the three groups. We can then multiply these probabilities together since the events are independent. For the Top 4 (where Harris is the only non-white candidate), there are \binom{4}{2}=6 total combinations, and \binom{3}{2} \times \binom{1}{0}=3 ways to choose only white candidates on the first day. Therefore, the probability of featuring only white candidates on the first day is \frac{3}{6}=50\%.

For the Bottom 10 (where Gabbard is the only non-white candidate), there are \binom{10}{5}=252 total combinations, and \binom{9}{5} \times \binom{1}{0}=126 ways to choose only white candidates on the first day. Therefore, our desired probability is \frac{126}{252}=50\%.

It should make sense that the probability is 50% for both the Top 4 and Bottom 10, precisely because there is exactly one candidate of color in each group. Think about it for a second: in both scenarios, the non-white candidate either ends up debating on the first day or the second day, hence 50%.

The Middle 6 is where it gets interesting. There are exactly 3 white candidates and 3 non-white candidates. This yields \binom{6}{3}=20 total combinations, but only \binom{3}{3} \times \binom{3}{0}=1 way to choose only white candidates on the first day, or a probability of just \frac{1}{20}=5\%.

Since the three draws are independent events, we can simply multiply the probabilities to get to our desired answer: 50\% \times 50\% \times 5\% = 1.25\%. Even lower than the 1.63% from our first calculation!

One More Twist

Even a casual observer may have noticed that although the first day of debates featured an all-white field, Democratic front-runner Joe Biden was drawn on the second day. This conveniently set up what many media outlets touted as a “rematch” with Senator Kamala Harris, with CNN going so far as comparing the match-up to the “Thrilla in Manila” (I wish I were joking).

The probability of have only white candidates on the first day AND Joe Biden on the second day is 16.67\% \times 50\% \times 5\% = 0.42\%. The only difference between this scenario and the previous one is that within the Top 4, there is only one way to draw both Biden and Harris on the second day out of a total of six possible combinations: \frac{1}{6}=16.67\%.

Validating with Monte Carlo

I wasn’t 100% certain about my mathematical calculations at this point, so I decided to verify them using Monte Carlo simulations. Plus, this wouldn’t be a “Fun with Excel” post if we didn’t let Excel do some of the heavy lifting 🙂

I set up a series of random number generators to simulate CNN’s drawing procedure, keeping track of whether Scenario 1 (only white candidates on the first day) or Scenario 2 (only white candidates on the first day AND Joe Biden on the second day) was fulfilled in each case. Excel’s row limit only let me run 45,000 draws simultaneously, which I then repeated 100 times and graphed as box and whisker plots below:

Min: 1.14%, Max: 1.36%, Average: 1.26%
Min: 0.34%, Max: 0.52%, Average: 0.42%

The simulations yielded an average of 1.26% for Scenario 1 and 0.42% for Scenario 2, thus corroborating the previously calculated theoretical probabilities of 1.25% and 0.42%.

Accurate Portrayal of My Reaction Whenever One of My Crazy Excel Experiments Ends up Actually Working

Concluding Thoughts

Numbers don’t lie, and they lead me to conclude that the CNN Democratic Debate Draw was not truly random. The million dollar question, of course, is why? What does CNN gain from having only white candidates on the first day and Joe Biden on the second day (along with all the minority candidates)? As I don’t intend for my blog to be an outlet for my personal political views, I’ll leave out any “conspiracy” theories and leave them as an exercise for you, the reader.

As always, you can find my work here.

This is Post #20 of the “Fun with Excel” series. For more content like this, please click here.

Fun with Excel #18 – The Birthday Problem

Meeting someone with the same birthday as you always seems like a happy coincidence. After all, with 365 (366 including February 29th) unique birthdays, the chances of any two people being born on the same day appear to be small. While this is indeed true for any two individuals picked at random, what happens when we add a third, a fourth, or a fifth person into the fray? At what point does it become inevitable that some pair of people will share a birthday?

Of course, the only way to guarantee a shared birthday among a group of people is to have at least 367 people in a room. Take a moment to think about that statement, and if you’re still stumped, read this. Now that we know 100% probability is reached with 367 people, how many people would it take to reach 50%, 75%, or 90% probability? If you think the answer is 184, 275, and 330, then you would be quite wrong. Here’s why:

Let’s assume that all birthdays are equally likely to occur in a given population and that leap years are ignored. To paint a more vivid picture in our minds, let’s further assume that we have a large room and that people are entering the room one at a time while announcing their birthdays for everyone to hear. The first person enters the room and announces that his/her birthday is January 1st (we can choose any date we want without loss of generality). The second person has a 364/365 probability of having a different birthday from the first person and therefore a 1 - 364/365 probability of having the same birthday. The third person has a (364/365) \times (363/365) probability of having a different birthday from either of the first two people and therefore a 1 - (364/365) \times (363/365) probability of having the same birthday as either of first two people. The fourth person has a (364/365) \times (363/365) \times (362/365) probability of having a different birthday from any of the first three people and therefore a 1 - (364/365) \times (363/365) \times (362/365) probability of having the same birthday as any of first three people. To generalize, the probability of the nth person being the first person to have the same birthday as any of the n-1 people before him/her is:

P(n) = 1- \frac{364}{365} \times \frac{363}{365} \times \frac{362}{365} \times \cdots \times \frac{365-n+1}{365}

Note that the yellow series in the above graph grows exponentially rather than linearly, with the probability reaching 50% at just 23 people. 75% and 90% probability are reached at 32 and 41 people, respectively. By the time 70 people are in the room, there is a greater than 99.9% chance that two individuals will have the same birthday!

As the number of people increases, P(n) switches from exponential to logarithmic, with each additional personal providing less incremental probability than the previous. Interestingly, the 20th person provides the greatest incremental probability, as seen in the above table.

In contrast, the probability that any one person has a specific birthday is denoted by the much simpler equation:

P_1(n) = 1 - \left( \frac{364}{365} \right)^n

This relationship, which is highlighted by the green series in the graph, grows at a much slower rate than the yellow series. In comparison, it takes 253 people for P_1(n) to exceed 50%.

Testing Our Assumptions

One key assumption we made in the above exercise was that all birthdays (aside from February 29th) occur with equal probability. But how correct is that assumption? Luckily, Roy Murphy has run the analysis based on birthdays retrieved from over 480,000 life insurance applications. I won’t repeat verbatim the contents of his short and excellent article, but I did re-create some charts showing the expected and actual distribution of birthdays. The bottom line is that the actual data show more variation (including very apparent seasonal variation by month) than what is expected through chance.

Implications on Birthday Matching

Now that we know that birthdays in reality are unevenly distributed, it follows that matches should occur more frequently than we expect. To test this hypothesis, I ran two Monte Carlo Simulations with 1,000 trials each to test the minimum number of people required to get to a matching birthday: the first based on expected probabilities (each birthday with equal likelihood of 1/365.25 and February 29th with likelihood of 1/1461) and the second based on actual probabilities (sourced from the Murphy data set).

Note that the distributions of both simulations are skewed positively (i.e. to the right). The results appear to corroborate our hypothesis, as evidenced by the gray line growing at a faster rate than the yellow line in the above graph. Indeed, the average number of people required for a birthday match is 24.83 under the simulation using actual probabilities, slightly lower than the 25.06 using expected probabilities. However, the difference is not very significant; therefore, our assumption of uniformly distributed birthdays works just fine.

As always, you can find my work here.

Fun with Excel #16 – Rigging Live Draws: The Emirates FA Cup

The Fifth Round Draw of the 2016/17 Emirates FA Cup was rigged.

Bold statement (literally), although that sentence probably meant nothing to anyone who doesn’t follow English Football (re: soccer) and the FA Cup in particular.

A quick introduction to the FA Cup competition, courtesy of Wikipedia (emphasis mine):

The FA Cup, known officially as The Football Association Challenge Cup, is an annual knockout association football competition in men’s domestic English football. First played during the 1871–72 season, it is the oldest association football competition in the world. For sponsorship reasons, from 2015 through to 2018 it is also known as The Emirates FA Cup.

The competition is open to any eligible club down to Levels 10 of the English football league system – all 92 professional clubs in the Premier League and the English Football League (Levels 1 to 4), and several hundred “non-league” teams in Steps 1 to 6 of the National League System (Levels 5 to 10). The tournament consists of 12 randomly drawn rounds followed by the semi-finals and the final. Entrants are not seeded, although a system of byes based on league level ensures higher ranked teams enter in later rounds – the minimum number of games needed to win the competition ranges from six to fourteen.

In the modern era, only one non-league team has ever reached the quarter finals, and teams below Level 2 have never reached the final. As a result, as well as who wins, significant focus is given to those “minnows” (smaller teams) who progress furthest, especially if they achieve an unlikely “giant-killing” victory.

It’s no secret that when it comes to the FA Cup, “giant-killing” victories are more exciting to the average viewer, and therefore better for TV ratings. Therefore, the tournament organizers are incentivized to create as many “minnow-giant” match-ups as possible. Specifically, this means matching up teams from the top level of the English football league system (more commonly known as the English Premier League, or EPL) with teams from lower levels (2nd Tier = Championship, 3rd Tier = League One, 4th Tier = League Two, 5th Tier = National League, etc.) While match-ups in the first 12 rounds of the tournament are determined using “randomly drawn” balls, it has been shown that such live draw events can be effectively rigged by cooling or freezing certain balls.

This year’s FA Cup Fifth Round Draw provided an interesting case study to test the rigging hypothesis, because out of the 16 teams going into the Fifth Round, 8 of them were from the EPL (Tier 1), while the remaining 8 were all from lower divisions. Coincidentally, the 8 EPL teams just happened to get drawn against the 8 non-EPL teams, conveniently leading to the maximum number of 8 “minnow-giant” match-ups. This result should seem suspicious even if you are not familiar with probability theory, but to illustrate just how unlikely such a result is, I will walk through the math.

In order to calculate the probability of the aforementioned result, we first need to figure out the total number of match-ups (i.e. pairs) that can be arranged among a group of 16 teams. As with most problems in mathematics, there is more than one solution, but perhaps the most intuitive one is this: Take one of the 16 teams at random. That first team can be paired up with 15 possible other teams. After a pair is made, 14 teams will remain. Again, we take one of the 14 teams at random. This team can be paired up with 13 possible other teams. By repeating this logic, we see that there are a total of 15x13x11x9x7x5x3x2x1=2,027,025 unique pairs. It turns out that mathematicians already have a function that simplifies this exact result: the double factorial (expressed as n!!). Therefore, we can generalize that for any group of objects, the number of unique pairings is equal to (n-1)!!

To calculate the total number of ways to draw exactly 8 “minnow-giant” match-ups, we can imagine putting all 8 of the EPL teams in a line. Since we are looking to match the EPL teams one-to-one with the non-EPL teams, the question becomes: how many different ways can we line up the non-EPL teams so that they are paired up with the EPL teams? The answer to that is simply 8x7x6x5x4x3x2x1=8!=40,320. It is important to understand why we keep the order of the EPL teams unchanged while we only change the order of the non-EPL teams; otherwise, we would be grossly over-counting!

The probability of drawing exactly 8 “minnow-giant” match-ups is therefore 40,320/2,027,025=1.99%, or just a tad under 2%! To verify this, I ran a Monte Carlo simulation involving 50,000 trials, of which 961 trials ended up with exactly 8 “minnow-giant” match-ups, or 1.92%. The below table and chart also show the theoretical probabilities of drawing “minnow-giant” match-ups, for 0 ≤ n ≤ 8. (Bonus Question: Can you convince yourself why it’s impossible to draw an odd number of “minnow-giant” pairs among a group of 16 teams?)


But wait, it gets even better. Out of the 8 non-EPL teams, 4 teams were from the Championship (2nd Tier league), 2 teams were from League One (3rd Tier), and 2 teams were from the National League (5th Tier). Arsenal, which has been sponsored by Emirates since 2006, ended up drawing Sutton United, one of only two teams (the other being Lincoln City) from the National League (5th Tier). Now, what are the chances that the team that shares a sponsor with the competition itself ends up drawing one of the two easiest (in theory) match-ups available?

The number of ways for Arsenal to draw a National League (5th Tier) team (i.e. either Sutton United or Lincoln City), without any restrictions on how the other match-ups are drawn, is 270,270. We arrive at this number by first assuming Arsenal and Sutton United are already paired off, thus leaving 14 teams reaming. The 14 teams can be paired off in 13!!=135,135 ways without restriction. We can repeat the same reasoning for an Arsenal/Lincoln City pair. Therefore, we double 135,135 to arrive at 270,270. This yields a theoretical probability of 270,270/2,027,025=13.33% (Monte Carlo resulted in 6,620/50,000=13.24%), which is almost 1 in 6. However, this is only the probability of Arsenal drawing a 5th Tier team with no other match-up restrictions. In reality, there were already 8 “minnow-giant” match-ups drawn in the first place.

Therefore, the question becomes: what is the probability that 8 “minnow-giant” match-ups are drawn AND Arsenal draws a 5th Tier team? We already know there are 40,320 possible match-ups for the first part of the requirement. Satisfying both parts of the requirement must result in a number smaller than 40,320. Think of it like this: we start off with the fact that the 8 EPL teams are matched up one-to-one with the 8 non-EPL teams. There are 2 different ways to pair Arsenal with a 5th Tier team (since there are only 2 such teams). Of the remaining teams, there are 7!=5,040 ways to pair them off such that the EPL and non-EPL teams are still matched one-to-one. Therefore, the total number of match-ups satisfying both requirements is 2×7!=10,080. This yields a theoretical probability of 10,080/2,027,025=0.50% (Monte Carlo resulted in 250/50,000=0.50%).

In conclusion, there was only a 0.50% chance that the 2016/17 Emirates FA Cup Fifth Round Draw would lead to exactly 8 “minnow-giant” match-ups AND Arsenal drawing 1 of the 2 National League (5th Tier) teams. The fact that it happened anyway suggests that the drawing process may not have been 100% random.

As always, you can find my back up here. Please note, however, that I had to change all of the Monte Carlo formulas to values and save the file as .xlsb instead of .xlsx, as the file was way too large before (71 MB).

I would also like to give credit to the Chelsea subreddit for inspiring me to explore this topic.

Fun with Excel #14 – Fantasy Football Match-ups Matter!

The vast majority of Fantasy Football leagues use head-to-head (H2H) scoring, where a (random) schedule is determined at the beginning of the season and mirrors how the sport is played in real life. During the Regular Season (typically Weeks 1-13), each team will usually play every other team at least once, with each team’s win-less record determining its Playoff seeding. Therefore, the objective for each team is to simply maximize its points total every week. While this is a straightforward task, the variability of individual player scoring can get frustrating, especially when coupled with the randomness of H2H match-ups. Everyone has had that one week where they score 150 points, only for their opponent to somehow put up 160. Other times, it feels like the Fantasy Gods are conspiring against you as you finish the season top 3 in Points For, but first in Points Against by a long shot.

So how much additional variability does H2H scheduling really introduce, and are there more equitable scoring formats? To explore this, I looked no further than my previous Fantasy season (shameless plug about winning the Championship in my first year here).

To start off, here are the 14 teams from my league last year and their scoring through the Regular Season (first 13 weeks, ESPN PPR scoring):

chart-0

Here is the corresponding standings table (yes, there was a tie game…our league has since switched to fractional scoring):

chart-1

The correlation between final ranking (i.e. Playoff seeding) and total points scored over the Regular Season was 87.6% (technically negative 87.6%, since smaller numbers correspond to higher/better rankings). To further complicate things, my league was structured with two divisions (2D) last season (Division A and Division B with 7 teams each). So rather than teams being ranked solely based on win-loss record, the top team (by record) from Division A and the top team from Division B were awarded the #1 and #2 seeds, while the top 3 teams from each division were given the top 6 seeds (and the only spots in the Playoffs). This resulted in Team 10 (9-4) finishing as the #2 team despite having a worse Regular Season record than Team 11 (10-3), which finished #3. As you can see in the table below, switching to a more standard single division (1D) scoring format led to the swap of Teams 10 and 11, which is arguably more “fair.” The correlation between the rankings under the 1D case and total points scored was 86.1%, very similar to that of the 2D case.

chart-2

The third scoring format I explored was simply total Points For (PF). As you can see in the table above, this led to a fairly decent shakeup in the overall rankings. Under the PF scoring format, Teams 1, 8, and 11 would have ranked 3 places lower than under the 1D scoring format, suggesting that these teams benefited from “lucky” H2H match-ups (under 1D). On the flip side, Teams 4 and 12 would have ranked 4 and 3 places higher, respectively, than under 1D, suggesting that these two teams were hurt by “unlucky” H2H match-ups. Notably, Team 4 finished 10th under 1D scoring (well outside the Playoffs) but would have finished 6th under PF scoring (securing the last Playoff spot).

Lastly, I ran a Monte Carlo simulation consisting of 1,000 trials, randomizing the schedules of every team over the entire Regular Season. Each individual trial was scored under the 1D format, but my goal was to measure the average ranking of each team over a large number of repetitions, and to compare the results with both the 1D (Base Case) and PF formats.

The results of the simulation were similar to those under the PF scoring format. Once again, Teams 1, 8, and 11 would have ranked 3 places lower than under the 1D scoring format, suggesting that these teams benefited from “lucky” H2H match-ups. In contrast, Teams 4 and 5 would have ranked 3 and 4 places higher, respectively, than under 1D, suggesting that these two teams were hurt by “unlucky” H2H match-ups. The correlation between the rankings under the PF and MC cases and total points scored was 97.4% and 96.2%, respectively. This makes intuitive sense because the MC case minimizes the impact of additional variance introduced by H2H scheduling, while the PF case eliminates such variance completely.

In addition to looking at the correlation between Playoff seeding and total points scored, I also explored the impact of team volatility (i.e. the standard deviation of each team’s weekly score over the course of the 13 Regular Season games) on the final rankings of the teams. I came up with a “Sharpe Ratio“, which took each team’s average points scored per week and divided it by the standard deviation of each team’s weekly score. I hypothesized that teams with higher Sharpe Ratios would generally be more successful, although I was curious whether this would be a stronger indicator of success than simply looking at total points scored. As you can see in the table below, the correlation between ranking and Sharpe Ratio was in fact significantly lower than the correlation between ranking and total points scored, coming in at roughly 41% under the 2D, 1D, and PF cases and 52.4% under the MC case.

chart-3

So what does all of this mean for Fantasy managers? The name of the game has always been points maximization, and the work that we’ve done in this Post confirms that. In the case of my Fantasy league last season, ranking teams based on one particular (i.e. random) H2H schedule reduced the correlation between overall ranking and total points scored by roughly 10%, thus introducing additional “randomness” to the game. While simply awarding Playoff seeding based on total points scored over the Regular Season may be the fairest scoring format, it certainly takes away from the drama of H2H match-ups that makes Fantasy Football so fun in the first place. One potential compromise is to let the Regular Season run its usual course, but then re-seed the Playoffs according to a comprehensive Monte Carlo simulation. This would minimize the variability introduced by H2H scheduling and ensure that teams are not being helped or hurt by “lucky” or “unlucky” schedules. Another alternative would be to reserve the last seed in the Playoffs for the team that made the top 6 in points scored during the Regular Season (assuming a 6 team Playoff format) but did not make the top 6 in win-loss record. Under this format, Team 4 would have made the Playoffs as the #6 seed last season, displacing Team 13. However, it is worth noting that while different scoring formats would have led to different rankings, the variations on average were still relatively minor. Indeed, the top 5 teams in my league last season (Teams 1, 10, 11, 12, and 5) would have finished in the top 5 regardless of the scoring format used.

chart-4

Before I sign off, I will leave you with one final chart, which is a box-and-whisker plot of the Monte Carlo simulation results. As you can see, the combination of volatility at the team level and variance introduced by H2H scheduling results in a fairly wide range of potential outcomes for every team (with some interesting results, such as Teams 2, 4, and 13 all potentially finishing anywhere between #1 and #14, inclusive). In general, however, the chart still provides an effective visualization of the relative ranking of each team, which I found quite elegant.

chart-5

As always, you can find my backup data here.

Fun with Excel #5 – Monte Hall Meets Monte Carlo

The famous Monte Hall Problem poses the following question: “Suppose you’re on a game show, and you’re given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say #1, and the host, who knows what’s behind the doors, opens another door, say #3, which has a goat. He says to you, ‘Do you want to pick door #2?’ Is it to your advantage to switch your choice of doors?”

While the intuitive answers seems to be “no,” as one might argue that the two remaining doors are equally likely to contain the car, the correct answer is actually “yes.” As vos Savant points out later on in the above link, the probability of winning if you switch is actually 2/3.

But what if you wanted to find the solution without using probability directly? One way is through a Monte Carlo Simulation, which involves running a simulation of the game numerous times in order to calculate the probabilities of winning heuristically. The idea is that as the number of observations increases, the average of the results will coincide with the expected value.

For instance, if we run the simulation 1,000 times, we see a fair amount of volatility in the results over the first 250 and even 500 trials. As we add more trials, however, the average of the results begin to converge to the true expected values: 2/3 chance of winning if we switch doors, and 1/3 chance if we don’t.

The results are even more concrete if we consider 10,000 trials:

And there you have it, a simple application of Monte Carlo Simulation to support one of the more counter-intuitive results in probability theory.