Fun with Excel #16 – Rigging Live Draws: The Emirates FA Cup

February 25, 2017 Jeffcheating, corruption, Emirates FA Cup, England, football, fun with excel, live draw, math, monte carlo simulation, probability, rigged, soccerLeave a comment

The Fifth Round Draw of the 2016/17 Emirates FA Cup was rigged.

Bold statement (literally), although that sentence probably meant nothing to anyone who doesn’t follow English Football (re: soccer) and the FA Cup in particular.

A quick introduction to the FA Cup competition, courtesy of Wikipedia (emphasis mine):

The FA Cup, known officially as The Football Association Challenge Cup, is an annual knockout association football competition in men’s domestic English football. First played during the 1871–72 season, it is the oldest association football competition in the world. For sponsorship reasons, from 2015 through to 2018 it is also known as The Emirates FA Cup.

The competition is open to any eligible club down to Levels 10 of the English football league system – all 92 professional clubs in the Premier League and the English Football League (Levels 1 to 4), and several hundred “non-league” teams in Steps 1 to 6 of the National League System (Levels 5 to 10). The tournament consists of 12 randomly drawn rounds followed by the semi-finals and the final. Entrants are not seeded, although a system of byes based on league level ensures higher ranked teams enter in later rounds – the minimum number of games needed to win the competition ranges from six to fourteen.

In the modern era, only one non-league team has ever reached the quarter finals, and teams below Level 2 have never reached the final. As a result, as well as who wins, significant focus is given to those “minnows” (smaller teams) who progress furthest, especially if they achieve an unlikely “giant-killing” victory.

It’s no secret that when it comes to the FA Cup, “giant-killing” victories are more exciting to the average viewer, and therefore better for TV ratings. Therefore, the tournament organizers are incentivized to create as many “minnow-giant” match-ups as possible. Specifically, this means matching up teams from the top level of the English football league system (more commonly known as the English Premier League, or EPL) with teams from lower levels (2nd Tier = Championship, 3rd Tier = League One, 4th Tier = League Two, 5th Tier = National League, etc.) While match-ups in the first 12 rounds of the tournament are determined using “randomly drawn” balls, it has been shown that such live draw events can be effectively rigged by cooling or freezing certain balls.

This year’s FA Cup Fifth Round Draw provided an interesting case study to test the rigging hypothesis, because out of the 16 teams going into the Fifth Round, 8 of them were from the EPL (Tier 1), while the remaining 8 were all from lower divisions. Coincidentally, the 8 EPL teams just happened to get drawn against the 8 non-EPL teams, conveniently leading to the maximum number of 8 “minnow-giant” match-ups. This result should seem suspicious even if you are not familiar with probability theory, but to illustrate just how unlikely such a result is, I will walk through the math.

In order to calculate the probability of the aforementioned result, we first need to figure out the total number of match-ups (i.e. pairs) that can be arranged among a group of 16 teams. As with most problems in mathematics, there is more than one solution, but perhaps the most intuitive one is this: Take one of the 16 teams at random. That first team can be paired up with 15 possible other teams. After a pair is made, 14 teams will remain. Again, we take one of the 14 teams at random. This team can be paired up with 13 possible other teams. By repeating this logic, we see that there are a total of 15x13x11x9x7x5x3x2x1=2,027,025 unique pairs. It turns out that mathematicians already have a function that simplifies this exact result: the double factorial (expressed as n!!). Therefore, we can generalize that for any group of n objects, the number of unique pairings is equal to (n-1)!!

To calculate the total number of ways to draw exactly 8 “minnow-giant” match-ups, we can imagine putting all 8 of the EPL teams in a line. Since we are looking to match the EPL teams one-to-one with the non-EPL teams, the question becomes: how many different ways can we line up the non-EPL teams so that they are paired up with the EPL teams? The answer to that is simply 8x7x6x5x4x3x2x1=8!=40,320. It is important to understand why we keep the order of the EPL teams unchanged while we only change the order of the non-EPL teams; otherwise, we would be grossly over-counting!

The probability of drawing exactly 8 “minnow-giant” match-ups is therefore 40,320/2,027,025=1.99%, or just a tad under 2%! To verify this, I ran a Monte Carlo simulation involving 50,000 trials, of which 961 trials ended up with exactly 8 “minnow-giant” match-ups, or 1.92%. The below table and chart also show the theoretical probabilities of drawing n “minnow-giant” match-ups, for 0 ≤ n ≤ 8. (Bonus Question: Can you convince yourself why it’s impossible to draw an odd number of “minnow-giant” pairs among a group of 16 teams?)

But wait, it gets even better. Out of the 8 non-EPL teams, 4 teams were from the Championship (2nd Tier league), 2 teams were from League One (3rd Tier), and 2 teams were from the National League (5th Tier). Arsenal, which has been sponsored by Emirates since 2006, ended up drawing Sutton United, one of only two teams (the other being Lincoln City) from the National League (5th Tier). Now, what are the chances that the team that shares a sponsor with the competition itself ends up drawing one of the two easiest (in theory) match-ups available?

The number of ways for Arsenal to draw a National League (5th Tier) team (i.e. either Sutton United or Lincoln City), without any restrictions on how the other match-ups are drawn, is 270,270. We arrive at this number by first assuming Arsenal and Sutton United are already paired off, thus leaving 14 teams reaming. The 14 teams can be paired off in 13!!=135,135 ways without restriction. We can repeat the same reasoning for an Arsenal/Lincoln City pair. Therefore, we double 135,135 to arrive at 270,270. This yields a theoretical probability of 270,270/2,027,025=13.33% (Monte Carlo resulted in 6,620/50,000=13.24%), which is almost 1 in 6. However, this is only the probability of Arsenal drawing a 5th Tier team with no other match-up restrictions. In reality, there were already 8 “minnow-giant” match-ups drawn in the first place.

Therefore, the question becomes: what is the probability that 8 “minnow-giant” match-ups are drawn AND Arsenal draws a 5th Tier team? We already know there are 40,320 possible match-ups for the first part of the requirement. Satisfying both parts of the requirement must result in a number smaller than 40,320. Think of it like this: we start off with the fact that the 8 EPL teams are matched up one-to-one with the 8 non-EPL teams. There are 2 different ways to pair Arsenal with a 5th Tier team (since there are only 2 such teams). Of the remaining teams, there are 7!=5,040 ways to pair them off such that the EPL and non-EPL teams are still matched one-to-one. Therefore, the total number of match-ups satisfying both requirements is 2×7!=10,080. This yields a theoretical probability of 10,080/2,027,025=0.50% (Monte Carlo resulted in 250/50,000=0.50%).

In conclusion, there was only a 0.50% chance that the 2016/17 Emirates FA Cup Fifth Round Draw would lead to exactly 8 “minnow-giant” match-ups AND Arsenal drawing 1 of the 2 National League (5th Tier) teams. The fact that it happened anyway suggests that the drawing process may not have been 100% random.

As always, you can find my back up here. Please note, however, that I had to change all of the Monte Carlo formulas to values and save the file as .xlsb instead of .xlsx, as the file was way too large before (71 MB).

I would also like to give credit to the Chelsea subreddit for inspiring me to explore this topic.

Fun with Excel #15 – Fantasy Football: The Value of Optimal Play

January 14, 2017January 21, 2017 JeffFantasy Football, FF, football, fun with excel, optimal play, optimizingLeave a comment

Another year, another season, another ~~Fantasy Football championship~~. Well, almost. We made it to the Finals for the second year in a row, but sadly lost to the 12-1 team in our league by a score of 144.7 to 156.6. Suddenly, my last post on match-ups now seems very perspicacious:

Everyone has had that one week where they score 150 points, only for their opponent to somehow put up 160).

Did I just quote myself so that I could use “perspicacious” in a sentence? Yes, yes I did. Interestingly enough, the eventual winner also won our league two years ago, meaning that the same two teams have combined for 3 out of the last 3 Championships and 4 out of the last 6 Finals appearances. Coincidence?

In this post, I will examine the importance of optimal play in Fantasy Football.

Defining Optimal Play

In the strictest sense, playing optimally means that a manager is maximizing his or her total points scored on a weekly basis. In a perfect world, this would entail picking up the best free agents by position and starting them throughout the course of the season (e.g. acquiring Jack Doyle (TE) from the waivers and starting him in Week 1 when he got 18.5 points, the highest point total for all TEs that week). While such a strategy is theoretically possible, the large amount of roster churn required as well as the significant amount of risk involved make this strategy almost impossible to implement in any sort of standard league.

A more practical interpretation of optimal play is a manager’s ability to choose the best starting lineup given his or her current roster for that week. By defining optimal play like this, we split the manager’s abilities into two distinct buckets: (1) all the actions required to arrive at his or her current roster (i.e. drafting, waiver wire acquisitions, trading) and (2) setting the best starting lineup. Isolating the impact of (2) is a relatively easy exercise to do in retrospect (since all the data is available), and that’s exactly what I did over the 13-week Regular Season.

Of course, it should go without saying that the subsequent analysis does not account for the inherent randomness in Fantasy Football (i.e. variability in player performance, match-ups/schedule, etc.)

Results

I reviewed the Regular Season performance (i.e. Weeks 1-13) for all 14 teams in our league and calculated each team’s hypothetical optimal score every week, as well as the number of points below optimal (PBO) that each team actually scored. The smaller the PBO, the better.

Here’s a sample of what that looked like in Excel:

The following tables show a summary of the actual, optimal, and PBO results for each team over the Regular Season:

The average PBO per week was 13.0 in our league, with a standard deviation of 3.4. Out of the 14 teams, 11 teams performed within one standard deviation of the mean. Team 8 had an average PBO per week of only 7.3, while Teams 7 and 9 were on the other end of the spectrum, boasting PBO values of 17.9 and 21.0 (more than two SDs out!), respectively. But did these outliers perform significantly better or worse in terms of Regular Season record?

Although the sample size is quite small, the answer appears to be “not really.” As we learned in my last post, Regular Season rank has a much stronger correlation with Total Points For than any other factor, and that is apparent in the scatter plot below. There was a -69.4% correlation between Regular Season rank and Total Points For (remember, it’s negative because a smaller rank denotes a better overall record), compared to only a 2.7% correlation between Regular Season rank and Total Points Below Optimal.

To further hit the point home, the league average Total Points For was 1,456.0. The six teams that made the Playoffs had an average of 1,512.3, while the eight teams that didn’t make the Playoffs had an average of 1,413.7. When it came to Total Points Below Optimal, however, there was virtually no difference between teams that made the Playoffs and those that didn’t: the league average Total PBO was 168.7, while Playoff teams averaged 168.5 and non-Playoff teams averaged 168.9.

The below table summarizes our league’s Regular Season rankings. In addition to each team’s Total PBO, I also calculated the Total PBO for each team’s opponents throughout the Regular Season. Lastly, I ran an alternate scenario where I assumed each team had the same schedule but played optimally throughout the entire Regular Season.

Concluding Remarks

While playing optimally may not be directly correlated with each team’s Regular Season record, it turns out that Total PBO for each team’s opponents was correlated -37.2% with that team’s record. Playoff teams’ average Opponent PBO was 187.8, compared to non-Playoff teams’ average of 154.5 (remember, overall average of 168.7). In other words, teams with better records “capitalized” on their opponents’ mistakes, and this had a non-trivial impact on the final Regular Season standings.

In addition, the optimal Regular Season scenario (last three columns of the table above) yielded some interesting results. If every team played optimally every week, then the top 10 teams over the Regular Season would have remain unchanged. However, due to the overall competitiveness of the league (six teams were either 8-5 or 7-6 in the Regular Season), an extra game won or lost in this scenario led to Teams 12 and 1 dropping out of the Playoffs and Teams 4 and 7 making it instead.

The “all optimal” scenario also indirectly highlights (once again) the painfulness of unlucky match-ups. Team 6, for example, had the second highest Total Points For but only ended up 5-8 and in 10th place. In the “all optimal” scenario, Team 6 once again boasted the second highest Total Points For but remained unchanged at 5-8 and 10th place. This is yet another argument for having more flexible Playoff seeding, as mentioned in my previous post:

Another alternative would be to reserve the last seed in the Playoffs for the team that made the top 6 in points scored during the Regular Season (assuming a 6 team Playoff format) but did not make the top 6 in win-loss record.

Interestingly, Team 8, which had the lowest PBO in the league, also benefited from having the second lowest Opponent PBO. Under the “all optimal” scenario, Team 8 would have dropped from having a 4-9 record and holding 11th place to just winning one game and being in dead last.

Before I sign off, let’s revisit our earlier decision to

split the manager’s abilities into two distinct buckets: (1) all the actions required to arrive at his or her current roster (i.e. drafting, waiver wire acquisitions, trading) and (2) setting the best starting lineup.

Stated simply, Bucket (1) measures a manager’s ability to maximize his or her Total Points For under an “all optimal” scenario. Inherent randomness aside, I would argue that this is largely a test of skill. Bucket (2) can be further split into two aspects: (A) the manager’s ability to set the optimal lineup every week (largely a test of skill) and (B) the ability of the manager’s opponents to play optimally (more a test of luck, as schedules/match-ups are in play).

As always, you can find my work here.

Fun with Excel #14 – Fantasy Football Match-ups Matter!

October 27, 2016January 14, 2017 JeffFantasy Football, FF, football, fun with excel, monte carlo simulationLeave a comment

The vast majority of Fantasy Football leagues use head-to-head (H2H) scoring, where a (random) schedule is determined at the beginning of the season and mirrors how the sport is played in real life. During the Regular Season (typically Weeks 1-13), each team will usually play every other team at least once, with each team’s win-less record determining its Playoff seeding. Therefore, the objective for each team is to simply maximize its points total every week. While this is a straightforward task, the variability of individual player scoring can get frustrating, especially when coupled with the randomness of H2H match-ups. Everyone has had that one week where they score 150 points, only for their opponent to somehow put up 160. Other times, it feels like the Fantasy Gods are conspiring against you as you finish the season top 3 in Points For, but first in Points Against by a long shot.

So how much additional variability does H2H scheduling really introduce, and are there more equitable scoring formats? To explore this, I looked no further than my previous Fantasy season (shameless plug about winning the Championship in my first year here).

To start off, here are the 14 teams from my league last year and their scoring through the Regular Season (first 13 weeks, ESPN PPR scoring):

Here is the corresponding standings table (yes, there was a tie game…our league has since switched to fractional scoring):

The correlation between final ranking (i.e. Playoff seeding) and total points scored over the Regular Season was 87.6% (technically negative 87.6%, since smaller numbers correspond to higher/better rankings). To further complicate things, my league was structured with two divisions (2D) last season (Division A and Division B with 7 teams each). So rather than teams being ranked solely based on win-loss record, the top team (by record) from Division A and the top team from Division B were awarded the #1 and #2 seeds, while the top 3 teams from each division were given the top 6 seeds (and the only spots in the Playoffs). This resulted in Team 10 (9-4) finishing as the #2 team despite having a worse Regular Season record than Team 11 (10-3), which finished #3. As you can see in the table below, switching to a more standard single division (1D) scoring format led to the swap of Teams 10 and 11, which is arguably more “fair.” The correlation between the rankings under the 1D case and total points scored was 86.1%, very similar to that of the 2D case.

The third scoring format I explored was simply total Points For (PF). As you can see in the table above, this led to a fairly decent shakeup in the overall rankings. Under the PF scoring format, Teams 1, 8, and 11 would have ranked 3 places lower than under the 1D scoring format, suggesting that these teams benefited from “lucky” H2H match-ups (under 1D). On the flip side, Teams 4 and 12 would have ranked 4 and 3 places higher, respectively, than under 1D, suggesting that these two teams were hurt by “unlucky” H2H match-ups. Notably, Team 4 finished 10th under 1D scoring (well outside the Playoffs) but would have finished 6th under PF scoring (securing the last Playoff spot).

Lastly, I ran a Monte Carlo simulation consisting of 1,000 trials, randomizing the schedules of every team over the entire Regular Season. Each individual trial was scored under the 1D format, but my goal was to measure the average ranking of each team over a large number of repetitions, and to compare the results with both the 1D (Base Case) and PF formats.

The results of the simulation were similar to those under the PF scoring format. Once again, Teams 1, 8, and 11 would have ranked 3 places lower than under the 1D scoring format, suggesting that these teams benefited from “lucky” H2H match-ups. In contrast, Teams 4 and 5 would have ranked 3 and 4 places higher, respectively, than under 1D, suggesting that these two teams were hurt by “unlucky” H2H match-ups. The correlation between the rankings under the PF and MC cases and total points scored was 97.4% and 96.2%, respectively. This makes intuitive sense because the MC case minimizes the impact of additional variance introduced by H2H scheduling, while the PF case eliminates such variance completely.

In addition to looking at the correlation between Playoff seeding and total points scored, I also explored the impact of team volatility (i.e. the standard deviation of each team’s weekly score over the course of the 13 Regular Season games) on the final rankings of the teams. I came up with a “Sharpe Ratio“, which took each team’s average points scored per week and divided it by the standard deviation of each team’s weekly score. I hypothesized that teams with higher Sharpe Ratios would generally be more successful, although I was curious whether this would be a stronger indicator of success than simply looking at total points scored. As you can see in the table below, the correlation between ranking and Sharpe Ratio was in fact significantly lower than the correlation between ranking and total points scored, coming in at roughly 41% under the 2D, 1D, and PF cases and 52.4% under the MC case.

So what does all of this mean for Fantasy managers? The name of the game has always been points maximization, and the work that we’ve done in this Post confirms that. In the case of my Fantasy league last season, ranking teams based on one particular (i.e. random) H2H schedule reduced the correlation between overall ranking and total points scored by roughly 10%, thus introducing additional “randomness” to the game. While simply awarding Playoff seeding based on total points scored over the Regular Season may be the fairest scoring format, it certainly takes away from the drama of H2H match-ups that makes Fantasy Football so fun in the first place. One potential compromise is to let the Regular Season run its usual course, but then re-seed the Playoffs according to a comprehensive Monte Carlo simulation. This would minimize the variability introduced by H2H scheduling and ensure that teams are not being helped or hurt by “lucky” or “unlucky” schedules. Another alternative would be to reserve the last seed in the Playoffs for the team that made the top 6 in points scored during the Regular Season (assuming a 6 team Playoff format) but did not make the top 6 in win-loss record. Under this format, Team 4 would have made the Playoffs as the #6 seed last season, displacing Team 13. However, it is worth noting that while different scoring formats would have led to different rankings, the variations on average were still relatively minor. Indeed, the top 5 teams in my league last season (Teams 1, 10, 11, 12, and 5) would have finished in the top 5 regardless of the scoring format used.

Before I sign off, I will leave you with one final chart, which is a box-and-whisker plot of the Monte Carlo simulation results. As you can see, the combination of volatility at the team level and variance introduced by H2H scheduling results in a fairly wide range of potential outcomes for every team (with some interesting results, such as Teams 2, 4, and 13 all potentially finishing anywhere between #1 and #14, inclusive). In general, however, the chart still provides an effective visualization of the relative ranking of each team, which I found quite elegant.

As always, you can find my backup data here.

Fun with Excel #11 – Who’s My 2015 Fantasy Football MVP?

January 6, 2016October 27, 2016 JeffCam Newton, Fantasy Football, FF, football, fun with excel, Jordan Reed, MVP, Tim HightowerLeave a comment

Four months ago, I had never watched a full game of American Football, nor was I remotely interested in doing so. Now, after participating in my first ever season of Fantasy Football (and winning the Championship!), I find myself an enthusiastic follower of the sport. For those of you not familiar with Fantasy Sports, here’s a definition courtesy of Wikipedia:

A fantasy sport is a type of online game where participants assemble imaginary or virtual teams of real players of a professional sport. These teams compete based on the statistical performance of those players’ players in actual games. This performance is converted into points that are compiled and totaled according to a roster selected by each fantasy team’s manager. These point systems can be simple enough to be manually calculated by a “league commissioner” who coordinates and manages the overall league, or points can be compiled and calculated using computers tracking actual results of the professional sport. In fantasy sports, team owners draft, trade and cut (drop) players, analogously to real sports.

Online fantasy sports are a multibillion-dollar industry, and fantasy NFL football is by far the most popular fantasy sport.

Being the highly competitive stats nerd that I am, Fantasy Football caught my interest the moment the draft began and gradually became a part of my daily routine for the better part of 16 weeks. While there is inherently a lot of randomness and luck involved in the game, Fantasy Football at the end of the day is a game of skill – managers who take the time to draft wisely and follow the latest player and team developments put themselves in a good position to make savvy moves for their fantasy team on a week-to-week basis (e.g. picking up valuable free agents off the wavier wire, making trades with other managers, optimizing the starting lineup), which in turn leads to better performance over the long run.

After taking home the Championship in Week 16 (I guess it’s all downhill from here on out…), I became curious as to who the Most Valuable Player (MVP) of my Fantasy Team was. MVPs are handed out at the end of the season in real sports, so I wanted to come up with a methodology to do the same with my Fantasy Team.

League Details

14 Teams, ESPN, 1 PPR Scoring, QB/2RB/2WR/1TE/1FLEX(RB/WR/TE)/DST/K, Auction Draft ($200), FAAB Waiver Wire ($1,000)

Thought Process

I wanted to not only quantify each player’s contribution to the Team’s performance throughout the course of the season (Weeks 1-13 Regular Season, Weeks 15-16 Playoffs due to First Round Bye), but also measure each player’s performance during key games — their “clutch” ability, if you will. In the end, I came up with 3 ways to rank players by their points contribution and 2 ways to assess their “clutchness.”

Results and Rankings

First, a table summarizing my Team’s season as well as every player (27!) who contributed points during the campaign:

In addition to tables, I’ll be providing some charts throughout this post that will help to better visualize player performance and rankings.

Ranking Methodology #1 – Total Points Over Season – Fairly straight forward. Who contributed the most points over the course of the season?

Ranking Methodology #2 – Average Points Per Game Started – Unlike the first ranking, this measurement normalizes player contribution by the number of games started and highlights productive players who were picked up later on in the season (or, conversely, high performing players who unfortunately had their seasons cut short due to injury). Under this simple points per games scheme, both Cam Newton (7 games) and Jamaal Charles (5 games) rank higher thank Demaryius Thomas, despite Thomas (15 games) starting every game in the season (and interestingly enough the only player to do so this season!).

Ranking Methodology #3 – Average Contribution % Per Game Started – In this measurement, I assessed each player’s points scored as a percentage of my Fantasy Team’s total points scored for a particular week/game. I then averaged this percentage over every game the player started over the course of the season. In the end, this was an alternate measure of a player’s relative importance that resulted (unsurprisingly) in a ranking similar to the simple average method (#2).

Clutch Factor #1 – Over/Underperformance in Close Games (<= 10 Points) – I was curious to see how players performed in close games, which I defined somewhat arbitrarily as any game with a margin of victory/defeat of 10 points or fewer. There were 4 such games during the season, in Weeks 4 (Win), 5 (Win), 8 (Loss), and 12 (Win). To actually measure “clutchness,” I employed a calculation akin to the Sharpe Ratio. I first calculated each player’s average PPG over the full season (Weeks 1-16), as well as the standard deviation of their performances. Then, for every Close Game, I took each player’s points scored, subtracted his average PPG, and divided this difference by the player’s standard deviation. A positive number would indicate over-performance while a negative number would indicate under-performance (the bigger the absolute value of the number, the bigger the over/underperformance). Finally, for each player, I took an average of his over/underperformance figures over all the Close Games that he played in.

Clutch Factor #2 – Over/Underperformance in Playoffs Games (Weeks 15-16) – While there is nothing that makes a Playoff Game inherently different from a Regular Season Game, the vast majority of leagues structure the Playoffs as a single-elimination bracket where every game becomes a must-win. Therefore, I performed the same calculations of “clutchness” for the two Playoff Games that my Team played in this season.

Final MVP Calculation

With the above measurements complete, the only thing remaining was to come with a methodology to calculate my MVP of the 2015 Season.

First, I assigned points to the Top 10 performers in each of the 3 Ranking Methodologies. Rather than assign 10 points to the top ranked player, 9 points to #2, 8 points to #3, 7 points to #4, etc., I used the current Formula One World Championship points scoring system, which assigns 25 points to #1, 18 points to #2, 15 points to #3, 12 points to #4, 10 points to #5, 8 points to #6, 6 points to #7, 4 points to #8, 2 points to #9, and 1 point to #10. This gives more value to the Top 5 performers (and to #1 in particular) and creates more separation between the very best players and the second tier players. I then summed up each player’s ranking points across all 3 Ranking Methodologies to get to a Composite Ranking Value. From the table below, it’s clear that Cam Newton is the runaway favorite for MVP at this point, ranking #1 across two categories and #2 across the third category for a whopping total of 68 points, almost double the next highest total.

Here’s where things get more interesting. Remember, the 2 Clutch Factors calculated earlier help to quantify each player’s performance in close games and (more importantly) playoff games. Any positive number here means a player performed better than expected, while a negative number means the opposite. Rather than simply add the 2 Clutch Factors, I first multiplied Clutch Factor #2 (over/underperformance in playoffs) by 1.5x, as I deemed Playoff Games to be more important than Close Games during the Regular Season. I then added the result to Clutch Factor #1 to come up with a Composite Clutch Factor. Lastly, I multiplied the Composite Ranking Value by 1 plus the Composite Clutch Factor to arrive at the final calculation for MVP Points.

To recap: MVP Points = (Ranking #1 Points + Ranking #2 Points + Ranking #3 Points)*(1 + Clutch Factor #1 + 1.5 x Clutch Factor #2).

So there you have it! My MVP for the 2015 Fantasy Football Season was Jordan Reed, TE from the Washington Redskins. Not only did Reed end up tied for 2nd in scoring among all TEs (only 11 behind Gronk despite missing two games to injury), he peaked at the optimal time, putting in monster performances of 27, 27, and 33 points in Weeks 14-16 and going for 5 TDs during that span. An Honorable Mention goes out to #2 Tim Hightower (RB, New Orleans Saints), whose real life story is even more amazing than his scintillating Fantasy performance. After literally not playing NFL football for 4 years due to injuries and repeatedly getting cut and re-signed by the Saints earlier this season, Hightower was thrust into the starting position after a season-ending injury to Mark Ingram. The 29 year old made the most of his chances by rushing for 85 yards and a TD in his first game as a starter in Week 14 (16 points). He followed this up with a 11 point performance in Week 15 and a monster 31 point game (122 rushing yards, 47 receiving yards, 2 TDs) in Week 16, no doubt single-handedly winning the Championship for many Fantasy owners. My second Honorable Mention goes out to #3 Cam Newton (QB, Carolina Panthers). Since being acquired in a mid-season trade, Cam never left the my starting lineup, averaging a ridiculous 26.9 PPG from Weeks 9-16 (Week 14 excluded due to Playoff Bye). Although he put up a disappointing 13 points in the Championship, he was nevertheless instrumental in propelling many Fantasy teams into the Playoffs and beyond. Cam ended the season as the highest scoring QB by a sizable margin of 37 points and in fact was second only to Antonio Brown by 6 points for highest scoring player in the league.

As a final parting shot, I will note that none of the players in the top 5 of my MVP rankings was drafted (Reed – trade, Hightower – waivers, Newton – trade, Aiken – waivers, OBJ – trade). This just goes to show that while a bad draft can lose you the season, playing the waivers and making favorable trades are what win Championships.

Fun with Excel #10 – World Cup Goals

April 26, 2015September 26, 2016 Jefffootball, fun with excel, goals, soccer, World CupLeave a comment

While I’ve always been fascinated by sports statistics, I must admit I did not realize that gathering data for this particular project would be so time consuming. No, it didn’t take me 9 months to tally up all the goals ever scored in the World Cup starting from 1930, but I did lose interest about halfway through the data mining (from Wikipedia, no less), and I never got around to tying up loose ends until now. Nevertheless, I’m proud to present my results in this post.

Raw Data — The raw data spanned 836 rows…

Let’s get straight into the pretty charts shall we?

This first chart shows the pace at which goals were scored for every World Cup in history, and while it is indeed pretty, it does a poor job of showing any trends. You might notice that more recent World Cups have seen more goals scored on an absolute basis (and that is indeed the case if you look at the following chart), but you must also keep in mind that the tournament has expanded drastically since its early days.

Starting with just 13 teams at the inaugural 1930 World Cup in Uruguay (18 matches were played), the tournament expanded to include 24 teams in 1982 (52 matches) and finally to 32 teams (the current format) in 1998 (64 matches). Therefore, it should come as no surprise to anyone that more goals are being scored in the modern era, as there are simply more matches being played.

We can quickly verify this by comparing “% Matches Complete” versus “% Goals Scored” rather than “Cumulative Goals Scored,” as this will normalize for the increasing number of matches played over time. As the following two charts show, the actual pace of goalscoring on a per match basis has remained relatively consistent over time. This should be expected, as the rules of the game have changed little since the inception of the World Cup.

I will leave you with one more chart before I sign off, which I find very interesting:

This is a busy chart, but it shows 2 trends which should be obvious, and 2 trends which are not as obvious. The 2 obvious trends are: (1) the absolute number of goals scored per tournament has, on average, increased over time, and (2) the number of matches per tournament has increased over time. The 2 less obvious trends are: (1) the early years of the World Cup (1930-1958) had the highest number of goals per match and (2) that same early period also had the highest average goal differential per match. There are a couple of explanations for this intriguing pattern. First, soccer as a game was not as developed in the early days of the World Cup. There were only a few powerhouse countries in South America and Europe that dominated the international scene, but seeing as the entire purpose of the World Cup was to bring together teams from all over globe, it was inevitable that lopsided results would arise from powerhouse countries stream rolling some of their less fortunate competitors (e.g. Argentina dismantling USA 6-1 in the semifinals in 1930, Hungary beating the Dutch East Indies 6-0 in 1938, with Sweden overrunning Cuba 8-0 that same year). Second, the format of the tournament was in part to blame, an issue most pronounced in the 1954 World Cup, which featured an astounding 5.38 goals per match and 3.00 average goal differential, statistics which will likely never be matched again in the modern era. From Wikipedia: “The sixteen qualifying teams were divided into four groups of four teams each. Each group contained two seeded teams and two unseeded teams. Only four matches were scheduled for each group, each pitting a seeded team against an unseeded team.” In other words, low ranking teams were forced to play high ranking teams, which led to such absurd scorelines like: Brazil 5 – Mexico 0, Hungary 9 – South Korea 0, Turkey 7 – South Korea 0, Uruguay 7 – Scotland 0, and Austria 5 – Czechoslovakia 0. As the game grew in popularity worldwide and more and more countries began sending their most promising players to play in the most competitive leagues in Europe, the playing field has evened substantially. Furthermore, the modern game places a lot more emphasis on defense, which is why it is now rare to see a lopsided scoreline in any competitive international match (and also why Germany’s 7-1 drubbing of Brazil last year was so unexpected). While further expansions in the tournament seem almost certain (after all, more matches played = more money to be made), I expect the goals per match and average goal differential to remain in line with where they have been in the modern era.

—

For those of you interested, the data I aggregated can be found here.

Jeffrey Fan

Random Musings of an Amateur Data Scientist

football

Fun with Excel #16 – Rigging Live Draws: The Emirates FA Cup

Fun with Excel #15 – Fantasy Football: The Value of Optimal Play

Fun with Excel #14 – Fantasy Football Match-ups Matter!

Fun with Excel #11 – Who’s My 2015 Fantasy Football MVP?

Fun with Excel #10 – World Cup Goals