Biased Stats in the NBA

One of my favorite NBA-related articles is Tommy Craggs’ “The Confessions Of An NBA Scorekeeper”, which recounts of the experiences of a scorekeeper named Alex in the 1990s. The article highlights the common occurrence of “stat-padding,” or the practice of inflating the stats (e.g., assists, steals, blocks, and rebounds) of players of the home team. As Craggs writes:

Alex quickly found that a scorekeeper is given broad discretion over two categories: assists and blocks (steals and rebounds are also open to some interpretation, though not a lot). “In the NBA, an assist is a pass leading directly to a basket,” he says. “That’s inherently subjective. What does that really mean in practice? The definition is massively variable according to who you talk to. The Jazz guys were pretty open about their liberalities. … John Stockton averaged 10 assists. Is that legit? It’s legit because they entered it. If he’s another guy, would he get 10? Probably not.”

“The Confessions Of An NBA Scorekeeper”

Alex’s comment on Stockton caught my attention. While I was pretty certain stat-padding existed 20 years ago and and still does to this day, I was curious as to what degree the NBA’s all-time career leaders benefited from this bias.

Methodology

I pulled the top 25 all-time career leaders for each of the following categories from Basketball Reference: points, assists, steals, blocks, and rebounds. This yielded a total of 78 unique players, as some players were ranked on the all-time list in multiple categories. I then pulled the stats for each player, split by home vs. road games.

Note that steals and blocks and were not officially recorded in the NBA until the 1973–74 season. Furthermore, not all statistics were broken out by home vs. road splits until more recently, which means the analysis of bias could not be completed for many of the older stars, including Bill Russell, Kareem Abdul-Jabbar, Magic Johnson, Moses Malone, Oscar Robertson, and Wilt Chamberlain.

Setting the Benchmark (Points)

It’s fairly well-established that teams play better at home than on the road. To confirm this, I measured each player’s points per home game and compared it to his points per road game. On average, players scored 2.8% more points per game at home than on the road, with a standard deviation of 5.1%.

Blue = Average; Green = +/- 1 SD; Red = +/- 2 SD

Strong positive outliers included Tree Rollins, Shawn Bradley, and Mookie Blaylock1Two of these players also belong on the all-time greatest names list. I’ll let you guess which., who were all more than two standard deviations higher than the mean. Jermaine O’Neal was the only negative outlier more than two SDs lower than the mean. Notably, the top six career point leaders were all below average.

I then compared each player’s home vs. road performance for assists, steals, blocks, and rebounds relative to his home vs. road scoring performance. For example, if a player scored, on average, 5% more points per game at home than on the road and grabbed 10% more rebounds per game at home than on the road, then the relative home bias of his rebounding performance would be \frac{1.10}{1.05}-1=4.76\%.

The underlying assumption here is that in the absence of any stat-padding, there should not be significant relative home bias in any of the statistical categories. However, given Alex’s scorekeeping experiences, we would expect to see some degree of bias in all four categories, especially assists and blocks.

My analysis revealed the following results:

Assists

Blue = Average; Green = +/- 1 SD; Red = +/- 2 SD

Relative to the baseline (i.e. points), assists showed a relative home bias of 6.4%, with a standard deviation of 9.6%.

Almost everyone fell within two SDs of the mean, although Theo Ratliff was an extreme positive outlier, albeit on small volume. Note that John Stockton, the all-time assists leader by a long shot, had a relative home bias of only 3.6%, indicating a very low likelihood of stat-padding. On the other hand, Jason Kidd, the second all-time assists leader, had a relative home bias of 16.5%.

Of course, a high relative home bias doesn’t necessarily mean that there was stat-padding going on. Kidd also had an average home vs. road point performance of negative 4.5%. One explanation is that he played more as a facilitator at home while having to shoulder more of the scoring burden while on the road.

Steals

Blue = Average; Green = +/- 1 SD; Red = +/- 2 SD

Relative to the baseline (i.e. points), steals showed a relative home bias of 3.2% (half that of assists), with a standard deviation of 9.3% (roughly the same as that of assists).

Again, almost everyone fell within two SDs of the mean, although Manute Bol was an extreme positive outlier on small volume. Alvin Robertson was also more than two SDs higher than the mean, on much higher volume. Remarkably, John Stockton, also the all-time steals leader by a decent margin, had a relative home bias of only 2.0%, indicating once again that he was the real deal.

By now, you may have noticed that Dikembe Mutombo was more than two SDs below the mean for both assists and steals. It doesn’t really make sense for stat-padding to go the other way, so the likely explanation for negative bias is simply underperformance. The reason why the numbers look so extreme in this case is due to small sample size. Mutombo was an all-time rebounding great who averaged 10.7 boards at home and 10.0 on the road. However, he also only scored 10.3 points at home and 9.4 on the road (benchmark of 9.8%). He had so few assists (1.0 home vs. 1.1 road) and steals (0.4 vs. 0.5) that very small absolute differences in home and road performance led to large percentage biases (-15.4% and -25.4%) relative to his baseline.

Blocks

Blue = Average; Green = +/- 1 SD; Red = +/- 2 SD

Relative to the baseline (i.e. points), blocks showed a relative home bias of 12.3% (nearly double that of assists), with a standard deviation of 19.7% (also nearly double that of assists). Blocks were by far the most biased statistic, as well as the most variable.

There were a handful of players that fell more than two SDs below the mean, while Fat Lever2Another all-time great name., Alvin Robertson, and John Stockton were all more than two SDs above the mean (on low volume). Both Robertson and Stockton had a relative home bias of nearly 80%, or almost 3.5 SDs above the average! So while the Utah Jazz scorekeepers may not have been padding Stockton’s assists and steals, they almost certainly were boosting his blocks…3Take that, Stockton! I finally got you 🙂

Interestingly, David Robinson and Tim Duncan, who both played for the San Antonio Spurs for the entirety of their careers, were between one to two SDs above the mean on relatively high volumes!4Alvin Robertson also played for five season for the Spurs at the beginning of his career.

Rebounds

Blue = Average; Green = +/- 1 SD; Red = +/- 2 SD

Relative to the baseline (i.e. points), rebounds showed a relative home bias of 1.4% (one-fifth of that of assists), with a standard deviation of 4.9% (half that of assists). In contrast to blocks, rebounds were by far the least biased statistic, as well as the least variable.

Given the lower variability, it’s not too surprising that almost all players fell within two SDs of the mean, with no positive outliers and only three negative outliers (on low volume).

Closing Thoughts

In short, the results confirmed our initial expectations. Blocks (12.3% average relative home bias, 19.7% standard deviation) and assists (6.4% Avg, 9.6% SD) showed the most evidence of bias, whereas steals (3.2% Avg, 9.3% SD) and rebounds (1.4% Avg, 4.9% SD) showed the least.

At first, I was surprised that blocks showed significantly more bias than assists. Conceptually, assists felt like a much more subjective stat to record, but the data seemed to suggest the opposite. However, I soon realized this was because of the “Mutombo problem” of small sample size. Simply put, assists occur with a lot more frequency than blocks in the NBA. While many great players average more than five assists a game over the course of their careers (the truly elite average over eight!), very few ever manage to block more than three shots a game.

It’s not uncommon for point guards like Stockton and Kidd to average fewer than 0.5 blocks per game, and in certain cases, significantly fewer than that (e.g., Steve Nash and Tony Parker averaged fewer than 0.1 blocks per game). Therefore, even if there were the same amount of absolute stat-padding for assists and blocks, the relative impact would be much greater for blocks. That is to say, a scorekeeper giving a player an “extra” assist or two every home game when the player is averaging eight or ten assists is going to have a much smaller impact than gifting an “extra” block every few home games if that player is averaging a measly 0.1 blocks a game.

As always, you can find my work here.

This is Post #21 of the “Fun with Excel” series. For more content like this, please click here.

Fun with Excel #3 – Corruption in the NBA?

My father was a big fan of the Chicago Bulls back in the ’80s and ’90s, so I had the good fortune of watching some of the best playoff basketball (i.e. Michael Jordan) that the NBA (and the world) has ever witnessed. Perhaps that is the same reason why the last decade or so of NBA basketball has seemed to pale noticeably in terms of excitement. It is generally agreed upon among basketball fans that the game as it is played today is (a lot) less physical (and perhaps less exciting) than it once was.

Officiating has also seemingly become a bigger determinant of results, and like virtually all professional team sports, the blame often lands on the referees. “If it weren’t for that call, they would have won the game,” is a phrase we hear all too often, and one that I am guilty of committing as well. However, have changes in officiating really been that significant over the last few decades, and if so, how would we measure such a phenomenon? The answer, of course, lies in the numbers.

Luckily, statistics for the NBA are readily available, but for the purposes of my project, I decided to look at playoff statistics from the 1983-84 season to the latest 2012-13 season. However, even if the data is easily accessible, oftentimes the most time-consuming aspect of a project is collecting the data and organizing it in a way that makes it easy to analyze. This was no exception. Luckily, with a little vlookup and text parsing (the latter is needlessly complex in Excel) magic, I was able to largely automate the process of converting 30 years of raw playoff data into something I could process more easily.

My first goal was to see if there were any high level trends in the NBA playoffs through time, in particular the number of games played and the point differential in each game. Moreover, I wanted to analyze these metrics by playoff round (e.g. first round (1R), conference semifinals (2R), conference finals (3R), and finals (4R)). If we were to believe that officiating actually had a measurable impact on playoff results, we may expect to find the following:

  • An overall longer playoff campaign
  • Smaller average point differentials, to convey the appearance of “closer” games

Why would the NBA want any of these things to happen? The answer is simple: profits. More games played/closer games = more tickets sold/higher TV ratings. In fact, the NBA switched from a best-of-five format to a best-of-seven format in the first round starting in the 2003 playoffs.

The Results (and the data)

I’ll make a few observations, but the data really speaks for itself here. In the first chart, we see that after adjusting for the NBA’s change in playoff format since the 2002-03 campaign, both the average numbers of games played in the playoffs and the average number of games played per round has not shown any noticeable shift through time. The average points differential chart shows the same story, and in fact both charts seem to suggest some cyclical trends through time. Lastly, the average free throw attempts and fouls chart actually displays a noticeable decrease through time on a per game adjusted basis. Perhaps this is a testament to just how physical the game was back in the 1980s and 90s, which MJ himself has suggested on many an occasion.

 

Conclusion

The data doesn’t seem to indicate any obvious playoff trends that may have been caused by officiating. However, more granular foul data (which may not be available) may help clarify the story. In particular, even if the average number of fouls per game has trended down over the last 30 years, have the types of fouls called changed in any significant way? Perhaps more calls are coming during particularly tight stretches of games, or conversely, during blow outs, to ensure that the losing team is “still in it.” Of course, all of this is pure speculation, and without hard evidence, it is difficult to move forward. As Sir Arthur Conan Doyle once said through his most famous character Sherlock Holmes, “It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.” Until such facts are found, our theories will remain theories.

Good writing must be appreciated…

and no, I’m not talking about my own, haha. Jason Gay, one of my favorite sports columnists, has done it again. Check out his two remarkable pieces regarding the Jeremy Lin story below:

1. http://online.wsj.com/article/SB10000872396390444464304577535081027315006.html – absolutely hilarious; a must read if you are a basketball.

2. http://online.wsj.com/article/SB10000872396390444464304577537284235901286.html – a bit more subtle, but still worth a read.