By Sean Campbell
What started as an article analyzing the contract of one player on the Knicks, eventually became CARMELO (Career-Arc Regression Model Estimator with Local Optimization), an application that forecasts the performance of every NBA player for the next five years for statistician turned journalist, Nate Silver.
To start, his forecast of the future performance is measured in Wins Above Replacement (WAR). For those that don’t know, WAR is the estimated number of wins that a player adds to his team over a replacement player. For comparison, a replacement player is defined as a professional caliber athlete, playing the same position, that is not good enough make a team’s roster in most cases, but can be added to a team at the league’s minimum allowable salary. They’re essentially filler players that don’t help or hurt.
Silver uses the formula WAR = (PM*MIN*(2.18))/(48*82) to estimate how a player performs for his team. In this equation, PM is the plus-minus in points added, MIN is the number of minutes played over the course of an observed season, 48 is an estimate for the amount of minutes per 100 team possessions, 82 is the number of games played each season, and 2.18 is likely a correction factor that translates the points per game that lead to wins. Plus-minus takes into account both the amount of points a player scores in addition to the number of points he prevents by playing defense.
To estimate the futures of the players measured in WAR, Silver wrote a machine-learning algorithm that uses 16 statistics to find how similar any given player is to all other NBA players that have played the game since 1976.
The values for the 16 statistics are calculated from a weighted average of the players’ performance over the previous three seasons. Instead of using a straightforward average, Silver uses a weighted average according to a 60-30-10 split (i.e. the most recent season accounts for 60 percent of the average, the second most recent, 30 percent, and the third most recent is given 10 percent). He uses this split because the player’s most recent seasons are likely have more information about their current performance than earlier seasons.
The similarity is then localized by the player’s age so the projections can compare players at the same relative point in their respective careers.
From here, he averages all the historical players’ WAR using another weighted average, where the weights are allotted based on the players’ comparative similarity score, which ranges from 100 to about -300. A score of 100 means the observed player and the historical player match exactly in each of the 16 statistics. The assumption here is that a historical player who is very similar to an observed player can hold a lot of information about the current player’s possible path through the NBA; their careers are more likely to follow similar trajectories.
There are some other components and details that go into the forecast model, but those are less crucial to understanding his analysis than the concepts covered here.
Silver uses this model in a number of ways. In one article, he ranks the top 53 players based on his WAR calculations. In another, one of his writers uses the model to build a case for a team’s last chance to win a championship. Through his statistical model, Silver not only found stories in the data, but he also created a tool that his newsroom can use to generate material for as long as it remains valid, or at least plausible.
To his credit, Silver acknowledges that he thinks the model is good but probably not as good as the models used by Vegas (and by extension, NBA franchises). Teams of statisticians with advanced metrics generate those models. They also have multiple projections from various models and take the average of the values from those models to reduce the error due to noise in any single model.
In back testing, his model performed pretty well, but by his admission, “back-testing is not the same thing as seeing how predictions perform in the real world against truly unknown data.”