[OC] Using machine learning to find the best and worst value contracts

This post has a lot of graphs graphs. If you don’t want to click on each one individually, they’re all in an imgur album [here](https://imgur.com/a/PaWRWmf).

**tl;dr** We made a metric that estimates how much a player should have earned in a year given their stats. If you want to see the results, scroll down to the Google Sheets link (or click [here](https://docs.google.com/spreadsheets/d/19_g58Nzb9qv0HqmUuUqH5YSfk9Ys25E5q_e4qJIB1UI/edit?usp=sharing)). If you want to see more detailed results and compare players, scroll down the shiny apps link (or click [here](https://dribbleanalytics.shinyapps.io/contract-value/)). If you just want to see the 10 best and worst contracts, look at [this graph](https://i.imgur.com/eOuZ2nX.png) and [this graph](https://i.imgur.com/9sMW9La.png).

***

# Introduction

As soon as free agency opened in July of 2016, the Lakers signed Timofey Mozgov to a 4 year, $64 million deal. At the time, this seemed like an enormous overpay. Today, it still seems like one. In the 2015-16 season, Mozgov averaged 6.3 PPG and 4.4 TRB in 17.4 MPG for the Cavs. The following season – his first as a Laker – his stats improved marginally. He averaged 7.4 PPG and 4.9 TRB in 20.4 MPG.

The following summer, the Lakers dumped Mozgov’s salary. They traded him along with D’Angelo Russell for Brook Lopez and the 27th pick in the draft. They used this pick to select Kyle Kuzma.

Mozgov’s contract was bad from the beginning. But, this is not always the case. Often, misfortune and other unforeseen circumstances make contracts bad. Key examples include injuries and accelerated aging. So, at the time, these contracts seem fine. But they become poison fast. In turn, the players earn much more than expected given their performance.

To find the best and worst value contracts, we’ll create 4 models to predict a player’s salary. This is not a predictive metric of what a player will earn in their next contract. This evaluates expected salary relative to real salary to see who’s overpaid and who’s underpaid.

***

# History and understanding the data

In the 1984-1985 season, the NBA instituted the salary cap. This was a bare-bones salary cap; many of the rules that influence today’s cap weren’t in place. Unrestricted free agency, rookie contracts, max contracts, etc. only came into play later.

With the salary cap, each team had to decide how to allocate their money. Early on, many teams opted for a smooth approach where they would pay lots of middling players. In the 1990-1991 season, only 9 players earned over 20% of the total salary cap. The highest-paid player was Patrick Ewing, who earned 35.8% of the salary cap. At the time, the max contract did not yet exist. This year is somewhat of an outlier. The following season, the salary distribution changed to resemble what we see today.

Larry Bird earned a staggering 56% of the Celtics’ salary cap in the 1991-1992 season. Several players started to earn more of their team’s cap, as the league shifted more towards star power. The CBA only included the max contract about a decade later. Without max contracts, Michael Jordan earned 120% of the cap for two years in his second 3-peat.

Today, due to exceptions and restrictions, teams structure their cap room to get stars. Most contending teams often have a few max players, a couple guys in the $10 million range, and lots of minimums. So, if we’re going to predict salary, we must first understand its distribution.

View post on imgur.com

We see that, historically, most players earned a small percentage of the cap. This is what we would expect. With the prevalence of tanking and salary dumps, most teams will have at least one high salary guy. For a contender, this is their main star. For a tanking team, this could be someone they took on along with a draft pick (such as Melo on the Hawks).

Next, we’ll look at how different factors correlate with salary. Theoretically, better players should always earn higher salaries. So, the correlation between something like points and salary should be positive. This is because better scorers are often better players, and, in turn, should earn more.

The CBA complicates this simple relationship with the rookie scale and minimum contracts. If we’re looking at performance relative to salary, rookie contracts by far the best value in the league. Luka Doncic is playing at an incredible level, but only earns about $7.5 million. Minimum contracts create a similar effect of underpaid players. Many contenders have no cap room. So, if a player wants to contend, he must take a pay cut. For the player to join the contender, they must either take a minimum or the mid-level exception (like Cousins on the Warriors). Relative to their performance, this is a steal.

The three graphs below show the relationship between different stats and salary.

View post on imgur.com

View post on imgur.com

View post on imgur.com

As expected, points and win shares correlate with salary. Furthermore, the age plot shows the effect of rookie contracts. No player under 20 years old earned over 20% of the cap.

Now that we understand the data, we can discuss the methods for the analysis.

***

# Methods

First, we collected all player data for every season since the 1990-1991 season. We go back to 1990-1991 for two reasons. First, the CBA only added unrestricted free agency in 1988. This changed the whole process of free agency – and in turn, contracts. So, we wouldn’t go back further than 1988 anyway. Second, our source for historical player salary data is [this Kaggle data set](https://www.kaggle.com/whitefero/nba-player-salary-19902017), which only goes back to 1990.

We combined the salary data with each player’s counting and advanced stats for that season. Note that we’re taking stats for the given season. So, this is not a predictive metric. This evaluates whether the player is overpaid or underpaid given their expected salary. As such, players like Gordon Hayward will appear overvalued. At the time, the contract was fair. Due to a devastating injury later, Hayward spent a full year rehabbing and still had to shake off rust. Last season, Hayward did not play like a max player. So, he was overpaid for that season.

We used the following factors to predict a player’s salary:

1. Age
2. Points per game
3. Rebounds per game
4. Assists per game
5. Steals per game
6. Blocks per game
7. True shooting %
8. Win shares

These factors generally paint a picture of a player’s performance and situation. So, they can predict a player’s salary.

We included age as a feature to adjust for rookie contracts. Though they’re great value, it’s not a result of any negotiation or offer by the GM. It’s based on the player’s draft slot. So, even a superstar 18-year-old can’t get paid a large part of the cap. Adding age nullifies the effect of rookie contracts. If we did not have age, we would likely have to remove rookie contracts, as they would add noise to our data set. This transforms the problem from salary given performance to salary given expected salary. Our definition of expected salary mixes performance and age.

We used win shares instead of VORP or BPM because win shares is a cumulative stat. So, it depends on games played, which is a key factor in how much value a player provides to his team.

To understand the relationship between these features, we created a correlation plot.

View post on imgur.com

With our 10,821 samples, we randomly split the data. We used 75% to train the models, and 25% to test them. We created four models:

1. K-nearest neighbors regressor (KNN)
2. Random forest regressor (RF)
3. Gradient boosting regressor (GBR)
4. Extreme gradient boosting regressor (XGB)

Note that we’re predicting the percentage of the cap a player deserves, instead of their raw salary. The salary cap rose and inflation occurred, so players today earn more than players 20 years ago. But, the percentage of the cap stayed consistent. Because we’re predicting percentage of cap, this is a regression problem.

We predicted salary for the 2018-19 season using 2018-19 stats (last season’s numbers). So, contracts won’t exactly match up with what they are today.

We also only considered players who played in the 2018-19 season. This affects some teams; for example, the Heat paid Chris Bosh $26.8 million last season, but he did not play. Because Bosh didn’t play, he’s excluded from the data. Also, Bosh doesn’t count against the Heat’s cap because he retired for medical reasons.

One final note about the data is that there are small inconsistencies due to player movement. We collected contract data from Basketball Reference’s contract page from April 1, 2019 ([link](https://web.archive.org/web/20190401162630/https://www.basketball-reference.com/contracts/players.html)). This is after the trade deadline. So, traded players count against the team they played for at the end of the season (e.g. Marc Gasol is on the Raptors, not the Grizzlies). Some players had multiple entries on the page, like Carmelo Anthony. He earned $25 million from the Hawks (salary dump from the Thunder). Then, after the Hawks waived him, Melo signed with the Rockets for $2 million. The Rockets then traded him to the Bulls. So, Melo has a $25 million Hawks entry and a $2 million Bulls entry. We removed duplicates and kept the first-indexed one (the higher salary one).

Neither of these inconsistencies affect the analysis much. Traded players still have the same contract, so on a player-by-player basis for value, they’re the same. This only affects the team’s sum of expected and observed salary difference. In the case of Melo with two entries, there are very few of these cases where it doesn’t matter much.

***

# Regression analysis

In this section, we’ll check how our models perform.

## Basic goodness of fit

For regression models, we have two basic metrics of performance. First, we have r-squared. This measures the proportion of the variance in the dependent variable (percent of cap) explained by the independent variables (features). It’s between 0 and 1, with 1 being the best possible value.

Second, we have mean squared error. This measures the average squared difference between the predicted and observed values. Unlike r-squared, lower MSE is better, with a best possible value of 0. We can interpret mean squared error, as it tells us how close our predictions are to the real value on average.

The table below shows the r-squared and mean squared error for the four models.

|Model|r-squared|MSE|
:–|–:|–:|
|KNN|0.516|0.003|
|RF|0.546|0.003|
|GBR|0.542|0.003|
|XGB|0.544|0.003|

All the models have a very low mean squared error of 0.003. We can take the square root of this (root mean squared error) to interpret the result. This differs from mean absolute error, which is the absolute value of the differences. RMSE is better here because it penalizes large errors more, so it’s more valuable for this problem.

Because the mean squared error for each model is about 0.003, the RMSE is about 0.055. This means that, on average, the models are about 5.5% off in predicted percent of cap.

In our classification problems, we create dummy classifiers to represent improvement over random. It’s harder to create a random model here. However, we can still compare our models to simple regressions. We saw before that the correlation coefficient (r) for a points and salary regression is 0.6. So, the r-squared is 0.36. Our models all have r-squared above 0.5, so they outperform simple methods to predict salary. So, the models predict salary well.

## Cross-validation

In machine learning, we want to avoid overfitting. This occurs when the models learn the given data too well. So, they’re accurate on the given data but aren’t predictive on new data. To check for overfitting, we’ll perform cross-validation.

First, as in previous posts, we performed grid search on our hyperparameters. This means we tested lots of possible combinations for factors that determine how our models fit the data. Then, we selected the combination that resulted in the lowest MSE on different splits.

Additionally, we performed k-fold cross-validation. In k-fold cross-validation, we randomly split the data into k bins. The models receive k – 1 bins as training data, and then predict the one excluded bin. We repeat this process for every combination of bins. Then, we average the performance across the bins. This gives us an estimate of how our models perform on different splits of the data. A cross-validated score close to our initial score indicates the model performs almost the same on the different splits. So, if the two scores are close, it’s unlikely the models are overfitting.

The table below shows the cross-validation scores for r-squared and MSE, along with their 95% confidence intervals (2 standard deviations away from the mean).

|Model|r-squared|95% CI|MSE|95% CI|
:–|–:|:–|–:|:–|
|KNN|0.507|+/- 0.071|0.004|+/- 0.001|
|RF|0.469|+/- 0.150|0.004|+/- 0.001|
|GBR|0.421|+/- 0.281|0.004|+/- 0.002|
|XGB|0.444|+/- 0.199|0.004|+/- 0.001|

Though the CV r-squared scores are lower, the actual r-squared scores are within the 95% confidence interval. Furthermore, the CV MSE is very close to the initial MSE. So, it’s unlikely the models are overfitting.

## Standardized residuals

A big part of regression analysis depends on analyzing the residuals. A residual is the difference between the predicted and observed value at a point.

Residuals in strong models have two important characteristics. First, they follow the normal distribution. Second, they have no autocorrelation or trend. Both these characteristics show that the model isn’t repeating the same mistake.

First, we’ll look at the standardized residuals test. Ideally, 95% of a model’s standardized residuals fall within 2 standard deviations of the mean. Furthermore, the standardized residuals should have no noticeable trend.

The graph below shows the standardized residuals of the four models.

View post on imgur.com

We see that only the KNN has 95% of its residuals within 2 standard deviations of the mean. The others are close to 95% (they’re all over 94.6%). This 95% is not a hard boundary; it exists because, in a normal distribution, 95% of data is within 2 standard deviations of the mean. So, the fact that close to 95% of the data is within 2 standard deviations is good.

The distributions of the standard residuals differ from a normal distribution. We see that the residuals peak close to 0 far above the expected density for a normal distribution. Furthermore, there are some residuals far away from most of the data. So, our residuals are probably heavy-tailed and not normal.

To analyze this assumption, we’ll perform a Shapiro-Wilk test for normality. The test returns a p-value and a w-value (not important here). If the p-value is less than 0.05, we can reject the null hypothesis, which is that the data is normally distributed. So, if p < 0.05, the data is not normal. The table below shows the p-values of the Shapiro-Wilk test.

|Model|p-value|
:–|:–|
|KNN|&lt; 0.001|
|RF|&lt; 0.001|
|GBR|&lt; 0.001|
|XGB|&lt; 0.001|

The p-value is small for all four models, so we can reject the null hypothesis. So, our standardized residuals are not normally distributed. This low p-value may be a result of a large sample size. To confirm this isn’t due to sample size, we’ll also look at a quantile-quantile (QQ) plot.

The QQ plot shows the theoretical quantiles and the order values of two distributions. If the two distributions are the same, their QQ plot will be a straight line. So, we will plot the residuals against a normal distribution (shown in red). The closer our points are to the red line, the better. The graph below shows each model’s QQ plot.

View post on imgur.com

We see that the residuals stay close to the line for most of the middle values. At the upper and lower ends, the residuals differ a lot. This indicates the residuals have heavy tails like we thought before. So, we can confidently say the model’s residuals are not normal.

Now, we will test for autocorrelation. To do this, we’ll perform a Durbin-Watson test. The test returns a Durbin-Watson statistic between 0 and 4. Values close to 2 indicate no autocorrelation. Values close to 0 indicate positive autocorrelation. Values close to 4 indicate negative autocorrelation. The table below shows the results of the DW test.

|Model|DW statistic|
:–|–:|
|KNN|1.97|
|RF|1.99|
|GBR|1.99|
|XGB|1.99|

The DW statistic for each model is close to 2. So, there’s no autocorrelation in the residuals, which is promising.

Though our residuals are not normal, which is a problem, the models are still useful. The residuals have no autocorrelation – meaning the model is not repeating the same mistake – and the models have no error. Now that we’ve evaluated the models, we can see what they predict.

***

# Results

As mentioned earlier, rookie contracts are bargains. This is because their contracts depend on their draft position. Furthermore, they’re non-negotiable, as the rookie scale continues for 4 years.

Before diving into results, we’ll look at an example of how our models treat rookies. Last year, Luka Doncic put up 21.2 PPG, 7.8 RPG, and 6 APG on good efficiency at 19 years old. So, even though he’s a great player, his expected salary is low because he’s on a rookie contract.

Our models predicted Luka Doncic to earn 12% of the cap. This is higher than his actual percent of cap of about 6.5%. So, he’s still a bargain given his expected salary. However, a player putting up Doncic’s stats would earn far more than 12% of the cap on the open market. This shows that our models identify the effect of age.

Now, we’ll let the models predict what Luka Doncic would earn at 27 years old if he put up the same stats. Given Doncic was 19 his rookie year, he will start his rookie maximum contract when he’s 23. That contract will take up 25% of the cap with 8% annual raises.

The models predict Doncic will be worth 23% of the cap if he put up his rookie stats (which he’s already improved on this year) at 27 years old. This is close to what he’ll make depending on how fast the cap rises. Because of the large difference in salary depending on age, we know the models capture the effect of age. So, they won’t identify all rookie contracts as bargains, as they know the expectation for rookies.

Now that we understand this, we can examine the results.

We’ll look at both player-by-player and team-by-team differences between expected and actual percent of cap. Higher values mean the player is great value, as he’s earning below his expected salary. We can sum these across teams to see which teams overpay players and which teams get good value.

The graph below shows the top 10 best value contracts as decided by the average of our four models.

View post on imgur.com

This list has some players we’d expect. Last season, Kemba earned only $12 million, a great value. Several of the players here signed a minimum contract or an exception like DeMarcus Cousins and Brook Lopez. Despite our earlier example of how the models filter out rookie contracts, there are still two rookie contract players here. But, they’re both older than typical rookie contract players. Last season, both Buddy Hield and Malcolm Brogdon were 26 years old. This is about as old as a player can get on their rookie contract unless they’re an international player. So, their age prevents their expected value from regressing to that of a rookie contract.

Now, let’s look at the worst value contracts.

View post on imgur.com

This contains the usual suspects for the worst contracts. Before his improvement this year, Andrew Wiggins had one of the worst contracts in the league. Furthermore, Hayward’s injury recovery made him perform far below what you’d expect for a max player. We also have some less recent poison contracts, like Otto Porter, Chandler Parsons, and Ryan Anderson.

Let’s look at which teams handed out the best contracts last year. We do this by summing the difference in expected and actual salary for every player on the roster.

View post on imgur.com

All the teams on this list have lots of good role players or young talent. Though the Lakers, Pelicans, and Bucks all had a max player (LeBron, Davis, Giannis), a max is great value for those players. Furthermore, the rest of the team had great value contracts. We see that the teams here made big moves over the summer, often resulting in improvement. The Lakers added Davis. The Pelicans added young talent. The Nets signed Kyrie and KD. The 76ers signed Horford. The Clippers signed Kawhi and traded for Paul George.

Now, let’s look at the teams with the worst value contracts.

View post on imgur.com

OKC had the worst cumulative value difference by a large value. Westbrook’s max and Steven Adams’ large contract contribute to this. We see that a lot of the teams here ended up tanking or in limbo. For example, the Thunder traded Westbrook and George. The struggling Pistons held onto Griffin and Drummond and now look even worse.

Though it’s bad to be a team in the above graph, for tanking teams, it can be good. Tanking teams often receive salary dumps, where they take on a bad contract in exchange for picks. So, they’ll have a negative difference, but it’s worth it because of the attached assets.

Let’s look at the distribution of our predicted salary relative to the actual salary distribution.

View post on imgur.com

We see that the predicted distribution is a right-shift of the observed distribution. Minimum contracts and exceptions contribute to this, as good players often take discounts to play for contenders.

## Full individual results

To see the full results for all players and teams, go to this Google Sheet:

https://docs.google.com/spreadsheets/d/19_g58Nzb9qv0HqmUuUqH5YSfk9Ys25E5q_e4qJIB1UI/edit?usp=sharing

## Interactive results

To help visualize the results, we created an R Shiny app. This lets you interactively compare players and teams. The link is:

https://dribbleanalytics.shinyapps.io/contract-value/

***

# Why do the models predict what they do?

Now that we’ve seen the performance and results of the model, we’ll look at what influences their prediction. To do this, we’ll use Shapley values. Shapley value estimates the marginal contribution of a feature through all possible combinations. This allows us to see what features are most important, and what values for these features create the most impact. For example, we know that a low value for age will result in a low predicted salary because of rookie contracts. But, is age a big predictor of salary in all cases? Or does it only matter for young players, after which other factors matter more?

Shapley values let us answer these questions. The four graphs below show the Shapley values for each feature in our four models. The y-axis sorts features by their importance (most important is on top). The x-axis shows the Shapley value, or the impact on model output. The color of each point shows the actual feature value. So, for example, older players have red “age” points, because their age is higher.

View post on imgur.com

View post on imgur.com

View post on imgur.com

View post on imgur.com

All four models have points, age, and rebounds as their top 3 features in that order. Notice that the feature importance for the RF, GBR, and XGB follow the same order. Because all three of these models are tree-based models, they fit the data in similar ways. So, the same feature sets affect the models to a similar extent.

We see that rebounds don’t have much of an effect on model output unless the player recorded a lot of rebounds. This makes sense, as most top bigs will rack up rebounds. Meanwhile, guards and wings won’t record lots of rebounds, but that shouldn’t affect them.

The plots give more proof to the effect of rookie contracts and age. In each model, low values of age have the largest negative impact on model output. But, there’s an interesting trend with high values of age. We’d expect high values of age not to affect output, given that older players often earn less because they’re expected to decline. However, high values of age positively impact model outputs. This explains some of the odd results of the models, such as why LaMarcus Aldridge has a higher predicted salary than Giannis.

***

# Conclusion

When offering players contracts, GMs try to give better players more money. By giving models basic indicators of player performance, we can almost pinpoint what a player should earn.

We could expand this to predict a player’s salary in year n given their performance in year n – 1. This allows us to predict how much a player should earn before they start their contract or pick their team. However, this analysis would be much less accurate than what we did here. It’s hard to predict how players will progress between seasons or how different schemes affect their production. Over one summer, so much changes. In the course of a single season, player performance is much easier to predict. So, this works because it’s a retrospective look at value, instead of a future prediction.

***

This is my newest post on my open-source basketball analytics blog, Dribble Analytics.

The GitHub for the this project is [here](https://github.com/dribbleanalytics/contract-value).

20 COMMENTS

  1. I went into this thinking “I already know the worst contracts, what do I need this for?”

    Was genuinely surprised to see Otto Porter on this list and not THJ. Was also pleased to see the predicted vs. actual distribution of salaries depicted, just for the sake of knowing what the “perfect” NBA would look like, money-wise.

    Good job, OP! Very thorough and well-explained.

  2. Wow, people really do have free time to spend on entertaining us during our free time.

    Great post OP, I’ll now read further than tldr. Thank you.

  3. > We used the following factors to predict a player’s salary:
    >
    > 1. Age
    > 2. Points per game
    > 3. Rebounds per game
    > 4. Assists per game
    > 5. Steals per game
    > 6. Blocks per game
    > 7. True shooting %
    > 8. Win shares
    >
    > These factors generally paint a picture of a player’s performance and situation. So, they can predict a player’s salary.

    To judge whether a contract is “good” or “bad,” you need to determine each player’s value. These are generally poor ways to determine a player’s value, and there are much better metrics out there.

    Either use one of the existing value metrics (VORP, WS, RAPTOR Wins, PPIM Wins) or create your own.

    It also makes little sense to include both holistic metrics like WS and the individual box score stats that go into WS. This method is just combining a bad value metric with duplicative box score noise.

    >We used win shares instead of VORP or BPM because win shares is a cumulative stat. So, it depends on games played, which is a key factor in how much value a player provides to his team.

    VORP is the cumulative stat for BPM, and it’s much better than Win Shares. And RPM is better than both.

  4. I don’t know that I’m smart enough to tell if this is good or interesting, but I appreciate the effort put forth here. Especially that you aren’t just spamming your blog for promotion.

    Can you expand a bit more on your conclusion?

    >When offering players contracts, GMs try to give better players more money.

    Your key conclusion seems…obvious. The juice doesn’t seem worth the squeeze here. Did you expect to find something different? The detail on the analytics is great, but I’d love to see more thoughts or more fleshed out conclusions.

  5. Didn’t look into it too much yet, but are you not including rookies? Otherwise it seems clear that someone like Luka would be the biggest bargain in the league purely due to his rookie contract.

  6. He hasn’t been good enough to justify his contract, but he was very solid for the bulls at the end of last year. Started slow this year but started playing well right before getting injured

  7. In terms of choosing what to enter into the model, I would be careful since win shares take many of the other values as inputs.

    As an additional question, what does this do that simply predicting win shares with salary does not?

LEAVE A REPLY

Please enter your comment!
Please enter your name here