What Attributes Cause NFL Wide Recievers to be Targeted?
Using nflfastR's play-by-play data this project takes an exploratory analysis on the relationship between receiver targets and other recorded variables. Then, if any of the recorded variables are major factors in what causes a player to be targeted.
Project Objective
What metrics in nflfastR are predictive of a wide receiver on field value? Or in other words, which metrics recorded in nflfastR’s play-by-play data can be used to predict the number of targets a wide receiver will get during an NFL season?
Research Methodology
I converted the nflfastR’s play-by-play data into season totals for all the players using a fore loop. Then the players were then filtered out by receiving EPA (a metric only recorded for receivers) to isolate that position group. Players with fewer 10% of the team’s targets were removed from the dataset to remove noise. Finally, all the data was adjusted to be on a per target basis and normalized to be between 0-1 so as to not skew the model.
All the machine learning models used k-fold cross validation, where the data was split up into 5 different groups. The data then was rotated in with the formula 4 training, 1 testing. This was repeated three times in each model, where the average of each model iteration was taken to get the final results. The models we ran the data through was Linear Regression, TreeBagging, Bayes GLM, and Neural Network.
To determine which model was best we used the root mean square error (RMSE), mean adjusted error (MAE), and R squared (R^2). The priority on deciding the best model was first on R squared, then on the mean adjusted error, and finally the mean adjusted error. The best model then was used to predict the number of targets that a receiver should receive based off its training tuning. The predictions column and the target column were then stored in a data frame along with the player id, player name and player season. To evaluate the models, the absolute value of the difference between the targets and prediction columns were taken and stored into a new column. If the value was within plus or minus 7.5 of the actual targets, it was scored with a one. Labeling it a success. If it was outside of plus or minus ten it was labeled a failure. The model was then given a score based off its total accuracy rate.
Research Findings
The most predictive statistic in determining the success is Receiving Expected Points Added (Receiving EPA). An individual players EPA is the sum of a players Expected Points (EP). “EPA is the difference between the EP at the start of a play and the EP at the end of the play” -advancedfootballanalytics.com. Even when adjusting for total receiving plays where they impacted value by being targeted (Receiving EPA / Targets) it still accounts for 87.5% of the best model's R squared (TreeBagging model).
Research Weaknesses and Limitations
Choosing Targets may not necessarily be the best metric in deciding WR value. From a Quarterback or play calling perspective, targets may indictive of which player the team values over the course of a season. However, in the analytics community value would be better identified as a metric such as EPA. Using that as the dependent variable is another way to look at value. In addition, Player Salary would be the teams perception of expected value, and would hold another solid case for value in creating a model as it is a statement of value, even if it is likely strongly tied to their past performance and free agency market. However, Salary was not recorded in nflfastR and for the purpose of this analysis I wanted to keep the scope within nflfastR's data.