Last July, I used a regression analysis in an attempt to find out which NFL team attributes most contributed to victories during the 2012 NFL season. Now, I've updated the analysis for the 2013 season in order to see how much changed from 2012 to 2013, and also to see if our 2012 formula could have predicted anything for 2013. First, here are the attributes I used for each NFL team:
- Pro Football Focus Offensive Stats: Pass, Rush, Pass Block, Run Block;
- Pro Football Focus Defensive Stats: Run Defense, Pass Rush, Pass Coverage;
- Pro Football Focus Miscellaneous Stats: Penalty (combined), Special Teams;
- Salary Cap Space as of September 14, 2013, according to a Giants salary cap blog who cited the NFLPA's League Cap Report website at that point in time;
- Head Coach's Tenure at the start of the 2013 season; and
- Head Coach's Salary - according to Coaches Hot Seat. Publicly available data on coaches' salaries is admittedly not the greatest and some of these values are likely estimated.
There are tons of other variables that I'm sure could be of interest; these were just the ones I wanted to study and for which I had data.
If you are familiar with regression analysis, you can skip the next two paragraphs. If you'd like a quick lesson, I did my best!
I used linear regression in an attempt to find out which of the above variables had the most significance in producing wins in 2013. The variables listed above are my independent variables, the "X" from algebra class. I used 2013 wins as my dependent variable, "Y" from that same class. For this purpose, I will use regression analysis to create a formula for 2013 wins as predicted by the most significant independent variables from the above list that looks something like this: 2013 Wins = 6.5 + .02(Pass Offense) + .50(Run Defense).
Okay so, how will I determine the most significant variables from the list? Each time a regression analysis is run, each X variable receives a p-value. In layman's terms, a p-value tells us the probability that the relationship between the X variable and the Y variable is pure chance. P-value is expressed as a decimal, or % of 100. For example, a p-value of .05 means that there is a 95% chance a relationship exists between the variables. So if one of the X variables above has a p-value of .07, the regression is saying there's a 93% chance that variable has a relationship with producing a victory in 2013.
I imagine that all of the above variables could have some effect on producing wins (as well as others variables not included), but I wanted to find the most significant variables from this list I made up. To do this, I ran multiple regressions, eliminating the least significant variable (largest p-value) each subsequent time (6 total times). I kept eliminating variables until I got the p-values of each remaining variable to be less than or equal to .05, or 95% confidence in that variables' significance.
The variables were eliminated in the following order, from least significant to most significant in producing wins in 2013:
- Rush Offense
- Run Block
- Coach Salary
- Pass Block
- Pass Rush
- Special Teams
The six variables that achieved statistically significant p-values of <=.05 were*:
- Cap Space, p-value = .02523
- Coach Tenure, p-value = .04643
- Pass Offense, p-value = .00004
- Run Defense, p-value = .00016
- Pass Coverage, p-value = .00029
- Penalty (combined), p-value = .00649
*Last year's significant variables were Cap Space, Pass Offense, Pass Coverage, Special Teams
This regression analysis resulted in the following formula*:
- 2013 Wins = 6.80 - .12(Cap Space) - .19(Coach Tenure) + .04(Pass Offense) + .02(Run Defense) + .04(Pass Coverage) + .06(Penalty)
*R Square = .81, Adjusted R Square = .77, Standard Error = 1.50
Cap Space, Pass Offense, and Pass Coverage have carried over from last year's list, while Coach Tenure, Run Defense, and Penalty are new additions. The first thing that caught my eye was that Coach Tenure had a negative effect on a team's 2013 wins. That is, the longer a team's head coach was in place, the less likely his team was to win (all else equal). This variable was influenced by strong seasons from relatively new coaches (e.g., Broncos, Seahawks, Panthers) and totally new coaches (e.g., Chiefs, Cardinals, Eagles, and Chargers).
For 2012, the Penalty variable was the 4th least significant of all the variables tested. For 2013, it was the 4th MOST significant variable and indicated the importance of staying disciplined as a football team. Or more likely, it could indicate that 2013 was indeed a bad year for NFL officials and teams which happened to be on the favorable side of significant referee miscues (ahem) were more likely to win games.
Here you can see the list of NFL teams with their Actual 2013 Wins alongside their Expected 2013 Wins as predicted by the formula. Something to keep in mind with the Expected Wins formula is that the formula doesn't know anything about NFL scheduling (how many total wins there should be), only how many wins that team should have gotten according to the formula above:
As you can see, the 2013 formula comes pretty close to replicating the win totals for each team when using their variables for 2013. If you read last year's article, you'll remember that these regression formulas can't really be used to "predict" future results, as they rely on games/variables that have already occurred and are specific to that year. With that in mind, let's compare Actual 2013 Wins to Expected 2013 Wins using the 2012 formula from last year's article. Just to be clear, this table shows each team's expected wins using 2013 data for the variables that are plugged into the 2012 formula, make sense? Here it is:
While not as close as the 2013 formula, the 2012 formula does fairly well with 2013 data, though it's clearly not a fan of the Eagles. However, note that the 2012 formula only dished out 219 victories amongst the teams (as opposed to the 255 in the above table), so that's clearly a shortcoming of using these formulas as a "prediction source."