Hi fellow stathead! In this post, I'm going to be using statistical analysis in an attempt to find out what NFL team attributes most contributed to victories in the 2012 NFL season. First, here are the attributes I used for each NFL team:
- Pro Football Focus Offensive Stats: Pass, Rush, Pass Block, Run Block;
- Pro Football Focus Defensive Stats: Run Defense, Pass Rush, Pass Coverage;
- Pro Football Focus Miscellaneous Stats: Penalty, Special Teams;
- Salary Cap Space as of September 7, 2012 according to ProFootballTalk.com;
- Head Coach's Tenure at the start of the 2012 season; and
- Head Coach's Salary - combined from many sources including Coaches Hot Seat, Forbes, and Wikipedia. Publicly available data on coaches' salaries is admittedly not the greatest.
There are tons of other variables that I'm sure could be of interest; these were just the ones I wanted to study and for which I had data.
If you are familiar with regression analysis, you can skip the bracketed paragraphs below. If you'd like a quick lesson, I did my best!
I used linear regression in an attempt to find out which of the above variables had the most significance in producing wins in 2012. The variables listed above are my independent variables, the "X" from algebra class. I used 2012 wins as my dependent variable, "Y" from that same class. For this purpose, I will use regression analysis to create a formula for 2012 wins as predicted by the most significant independent variables from the above list that looks something like this: 2012 Wins = 6.5 + .02(Pass Offense) + .50(Run Defense)
Okay so, how will we determine the most significant variables from the list? Each time a regression analysis is run, each X variable receives a p-value. In layman's terms, a p-value tells us the probability that the relationship between the X variable and the Y variable is pure chance. P-value is expressed as a decimal, or % of 100. For example, a p-value of .05 means that there is a 95% chance a relationship exists between the variables. So if one of the X variables above has a p-value of .07, the regression is saying there's a 93% chance that variable has a relationship with producing a victory in 2012.
I imagine that all of the above could have some effect on producing wins, but we want to find the most significant variables from the list I made up. To do this, I ran multiple regressions, eliminating the least significant variable (largest p-value) each subsequent time (9 total times). I kept eliminating variables until I got the p-values of the remaining variables to be approximately .05, or 95% confidence in the variables' significance.
The variables were eliminated in the following order, from least significant to most significant in producing wins in 2012*:
- Run Block
- Pass Rush
- Run Defense
- Penalty (combined for offense and defense)
- Pass Block
- Coach's Salary
- Rush Offense
- Coach's Tenure
*I don't know exactly what goes into the Pro Football Focus stats, so I don't know if Pass Block and Run Block are incorporated into the Pass Offense and Run Offense stats. The two Passing stats have a correlation of .40, and the two Running stats have a correlation of .62. Do with that information what you will.
The four variables that achieved statistically significant p-values of approximately .05 were:
- Pass Coverage, p-value = .0657
- Pass Offense, p-value = .052
- Cap Space, p-value = .049
- Special Teams, p-value = .012
This regression analysis resulted in the following formula*:
- 2012 Wins = 6.57 + .02(Pass) + .03(Pass Coverage) + .06(Special Teams) - .16(Cap Space)
*R Square = .59, Adjusted R Square = .53, Standard Error = 2.11
On face value, this appears to make sense. It's a passing league as they say, and this formula is saying that the most successful teams were the ones who passed the ball well and stopped the pass, while spending more money on players and executing on special teams.
Here you can see the list of NFL teams with their Actual 2012 Wins alongside their Expected Number of Wins as predicted by the formula:
Obviously this "formula" can't predict the future, but it is fun to see the important attributes of successful teams in 2012. Though if we look historically, we could use this methodology to view trends in the League throughout time, which could help shape our predictions about the future of the League. But that's another post for another time, unless you hate this one. Cheers!