1. Variables that affect the Super Bowl Winners
By: Hailee Gunn, Kristina Markey, Kevin Mathews, Christina Mavroudisc, and Donna Moulton
Table of Contents
2. 1
An executive summary ...………………………….………………………………………..………….. 2
Description of the data ..…………………………….……………………………………………….3 - 4
Charts and Graphs ………………………………………………………...……………………...……..5
Results of Data .………………………………………….……………………..…………………….5 - 6
Discussion of Results ..……………………………………………...………………….…………...6 - 8
References …...……………………………………………………………………….………………….8
Appendix ..………………………………………….………………………………………………10 - 20
3. 2
Executive Summary:
This report will give an overview of the winning Super Bowl teams and how different
variables affected the outcome of the games. The variables that we tested for this project are:
stadium type (indoor and outdoor), the weather (sunny, rain, snow), Interval win, Average age,
teams that won the Super Bowl more than once, and teams that appeared in the Super Bowl
more than once. In order to do this, we looked through online sports databases to find our data
and put it into Excel. We then made charts to visualize our data and finally ran our regression
analysis to see which variables were significant. Below are the results of our experimental
research design. We run a regression multiple times, we computed that the only significant
variables in our regression were interval win and that Pittsburgh, New England, and the NY
Giant were the only significant teams. From the beginning we felt that interval win would be a
significant indicator and were correct in predicting this. However, most of our other variables
were found insignificant. We found that the Pittsburgh team was significant and they were one
of the teams that won back to back and actually did so twice. We concluded that winning a
Super Bowl more than once is a significant variable for these three teams specifically.
4. 3
Description of the Data:
Our main source for out Super Bowl data came from this website database football
(http://www.databasefootball.com/leagues/superbowl.htm). The information we found on this
database were the winning teams, the year of the Super Bowl, and the winning and losing
scores. We used this to compute our Y variable of interval win. For the rest of our data we
found a PDF of the winning teams, by year, that also included the stadium name, stadium type,
and weather the day of the game. The PDF was created by William William Schmitz of the
Southeast Regional Climate Center; (https://www.sercc.com/SuperBowlClimate.pdf). We also
found some data from online football websites that had articles on some of the winning teams
along with different team stats. The links to those websites can be found on our reference page.
When first setting up our excel document we have 51 rows including our labels and we
have 24 columns worth of variables that we used to run the regression analysis. Our Y variable
was interval win. Our X variables were winner of the game, loser of the game, Super Bowl,
stadium type (outdoor or dome), weather (rain or snow), average age, back to back win. Our
dummy variables were Green Bay, Miami, Pittsburg, San Francisco, Dallas, Denver, New
England, Baltimore, and NY Giants. This was due to the fact that they had either back to back
wins or won the Super Bowl more than once. After running this regression, we found that only
the outdoor variable was significant. This meant that the stadium was an outdoor stadium,
rather than a dome. Therefore, we changed our variables and re-ran our data set.
In setting up our second excel document we had 51 rows and 24 columns. We still used
interval win as our Y variable. However, we only ran 16 of our X variables. Our X variables were
outdoor bad weather, average age, and back to back win. Our dummy variables were Green
Bay, Miami, Pittsburg, San Francisco, Dallas, Denver, New England, Baltimore and the NY
Giants. In running this regression we found three significant variables however we also found an
error on the user end and had to run the regression one more time.
5. 4
Our final time that we ran the regression we had 51 rows and 12 columns. Once again
we used interval win as our Y variable. Our final X variables were Pittsburg, New England, and
NY Giants. With this, we found that all variables were significant.
Charts and Graphs:
We made a bar graph of interval win for each Super Bowls winning team. The interval win was
calculated by how much that team won by (subtracting the losing team's final score from the
winning team's final score) and it was our Y variable for running regression. This specific chart
gives us a visual representation of not only who won each Super Bowl but also how much the
winning team actually won by. As one can see, the highest overall interval win was held by San
Francisco and the lowest was held by the NY Giants.
6. 5
In this chart we used a pivot table to show the total amount of times each team won the Super
Bowl. Two of our X variables were back to back win and teams that won more than once, so, we
then made a bar graph with the data from the pivot table to get a visual representation of the
totaled data. As you can see, Pittsburg won the most amount of Super Bowls while Chicago,
the Colts, Kansas City, LA Raider, the Jets, the Saints, Seattle, St. Louis, and Tampa Bay have
won only once.
Results of Data Modeling with Regression:
Our first data set that we put together had 24 variables and then all 50 super bowl games and
the regression analysis came back and showed that there were no significant outcomes and
showed multiple number errors. We then went back through our data set and then eliminated
some of the variables and replaced them with other variables and then re-ran the regression
analysis to find our outcome. When we re-ran the regression analysis with 13 variables and
found that we had three significant variables, Pittsburgh, New england and the NY Giants came
back as significant variable. After looking at the results of this analysis we decided to adjust our
data and then run the regression analysis one more time and eliminated a few more insignificant
7. 6
variables and then added and readjusted the variables. When we ran the regression analysis
this time we found we had more significant variables than we did the first two times. We found
that Pittsburgh, New england and the NY Giants were the only significant variables along with
our intercept was the average variable win. Below you will find a more detailed outline of the
results. As previously stated, the 3rd time that we ran regression was when we received the
best and most accurate results. The linear equation for this set of data is y= 17.53 - 10.03x -
14.28x - 10.78x. This means that if Pittsburgh were to appear in another Super Bowl, it would be
predicted that they would win, or have an interval win of, 7.5 points. If New England were to
appear at the Super Bowl again it would be predicted that they would have an interval win of
3.25 points. If the NY Giants were to appear in another Super Bowl, it would be predicted that
they would have an interval win of 6.75 points.
Discussion of Results:
When running regression for the final time, it was computed that the only significant
variables in our regression were interval win, Pittsburgh, New England, and the NY Giants.
From the beginning we felt that interval win would be a significant indicator and were correct in
predicting this. However, most of our other variables were found insignificant.
One variable that we found was significant was the Pittsburgh team, they were one of
the teams that won back to back and actually did so twice. Two of the other variables that we
found were significant, were New England and the NY Giants. Both of these teams had not only
been to the Super Bowl more than once but also won the Super Bowl more than once.
Therefore, we believe that winning a Super Bowl more than once is a significant variable for
these three teams specifically. After running regression and analysing the data we found a
connection between average interval win and the coefficient for our significant variables.
We believe that Pittsburgh was seen as a significant variable due to the fact that they
have the most overall Super Bowl Wins. This gives them a pretty good chance at winning
8. 7
another Super Bowl if they were to make it to the Super Bowl again. Therefore, it would be easy
to predict how many points the team would win by. There average interval win is 7.5 points for
the amount of times that they have won the Super Bowl. It has been predicted by our data that if
Pittsburgh were to make it to another Super Bowl, they would win by 7.5 points, the coefficient
for Pittsburgh in our regression. As you can see, their average interval win and their coefficient
in the regression are the same number.
New England was another team that won the Super Bowl more than once. They won a
total of 4 times. With this, there is also a good chance that if they were to appear in another
Super Bowl they would win. The average interval win for this team is 3.25 points. With this, our
data predicts that they would win by an average of 3.25 points, the coefficient for this variable in
our regression. Once again, the average interval win and the coefficient for the significant are
the same number.
The NY Giants were, again, one of the teams that had won the Super Bowl more than
one time. Like New England, they won a total of 4 times. Their average interval win for the four
times that they won the Super Bowl was 6.75. With this, our data predicted that if the NY Giants
were to appear in the Super Bowl again they would win by an average total of 6.75 points. Yet
again, the average interval win and the coefficient for this significant variable are the same.
After realizing this, we can see that the regression we ran predicted these three teams
interval win, if they were to make it to the Super Bowl another time, by averaging their past
interval wins. We agree with the data, that this is a good way of predicting a future interval win
for a team returning to the Super Bowl.
9. 8
References:
Works Cited
"Checkdowns: How the Ravens and 49ers Compare to Previous Super Bowl Teams."
FootballPerspective.com. 30 Jan. 2013. Web. 04 May 2016.
Dealer, Bill Lubinger The Plain. "Breaking down the Last Five Super Bowl Champions."
Cleveland.com. 12 Feb. 2012. Web. 23 Apr. 2016.
"Google." Google. Web. 23 Apr. 2016.
"Ranking NFL Teams by Age: Rams Youngest, Colts Oldest, Eagles Fifth-oldest."
PhillyVoice. Web. 23 Apr. 2016.
"Super Bowl 2014 Rosters: Peyton Manning, Richard Sherman among Many Stars."
SBNation.com. 02 Feb. 2014. Web. 23 Apr. 2016.
"Super Bowl Winners." List of. Web. 23 Apr. 2016.
"Super Bowl and MVP History, Historical Odds, Stadiums, Locations and More on
DatabaseFootball.com." Super Bowl and MVP History, Historical Odds, Stadiums,
Locations and More on DatabaseFootball.com. Web. 04 May 2016.