Superbowl XLIX: Worst Call Ever?

Avatar
Posted on Feb 5, 2015
  1. Contributed by James Hedges and Malcolm Hess.
  2. James and Malcolm are part of the 12-Week Data Science Bootcamp with Vivian Zhang in the spring of 2015.
  3. This post is based on their first in-class presentation, a review of Benjamin Morris’ article on FiveThirtyEight.com related to Seattle’s final offensive play in Super Bowl XLIX.

 


Videos

1. Video of the presentation can be found here:

https://www.youtube.com/watch?v=ZZl8r_K7I5E&feature=youtu.be


 Background
We're interested in applying statistical and analytical approaches to competitive sports, and to gain surprising insights from doing so. To that end, we discussed a recent article by FiveThirtyEight.com’s Benjamin Morris in which he builds support for the contrarian position the decision underlying what may be remembered as one of the most impactful plays in Super Bowl history. He develops a probabilistic model in support of the conclusion that Seattle’s decision to throw the ball on second down from the 1-yard line wasn’t actually bad decision.
We wanted to learn more about his model and to see whether we could implement a version of it ourselves. We also wanted provide some context for it and to consider other approaches to problems of this kind. Doing well with such problems may hinge on understanding and simplifying the dependencies between a context (e.g., 2 down, 1-yard line, down by 4 points), a decision (e.g., run the ball or throw the ball), a specific outcome (e.g., a touchdown or an interception), and a more general outcome (e.g., win or lose the game).

 


 

 

Objectives

We initially attempted to recreate a primary result from the article, in which the probabilities of sequential play outcomes and overall game outcomes are computed, and in which those estimates change based on some additional assumptions. While the objective of recreating the model is important objective, we felt it was unrealistic to reach that point without having more information on the data Morris’ used and in how the model was actually computed. Our attention instead turned to numerically replicating some elements of the model, such as the probability of scoring a touchdown on a run play, and to evaluating whether tweaks to the model were reasonable.


 

Situation

  1. We start with the play. This image from NFL Breakdowns shows Seattle in a shotgun formation at New England's 1-yard line in the seconds just prior to the snap. Trailing by four points, a touchdown would have put Seattle up by three (assuming they go for and get the PAT), something many an observer would have as much assumed was going to happen. Russell Wilson’s attempted pass on a slant route to Ricardo Lockette."

 

fGtafoB

football in 5 rules
  1. points: Touchdown = 6 pts; Field Goal = 3 puts; Point After Touchdown (easy kick) = 1 pt
  2. attacking team (offense) scores a touchdown by getting the ball into the end zone (area beyond goal line)
  3. offense has four attempts to move the ball 10 yds; if inside 10 yard line, then just the number of attempts to goal
  4. ball is advanced by throwing the ball to someone who catches it or by someone running with the ball (i.e., pass or run the ball)
  5. a given play ends with the person with the ball is tackled or goes out of bounds or when its passed and not caught
source: https://usafootball.com/football-basics

In a simpler view, imagine having two bowls each with three colored balls.  You pull a ball out blindfolded one at a time.  Pull a red ball you win, a black ball you lose, and a yellow ball lets you pull again.  However if you pull three yellow balls in a row you also lose.  There are two bowls to choose from, one called run and one called pass, each has a different amount amount of red, yellow, and black balls.

Using this mentality we created a probability tree that includes all possibilities from this decision.

prob

 


 

Implementation

Data for play by play results of every NFL game of the 2014 season was found here: source: https://nflsavant.com/about.php
# get data ----------------------------------------------------------------
library(downloader)
fileUrl<-https://nflsavant.com/pbp_data.php?year=2014"
downloadfileUrl, dest="./data/data.pbp.2014.csv",mode="wb")
list.files("./data")
data.pbp.2014 <- read.csv("./data/data.pbp.2014.csv")
# check data --------------------------------------------------------------
str(data.pbp.2014)


# 45k observations by 45 vars


# 01 - GameId - integer - example: 2014090400 - date of game and two more digits

# 02 - GameDate - factor - example: 2014-09-04 - date of game

# 03 - Quarter - integer - example: 1 - quarter in game

# 04 - Minute - integer - example: 15 - minutes left in quarter

# 05 - Second - integer - example: 0 - seconds left in quarter

# 06 - OffenseTeam - factor - example: ARI - offensive team

# 07 - DefenseTeam - factor - example: ARI - offensive team

# 08 - Down - integer - example: 1 - down; not sure ab 0?

# 09 - ToGo - integer - example: 10 - distance to go; not sure ab 0?

# 10 - YardLine - integer - example: 35 - distance to go; not sure ab 0? *******

# 11 - X - logical - example: ?? - not sure

# 12 - SeriesFirstDown - integer - example: 1 - series 1st down

# 13 - X.1 - logical - example: ?? - not sure

# 14 - NextScore - integer - example: 0 - check this ***************************

# 15 - Description - factor - example: "D.CARR.." - description

# 16 - TeamWin - integer - example: 0 - unclear - ******************************

# 17 - X.2 - logical - ?? - not sure

# 18 - X.3 - logical - ?? - not sure

# 19 - SeasonYear - integer - example: 2014 - season year

# 20 - Yards - integer - example: 0 - yards from result of play? ***************

# 21 - Formation - factor - example: SHOTGUN - simple formation on play

# 22 - PlayType

# 23 - IsRush - integer - example: 0 - whether rush play or not ****************

# 24 - IsPass - integer - example: 0 - whether pass play or not ****************

# 25 - IsIncomplete

# 26 - IsTouchdown - integer - example: 0 - whether play was touchdown or not***

# 27 - PassType

# 28 - IsSack

# 29 - IsChallenge

# 30 - IsChallengeReversed

# 31 - Challenger

# 32 - IsMeasurement

# 33 - IsInterception - integer - example: 0 - whether play was interception ***

# 34 - IsFumble - integer - example: example: 0 - whether play was fumble ******

# 35 - IsPenalty - integer - example: example: 0 - whether play was penalty ****

# 36 - IsTwoPointConversion

# 37 - IsTwoPointConversionSuccessful

# 38 - RushDirection

# 39 - YardLineFixed - integer - example: 35 - 0-50 yardline

# 40 - YardLineDirection - factor - example: OPP - which side of field

# 41 - IsPenaltyAccepted - integer - example: 0 - penalty accepted or not ******

# 42 - PenaltyTeam - factor - example: ARI - why 33 levels

# 43 - IsNoPlay - integer - example: 0 - not sure what this means

# 44 - PenaltyType - factor - example: BLOCKED INTO PUNTER

# 45 - PenaltyYards - integer - example: 5 - yards from penalty

Then we sum amount of events that met all the criteria.  For each, pass and run, we needed the total amount of attempts, the amount of touchdowns (successes), and amount of turnovers (either fumble or interception).

# probability of outcomes
-------------------------------------------------

n.rush <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1 &

data.pbp.2014$YardLineDirection == "OPP" &

data.pbp.2014$IsPenalty == 0 &

data.pbp.2014$IsRush == 1,])

n.rush.td <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1
&
data.pbp.2014$YardLineDirection == "OPP" &

data.pbp.2014$IsPenalty == 0 &

data.pbp.2014$IsRush == 1 &

data.pbp.2014$IsTouchdown == 1,])

n.rush.no.td <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1 &
data.pbp.2014$YardLineDirection == "OPP" &
data.pbp.2014$IsPenalty == 0 &
data.pbp.2014$IsRush == 1 &
data.pbp.2014$IsTouchdown == 0,])

n.rush.fumble <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1 &
data.pbp.2014$YardLineDirection == "OPP" &
data.pbp.2014$IsPenalty == 0 &
data.pbp.2014$IsRush == 1 &
data.pbp.2014$IsFumble == 1,])

round(n.rush.td / n.rush, digits=3)
round(n.rush.no.td / n.rush, digits=3)
round(n.rush.fumble / n.rush, digits=4)

n.pass <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1 &
data.pbp.2014$YardLineDirection == "OPP" &
data.pbp.2014$IsPenalty == 0 &
data.pbp.2014$IsPass == 1,])

n.pass.td <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1 &
data.pbp.2014$YardLineDirection == "OPP" &
data.pbp.2014$IsPenalty == 0 &
data.pbp.2014$IsPass == 1 &
data.pbp.2014$IsTouchdown == 1,])

n.pass.no.td <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1 &
data.pbp.2014$YardLineDirection == "OPP" &
data.pbp.2014$IsPenalty == 0 &
data.pbp.2014$IsPass == 1 &
data.pbp.2014$IsTouchdown == 0,])

n.pass.interception <- nrow(data.pbp.2014[
data.pbp.2014$YardLineFixed == 1 &
data.pbp.2014$YardLineDirection == "OPP" &
data.pbp.2014$IsPenalty == 0 &
data.pbp.2014$IsPass == 1 &
data.pbp.2014$IsInterception == 1,])

 

Lastly we calculate the success and failure chances by dividing those by the total amount of attempts.

round(n.pass.td / n.pass, digits=3)
round(n.pass.no.td / n.pass, digits=3)
round(n.pass.interception / n.pass, digits=4)

# > round(n.rush.td / n.rush, digits=3)
# [1] 0.563
# > round(n.rush.no.td / n.rush, digits=3)
# [1] 0.437
# > round(n.rush.fumble / n.rush, digits=4)
# [1] 0.0101


# > round(n.pass.td / n.pass, digits=3)
# [1] 0.579
# > round(n.pass.no.td / n.pass, digits=3)
# [1] 0.421
# > round(n.pass.interception / n.pass, digits=4)
># [1] 0

 

This success rate will is used to determine if the decision made in the Superbowl was good or not.  Since there are not enough sample size in the 2014 season, we felt it was unwise to use an individual team's success rate given that there is not a big enough sample with exact parameters of the play (ball on 1 yard line).  
chart

Conclusion 
We can recreate a victory probability model using these numbers.  Doing so shows us that passing is in fact more likely to succeed than running the ball.  Unfortunately we cannot compare our model to that found on the 538 article because there are many built in assumptions including a significant change in success rate which is dependent on if the first play was either a run or a pass.

 

 

About Author

Related Articles

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp