Thesis Defense (all)

    Before we start, I would first like to thank my committee members for the time and effort they have spent helping me get to this point with my research and addtionally for the input and insight they have provided throughout this entire process.

    I would also like to thank my friends and family for their support, especially my wife Gabrielle and my parents John and Cathy, who are all here today.

    Lastly, I would like to take a moment to specifically thank Dr. Mongeon for serving as the chair of this committee and for being my thesis advisor. Over the past two (plus) years, I have learned more from you than I ever could have anticipated, expected or hoped, so thank you for that.

    Now, with all the touchy-feely stuff out of the way, let's get down to business.

    Good morning and thank you all for being here. For those of you that don't know me, my name is Weller Ross and today I am presenting the research that I have conducted for my master's thesis, which is titled 'An Examination of Decision-Making Biases On Fourth Down in the NFL”

    I will start by discussing the motivation for the research and will then go over the previous related literature before explaining the methodology and reviewing the data set used in this research. At that point I will talk about the results, provide some concluding remarks and open it up to questions.

    I had several ideas for this research, which all basically fell under the large umbrella of how to evaluate teams' performance.

    Breaking that down a little further, the real question was what impacts a team's ability to win? To answer that question, I needed to think about why teams perform differently and who the people are that influence the outcome of a game. This can be addressed with a wide variety of answers. The first and most obvious answer is the players, who certainly impact the outcome of a game.

    What about the refs? The fans of a team that just lost will gladly tell you how the refs impacted the outcome of the game. Speaking of the fans, can they impact the outcome of a game? They might; research has been done on home field advantage, but is that because of the environment that the fans help create or more about the visiting team playing in an unfamiliar city, stadium, time zone, climate, etc?

    What about the front office personnel? The owner? The general manager? Team doctors? Or on the business side of thing. How about the marketing departments? Event management? Stadium ops? Media relations? We're getting further and further away from the center of the action here, but the point I'm trying to make is that there are more things to examine within the realm of sport analytics than just player evaluation despite what popular media would have you believe.

    The argument could be made that coaches' decisions have the largest impact on game outcomes. At the very least, most people would agree that the coaches and their decisions would rank pretty high on the list, especially in key moments and high-pressure situations. An example of a situation like this in football is when a coach must make the decision of whether to kick or go-for-it on fourth down. This is the decision-making process that I am evaluating.

    My research focuses on when coaches in the NFL make decisions on fourth down and whether or not they succumb to a subconscious psychological bias called the representativeness heuristic. So what is representativeness? It's part of Prospect Theory, which in short, explains why people make the decisions that they do. Prospect Theory is comprised of four well-established psychological biases (or heuristics) and the representativeness heuristic focuses on people overweighting new information relative to prior information.

    This research will be useful not only for the coaches, but for the general managers' evaluations of coaches, because if general managers and other high-level decision-makers are more aware of this bias then it could aid them in their decisions when choosing to hire and/or fire a coach.

    Of the previous literature I read that focused on fourth down decision-making in the NFL, all of the researchers came to the same conclusion: that coaches make suboptimal decisions by acting too conservatively and opting to kick on fourth down more often than they should.

    They speculate that coaches could be profit-maximizing rather than win-maximizing, or that they might just be systematically imperfect maximizers, or that they might prefer to lose as a result of playing it safe rather than lose from the result of taking a gamble.

    The problem was that none of them were able to figure out why coaches were making suboptimal decisions. Instead, the researchers had to make inferences from the contexts they analyzed rather than being able to directly test for specific causes. This is key, because this is where my research will make the largest contribution to the realm of academia and future research.

    The portion of my research used to determine if coaches are making suboptimal decisions is simply the means to the end. These previous researchers esetablished that coaches make suboptimal decisions, so the purpose of my research is to introduce a methodology that allows us to directly test for why coaches are making suboptimal decisions, because that's something that none of the other researchers were able to do.

    • I included a few quotes that I thought best captured this point. The first is from David Romer when he stated:
    • “there is little evidence about whether conservative behaviors arise because individuals have nonstandard objective functions or because they are imperfect maximizers.”
    • He's saying they act conservatively but there's little evidence as to WHY this is happening.

    • Carter & Machol wrote:
    • “We believe the reason for this paradox is that coaches do not have sufficient intuitive feel for the negative value imposed on the opposition.”
    • Notice that they say that they "BELIEVE" this is the reason, but they weren't able to directly test for it.

    • These two quotes are both from Soham Patel, who wrote:
    • “individuals are more sensitive to losses than to gains. In football terms, a coach might find the disutility of a play allowing the opponent to score points surpasses the utility of a play that allows his own team to score.”

    • He later states:
    • “coaches might value losses of a play higher than they would the corresponding gains of a play and so they might be calling conservative plays.”
    • Again, he says they "MIGHT" be doing this. Patel actually mentions Prospect Theory specifically in his paper, but he states it as a quote-“potential” reason for the suboptimal decisions.

    All of the previous researchers were handcuffed by their methodologies, which they were able to use to determine if coaches were making optimal decisions, but limited them to only make guesses as to why coaches were making suboptimal decisions. They were, of course, educated guesses and well-informed guesses, but guesses nonetheless.

    For my research, I use a Bayesian approach, which provides the flexibility to keep the prior odds separate from the conditional likelihood. This makes it possible to measure how much weight the coaches are giving new information compared to the original information when making decisions, which in turn allows me to directly test for the representativeness heuristic.

    This is Bayes' rule in odds form.

    On the left hand side of the equation we have what is called the posterior, which in this circumstance is the in-game odds of team C winning the game given the fourth down decision that the coach made.

    The first term on the right hand side is called the prior, which is the odds of team C winning the game given the game-state, while the second term is called the conditional likelihood, which in this case is the inverse conditional odds of team C making that decision given that they won the game.

    In other words these two components are the original information and the new information.

    In this frame, we have the equation for testing the hypothesis. In this equation Beta-One represents the parameter estimates for prior information while Beta-Two represents the parameter estimates for the new information. To test for representativeness we will compare those two numbers.

    As you can see down here, the null hypothesis is that coaches are equally weighting the two components and are therefore not guilty of the representativeness heuristic. The first alternative hypothesis is that they are under-weighting new information, while the second alternative hypothesis is that they are over-weighting new information, which indicates the presence of the representativeness heuristic in their decision-making process.

  17. PRIOR (1A)
    If this frame looks similar it's because we are again utilizing Bayes' rule, but this time it is with the posterior as the PRE-DECISION odds of team C winning the game given the game-state, which is shown on the left-hand side of the equation.

    On the right-hand side we again have the prior and conditional likelihood. In this case, the first term is literally the prior (pre-game) odds of team C winning the game while the second term is the odds of team C being in that game-state given that they ended up winning the game.

  18. PRIOR (1B)
    The next step is to calculate these two components. Down here you can see that to estimate the first component (the prior) we are using a logistic regression to predict game outcomes from closing point spreads.

    For the second component we use a multinomial logistic regression to predict the probability of the team being in that situation given that they won the game. In this case X-Prime represents the matrix of variables that make up the game-state: the score margin, time remaining in the game, timeouts remaining, field position with respect to the offensive team, current down, yards to go to gain a new set of downs, and an indicator variable for possession with respect to the home team, as well as the bookmakers' over/under data for the game.

    This frame shows the conditional likelihood component of the theoretical model. On the left hand side of the equation we have the probability of that decision being made in game G at time T, given that team C ended up winning the game.

    On the right-hand side we have another multinomial logisitic regression where X-Prime again respresents the matrix of variables that make up the game-state. The one difference you will notice here is that this time it does not include the current down, and that is because this equation is looking only at fourth down plays, while the previous was looking at all plays, regardless of down.

  20. DATA
    This research uses a data set comprised of readily available NFL play-by-play data for every regular season game that was played since 2003, other than the 2015 season and the current season.

    That gives us data for each year from 2003 through 2014. Those 12 seasons provided every play from 3,072 games for a total of 468,699 observation.

    This table provides the summary statistics for game-state information across first down attempts, field goal attempts, and punts. There is obviously a lot of information here, but what I'd like to draw your attention to first is the small number of first down attempts compared to field goals and punts. There are approximately twice as many field goal attempts as first down attempts and more than five times as many punts as first down attempts.

    Beyond that, you would expect that with 12 years worth of data, things would tend to balance out, so with 60 minutes in a game, you would think the average minutes remaining would be about 30. This is true for punts (30.57) and field goals (28.90), but not first down attempts (18.09). The same is true for score margin, which you would expect to be near zero. Again this is true for punts (-0.39) and field goals (0.73) but not first down attempts (-6.56).

    This trend holds true across other stats as well, which indicates that coaches are typically choosing to attempt a first down when they are trailing late in the game. In other words, they tend to choose to attempt a first down out of desperation rather than due to it being the optimal decision.

    Like the previous table, this table provides the summary statistics for the game-state information, but this time it is across wins, losses, and ties. This table shows the information for the offense and the next table shows the information for the defense. The data is separated by offense and defense to account for a possession indicator variable that is included in the model.

    Here is the table for the defense. For both of these tables, everything is about how you would expect it to be. The average score margin is positive for teams that win, and negative for teams that lose, and the average field position is better for the winnings teams as well. And I know there's a lot of information here and these tables might be hard to see for you guys so I'll be happy to come back to these tables (as well as any of the other tables and figures I'll be discussing) for any specific questions about them later on.

    To test the in-game win odds model, I used six years of data to develop the model and six years as a validation set to see how good of a job the model did when predicting on new data. Odd-numbered years (2003, 2005, 2007, etc) were used to build the model and the even-numbered years were used as the validation set.

    The regression shown here predicted actual game outcomes from the in-game win odds results. As you can see here the p-value indicates statistical significance, while an estimate of 0.9925 indicates a strong positive correlation between the in-game win odds and the actual game outcomes. Specifically, this tells us that a 10% increase in the in-game win odds corresponds to a 9.9% increase in the teams' win probability for the validation set, which is nearly a one-to-one relationship.

  26. WP of different score-states across time (A)
    To further illustrate the results of the in-game win odds model, here is a graph that shows the teams' predicted in-game win probabilities across five different score-states for when they have a first down with 10 yards to go at their own 20 yard line throughout the course of a game. This situation was chosen because it is the most common situation in football (n=11,423), nearly four times as many occurances as the second most common situation (n=3,079).

  27. WP of different score-states across time (B)
    The five score-states are when a team is leading by two possessions, leading by one possession, tied, down by one possession and down by two possessions. The maximum number of points a team is able to score on a single possession is eight, so being up or down by two possessions is when they are up or down by nine or more points, one-possession is from one to eight points, and tied is when the score margin is zero.

    This graph is extremely logical, which demonstrates the reliability of the in-game win odds model. The five lines never cross, and it shows that a team up by two possessions always has a higher win probability than being up by one possession, which is always better than being tied, and so on.

  28. WP reliability test across different validation sets (A)
    As I mentioned before, the in-game win odds model was developed usign six years of data and validated using a separate six years of data. This was done in four different ways to ensure the best model was selected. The model was built using the first six years of data, the last six years, the odd-numbered years, and the even numbered years. In each case the validation set was comprised of the six other years.

  29. WP reliability test across different validation sets (B)
    As you can see, all variations yielded similar results in terms of their parameter estimates, standard errors and p-values. I ended up choosing the odd years for the training set to help negate any underlying effects that could have directly or indirectly impacted the coaches' decision-making such as rules changes and the improved quality of kickers.

    The same tests were conducted for the inclusion (or exclusion) or certain variables as well. A further example of this is shown below where you can see the results for when the model was developed with respect to the home team or with respect to the possession team.

  30. Optimal decision across YTG (A)
    Those in-game win probabilities were used to determine what the optimal decision was on fourth down. This figure shows us the change in win probability across yards to go, for each of the three options a team has on fourth down.

    The solid line represents a first down attempt (the option to “go for it”), the dotted line is a field goal attempt, and the dashed line is a punt. For the purposes of this graph, I only included plays between the 33- and 40-yard line, because this is where all three choices are considered to be viable options.

  31. Optimal decision across YTG (B)
    As you can see, a first down attempt is considered to be the optimal decision when there are fewer than seven yards to go, and even at exactly seven yards to go it is approximately even with field goal attempts, which ends up being the optimal decision for eight or more yards to go.

    This means that punting is not the optimal decision across each of the yards to go. However, it was the most frequently chosen option for all but one or two yards to go.

  32. FGA vs FDA optimal decision across field position
    Where the previous graph illustrated the optimal decision across yards to go, this graph and the next one illustrate the optimal decision across field position. On this graph the solid line is first down attempts while the dotted line is field goal attempts. The graph was generated using 4th down plays with 10 or fewer yards to go in a two-possession game (-16:16).

    You can see that a first down attempt is the optimal decision across all yardlines. However, it is again the less frequently chosen option. Inside the 10-yard line field goals were chosen about 80% of the time, and between the 10 and 30 it was about 85%, and even though from the 30 to the 40, field goals were chosen only slightly more frequently than first down attempts, punts were chosen nearly that much as well.

  33. PNT vs FDA optimal decision across field position (A)
    This graph again illustrates the optimal decision across field position, however this time it is comparing first down attempts to punts. As you can see, there is a minor break in each line at the 70-yard line (the offense's own 30) and this is due to each half of this graph being generated with slightly different sets of data. Like the previous graph, plays within a two-possession score margin were used to generate the portion of the graph from the 70 to the 100, but for the portion from the 40 to the 70, score margins from -11 to 11 were used. This was done to allow for a higher ratio of plays where a team in the middle of the field could be trying to get in field goal range.

  34. PNT vs FDA optimal decision across field position (B)
    The most note-worthy part of this graph is that at about the 20-yard line, a first down attempt draws even with a punt in terms of their impact on the team's in-game win probability. And by the 25-yard line, it is clear that the first down attempt is the optimal decision. This strongly contradicts conventional wisdom, which would nearly always have a team punt on a fourth down from their own half of the field. In fact, 95% of the fourth down plays from a team's own half of the field were punts.

    However, this does align with the findings of previous researchers. Romer stated that “even at its 10-yard line -90 yards from a score- a team within three yards of a first down is better off on average going for it.” While he's referring to situations with three or fewer yards to go, the average for my data is 8.43, with a median of seven. That's why he mentions a team's own 10 while my graph shows it being the 20 for when a first down attempt becomes the optimal decision.

  35. PNT vs FDA optimal decision across field position (C)
    Similarly, in Romer's findings, at the 25-yard line the critical value is five yards to go for when a first down attempt is the optimal decision, and in my data a team at the 25-yard line had a median yards to go of five. And as mentioned before that is about where you start to see clear separation on this graph.

    For this portion of the research, it is important to note the similarities between my findings and those of the previous researchers, because as I mentioned before, my primary goal for this research is to provide a method for being able to directly test for the representativeness heuristic. The win odds model and the resulting recommendations/optimal decisions are simply the vessel that allows me to answer that question of why coaches make suboptimal decisions.

  36. Pre-game win probability summary data across game outcomes
    So to estimate the in-game win probability using a Bayesian approach, I first needed to estimate the pre-game win probabilities. In this table you can see the summary data for the pre-game win probabilities across actual game outcomes.

    These numbers are all with respect to the home team and as I mentioned before, there are 3,072 games in this data set. The home team won 1,760 of those, and the away team won 1,308, while the other four ended in ties.

  37. Pre-game win probability across point spreads
    This graph provides another way to view the pre-game win probabilities. The home team's closing point spreads are shown across the x-axis and the line represents the corresponding pre-game win probability for each point spread.

    As you can see, a point spread of zero translates to a predicted pre-game win probability of almost exactly 0.5, which is what we would expect to see.

  38. Conditional likelihood of decision model results
    While the previous graph illustrated the results of the pre-game component of the in-game win probability model, this table presents the results of the conditional likelihood component. The results are shown with respect to the multinomial logistic resgression's omitted option, which in this case is first down attemtps.

    As you can see with the p-values, all but three of these estimates are statistically significant at the 0.05 level and all but six are statistically significant at the 0.01 level.

    Now, we finally get to the results of the direct test for the representativeness heuristic. You can see the results of the hypothesis test here where we have coefficient estimates of 0.1334 and 0.9693 for Beta-1 and Beta-2 respectively, both of which have p-values less than 0.001, which indicates statistical significance.

    Based on these results we can reject the null hypothesis which stated that Beta-1 and Beta-2 would be equal to each other. And if we look at the second alternative hypothesis again, we can see that these results do indeed indicate that the representativeness heuristic is impacting coaches' decision-making on fourth down.

    THIS is where my research makes its contribution to the field. Where the previous researchers had to make inferences as to what was causing coaches to make suboptimal decisions, the methodology used in this research enabled me to directly test for the representativeness heuristic.

    • This same test was also conducted on several different subsets of data:
    • years contained in the validation set (even-numbered years)
    • home and away teams separately
    • each of the four quarters
    • all 32 teams in the NFL

  41. Hypothesis test results across years
    Testing for representativeness within each season was done to determine whether or not coaches were getting any better or worse at appropriately weighting the information from year to year. The new information has a higher estimate than the original information for every year and in 2004 the difference between the estimates for the new and orignial information was 0.8653, while in 2014 it was 0.8367, which indicates that the coaches haven't gotten much better or worse during this time frame.

  42. Hypothesis test results across home and away teams
    Separately testing home teams and away teams for representativeness was done to see if coaches are any more or less susceptible to the bias when coaching at home or on the road. The parameter estimates are very similar for both, so it appears that coaches, as a whole, do not weight the information differently when they are on the road or at home.

  43. Hypothesis test results across quarters
    Testing for representativeness across quarters was done to see if coaches weighted information differently throughout the course of a game. The estimates for the new information are higher in each of the quarters, however they are less lopsided in the second and third quarters, where the estimates for the new and original information are different by approximately 0.53, while the first and fourth quarters see differences of 0.81 and 0.90, respectively.

  44. Hypothesis test results for each team
    Lastly, all 32 teams were individually tested for the representativeness heuristic, and all 32 have a higher parameter estimate for the new infomration when compared to the original information.

    Every p-value for the new information is 0.000, while the estimates for the original information had statistical significance at the 0.1 level for 12 teams. However, 15 teams had p-values of 0.3 or greater for original information. While this means that those parameter estimates are not considered to be statistically significant, it does indicate that the correlation for those teams could be due to randomness, which could mean they are not taking the original information into account at all.

    Only five teams have a p-value of 0.015 or less for the original information.

  45. Top Teams Table
    And those five teams are five of the most successful teams during the 12-year span covered in this dataset: Saints, Steelers, Giants, Patriots, and Packers.

    The Steelers, Giants, and Patriots are the only three teams to win multiple Super Bowls during this 12-year span. The Saints and Packers also won Super Bowls during this time, meaning that these five teams accounted for nine of the 12 Super Bowls covered in the time frame of this dataset.

    The ranking over to the left indicates where these teams rank in terms of the smallest difference between the estimates for the new and original information. So the Saints have the smallest difference (0.1768), while on the other end of the spectrum...

  46. Bottom Teams Table
    … the New York Jets have the largest difference (1.6266) indicating that they were over-weighting new information the most. The bottom five teams are the Jets, Cardinals, Vikings, Bears, and Texans. They only made the playoffs an average of 3.2 times, with none more than four. For perspective, on average, each team made the playoffs 4.5 times.

  47. Compare Table
    And the to provide further perspective, here are the totals and averages for the top teams shown in the first table and the bottom five teams shown in the second table.

    Nine Super Bowls compared to zero, more than twice as many playoff appearances, three times as many playoff wins, and nearly 30 more regular season wins PER TEAM.

  48. Win percentage by optimal decision percentage (A)
    To further illustrate the difference between the successful and unsuccessful teams, this graph shows teams win percentage across the percentage of optimal decisions they made on fourth down in a game. It should be noted that on average teams faced more than seven fourth downs per game (7.42) with a maximum of 16 in a single game.

    As you can see, teams that chose the optimal decision more frequently, won more frequently. In games where a team chose the optimal decision on fourth down 50% of the time, won the game approximately 60% of the time.

  49. Win percentage by optimal decision percentage (B)
    In games where a team made the optimal decision on more than half of their fourth downs, they won the game approximately two-thirds of the time (67.2%). All we're talking about at this point is teams simply making the optimal decision more often than not, and they won more than two-thirds of the games in which that happened.

    Taking it a step further, teams that made the optimal decision between 25 and 50 percent of the time still won more often than not (51.5%). And let me emphasize that this is between 25 and 50 percent, not 25 percent or more.

    In the end, this research shows that Bayes' Rule can be used to directly test for why suboptimal decisions are being made, rather than needing to make inferences.

    This information would be useful not only for helping coaches make better decisions, but also for helping general managers better evaluate coaches.

    Future research could include models to directly test for the other biases and heuristics that make up prospect theory.

    If nothing else, I simply hope that my research can shine a light on the fact that there is significantly more to be looked at within the realm of sport analytics than just player evaluation, and that it will help advance the research that is being conducted not only within the field of sport analytics, but within the field of sport management as a whole.

    Thank you very much for listening and I will now open it up to questions.

Card Set
Thesis Defense (all)