Category Archives: Football

Wear Down, Chicago Bears?

I watched the NFC Championship game the weekend before last via a moderately sketchy British stream. It used the Joe Buck/Troy Aikman feed, but whenever that went to commercials they had their own British commentary team whose level of insight, I think it’s fair to say, was probably a notch below what you’d get if you picked three thoughtful-looking guys at random out of an American sports bar. (To be fair, that’s arguably true of most of the American NFL studio crews as well.)

When discussing Marshawn Lynch, one of them brought out the old chestnut that big running backs wear down the defense and thus are likely to get big chunks of yardage toward the end of games, citing Jerome Bettis as an example of this. This is accepted as conventional wisdom when discussing football strategy, but I’ve never actually seen proof of this one way or another, and I couldn’t find any analysis of this before typing up this post.

The hypothesis I want to examine is that bigger running backs are more successful late in games than smaller running backs. All of those terms are tricky to define, so here’s what I’m going with:

Bigger running backs are determined by weight, BMI, or both. I’m using Pro Football Reference data for this, which has some limitations in that it’s not dynamic, but I haven’t heard of any source that has any dynamic information on player size.
Late in games is the simplest thing to define: fourth quarter and overtime.
More successful is going to be measured in terms of yards per carry. This is going to be compared to the YPC in the first three quarters to account for the baseline differences between big and small backs. The correlation between BMI and YPC is -0.29, which is highly significant (p = 0.0001). The low R squared (about 0.1) says that BMI explains about 10% of variation in YPC, which isn’t great but does say that there’s a meaningful connection. There’s a plot below of BMI vs. YPC with the trend line added; it seems like close to a monotonic effect to me, meaning that getting bigger is on average going to hurt YPC. (Assuming, of course, that the player is big enough to actually be an NFL back.)

My data set consisted of career-level data split into 4th quarter/OT and 1st-3rd quarters, which I subset to only include carries occurring while the game was within 14 points (a cut popular with writers like Bill Barnwell—see about halfway down this post, for example) to attempt to remove huge blowouts, which may affect data integrity. My timeframe was 1999 to the present, which is when PFR has play-by-play data in its database. I then subset the list of running backs to only those with at least 50 carries in the first three quarters and in the fourth quarter and overtime (166 in all). (I looked at different carry cutoffs, and they don’t change any of my conclusions.)

Before I dive into my conclusions, I want to preemptively bring up a big issue with this, which is that it’s only on aggregate level data. This involves pairing up data from different games or even different years, which raises two problems immediately. The first is that we’re not directly testing the hypothesis; I think it is closer in spirit to interpret as “if a big running back gets lots of carries early on, his/his team’s YPC will increase in the fourth quarter,” which can only be looked at with game level data. I’m not entirely sure what metrics to look at, as there are a lot of confounds, but it’s going in the bucket of ideas for research.

The second is that, beyond having to look at this potentially effect indirectly, we might actually have biases altering the perceived effect, as when a player runs ineffectively in the first part of the game, he will probably get fewer carries at the end—partially because he is probably running against a good defense, and partially because his team is likely to be behind and thus passing more. This means that it’s likely that more of the fourth quarter carries come when a runner is having a good day, possibly biasing our data.

Finally, it’s possible that the way that big running backs wear the defense down is that they soften it up so that other running backs do better in the fourth quarter. This is going to be impossible to detect with aggregate data, and if this effect is actually present it will bias against finding a result using aggregate data, as it will be a lurking variable inflating the fourth quarter totals for smaller running backs.

Now, I’m not sure that either of these issues will necessarily ruin any results I get with the aggregate data, but they are caveats to be mentioned. I am planning on redoing some of this analysis with play-by-play level data, but those data are rather messy and I’m a little scared of small sample sizes that come with looking at one quarter at a time, so I think presenting results using aggregated data still adds something to the conversation.

Enough equivocating, let’s get to some numbers. Below is a plot of fourth quarter YPC versus early game YPC; the line is the identity, meaning that points above the line are better in the fourth. The unweighted mean of the difference (Q4 YPC – Q1–3 YPC) is -0.14, with the median equal to -0.15, so by the regular measures a typical running back is less effective in the 4th quarter (on aggregate in moderately close games). (A paired t-test shows this difference is significant, with p < 0.01.)

A couple of individual observations jump out here, and if you’re curious, here’s who they are:

The guy in the top right, who’s very consistent and very good? Jamaal Charles. His YPC increases by about 0.01 yards in the fourth quarter, the second smallest number in the data (Chester Taylor has a drop of about 0.001 yards).
The outlier in the bottom right, meaning a major dropoff, is Darren Sproles, who has the highest early game YPC of any back in the sample.
The outlier in the top center with a major increase is Jerious Norwood.
The back on the left with the lowest early game YPC in our sample is Mike Cloud, whom I had never heard of. He’s the only guy below 3 YPC for the first three quarters.

A simple linear model gives us a best fit line of (Predicted Q4 YPC) = 1.78 + 0.54 * (Prior Quarters YPC), with an R squared of 0.12. That’s less predictive than I thought it would be, which suggests that there’s a lot of chance in these data and/or there is a lurking factor explaining the divergence. (It’s also possible this isn’t actually a linear effect.)

However, that lurking variable doesn’t appear to be running back size. Below is a plot showing running back BMI vs. (Q4 YPC – Q1–3 YPC); there doesn’t seem to be a real relationship. The plot below it shows difference and fourth quarter carries (the horizontal line is the average value of -0.13), which somewhat suggests that this is an effect that decreases with sample size increasing, though these data are non-normal, so it’s not an easy thing to immediately assess.

That intuition is borne out if we look at the correlation between the two, with an estimate of 0.02 that is not close to significant (p = 0.78). Using weight and height instead of BMI give us larger apparent effects, but they’re still not significant (r = 0.08 with p = 0.29 for weight, r = 0.10 with p = 0.21 for height). Throwing these variables in the regression to predict Q4 YPC based on previous YPC also doesn’t have any effect that’s close to significant, though I don’t think much of that because I don’t think much of that model to begin with.

Our talking head, though, mentioned Lynch and Bettis by name. Do we see anything for them? Unsurprisingly, we don’t—Bettis has a net improvement of 0.35 YPC, with Lynch actually falling off by 0.46 YPC, though both of these are within one standard deviation of the average effect, so they don’t really mean much.

On a more general scale, it doesn’t seem like a change in YPC in the fourth quarter can be attributed to running back size. My hunch is that this is accurate, and that “big running backs make it easier to run later in the game” is one of those things that people repeat because it sounds reasonable. However, given all of the data issues I outlined earlier, I can’t conclude that with any confidence, and all we can say for sure is that it doesn’t show up in an obvious manner (though at some point I’d love to pick at the play by play data). At the very least, though, I think that’s reason for skepticism next time some ex-jock on TV mentions this.

A Reason Bill Simmons is Bad At Gambling

Leave a reply

For those unaware, Bill Simmons, aka the Sports Guy, is the editor-in-chief of Grantland, ESPN’s more literary (or perhaps intelligent, if you prefer) offshoot. He’s hired a lot of really excellent writers (Jonah Keri and Zach Lowe, just to name two), but he continues to publish long, rambling football columns with limited empirical support. I find this somewhat frustrating given that the chief Grantland NFL writer, Bill Barnwell, is probably the most prominent data-oriented football writer around, but you take the good with the bad.

Simmons writes a column with NFL picks each week during the season, and has a pretty so-so track record for picking against the spread, as detailed in the first footnote to this article here. Simmons has also written a number of lengthy columns attempting to construct a system for gambling on the playoffs, and hasn’t done too great in this regard either. I’ve been meaning to mine some of these for a post for a while now, and since he’s written two such posts this year already (wild card and divisional round), I figured the time was right to look at some of his assertions.

The one I keyed on was this one, from two weeks ago:

SUGGESTION NO. 6: “Before you pick a team, just make sure Marty Schottenheimer, Herm Edwards, Wade Phillips, Norv Turner, Andy Reid, Anyone Named Mike, Anyone Described As Andy Reid’s Pupil and Anyone With the Last Name Mora” Isn’t Coaching Them.

I made this tweak in 2010 and feel good about it — especially when the “Anyone Named Mike” rule miraculously covers the Always Shaky Mike McCarthy and Mike “You Know What?” McCoy (both involved this weekend!) as well as Mike Smith, Mike “The Sideline Karma Gods Put A Curse On Me” Tomlin, Mike Munchak and the recently fired Mike Shanahan. We’re also covered if Mike Shula, Mike Martz, Mike Mularkey, Mike Tice or Mike Sherman ever make comebacks. I’m not saying you bet against the Mikes — just be psychotically careful with them. As for Andy Reid … we’ll get to him in a second.

That was written before the playoffs—after Round 1, he said he thinks he might make it an ironclad rule (with “Reid’s name…[in] 18-point font,” no less).

Now, these coaches certainly have a reputation for performing poorly under pressure and making poor decisions regarding timeouts, challenges, etc., but do they actually perform worse against the spread? I set out to find this out, using the always-helpful pro-football-reference database of historical gambling lines to get historical ATS performance for each coach he mentions. (One caveat here: the data only list closing lines, so I can’t evaluate how the coaches did compared to opening spreads, nor how much the line moved, which could in theory be useful to evaluate these ideas as well.) The table below lists the results:

Playoff Performance Against the Spread by Select Coaches
Coach	Win	Loss	Named By Simmons	Notes
Childress	2	1	No	Andy Reid Coaching Tree
Ditka	6	6	No	Named Mike
Edwards	3	3	Yes
Frazier	0	1	No	Andy Reid Coaching Tree
Holmgren	13	9	No	Named Mike
John Harbaugh	9	4	No	Andy Reid Coaching Tree
Martz	2	5	Yes	Named Mike
McCarthy	6	4	Yes	Named Mike
Mora Jr.	1	1	Yes
Mora Sr.	0	6	Yes
Phillips	1	5	Yes
Reid	11	8	Yes
Schotteinheimer	4	13	Yes
Shanahan	7	6	Yes	Named Mike
Sherman	2	4	Yes	Named Mike
Smith	1	4	Yes	Named Mike
Tice	1	1	Yes	Named Mike
Tomlin	5	3	Yes	Named Mike
Turner	6	2	Yes

A few notes: first, I’ve omitted pushes from these numbers, as PFR only lists two (both for Mike Holmgren). Second, the Reid coaching tree includes the three NFL coaches who served as assistants under Reid who coached an NFL playoff game before this postseason. Whether or not you think of them as Reid’s pupils is subjective, but it seems to me that doing it any other way is going to either turn into circular reasoning or cherry-picking. Third, my list of coaches named Mike is all NFL coaches referred to as Mike by Wikipedia who coached at least one playoff game, with the exception of Mike Holovak, who coached in the AFL in the 1960s and who thus a) seems old enough not to be relevant to this heuristic and b) is old enough that there isn’t point spread data for his playoff game on PFR, anyhow.

So, obviously some of these guys have had some poor performances against the spread: standouts include Jim Mora, Sr. at 0-6 and Marty Schottenheimer at 4-13, though the latter isn’t actually statistically significantly different from a .500 winning percentage (p = 0.052). More surprising, given Simmons’s emphasis on him, is the fact that Reid is actually over .500 lifetime in the playoffs against the spread. (That’s the point estimate, anyway; it’s not statistically significantly better, however.) This seems to me to be something you would want to check before making it part of your gambling platform, but that disconnect probably explains both why I don’t gamble on football and why Simmons seems to be poor at it. (Not that his rule has necessarily done him wrong, but drawing big conclusions on limited or contradictory evidence seems like a good way to lose a lot of money.)

Are there any broader trends we can pick up? Looking at Simmons’s suggestion, I can think of a few different sets we might want to look at:

Every coach he lists by name.
Every coach he lists by name, plus the Reid coaching tree.
Every coach he lists by name, plus the unnamed Mikes.
Every coach he lists by name, plus the Reid coaching tree and the unnamed Mikes.

A table with those results is below.

Combined Against the Spread Results for Different Groups of Coaches Cited By Simmons
Set of Coaches	Number of Coaches in Set	Wins	Losses	Winning Percentage	p-Value
Named	14	50	65	43.48	0.19
Named + Reid	17	61	71	46.21	0.43
Named + Mikes	16	69	80	46.31	0.41
All	19	80	86	48.19	0.70

As a refresher, the p-value is the probability that we would observe a result as or more extreme as the observed result if there were no true effect, i.e. the selected coaches are actually average against the spread. (Here’s the Wikipedia article.) Since none of these are significant even at the 0.1 level (which is generally the lowest barrier to treating a result as meaningful), we wouldn’t conclude that any of Simmons’s postulated sets are actually worse than average ATS in the playoffs. It is true that these groups have done worse than average, but the margins aren’t huge and the samples are small, so without a lot more evidence I’m inclined to think that there isn’t any effect here. These coaches might not have been very successful in the playoffs, but any effect seems to be built into the lines.

Did Simmons actually follow his own suggestion this postseason? Well, he picked against Reid, for Mike McCoy (first postseason game), and against Mike McCarthy in the wild card round, going 1-0-2, with the one win being in the game he went against his own rule. For the divisional round, he’s gone against Ron Rivera (first postseason game, in the Reid coaching tree) and against Mike McCoy, sticking with his metric. Both of those games are today, so as I type we don’t know the results, but whatever they are, I bet they have next to nothing to do with Rivera’s relationship to Reid or McCoy’s given name.

Tied Up in Knots

2 Replies

Apologies for the gap between posts–travel and whatnot. I’ll hopefully have some shiny new content in the future. A narrow-minded, two part post inspired by the Bears game against the Vikings today:

Part I: The line going into the game was pick ’em, meaning no favorite. This means that a tie (very much on the table) would have resulted in a push. Has a tie game ever resulted in a push before?

As it turns out, using Pro Football Reference’s search function, there have been 19 ties since the overtime rule was introduced in the NFL in 1974, and none of them were pick ’em. (Note: PFR only has lines going back to the mid-1970s, so for two games I had to find out if there was a favorite from a Google News archive search.) (EDIT: Based on some search issues I’ve had, PFR may not list any games as pick ’ems. However, all of the lines were at least 2.5 points, so if there’s a recording error it isn’t responsible for this.)

Part II has to do with ties, specifically consecutive ones. Since 1974, unsurprisingly, no team has tied consecutive games. Were the Vikings, who were ~~24 seconds~~ 1:47 shy of a second tie, the closest?

Only two teams before the Vikes have even had a stretch of two overtime games with one tie, both in 1986. The Eagles won a game on a QB sneak at 8:07 of OT a week before their tie, in a game that seems very odd now–the Raiders fumbled at the Philly 15 and had it taken back to the Raiders’ 4, after which the Eagles had Randall Cunningham punch it in. Given that the coaches today chose to go with field goal tries of 45+ even before 4th down, it’s clear that risk calculations with respect to kicking have changed quite a bit.

As for the other team, the 49ers lost on a field goal less than four minutes into overtime the week before their 1986 tie. Thus, the Vikings seem to have come well closer to consecutive ties than any other team since the merger.

Finally, a crude estimate of the probability a team would tie two consecutive games in a row. (Caveats follow at the end of the piece.) Assuming everything is independent (though realistically it’s not), we figure a tie occurs roughly 0.207% of the time, or roughly 2 ties for every thousand games played. Once again assuming independence (i.e. that a team that has tied once is no more likely to tie than any other), we figure the probability of consecutive ties in any given pair of games to be 0.0004%, or 1 in 232,000. Given the current status of an 32 team league in which each team plays 16 games, there are 480 such pairs of games per year.

Ignoring the fact that a tie has to have two teams (not a huge deal given the small probabilities we’re talking about), we would figure there is about a 0.2% chance that a team in the NFL will have two consecutive ties in a given year, meaning that we’d expect 500 seasons in the current format to be played before we get a streak like that.

I’ll note (warning: dull stuff follows) that there are some probably silly assumptions that went into these calculations, some of which—the ones relating to independence—I’ve already mentioned. I imagine that baseline tie rate is probably wrong, and I imagine it’s high. I can think of two things that would make me underestimate the likelihood of a tie: one is the new rules, which by reducing the amount of sudden death increase the probability that teams tie. The other is that I’ve assumed there’s no heterogeneity across teams in tie rates, and that’s just silly—a team with a bad offense and good defense, i.e. one that plays low scoring games, is more likely to play close games and more likely to have a scoreless OT. Teams that play outside, given the greater difficulty of field goal kicking, probably have a similar effect. Some math using Jensen’s inequality tells us that the heterogeneity will probably increase the likelihood that one team will do it.

However, those two changes will have a much smaller impact, I expect, than that of increasing field goal conversion rates and a dramatic increase in both overall points scored and the amount of passing that occurs, which makes it easier for teams to get more possessions in one OT. Given the extreme rarity of the tie, I don’t know how to empirically verify these suppositions (though I’d love to see a good simulation of these effects, but I don’t know of anyone who has one for this specific a scenario), but I’ll put it this way: I wouldn’t put money down at 400-1 that a team would tie twice in a row in a given year. I don’t even think I’d do it at 1000-1, but I’d certainly think about it.

Two Step Forwards, Two Steps Back

That's a Clown Hypothesis, Bro!

Sports analysis and commentary, mostly empirically-based.