Category Archives: Splits

Wear Down, Chicago Bears?

I watched the NFC Championship game the weekend before last via a moderately sketchy British stream. It used the Joe Buck/Troy Aikman feed, but whenever that went to commercials they had their own British commentary team whose level of insight, I think it’s fair to say, was probably a notch below what you’d get if you picked three thoughtful-looking guys at random out of an American sports bar. (To be fair, that’s arguably true of most of the American NFL studio crews as well.)

When discussing Marshawn Lynch, one of them brought out the old chestnut that big running backs wear down the defense and thus are likely to get big chunks of yardage toward the end of games, citing Jerome Bettis as an example of this. This is accepted as conventional wisdom when discussing football strategy, but I’ve never actually seen proof of this one way or another, and I couldn’t find any analysis of this before typing up this post.

The hypothesis I want to examine is that bigger running backs are more successful late in games than smaller running backs. All of those terms are tricky to define, so here’s what I’m going with:

  • Bigger running backs are determined by weight, BMI, or both. I’m using Pro Football Reference data for this, which has some limitations in that it’s not dynamic, but I haven’t heard of any source that has any dynamic information on player size.
  • Late in games is the simplest thing to define: fourth quarter and overtime.
  • More successful is going to be measured in terms of yards per carry. This is going to be compared to the YPC in the first three quarters to account for the baseline differences between big and small backs. The correlation between BMI and YPC is -0.29, which is highly significant (p = 0.0001). The low R squared (about 0.1) says that BMI explains about 10% of variation in YPC, which isn’t great but does say that there’s a meaningful connection. There’s a plot below of BMI vs. YPC with the trend line added; it seems like close to a monotonic effect to me, meaning that getting bigger is on average going to hurt YPC. (Assuming, of course, that the player is big enough to actually be an NFL back.)

BMI & YPC

My data set consisted of career-level data split into 4th quarter/OT and 1st-3rd quarters, which I subset to only include carries occurring while the game was within 14 points (a cut popular with writers like Bill Barnwell—see about halfway down this post, for example) to attempt to remove huge blowouts, which may affect data integrity. My timeframe was 1999 to the present, which is when PFR has play-by-play data in its database. I then subset the list of running backs to only those with at least 50 carries in the first three quarters and in the fourth quarter and overtime (166 in all). (I looked at different carry cutoffs, and they don’t change any of my conclusions.)

Before I dive into my conclusions, I want to preemptively bring up a big issue with this, which is that it’s only on aggregate level data. This involves pairing up data from different games or even different years, which raises two problems immediately. The first is that we’re not directly testing the hypothesis; I think it is closer in spirit to interpret as “if a big running back gets lots of carries early on, his/his team’s YPC will increase in the fourth quarter,” which can only be looked at with game level data. I’m not entirely sure what metrics to look at, as there are a lot of confounds, but it’s going in the bucket of ideas for research.

The second is that, beyond having to look at this potentially effect indirectly, we might actually have biases altering the perceived effect, as when a player runs ineffectively in the first part of the game, he will probably get fewer carries at the end—partially because he is probably running against a good defense, and partially because his team is likely to be behind and thus passing more. This means that it’s likely that more of the fourth quarter carries come when a runner is having a good day, possibly biasing our data.

Finally, it’s possible that the way that big running backs wear the defense down is that they soften it up so that other running backs do better in the fourth quarter. This is going to be impossible to detect with aggregate data, and if this effect is actually present it will bias against finding a result using aggregate data, as it will be a lurking variable inflating the fourth quarter totals for smaller running backs.

Now, I’m not sure that either of these issues will necessarily ruin any results I get with the aggregate data, but they are caveats to be mentioned. I am planning on redoing some of this analysis with play-by-play level data, but those data are rather messy and I’m a little scared of small sample sizes that come with looking at one quarter at a time, so I think presenting results using aggregated data still adds something to the conversation.

Enough equivocating, let’s get to some numbers. Below is a plot of fourth quarter YPC versus early game YPC; the line is the identity, meaning that points above the line are better in the fourth. The unweighted mean of the difference (Q4 YPC – Q1–3 YPC) is -0.14, with the median equal to -0.15, so by the regular measures a typical running back is less effective in the 4th quarter (on aggregate in moderately close games). (A paired t-test shows this difference is significant, with p < 0.01.)

Q1-3 & Q4

A couple of individual observations jump out here, and if you’re curious, here’s who they are:

  • The guy in the top right, who’s very consistent and very good? Jamaal Charles. His YPC increases by about 0.01 yards in the fourth quarter, the second smallest number in the data (Chester Taylor has a drop of about 0.001 yards).
  • The outlier in the bottom right, meaning a major dropoff, is Darren Sproles, who has the highest early game YPC of any back in the sample.
  • The outlier in the top center with a major increase is Jerious Norwood.
  • The back on the left with the lowest early game YPC in our sample is Mike Cloud, whom I had never heard of. He’s the only guy below 3 YPC for the first three quarters.

A simple linear model gives us a best fit line of (Predicted Q4 YPC) = 1.78 + 0.54 * (Prior Quarters YPC), with an R squared of 0.12. That’s less predictive than I thought it would be, which suggests that there’s a lot of chance in these data and/or there is a lurking factor explaining the divergence. (It’s also possible this isn’t actually a linear effect.)

However, that lurking variable doesn’t appear to be running back size. Below is a plot showing running back BMI vs. (Q4 YPC – Q1–3 YPC); there doesn’t seem to be a real relationship. The plot below it shows difference and fourth quarter carries (the horizontal line is the average value of -0.13), which somewhat suggests that this is an effect that decreases with sample size increasing, though these data are non-normal, so it’s not an easy thing to immediately assess.

BMI & DiffCarries & Diff

That intuition is borne out if we look at the correlation between the two, with an estimate of 0.02 that is not close to significant (p = 0.78). Using weight and height instead of BMI give us larger apparent effects, but they’re still not significant (r = 0.08 with p = 0.29 for weight, r = 0.10 with p = 0.21 for height). Throwing these variables in the regression to predict Q4 YPC based on previous YPC also doesn’t have any effect that’s close to significant, though I don’t think much of that because I don’t think much of that model to begin with.

Our talking head, though, mentioned Lynch and Bettis by name. Do we see anything for them? Unsurprisingly, we don’t—Bettis has a net improvement of 0.35 YPC, with Lynch actually falling off by 0.46 YPC, though both of these are within one standard deviation of the average effect, so they don’t really mean much.

On a more general scale, it doesn’t seem like a change in YPC in the fourth quarter can be attributed to running back size. My hunch is that this is accurate, and that “big running backs make it easier to run later in the game” is one of those things that people repeat because it sounds reasonable. However, given all of the data issues I outlined earlier, I can’t conclude that with any confidence, and all we can say for sure is that it doesn’t show up in an obvious manner (though at some point I’d love to pick at the play by play data). At the very least, though, I think that’s reason for skepticism next time some ex-jock on TV mentions this.

Man U and Second Halves

During today’s Aston Villa-Manchester United match, Iain Dowie (the color commentator) mentioned that United’s form is improving and that they are historically a stronger team in the second half of the season, meaning that they may be able to put this season’s troubles behind them and make a run either the title or a Champions League spot. I didn’t get a chance to record the exact statement, but I decided to check up on it regardless.

I pulled data from the last ten completed Premier League seasons (via statto.com) to evaluate whether there’s any evidence that this is the case. What I chose to focus on was simply the number of first half and second half points for United, with first half and second half defined by number of games played (first 19 vs. last 19). One obvious problem with looking at this so simply is strength of schedule considerations. However, the Premier League, by virtue of playing a double round robin, is pretty close to having a balanced schedule—there is a small amount of difference in the teams one might play, and there are issues involving home and away, rest, and matches in other competitions, but I expect that’s random from year to year.

So, going ahead with this, has Man U actually produced better results in the second half of the season? Well, in the last 10 seasons (2003-04 – 2012-13), they had more points in the second half 4 times, and they did worse in the second half the other 6. (Full results are in the table at the bottom of the post.) The differences here aren’t huge—only a couple of points—but not only is there no statistically significant effect, there isn’t even a hint of an effect. Iain Dowie thus appears to be blowing smoke and gets to be the most recent commentator to aggravate me by spouting facts without support. (The aggravation in this case is compounded by the fact that this “fact” was wrong.)

I’ll close with two oddities in the data. The first is that, there are 20 teams that have been in the Premiership for at least 5 of the last 10 years, and exactly one has a significant result at the 5% level for the difference between first half and second half. (Award yourself a cookie if you guessed Birmingham City.) This seems like a textbook example of multiplicity to me.

The second, for the next time you want to throw a real stumper at someone, is that there is one team in the last 16 years (all I could easily pull data for) that had the same goal difference and number of points in the two halves of the season. That team is 2002-03 Birmingham City; I have to imagine that finishing 13th with 48 points and a -8 goal difference is about as dull as a season can get, though they did win both their Derby matches (good for them, no good for this Villa supporter).

Manchester United Results by Half, 2003—2012
Year First Half Points Second Half Points Total Points First Half Goal Difference Second Half Goal Difference Total Goal Difference
2003 46 29 75 25 4 29
2004 37 40 77 17 15 32
2005 41 42 83 20 18 38
2006 47 42 89 31 25 56
2007 45 42 87 27 31 58
2008 41 49 90 22 22 44
2009 40 45 85 22 36 58
2010 41 39 80 23 18 41
2011 45 44 89 32 24 56
2012 46 43 89 20 23 43

Break Points Bad

As a sentimental Roger Federer fan, the last few years have been a little rough, as it’s hard to sustain much hope watching him run into the Nadal/Djokovic buzzsaw again and again (with help from Murray, Tsonga, Del Potro, et al., of course). Though it’s become clear in the last year or so that the wizardry isn’t there anymore, the “struggles”* he’s dealt with since early 2008 are pretty frequently linked to an inability to win the big points.

*Those six years of “struggles,” by the way, arguably surpass the entire career of someone like Andy Roddick. Food for thought.

Tennis may be the sport with the most discourse about “momentum,” “nerves,” “mental strength,” etc. This is in some sense reasonable, as it’s the most prominent sport that leaves an athlete out there by himself with no additional help–even a golfer gets a caddy. Still, there’s an awful lot of rhetoric floating around there about “clutch” players that is rarely, if ever, backed up. (These posts are exceptions, and related to what I do below, though I have some misgivings about their chosen methods.)

The idea of a “clutch” player is that they should raise their game when it counts. In tennis, one easy way of looking at that is to look at break points. So, who steps their game up when playing break points?

Using data that the ATP provides, I was able to pull year-end summary stats for top men’s players from 1991 to the present, which I then aggregated to get career level stats for every man included in the data. Each list only includes some arbitrary number of players, rather than everyone on tour—this causes some complications, which I’ll address later.

I then computed the fraction of break points won and divided by the fraction of non-break point points won for both service points and return points, then averaged the two ratios. This figure gives you the approximate factor that a player ups his game for a break point. Let’s call it clutch ratio, or CR for short.

This is a weird metric, and one that took me some iteration to come up with. I settled on this as a way to incorporate both service and return “clutchness” into one number. It’s split and then averaged to counter the fact that most people in our sample (the top players) will be playing more break points as a returner than a server.

The first interesting thing we see is that the average value of this stat is just a little more than one—roughly 1.015 (i.e. the average player is about 1.5% better in clutch situations), with a reasonably symmetric distribution if you look at the histogram. (As the chart below demonstrates, this hasn’t changed much over time, and indeed the correlation with time is near 0 and insignificant. And I have no idea what happened in 2004 such that everyone somehow did worse that year.) This average value, to me, suggests that we are dealing at least to some extent with adverse selection issues having to do with looking at more successful players. (This could be controlled for with more granular data, so if you know where I can find those, please holler.)

Histogram

Distribution by Year

Still, CR, even if it doesn’t perfectly capture clutch (as it focuses on only one issue, only captures the top players and lacks granularity), does at least stab at the question of who raises their game. First, though, I want to specify some things we might expect to see if a) clutch play exists and b) this is a good way to measure it:

  • This should be somewhat consistent throughout a career, i.e. a clutch player one year should be clutch again the next. This is pretty self-explanatory, but just to make clear: a player isn’t “clutch” if their improvement isn’t sustained, they’re lucky. The absence of this consistency is one of the reasons the consensus among baseball folk is that there’s no variation in clutch hitting.
  • We’d like to see some connection between success and clutchness, or between having a reputation for being clutch and having a high CR. This is tricky and I want to be careful of circularity, but it would be quite puzzling if the clutchest players we found were journeymen like, I dunno, Igor Andreev, Fabrice Santoro, and Ivo Karlovic.
  • As players get older, they get more clutch. This is preeeeeeeeeeetty much pure speculation, but if clutch is a matter of calming down/experience/whatever, that would be one way for it to manifest.

We can tackle these in reverse order. First, there appears to be no improvement year-over-year in a player’s break ratio. If we limit to seasons with at least 50 matches played, the probability that a player had a higher clutch ratio in year t+1 than he did in year t is…47.6%. So, no year-to-year improvement, and actually a little decrease in clutch play. That’s fine, it just means clutch is not a skill someone develops. (The flip side is that it could be that younger players are more confident, though I’m highly skeptical of that. Still, the problem with evaluating these intangibles is that their narratives are really easily flipped.)

Now, the relationship between success and CR. Let’s first go with a reductive measure of success: what fraction of games a player won. Looking at either a season basis (50 match minimum, 1006 observations) or career basis (200 match minimum, 152 observations), we see tiny, insignificant correlations between these two figures. Are these huge datasets? No, but the total absence of any effect suggests there’s really no link here between player quality and clutch, assuming my chosen metrics are coherent. (I would have liked to try this with year end rankings, but I couldn’t find them in a convenient format.)

What if we take a more qualitative approach and just look at the most and least clutch players, as well as some well-regarded players? The tables below show some results in that direction.

Name Clutch Ratio
Best Clutch Ratios
1 Jo-Wilfried Tsonga 1.08
2 Kenneth Carlsen 1.07
3 Alexander Volkov 1.06
4 Goran Ivanisevic 1.05
5 Juan Martin Del Potro 1.05
6 Robin Soderling 1.05
7 Jan-Michael Gambill 1.04
8 Nicolas Kiefer 1.04
9 Paul Haarhuis 1.04
10 Fabio Fognini 1.04
Worst Clutch Ratios
Name Clutch Ratio
1 Mariano Zabaleta 0.97
2 Andrea Gaudenzi 0.97
3 Robby Ginepri 0.98
4 Juan Carlos Ferrero 0.98
5 Jonas Bjorkman 0.98
6 Juan Ignacio Chela 0.98
7 Gaston Gaudio 0.98
8 Arnaud Clement 0.98
9 Thomas Enqvist 0.99
10 Younes El Aynaoui 0.99

See any pattern to this? I’ll cop to not recognizing many of the names, but if there’s a pattern I can see it’s that a number of the guys at the top of the list are real big hitters (I would put Tsonga, Soderling, Del Potro, and Ivanesevic in that bucket, at least). Otherwise, it’s not clear that we’re seeing the guys you would expect to be the most clutch players (journeyman Dolgov at #3?), nor do I see anything meaningful in the list of least clutch players.

Unfortunately, I didn’t have a really strong prior about who should be at the top of these lists, except perhaps the most successful players—who, as we’ve already established, aren’t the most clutch. The only list of clutch players I could find was a BleacherReport article that used as its “methodology” their performance in majors and deciding sets, and their list doesn’t match with these at all.

Since these lists are missing a lot of big names, I’ve put a few of them in the list below.

Clutch Ratios of Notable Names
Overall Rank (of 152) Name Clutch Ratio
18 Pete Sampras 1.03
20 Rafael Nadal 1.03
21 Novak Djokovic 1.03
26 Tomas Berdych 1.03
71 Andy Roddick 1.01
74 Andre Agassi 1.01
92 Lleyton Hewitt 1.01
122 Marat Safin 1.00
128 Roger Federer 1.00

In terms of relative rankings, I guess this makes some sense—Nadal and Djokovic are renowned for being battlers, Safin is a headcase, and Federer is “weak in big points,” they say. Still, these are very small differences, and while over a career 1-2% adds up, I think it’s foolish to conclude anything from this list.

Our results thus far give us some odd ideas about who’s clutch, which is a cause for concern, but we haven’t tested the most important aspect of our theory: that this metric should be consistent year over year. To check this, I took every pair of consecutive years in which a player played at least 50 matches and looked at the clutch ratios in years 1 and 2. We would expect there to be some correlation here if, in fact, this stat captures something intrinsic about a player.

As it turns out, we get a correlation of 0.038 here, which is both small and insignificant. Thus, this metric suggests that players are not intrinsically better or worse in break point situations (or at least, it’s not visible in the data as a whole).

What conclusions can we draw from this? Here we run into a common issue with concepts like clutch that are difficult to quantify—when you get no result, is the reason that nothing’s there or that the metric is crappy? In this case, while I don’t think the metric is outstanding, I don’t see any major issues with it other than a lack of granularity. Thus, I’m inclined to believe that in the grand scheme of things, players don’t really step their games up on break point.

Does this mean that clutch isn’t a thing in tennis? Well, no. There are a lot of other possible clutch metrics, some of which are going to be supremely handicapped by sample size issues (Grand Slam performance, e.g.). All told, I certainly won’t write off the idea that clutch is a thing in tennis, but I would want to see significantly more granular data before I formed an opinion one way or another.

Doing the Splits with Josh Hamilton

I’m in the course of looking at some splits for active players (mostly day/night splits) and came across something I found interesting.

http://www.baseball-reference.com/play-index/split_stats.cgi?full=1&params=stad%7CDay%7Chamiljo03%7Cbat%7CAB%7C

The link is Josh Hamilton’s statistics during day games by year. (All numbers in this post come from b-r.) The thing I keyed in on is tOPS+, which is his OPS relative to his overall OPS–100 would be equal, and 120, say, would be a 20% increase. Here’s that number in day games over his career, with the number of day plate appearances in parentheses:

36 (85), 73 (172), 108 (96), 59 (145), 49 (143), 112 (169), 101 (182).

Now, that’s a pretty dramatic uptick in the last two years, but this is a player known for his volatility (in more than one sense), and we’re not looking at huge samples. Is there a simple explanation? At first, it seems so:

Rangers outfielder Josh Hamilton walked into the clubhouse wearing contact lenses that made his eye look red on Friday. His hope is that they can cut down in the amount of light and help him see the ball better during the day.

That quote is from ESPN Dallas, dated June 24, 2011.

Is this evidence that those stats aren’t a fluke, or (alternatively) evidence that the red contacts aren’t total quackery?

Of course, it’s not simple. For one, there’s no information I could find suggesting that he actually kept wearing them.* Moreover, some of that difference probably is just randomness, since his BABIP was 100 points higher in night games that year. Relatedly, his SLG was about 300 points higher as well–which is a sign he was making much better contact, though it could just be luck. (I couldn’t find his Line Drive % split by Day/Night, but a higher LD% would account for both SLG and BABIP.) Perhaps most importantly, Hamilton actually played about half his 2011 day games after he got the lenses, and still wound up with that awful split.

Still, the fact remains that his (relative) performance went from really awful to respectable after this. The most obvious reason it evened out, though, is that his nighttime strikeout rate almost doubled (2011: 13.4%, 2012: 25.5%, 2013: 24.2%), while his daytime strikeout rate stayed the same (2011: 28.0%, 2012: 25.4%, 2013: 26.4%).

If you’re a believer in the contacts, you’d say that he’s gotten worse overall, but that overall backsliding was counteracted by his daytime improvement, so his splits normalized. If you’re skeptical, especially since he probably hasn’t been wearing the contacts, you say that there was a lot of luck in that 2011 split and that this is regression to the mean. I’m inclined to go with the latter, not least because it’s much simpler.

However, I’m on the fence as to whether Hamilton actually is a worse hitter during day games. On the one hand, he’s got a season and a half of data and the second worst split among active players with at least 600 day at-bats. On the other hand, there’s a 40 point differential in BABIP that I’m fairly willing to chalk up to luck, and there are major multiplicity concerns when you pull one split for one player out of the vast morass of baseball data. I’m inclined to file this whole thing away as an example of the difficulties of trying to do rigorous data work: sometimes you see an interesting nugget in the data and think you have a great explanation, and then it evaporates when you do a bit more digging. C’est la vie.

*This is a big deal, and probably enough to nullify any conclusions I could draw. I kept going just for the hell of it.

Tim McCarver and Going the Other Way

During the Tigers-Red Sox game last night, Tim McCarver said he thought it was a little odd that the Tigers would bring in lefty Drew Smyly to face David Ortiz while also leaving the shift on, since lefties are more likely to go to the opposite field against a left-handed pitcher. (At the very least, I know he said this last part. Memory is a tricky thing, and I’m now not sure whether he said this about Ortiz or someone else, possibly Alex Avila.) Being Tim McCarver, he didn’t say why this might be true, nor did he cite a source for this information, putting this firmly in the realm of obnoxious hypotheses.

The first question is whether or not this is true. For that, there are these handy aggregated spray charts, courtesy Brooks Baseball.

David Ortiz Aggregated Spray Charts by Pitcher Handedness, 2007–2013

Alex Avila Aggregated Spray Charts by Pitcher Handedness, 2007–2013

Based on these data, I have to say it seems like McCarver’s assertion is true: they are slightly more likely to go to left against a left-handed pitcher. I don’t have enough information to say if the differences are either statistically significant (I’d guess it is, given the number of balls these guys have put into play in the last  5-7 years) or practically significant (I kinda doubt it). Regardless of the answer, though, the fact remains that the appropriate thing to do is to bring in the lefty and shift slightly less drastically, so who knows why McCarver brought this up to begin with. After all, Ortiz hits drastically worse against lefties (his OPS against lefties is 24% smaller than his lifetime rate, via baseball-reference), as does Avila (36%).

There’s also the question of why this might be true, and in fairness to McCarver, there are some pretty plausible mechanisms for what he was saying. One is that a breaking pitch from a left-hander is more likely to be on the outer part of the plate for a left-handed batter than a similar pitch from a right-handed batter, and outside pitches are more likely to get hit the other way. Another is that left-handed batters can’t pick up a pitch as easily against a left-handed pitcher, so they are more likely to make late contact, which is in turn more likely to go to the opposite field. I can’t necessarily confirm either of these mechanisms empirically, though looking at Brooks splits for Avila and Ortiz suggests that the fraction of outside pitches they see against left-handers is about 3 percentage points larger than the fraction against righties.

So, what McCarver said was true (though not terribly helpful), and there are seemingly good reasons for it to be true. I still posted something, though, because this is a great example of something that pisses me off about sports commentators–a tendency to toss out suppositions and not bother with supporting or explaining them. (Another good example of this is Hawk Harrelson.) That tendency, along with their love of throwing out hypotheses that are totally unfalsifiable (McCarver asserting that the pitching coach coming out to the mound is valuable, e.g.), is one of the things I plan to deal with pretty regularly in this space.

(Happy first post, everyone.)