Category Archives: Baseball History

The Quality of Postseason Play

Summary: I look at averages for hitters and pitchers in the postseason to see how their quality (relative to league average) has changed over time. Unsurprisingly, the gap between postseason and regular season average pitchers is larger than the comparable gap for hitters. The trend over time for pitchers is expected, with a decrease in quality relative to league average from the 1900s to mid-1970s and a slight increase since then that appears to be linked with the increased usage of relievers. The trend for hitters is more confusing, with a dip from 1950 to approximately 1985 and an increase since then. Overall, however, the average quality of both batters and pitchers in the postseason relative to league average is as high as it has been in the expansion era.

Quality of play in the postseason is a common trope of baseball discussion. Between concerns about optics (you want casual fans to watch high quality baseball) and rewarding the best teams, there was a certain amount of handwringing about the number of teams with comparatively poor records into the playoffs (e.g., the Giants and Royals made up the only pair of World Series teams ever without a 90 game winner). This prompted me to wonder about the quality of the average players in the postseason and how that’s changed over time with the many changes in the game—increased competitive balance, different workloads for pitchers, changes in the run environment, etc.

For pitchers, I looked at weighted league-adjusted RA9, which I computed as follows:

  1. For each pitcher in the postseason, compute their Runs Allowed per 9 IP during the regular season. Lower is better, obviously.
  2. Take the average for each pitcher, weighted by the number of batters faced.
  3. Divide that average by the major league average RA9 that year.

You can think of this as the expected result you would get if you chose a random plate appearance during the playoffs and looked at the pitcher’s RA9. Four caveats here:

  1. By using RA9, this is a combined pitching/defense metric that really measures how much the average playoff team is suppressing runs relative to league average.
  2. This doesn’t adjust for park factors, largely because I thought that adjustment was more trouble than it was worth. I’m pretty sure the only effect that this has on aggregate is injecting some noise, though I’m not positive.
  3. I considered using projected RA9 instead of actual RA9, but after playing around with the historical Marcel projections at Baseball Heat Maps, I didn’t see any meaningful differences on aggregate.
  4. For simplicity’s sake, I used major league average rather than individual league average, which could influence some of the numbers in the pre-interleague play era.

When I plot that number over time, I get the following graph. The black dots are observed values, and the ugly blue line is a smoothed rolling estimate (using LOESS). (The gray is the confidence interval for the LOESS estimate.)


While I wouldn’t put too much weight in the LOESS estimate (these numbers should be subject to a large bit of randomness), it’s pretty easy to come up with a basic explanation of why the curve looks the way it does. For the first seventy years of that chart, the top pitchers pitched ever smaller shares of the overall innings (except for an uptick in the 1960s), ceding those innings to lesser starters and dropping the average quality. However, starting in the 1970s, relievers have covered larger portions of innings (covered in this FiveThirtyEight piece), and since relievers are typically more effective on a rate basis than starters, that’s a reasonable explanation for the shape of the overall pitcher trend.

What about hitters? I did the same calculations for them, using wOBA instead of RA9 and excluding pitchers from both postseason and league average calculations. (Specifically, I used the static version of wOBA that doesn’t have different coefficients each year. The coefficients used are the ones in The Book.) Again, this includes no park adjustments and rolls the two leagues together for the league average calculation. Here’s what the chart looks like:


Now, for this one I have no good explanation for the trend curve. There’s a dip in batter quality starting around integration and a recovery starting around 1985. If you have ideas about why this might be happening, leave them in the comments or Twitter. (It’s also quite possible that the LOESS estimate is picking up something that isn’t really there.)

What’s the upshot of all of this? This is an exploratory post, so there’s no major underlying point, but from the plots I’m inclined to conclude that, relative to average, the quality of the typical player (both batter and pitcher) in the playoffs is as good as it’s been since expansion. (To be clear, this mostly refers to the 8 team playoff era of 1995–2011; the last few years aren’t enough to conclude anything about letting two more wild cards in for a single game.) I suspect a reason for that is that, while the looser postseason restrictions have made it easier for flawed teams to make it in the playoffs, they’ve also made it harder for very good teams to be excluded because of bad luck, which lifts the overall quality, a point raised in this recent Baseball Prospectus article by Sam Miller.

Two miscellaneous tidbits from the preparation of this article:

  • I used data from the Lahman database and Fangraphs for this article, which means there may be slight inconsistencies. For instance, there’s apparently an error in Lahman’s accounting for HBP in postseason games the last 5 years or so, which should have a negligible but non-zero effect on the results.
  • I mentioned that the share of batters faced in the postseason by the top pitchers has decreased steadily over time. I assessed that using the Herfindahl-Hirschman index (which I also used in an old post about pitchers’ repertoires.) The chart of the HHI for batters faced is included below. I cut the chart off at 1968 to exclude the divisional play era, which by doubling the number of teams decreased the level of concentration substantially.  HHI

Notes from SaberSeminar

I was fortunate enough to be at SaberSeminar this past weekend, held at Boston University and organized by (among others) Dan Brooks, the titular Brooks of PitchF/X site BrooksBaseball. I took some notes throughout the weekend, and I’ve typed them up below, broken into smaller observations. (All of the reflection was done on the bus home, so any mistakes are my own due to typing with a fried brain.)

One other thing to tell is that I presented some research I did on shifts in the strike zone (it actually came out of this article about high socks), and I’m going to be writing that up as an article soon, though it may appear at a different site. All in all, quite an enjoyable weekend even after factoring in the scattered criticisms below.

It was my first time at a baseball conference and the first time I’d been at any conference in quite some time, and it actually struck me as pretty similar to an indie music festival. The crowd wasn’t huge (a couple hundred people), and they all knew most of the speakers, who mostly stuck to greatest hits sort of things. (Most of what I saw presented wasn’t novel, especially by the more prominent folk.) That’s not to take away from the sessions—it was still interesting to meet and hear people that I had only read, and it was still great to be around a group where everyone was interested in the same sort of stuff and you could bring up things like SIERA and wOBA without much risk of confusion.

They had a panel discussion featuring three of the Red Sox baseball operations interns, and I was reminded of how skeezy some aspects of that system are. The moderator talked about how there were fewer MLB intern slots than there used to be because the feds cracked down on illegal internships, which he framed as a bad thing. I found that a bit horrifying; it seems odd to me that a team with a payroll of hundreds of millions would cut staff rather than pay a semi-reasonable wage to their junior people. (Even if it makes economic sense, it seems like a bad way to treat people.)

The interns, for their part, didn’t offer a whole lot of insight into things. (Not that I blame them; it’s hard to be insightful during a panel discussion.) They twice dodged the question of how many hours they work, only saying “a lot.” (It’s possible I’m being too harsh and they actually don’t know; because I’m billed out by the hour at work, I have to keep very accurate time logs, which means I know how much I’ve worked every week since I’ve started, but I may be an outlier in that regard.) One of the failings of the panel was that it didn’t include anyone who had been an intern and washed out (either quit or wasn’t offered a job), which would have been more informative and more interesting. (I understand not wanting to irk any of the teams, but I don’t think this is too inflammatory.) This is the same problem I ran into in college a lot, where most of the advice I got about whether to get a Ph.D. came from people who had not only loved grad school but also met with astounding success afterward.

I also thought about the fact that they are hiring recent grads of extremely expensive schools (Columbia, Yale, and Georgetown, in this case) to be extremely underpaid interns; I wonder how much of their labor pool is indirectly disqualified simply due to a lack of connections or a need to make money for family or student loan reasons. It’s very puzzling to me that teams, despite being flush with cash, hire people similarly to high-prestige, low-money companies like magazines rather than high-prestige, rich firms like banks and tech companies. I’d love to see more discussion from people who know more about the industry than I do.

One mostly unstated theme that kept occurring to me throughout the weekend was the issue of class, opportunity, and privilege, which popped up in a number of different ways:

  • Baseball Prospectus’s Russell Carleton discussed how teams can (or should) help their players develop into better adults by focusing on their financial, practical, nutritional, and mental well-being. While he focused mostly on the positive effects it would have on a player’s career and thus a team’s investment (fewer distractions and a better makeup will help talent win out), it seems to me that it’s a good thing in its own right to improve the life skills of the wash outs—especially the ones that skipped college and/or come from poorer backgrounds. Fewer guys in society who behave like Dirk Hayhurst’s teammates is probably a good thing.
  • Tom Tippett, a senior analyst for the Red Sox, talked about how he always appreciates players that are diamonds in the rough, i.e. guys who went undrafted, played in the independent leagues, etc. The thing is, though, that equality of access doesn’t exist, and it’s an interesting thought experiment to think of how many guys get cut before they figure this out. In particular, I wonder how many guys are able to get to a good college or (building on Carleton’s point) hang around the minors longer (thus increasing the opportunity that they make the big leagues) because they have better “makeup” that really comes from growing up with a few more advantages.
  • There was a demo of TrackMan, which is a portable radar system that can be used to evaluate pitch speed and rotation. One of the guys I was talking to pointed out that they’d sold as many of the systems as they could to clubs and agencies and were hoping to move on to selling it to amateurs. I don’t know what one costs, but the idea of buying a portable radar system for your high school pitcher seems like a caricature of what a rich family gunning for a scholarship would do (analogous to all of the academic tutoring and test prep that a lot of people I know did).
  • Relatedly, a number of guys talked about the various high school showcases that pit the best high school talent in a region against each other, and one casually mentioned how much money they bring in. Again, seems like the sort of thing that serves to extract cash from hyperzealous parents and limit the opportunities for kids of less means, but I don’t know enough about the system to comment.
  • Internship opportunities for big league teams, which I discussed above.
  • I’ve been of the conviction for a while that pro sports would be equally or more enjoyable and substantially less ethically problematic if the teams were run as non-profits in the same general manner as art museums and whatnot (the Green Bay Packers are something like this already), and while I won’t go into that further here, if that were the case I think it’d be easier for a lot of the explicit privilege issues to be brought up in the game. While teams are still nominally concerned with profits, it’s a lot easier for them to sidestep problems that in a better world they could help address.

There were a number of talks with a more medical and scientific focus, and they provided good examples of how hard it is to apply these things rigorously to baseball (or any other real world application). There are lots of studies with very small N (“N=4” appeared on one slide describing research that had been published), and they are presumably subject to the same sorts of issues that all public health and social science papers are. While I’m sure lots of teams (in all sports) would love to bring in scientists to help them with things like sleep and vision, I imagine there’s a lot of stuff that falls apart between the lab and the field (if it even exists at all).

I should mention that the first talk of the conference, by a UC-Riverside professor who focuses on vision, did have experimental evidence of how improved vision helps college player performance, but it’s still tiny samples and only college students, and thus to be taken with a grain of salt.

There was an interesting panel featuring Matt Swartz, Ben Baumer, and Vince Gennaro discussing the relationship between winning and teams’ making money that prompted at least a couple article ideas for the future. Gennaro said he thought that the way teams spend money might change a bit after the addition of the second wild card, as there is now much greater variety across playoff teams in terms of how valuable the postseason slot is—the first seed became more valuable and the wild card slots substantially less. I have some thoughts on that, but will leave them for future articles.


Astros’ GM Jeff Luhnow gave a pleasant enough if relatively fact-free talk, the main focus of which was the importance of convincing the uniformed personnel of the importance of the sabermetric principles that buck conventional wisdom. He used the example of the shift and how it took the Astros three years to actually get people on board with it; obviously, if the players and the manager don’t like it, it won’t work as well as it would otherwise. I honestly wouldn’t be surprised if this becomes much less of an issue in 10 or so years, when the reasoning will have permeated a bit more through the baseball establishment and managers and young players will be a lot more open to things.

Vince Gennaro gave a very similar talk, and one thing he brought up was that you need to strike a balance between sticking to general principles (about shifting, pitcher workload, etc.) and making exceptions where warranted. Given that Luhnow talked about how he had made too many exceptions about when to shift last year, the point was hammered home, though it’s a vague enough point that it’s hard to really implement. (I was also reminded of this recent article about Ruben Amaro and exceptions.)

While Red Sox GM Ben Cherington didn’t discuss anything much more novel than what Luhnow covered, he was a lot more personable and down-to-earth while doing so. I imagine some of that is personality and a lot of it is the result of being the GM of the defending World Series champs and talking in his own backyard instead of presiding over three years of horrible teams and a lot of criticism from around baseball.

The last question asked of Luhnow was a minute-long ramble that wasn’t really a question and basically turned into “haha, you screwed up the Brady Aiken situation,” and Luhnow looked pretty peeved afterward, prompting Dan Brooks to tell the crowd not to be jerks to the presenters. There were a lot of bad questions all weekend, especially to Cherington, Red Sox manager John Farrell, and Luhnow, who clearly couldn’t say anything about their teams to us that was any more interesting than what they tell the media after a game. That didn’t stop people from bugging Farrell about his bullpen, though.

There was a bit more offensive humor in some of the talks than I would have expected—one professor made a joke about George W. Bush being brain damaged, and another managed to have a slide showing him wearing a t shirt that said “Drunk Bitches Love Me,” a slide with a cartoon captioned “I Will Fucking Cut You, Bitch,” and threw in a couple fat jokes for good measure. As anyone who knows me will attest, I have a reasonably sharp sense of humor, but throwing around jokes with misogynistic overtones at a conference that I would estimate was 90–95% men made me cringe.

More bullets from John Farrell’s talk:

  • When asked (I believe by accomplished sabermetrician Mitchel Lichtman, aka MGL) about his handling of the bullpen on Friday night, he discussed how “it’s about 162 games” (specifically about pulling Koji Uehara after an easy inning). While that’s certainly important, I wonder how often it’s just used as a crutch to justify poor decisions. As Grantland’s Bill Barnwell has written before, you basically only ever see nebulous qualitative concerns like Uehara’s overwork invoked to defend conservative decisions, never more aggressive ones.
  • Farrell thinks that the spree of Tommy John surgeries is more due to guys overthrowing (to impress scouts) than it is due to the sheer volume of pitches. The surgeon that spoke later in the conference disagreed.
  • He mentioned that it’s easier to use less conventional strategies once opponents start doing it, because it seems more normal to the players. One obvious consequence of this is that really conspicuous tactical advances like shifting are going to be relatively short-lived advantages.
  • He thinks that introducing replay and the corresponding promotion of several umpires has led to an expanded strike zone and that’s part of the continued downturn in baseball. That seems like a testable hypothesis, and something I’ll probably look at soon.
  • Farrell mentioned that figuring out how guys will react when they fail as baseball players for the first time is an important part of helping players move through the system, which I guess is a big part of what Russell Carleton later talked about. It reminded me a lot of what people say about elite colleges and places like Stuyvesant High School, where a lot of people have to adjust from being in the 99th percentile of their peers at their prior institution to being median or below at their new one.

Russell Carleton gets major points for treating data as a plural noun rather than a singular one; he was the only one I noticed doing that all conference. I’d be curious to see what the usage rates are depending on background, with my guess being that people with more academic experience use “are” more than people who mostly use data in a private sector setting. (Yes, I’m a pedant about some of these things.)

Two White Sox notes from people’s presentations:

  • Apparently Tyler Flowers is fifth in the bigs in saving runs by catch framing. Some of that is surely because of his workload, but it’s still a surprise.
  • Not surprising, but something I’d forgotten: Erik Johnson was BP’s #1 prospect for the White Sox at the beginning of the year. Sigh.

Another one of those themes that kept popping up to me during the weekend was the idea of how teams preserve their edges, especially the ones they derive through quantitative and sabermetric means. (I’m reminded of the Red Queen hypothesis, which is an evolutionary biology idea I learned about through quiz bowl that applies pretty well to baseball analysis. Basically, you have to keep advancing in absolute terms to stay in the same place relatively, because if you are complacent people will catch up to you naturally.) A few places that this issue manifested itself:

  • MGL said that, for general short term forecasting, any of the major projection public systems (Oliver, Steamer, PECOTA, and ZiPS) will do. He actually suggested that teams were probably wasting their time trying to come up with a better general forecasting system and that understanding volatility and more specific systems is probably more important. Jared Cross, developer of Steamer, disagreed a little bit; his view was that the small gap between current projection systems and perfect estimates meant that a seemingly marginal improvement would actually mean a lot because of how competitive things are.
  • The sort of innovations Carleton discussed are the sort of thing that would pretty quickly spread throughout baseball, making any one team;s advantage relatively fleeting. Of course, if it spreads then it’s likely to increase the overall quality of the talent pool, which would lead to either better games or more teams, which is probably good. (It’s not necessarily good for the current players; any progress in player evaluation and development is unlikely to actually help players, given that the total number of jobs isn’t increasing; if one player does better, another loses his job.)
  • Tom Tippett said that if he had his druthers the public wouldn’t have access to PitchF/X data, because it allows teams who don’t have good analysts to borrow heavily from the public and thus decreases the advantage that analytically-inclined organizations hold. While I think that’s true from his self-interested perspective, I think it’s a bit short-sighted overall, and I wish he’d answered with a “good for the game” perspective. When you think about what’s best for fans, I think defending closed systems is probably harder; one of the things I like about baseball is how freely available the data are, and to the extent that becomes less true I think it’s a sad thing.
  • On that note, I’m a bit amused by what people think of as being “trade secrets”; there were lots of teams bouncing around and a few presentations by companies that are using highly proprietary data analysis methods that they are trying to sell people on. Again, it’s hard for me to evaluate how meaningful that stuff is (though people love kicking around the figure that pro teams are five years ahead of the public in their understanding of things), but even if it does represent a competitive advantage it’s still pretty funny to step back and think about the secrecy that’s applied to sports.

Several different people brought up StatCast, which is the new data collection system MLB Advanced Media is going to roll out some time soon; it will provide a huge amount of data on how fast players move and how quickly they react that will allow for analysis that’s a bit more along the lines of what the SportVU cameras do in basketball. (See the videos in the above link for examples.) There’s still no sense of whether or not it will be made public (and in what form it might be made public), but people were uniformly excited about it.

The projections folk were united in the belief that it would have a huge effect on projecting defense, to the point where MGL thinks defense will go from being the hardest component of the sport to analyze and predict to the easiest. There was a bit more divergence about what it might do for pitching and batting analysis, as well as about when it would come out—one speaker quoted MLB and said it would be ready to go by the beginning of next year, whereas Dave Cameron pointed out that test data hadn’t been released to the teams yet despite what was originally promised and thus thought it was highly unlikely that the data would be ready for teams by next year, much less ready for public consumption.

Dan Brooks jokingly introduced a hitting metric he called “GIP,” for Google Images Performance. It was prompted by the fact that a Google search for “miguel cabrera hitting” or “david ortiz hitting” finds pictures of them hitting home runs, whereas a query for “jose molina hitting” gets mostly pictures of him behind the plate.

More notes from Tippett’s talk:

  • He started by mentioning that the hardest part of his job is deciding when a player’s underperformance is real and not just noise; I don’t think I’ve ever seen a rigorous evaluation of how to do that using Bayes’ rule and the costs of Type I and Type II errors (booting a good player and keeping a bad one, respectively), but I would love to read one. (How projection systems react to new performance data is in principle just Bayesian reasoning regardless.)
  • He non-snarkily talked about the “momentum” that the Red Sox had, and nobody asked about it. All that probably means is that when he’s talking outside the office he’s perfectly willing to let a bit of narrative creep in.
  • He talked about their decision to cut Grady Sizemore and how it was related to certain incentives in his contract that would have vested. I casually wonder how cavalier teams are allowed to be in explicitly making decisions based on players’ contract incentives, given that there have been talks about how players might opt to  file grievances about these things in the past. (The one that comes to mind is Brett Myers, who would have had an option vest if he finished a certain number of games for the White Sox a couple years back. The commentary I read suggested that conspicuously changing his usage pattern would have been grounds for a grievance, but I don’t know how that applies to situations like Sizemore’s.)

Do High Sock Players Get “Hosed” by the Umpires?

I was reading one of Baseball Prospectus’s collections this morning and came across an interesting story. It’s a part of baseball lore that Willie Mays started his career on a brutal cold streak (though one punctuated by a long home run off Warren Spahn). Apparently, manager Leo Durocher told Mays toward the end of the slump that he needed to pull his pants up because the pant knees were below Mays’s actual knees, which was costing him strikes. Mays got two hits the day after the change and never looked back.

To me, this is a pretty great story and (to the extent it’s true) a nice example of the attention to detail that experienced athletes and managers are capable of. However, it prompted another question: do uniform details actually affect the way that umpires call the game?

Assessing where a player belts his pants is hard, however, so at this point I’ll have to leave that question on the shelf. What is slightly easier is looking at which hitters wear their socks high and which cover their socks with their baseball pants. The idea is that by clearly delineating the strike zone, the batter will get fairer calls on balls near the bottom of the strike zone than he might otherwise. This isn’t a novel idea—besides the similarity to what Durocher said, it’s also been suggested herehere, and in the comments here—but I wasn’t able to find any studies looking at this. (Two minor league teams in the 1950s did try this with their whole uniforms instead of just the socks, however. The experiments appear to have been short-lived.)

There are basically two ways of looking at the hypothesis: the first is that it will be a straightforward benefit/detriment to the player to hike his socks because the umpire will change his definition of the bottom of the zone; this is what most of the links I cited above would suggest, though they didn’t agree on which direction. I’m somewhat skeptical of this, unless we think that the umpires have a persistent bias for or against certain players and that that bias would be resolved by the player changing how he wears his socks. The second interpretation is that it will make the umpire’s calls more precise, meaning simply that borderline pitches are called more consistently, but that it won’t actually affect where the umpire thinks the bottom of the zone is.

At first blush, this seems like the sort of thing that Pitch F/X would be perfectly suited to, as it gives oodles of information about nearly every pitch thrown in the majors in the last several years. However, it doesn’t include a variable for the hosiery of the batter, so to do a broader study we need additional data. After doing some research and asking around, I wasn’t able to find a good database of players that consistently wear high socks, much less a game-by-game list, which basically ruled out a large-scale Pitch F/X study.

However, I got a very useful suggestion from Paul Lukas, who runs the excellent Uni Watch site. He pointed out that a number of organizations require their minor leaguers to wear high socks and only give the option of covered hose to the major leaguers, providing a natural means of comparison between the two types of players. This will allow us to very broadly test the hypothesis that there is a single direction change in how low strikes are called.

I say very broadly because minor league Pitch F/X data aren’t publicly available, so we’re left with extremely aggregate data. I used data from Minor League Central, which has called strikes and balls for each batter. In theory, if the socks lead to more or fewer calls for the batter at the bottom of the zone, that will show up in the aggregate data and the four high-socked teams (Omaha, Durham, Indianapolis, and Scranton/Wilkes-Barre) will have a different percentage of pitches taken go for strikes. (I found those teams by looking at a sample of clips from the 2013 season; their AA affiliates also require high socks.)  Now, there are a lot of things that could be confounding factors in this analysis:

  1. Players on other teams are allowed to wear their socks high, so this isn’t a straight high socks/no high socks comparison, but rather an all high socks/some high socks comparison. (There’s also a very limited amount of non-compliance on the all socks side, as based on the clips I could find it appears that major leaguers on rehab aren’t bound by the same rules; look at some Derek Jeter highlights with Scranton if you’re curious.)
  2. AAA umpires are prone to more or different errors than major league umpires.
  3. Which pitches are taken is a function of the team makeup and these teams might take more or fewer balls for reasons unrelated to their hose.
  4. This only affects borderline low pitches, and so it will only make up a small fraction of the overall numbers we observe and the impact will be smothered.

I’m inclined to downplay the first and last issues, because if those are enough to suppress the entire difference over the course of a whole season then the practical significance of the change is pretty small. (Furthermore, for #1, from my research it didn’t look like there were many teams with a substantial number of optional socks-showers. Please take that with a grain of salt.)

I don’t really have anything to say about the second point, because it has to do with extrapolation, and for now I’d be fine just looking at AAA. I don’t have even have that level of brushoff response for the third point except to wave my hands and say that I hope it doesn’t matter given that these reflect pitches thrown by the rest of the league, so they will hopefully converge around league average.

So, having substantially caveated my results…what are they? As it turns out, the percentage of pitches the stylish high sock teams took that went for strikes was 30.83% and the equivalent figure for the sartorially challenged was…30.83%. With more than 300,000 pitches thrown in AAA last year, you need to go to the seventh decimal place of the fraction to see a difference. (If this near equality seems off to you, it does to me as well. I checked my figures a couple of ways, but I (obviously) can’t rule out an error here.)

What this says to me is that it’s pretty unlikely that this ends up mattering, unless there is an effect and it’s exactly cancelled out by the confounding factors listed above (or others I failed to consider). That can’t be ruled out as a possibility, nor can data quality issues, but I’m comfortable saying that the likeliest possibility by a decent margin is that socks don’t lead to more or fewer strikes being called against the batter. (Regardless, I’m open to suggestions for why the effect might be suppressed or analysis based on more granular data I either don’t have access to or couldn’t find.)

What about the accuracy question, i.e. is the bottom of the strike zone called more consistently or correctly for higher-socked players? Due to the lack of nicely collected data, I couldn’t take a broad approach to answering this, but I do want to record an attempt I made regardless. David Wright is known for wearing high socks in day games but covering his hosiery at night, which gives us a natural experiment we can look at for results.

I spent some amount of time looking at the 2013 Pitch F/X data for his day/night splits on taken low pitches and comparing those to the same splits for the Mets as a whole, trying a few different logistic regression models as well as just looking at the contingency tables to see if anything jumped out, and nothing really did in terms of either greater accuracy or precision. I didn’t find any cuts of the data that yielded a sufficiently clean comparison or sample size that I was confident in the results. Since this is a messy use of these data in the first place (it relies on unreliable estimates of the lower edge of a given batter’s strike zone, for instance), I’m going to characterize the analysis as incomplete for now. Given a more rigorous list of which players wear high socks and when, though, I’d love to redo this with more data.

Overall, though, there isn’t any clear evidence that the socks do influence the strike zone. I will say, though, that this seems like something that a curious team could test by randomly having players (presumably on their minor league teams) wear the socks high and doing this analysis with cleaner data. It might be so silly as to not be worth a shot, but if this is something that can affect the strike zone at all then it could be worthwhile to implement in the long run—if it can partially negate pitch framing, for instance, then that could be quite a big deal.

Throne of Games (Most Played, Specifically)

I was trawling for some stats on hockey-reference (whence most of the hockey facts in this post) the other day and ran into something unexpected: Bill Guerin’s 2000-01 season. Specifically, Guerin led the league with 85 games played. Which wouldn’t have seemed so odd, except for the fact that the season is 82 games long.

How to explain this? It turns out there are two unusual things happening here. Perhaps obviously, Guerin was traded midseason, and the receiving team had games in hand on the trading team. Thus, Guerin finished with three games more than the “max” possible.

Now, is this the most anyone’s racked up? Like all good questions, the answer to that is “it depends.” Two players—Bob Kudelski in 93-94 and Jimmy Carson in 92-93—played 86 games, but those were during the short span of the 1990s when each team played 84 games in a season, so while they played more games than Guerin, Guerin played in more games relative to his team. (A couple of other players have played 84 since the switch to 82 games, among them everyone’s favorite Vogue intern, Sean Avery.)

What about going back farther? The season was 80 games from 1974–75 to 1991–92, and one player in that time managed to rack up 83: the unknown-to-me Brad Marsh, in 1981-82, who tops Guerin at least on a percentage level. Going back to the 76- and 78-game era from 1968-74, we find someone else who tops Guerin and Marsh, specifically Ross Lonsberry, who racked up 82 games (4 over the team maximum) with the Kings and Flyers in 1971–72. (Note that Lonsberry and Marsh don’t have game logs listed at hockey-reference, so I can’t verify if there was any particularly funny business going on.) I couldn’t find anybody who did that during the 70 game seasons of the Original Six era, and given how silly this investigation is to begin with, I’m content to leave it at that.

What if we go to other sports? This would be tricky in football, and I expect it would require being traded on a bye week. Indeed, nobody has played more than the max games at least since the league went to a 14 game schedule according to the results at pro-football-reference.

In baseball, it certainly seems possible to get over the max, but actually clearing this out of the data is tricky for the following two reasons:

  • Tiebreaker games are counted as regular season games. Maury Wills holds the raw record for most games played with 165 after playing in a three game playoff for the Dodgers in 1962.
  • Ties that were replayed. I started running into this a lot in some of the older data: games would be called after a certain number of innings with the score tied due to darkness or rain or some unexplained reason, and the stats would be counted, but the game wouldn’t count in the standings. Baseball is weird like that, and no matter how frustrating this can be as a researcher, it was one of the things that attracted me to the sport in the first place.

So, those are my excuses if you find any errors in what I’m about to present; I used FanGraphs and baseball-reference to spot candidates. I believe there’s only been a few cases of baseball players playing more than the scheduled number of games when none of the games fell into those two problem categories mentioned above. The most recent is Todd Zeile, who, while he didn’t play in a tied game, nevertheless benefited from one. In 1996, he was traded from the Phillies to the Orioles after the O’s had stumbled into a tie, thus giving him 163 games played, though they all counted.

Possibly more impressive is Willie Montanez, who played with the Giants and Braves in 1976. He racked up 163 games with no ties, but arguably more impressive is that, unlike Zeile, Montanez missed several opportunities to take it even farther. He missed one game before being traded, then one game during the trade, and then two games after he was traded. (He was only able to make it to 1963 because the Braves had several games in hand on the Giants at the time of the trade.)

The only other player to achieve this feat in the 162 game era is Frank Taveras, who in 1979 played in 164 games; however, one of those was a tie, meaning that according to my twisted system he only gets credit for 163. He, like Montanez, missed an opportunity, as he had one game off after getting traded.

Those are the only three in the 162-game era. While I don’t want to bother looking in-depth at every year of the 154-game era due to the volume of cases to filter, one particular player stands out. Ralph Kiner managed to put up 158 games with only one tie in 1953, making him by my count the only baseball player to play three meaningful games more than his team did in baseball since 1901.

Now, I’ve sort of buried the lede here, because it turns out that the NBA has the real winners in this category. This isn’t surprising, as the greater number of days off between games means it’s easier for teams to get out of whack and it’s more likely than one player will play in every game. Thus, a whole host of players have played more than 82 games, led by Walt Bellamy, who put up 88 in 1968-69. While one player got to 87 since, and a few more to 86 and 85, Bellamy stands alone atop the leaderboard in this particular category. (That fact made it into at least one of his obituaries.)

Since Bellamy is the only person I’ve run across to get 6 extra games in a season and nobody from any of the other sports managed even 5, I’m inclined to say that he’s the modern, cross-sport holder of this nearly meaningless record for most games played adjusted for season length.

Ending on a tangent: one of the things I like about sports records in general, and the sillier ones in particular, is trying to figure out when they are likely to fall. For instance, Cy Young won 511 games playing a sport so different from contemporary baseball that, barring a massive structural change, nobody can come within 100 games of that record. On the other hand, with strikeouts and tolerance for strikeouts at an all-time high, several hitter-side strikeout records are in serious danger (and have been broken repeatedly over the last 15 years).

This one seems a little harder to predict, because there are factors pointed in different directions. On the one hand, players are theoretically in better shape than ever, meaning that they are more likely to be able to make it through the season, and being able to play every game is a basic prerequisite for playing more than every game. On the other, the sports are a lot more organized, which would intuitively seem to decrease the ease of moving to a team with meaningful games in hand on one’s prior employer. Anecdotally, I would also guess that teams are less likely to let players play through a minor injury (hurting the chances). The real wild card is the frequency of in-season trades—I honestly have no rigorous idea of which direction that’s trending.

So, do I think someone can take Bellamy’s throne? I think it’s unlikely, due to the organizational factors laid out above, but I’ll still hold out hope that someone can do it—or at least, finding new players to join the bizarre fraternity of men playing more games than their teams.

The Joy of the Internet

One of the things I love about the Internet is that you can use the vast amounts of information to research really minor trivia from pop culture and sports. In particular, there’s something I find charming about the ability to identify exact sporting (or other) moments from various works of fiction—for instance, Ice Cube’s good day and the game Ferris Bueller attended.

I bring this up because I finally started watching The Wire (it’s real good, you should watch it too) and, in a scene from the Season 3 premiere, McNulty and Bunk go to a baseball game with their sons. This would’ve piqued my interest regardless, because it’s baseball and because it’s Camden Yards, but it’s also a White Sox game, and since the episode came out a year before the White Sox won the series, it features some players that I have fond memories of.

So, what game is it? As it turns out, we only need information about the players shown onscreen to make this determination. For starters, Carlos Lee bats for the Sox:

Carlos Lee

This means the game can’t take place any later than 2004, as Lee was traded after the season. (Somewhat obvious, given that the episode was released in 2004, but hey, I’m trying to do this from in-universe clues only.) Who is that who’s about to go after the pop up?

Javy Lopez

Pretty clearly Javy Lopez:

Lopez Actual

Lopez didn’t play for the O’s until 2004, so we have a year locked down. Now, who threw the pitch?

Sidney Ponson

Sidney Ponson, everyone’s favorite overweight Aruban pitcher! Ponson only pitched in one O’s-Sox game at Camden Yards in 2004, so that’s our winner: May 5, 2004. A White Sox winner, with Juan Uribe having a big triple, Billy Koch almost blowing the save, and Shingo Takatsu—Mr. Zero!—getting the W.

One quick last note—a quick Google reveals that I’m far from the first person to identify this scene and post about it online, but I figured it’d be good for a light post and hey, I looked it up myself before I did any Googling.

Don’t Wanna Be a Player No More…But An Umpire?

In my post about very long 1-0 games, I described one game that Retrosheet mistakenly lists as much longer than it actually was–a 1949 tilt between the Phillies and Cubbies. Combing through Retrosheet initially, I noticed that Lon Warneke was one of the umpires. Warneke’s name might ring a bell to baseball history buffs as he was one of the star pitchers on the pennant winning Cubs team of 1935, but I had totally forgotten that he was also an umpire after his playing career was up.

I was curious about how many other players had later served as umps, which led me to this page from Baseball Almanac listing all such players. As it turns out, one of the other umpires in the game discussed above was Jocko Conlan, who also had a playing career (though not nearly as distinguished as Warneke’s). This raises the question: how many games in major league history have had at least two former players serve as umpires?

The answer is 6,953–at least, that’s how many are listed in Retrosheet. (For reference, there have been ~205,000 games in major league history.) That number includes 96 postseason games as well. Most of those are pretty clustered, for the simple reason that umpires will ump most of their games in a given season with the same crew, so there won’t be any sort of uniformity.

The last time this happened was 1974, when all five games of the World Series had Bill Kunkel and Tom Gorman as two of the men in blue. (This is perhaps more impressive given that those two were the only player umps active at the time, and indeed the last two active period–Gorman retired in 1976, Kunkel in 1984.) The last regular season games with two player/umps were a four game set between the Astros and Cubs in August 1969, with Gorman and Frank Secory the umps this time.

So, two umpires who were players is not especially uncommon–what about more than that? Unfortunately, there are no games with four umpires that played, though four umpires in a regular season game didn’t become standard until the 1950s, and there were never more than 5-7 umps active at a time after that who’d been major league players. There have, however, been 102 games in which three umpires had played together–88 regular season and 14 postseason (coincidentally, the 1926 and 1964 World Series, both seven game affairs in which the Cardinals beat the Yankees).

That 1964 World Series was the last time 3 player/umps took the field at once, but that one deserves an asterisk, as there are 6 umps on the field for World Series games. The last regular season games of this sort were a two game set in 1959 and a few more in 1958. Those, however, were all four ump games, which is a little less enjoyable than a game in which all of the umps are former players.

That only happened 53 times in total (about 0.02% of all MLB games ever), last in October 1943 during the war. There’s not good information available about attendance in those years, but I have to imagine that the 1368 people at the October 2, 1943 game between the A’s and Indians didn’t have any inkling they were seeing this for the penultimate time ever.

Two more pieces of trivia about players-turned-umpires: only two of them have made the Hall of Fame–Jocko Conlan as an umpire (he only played one season), and Ed Walsh as a player (he only umped one season).

Finally, this is not so much a piece of trivia as it is a link to a man who owns the trivia category. Charlie Berry was a player and an ump, but was also an NFL player and referee who eventually worked the famous overtime 1958 NFL Championship game–just a few months after working the 1958 World Series. They don’t make ’em like that anymore, do they?

Notes on Long Games, Part I

Game 1 of the ALCS was one of those games that make baseball (and all sports, really) so great. It was an immensely important game, a near no-hitter (which would have been the first combined no-hitter in postseason history), and a 1-0 game, keeping the tension up for all nine innings. There’s something I’ve always found charming and pure about 1-0 games (whether in soccer, baseball, or hockey); they tend toward the intense, fluid, and (usually) quick.

Game 1, however, was anything but quick, lasting four minutes shy of four hours. It’s a nationally televised game, the Red Sox have a rep for playing slowly, and there were a hefty number of pitching changes. Still, it’s a ridiculous length of time for a 1-0 game, especially a one hitter.

As it turns out, that was the longest 9 inning 1-0 game on record by a margin of 36 minutes, or 15%, which is an astonishingly large leap. (Retrosheet has confirmed this.) I was curious about the prior record holder, so I did some digging, the results of which below. (All info comes from Retrosheet or Baseball Reference, more about which at the bottom of the post.)

There’s now a tie for #2 on the list; one of those games is a 3:20 1997 game between the Brewers and A’s. It’s a bit easier to see why this game lasted so long: there were a combined 341 pitches thrown, 19 more than 2013 ALCS Game 1. (14 walks were issued and 22 runners were left on base, so I imagine it was a pretty ugly game.) A writeup for the game says it was the longest 9 inning 1-0 game in history.

The other 3:20 game? There were only 270 pitches thrown, but it was in the postseason, so that probably accounts for some of it. Either way, it’s maybe my favorite game I’ve ever watched. Yeah, it’s Game 4 of the 2005 World Series. Unsurprisingly, the record was not the lede in any of the recaps I read. (One cause of the length might be things like Carl Everett’s taking about 75 seconds from the time of the previous out to see his first and only pitch (see 1:57:48 of the video). Guess he was moving like a dinosaur.)

One further note: Retrosheet actually lists two 1-0, 9 inning games as having lengths longer than 3:20 before this week. The first was the game between the Phillies and Cubs on July 19, 1949 listed at 3:34. The game looked otherwise entirely ordinary, and in fact, digging through the NYT archives finds a time of game of 1:54. If I had to guess, 1:54 became 5:14 became 214 minutes. The other game—the second of a doubleheader between the Phillies and Brooklyn Robins in 1917, also has the wrong time listed per the NYT archive (note the amusing old-timey recap in the latter link). Here it appears 2:06 became 206. I’ve reached out to Retrosheet and will hopefully have those corrected soon.

Check back in the near future for more on baseball game length.