Leaf-ed Behind by Analytics

As you may have heard, there’s been a whole hullabaloo recently in the hockey world about the Toronto Maple Leafs. Specifically, they had a good run last year and in the beginning of this season that the more numerically-inclined NHL people believed was due to an unsustainably high shooting percentage that covered up their very weak possession metrics. Accordingly, the stats folk predicted substantial regression, which was met with derision by many Leafs fans and most of the team’s brass. The Leafs have played very poorly since that hot streak and have been eliminated from the playoffs; just a few weeks back, they had an 84% chance of making it. (See Sports Club Standings for the fancy chart.)

Unsurprisingly, this has lead to much saying of “I told you so” by the stats folk and a lot of grumblings about the many flaws of the current Leafs administration. Deadspin has a great write up of the whole situation, but one part in particular stood out. This is a quotation from the Leafs’ general manager, Dave Nonis:

“We’re constantly trying to find solid uses for [our analytics budget,” Nonis said. “The last six, seven years, we’ve had a significant dollar amount in our budget for analytics and most of those years we didn’t use it. We couldn’t find a system or a group we felt we could rely on to help us make reasonable decisions.”

[…]

“People run with these stats like they’re something we should pay attention to and make decisions on, and as of right now, very few of them are worth anything to us,” he said at one point during the panel, blaming media and fans for overhyping the analytics currently available.

This represents a mind-boggling lack of imagination on their part. Let’s say they honestly don’t think there’s a good system currently out there that could help them—that’s entirely irrelevant. They should drop the cash and try to build a system from scratch if they don’t like what’s out there.

There are four factors that determine how good the analysis of a given problem is going to be: 1) the analysts’ knowledge of the problem, 2) their knowledge of the tools needed to solve the problem (basically, stats and critical thinking), 3) the availability of the data, and 4) the amount of time the analysts have to work on the problem. People who know about both hockey and data are available in spades; I imagine you can find a few people in every university statistics department and financial firm in Canada that could rise to the task, to name only two places these people might cluster. (They might not know about hockey stats, but the “advanced” hockey stats aren’t terribly complex, so I have faith that anyone who knows both stats and hockey can figure out their metrics.)

For #3: the data aren’t great for hockey, but they exist and will get better with a minimal investment in infrastructure. Analysts’ having sufficient time is the most important factor in progress, though, and the hardest one to substitute; conveniently, time is an easy thing for the team to buy (via salary, which they even get a discount on because of the non-monetary benefits of working in hockey). If they post some jobs at a decent salary, they basically have their pick of statistically-oriented hockey fans. If a team gets a couple of smart people and has them working 40-60 hours a week thinking about hockey and bouncing ideas off of each other, they’re going to get some worthwhile stuff no matter what.

Let’s say that budget is $200,000 per year, or a fraction of the minimum player salary. At that level, one good idea from the wonks and they’ve paid for themselves many times over. Even if they don’t find a grand unified theory of hockey, they can help with more discrete analyses and provide a slightly different perspective on decisions, and they’re so low cost that it’s hard to see how they’d hurt a team. (After all, if the team thinks the new ideas are garbage it can ignore them—it’s what was happening in the first place, so no harm done.) The only way Toronto’s decision makes sense is if they think that analytics not only are currently useless but can’t become useful in the next decade or so, and it’s hard to believe that anyone really thinks that way. (The alternative is that they’re scared that the analysts would con the current brass into a faulty decision, but given their skepticism that seems like an unlikely problem.)

Is this perspective a bit self-serving? Yeah, to the extent that I like sports and data and I’d like to work for a team eventually. Regardless, it seems to me that the only ways to justify the Leafs’ attitude are penny-pinching and the belief that non-traditional stats are useless, and if either of those is the case, something has gone very wrong in Toronto.

Advertisements

Do High Sock Players Get “Hosed” by the Umpires?

I was reading one of Baseball Prospectus’s collections this morning and came across an interesting story. It’s a part of baseball lore that Willie Mays started his career on a brutal cold streak (though one punctuated by a long home run off Warren Spahn). Apparently, manager Leo Durocher told Mays toward the end of the slump that he needed to pull his pants up because the pant knees were below Mays’s actual knees, which was costing him strikes. Mays got two hits the day after the change and never looked back.

To me, this is a pretty great story and (to the extent it’s true) a nice example of the attention to detail that experienced athletes and managers are capable of. However, it prompted another question: do uniform details actually affect the way that umpires call the game?

Assessing where a player belts his pants is hard, however, so at this point I’ll have to leave that question on the shelf. What is slightly easier is looking at which hitters wear their socks high and which cover their socks with their baseball pants. The idea is that by clearly delineating the strike zone, the batter will get fairer calls on balls near the bottom of the strike zone than he might otherwise. This isn’t a novel idea—besides the similarity to what Durocher said, it’s also been suggested herehere, and in the comments here—but I wasn’t able to find any studies looking at this. (Two minor league teams in the 1950s did try this with their whole uniforms instead of just the socks, however. The experiments appear to have been short-lived.)

There are basically two ways of looking at the hypothesis: the first is that it will be a straightforward benefit/detriment to the player to hike his socks because the umpire will change his definition of the bottom of the zone; this is what most of the links I cited above would suggest, though they didn’t agree on which direction. I’m somewhat skeptical of this, unless we think that the umpires have a persistent bias for or against certain players and that that bias would be resolved by the player changing how he wears his socks. The second interpretation is that it will make the umpire’s calls more precise, meaning simply that borderline pitches are called more consistently, but that it won’t actually affect where the umpire thinks the bottom of the zone is.

At first blush, this seems like the sort of thing that Pitch F/X would be perfectly suited to, as it gives oodles of information about nearly every pitch thrown in the majors in the last several years. However, it doesn’t include a variable for the hosiery of the batter, so to do a broader study we need additional data. After doing some research and asking around, I wasn’t able to find a good database of players that consistently wear high socks, much less a game-by-game list, which basically ruled out a large-scale Pitch F/X study.

However, I got a very useful suggestion from Paul Lukas, who runs the excellent Uni Watch site. He pointed out that a number of organizations require their minor leaguers to wear high socks and only give the option of covered hose to the major leaguers, providing a natural means of comparison between the two types of players. This will allow us to very broadly test the hypothesis that there is a single direction change in how low strikes are called.

I say very broadly because minor league Pitch F/X data aren’t publicly available, so we’re left with extremely aggregate data. I used data from Minor League Central, which has called strikes and balls for each batter. In theory, if the socks lead to more or fewer calls for the batter at the bottom of the zone, that will show up in the aggregate data and the four high-socked teams (Omaha, Durham, Indianapolis, and Scranton/Wilkes-Barre) will have a different percentage of pitches taken go for strikes. (I found those teams by looking at a sample of clips from the 2013 season; their AA affiliates also require high socks.)  Now, there are a lot of things that could be confounding factors in this analysis:

  1. Players on other teams are allowed to wear their socks high, so this isn’t a straight high socks/no high socks comparison, but rather an all high socks/some high socks comparison. (There’s also a very limited amount of non-compliance on the all socks side, as based on the clips I could find it appears that major leaguers on rehab aren’t bound by the same rules; look at some Derek Jeter highlights with Scranton if you’re curious.)
  2. AAA umpires are prone to more or different errors than major league umpires.
  3. Which pitches are taken is a function of the team makeup and these teams might take more or fewer balls for reasons unrelated to their hose.
  4. This only affects borderline low pitches, and so it will only make up a small fraction of the overall numbers we observe and the impact will be smothered.

I’m inclined to downplay the first and last issues, because if those are enough to suppress the entire difference over the course of a whole season then the practical significance of the change is pretty small. (Furthermore, for #1, from my research it didn’t look like there were many teams with a substantial number of optional socks-showers. Please take that with a grain of salt.)

I don’t really have anything to say about the second point, because it has to do with extrapolation, and for now I’d be fine just looking at AAA. I don’t have even have that level of brushoff response for the third point except to wave my hands and say that I hope it doesn’t matter given that these reflect pitches thrown by the rest of the league, so they will hopefully converge around league average.

So, having substantially caveated my results…what are they? As it turns out, the percentage of pitches the stylish high sock teams took that went for strikes was 30.83% and the equivalent figure for the sartorially challenged was…30.83%. With more than 300,000 pitches thrown in AAA last year, you need to go to the seventh decimal place of the fraction to see a difference. (If this near equality seems off to you, it does to me as well. I checked my figures a couple of ways, but I (obviously) can’t rule out an error here.)

What this says to me is that it’s pretty unlikely that this ends up mattering, unless there is an effect and it’s exactly cancelled out by the confounding factors listed above (or others I failed to consider). That can’t be ruled out as a possibility, nor can data quality issues, but I’m comfortable saying that the likeliest possibility by a decent margin is that socks don’t lead to more or fewer strikes being called against the batter. (Regardless, I’m open to suggestions for why the effect might be suppressed or analysis based on more granular data I either don’t have access to or couldn’t find.)

What about the accuracy question, i.e. is the bottom of the strike zone called more consistently or correctly for higher-socked players? Due to the lack of nicely collected data, I couldn’t take a broad approach to answering this, but I do want to record an attempt I made regardless. David Wright is known for wearing high socks in day games but covering his hosiery at night, which gives us a natural experiment we can look at for results.

I spent some amount of time looking at the 2013 Pitch F/X data for his day/night splits on taken low pitches and comparing those to the same splits for the Mets as a whole, trying a few different logistic regression models as well as just looking at the contingency tables to see if anything jumped out, and nothing really did in terms of either greater accuracy or precision. I didn’t find any cuts of the data that yielded a sufficiently clean comparison or sample size that I was confident in the results. Since this is a messy use of these data in the first place (it relies on unreliable estimates of the lower edge of a given batter’s strike zone, for instance), I’m going to characterize the analysis as incomplete for now. Given a more rigorous list of which players wear high socks and when, though, I’d love to redo this with more data.

Overall, though, there isn’t any clear evidence that the socks do influence the strike zone. I will say, though, that this seems like something that a curious team could test by randomly having players (presumably on their minor league teams) wear the socks high and doing this analysis with cleaner data. It might be so silly as to not be worth a shot, but if this is something that can affect the strike zone at all then it could be worthwhile to implement in the long run—if it can partially negate pitch framing, for instance, then that could be quite a big deal.

Adrian Nieto’s Unusual Day

White Sox backup catcher Adrian Nieto has done some unusual things in the last few days. To start with, he made the team. That doesn’t sound like much, but as a Rule 5 draft pick, it’s a bit more meaningful than it might be otherwise, and it’s somewhat unusual because he was jumping from A ball to the majors as a catcher. (Sox GM Rick Hahn said he didn’t know of anyone who’d done it in the last 5+ years.)

Secondly, he pinch ran today against the Twins, which is an activity not usually associated with catchers (even young ones). This probably says more about the Sox bench, as he pinch ran for Paul Konerko, who is the worst baserunner by BsR among big league regulars this decade by a hefty margin. Still: a catcher pinch running! How often does this happen?

More frequently than I thought, as it turns out; there were 1530 instances of a catcher pinch running from 1974 to 2013, or roughly 38 times a year. This is about 4% of all pinch running appearances over that time, so it’s not super common, but it’s not unheard of either. (My source for this is the Lahman database, which is why I have the date cutoff. For transparency’s sake, I called a player a catcher if he played catcher in at least half of his appearances in a given year.)

If you connect the dots, though, you’ll realize that Nieto is a catcher made his major league debut as a pinch runner. How often does that happen? As it turns out, just five times previously since 1974 (cross-referencing Retrosheet with Lahman):

  • John Wathan, Royals; May 26, 1976. Wathan entered for pinch hitter Tony Solaita, who had pinch hit for starter Bob Stinson. He came around to score on two hits (though he failed to make it home from third after a flyball to right), but he also grounded into a double play with the bases loaded in the 9th. The Royals lost in extra innings, but he lasted 10 years with them, racking up 5 rWAR.
  • Juan Espino, Yankees; June 25, 1982. Espino pinch ran for starter Butch Wynegar with the Yankees up 11-3 in the 7th and was forced at second immediately. He racked up -0.4 rWAR in 49 games spread across four seasons, all with the Yanks.
  • Doug Davis, Angels, July 8, 1988. This one’s sort of cheating, as Davis entered for third baseman Jack Howell after a hit by pitch and stayed in the game at the hot corner; he scored that time around, then made two outs further up. According to the criteria I threw out earlier, though, he counts, as three of the six games he played in that year were at catcher (four of seven lifetime).
  • Gregg Zaun, Orioles; June 24, 1995. Zaun entered for starter Chris Hoiles with the O’s down 3-2 in the 7th. He moved to second on a groundout, then third on a groundout, then scored the tying run on a Brady Anderson home run. Zaun had a successful career as a journeyman, playing for 9 teams in 16 years and averaging less than 1 rWAR per year.
  • Andy Stewart, Royals; September 6, 1997. Ran for starter Mike McFarlane in the 8th and was immediately wiped out on a double play. Stewart only played 5 games in the bigs lifetime.

So, just by scoring a run, Nieto didn’t necessarily have a more successful debut than this cohort. However, as a Sox fan I’m hoping (perhaps unreasonably) that he has a bit better career than Davis, Stewart, and Espino–and hey, if he’s a good backup for 10 or more years, that’s just gravy.

One of my favorite things about baseball is the number of quirky things like this that happen, and while this one wasn’t unique, it was pretty close. When you have low expectations for a team (like this year’s White Sox), you just hope the history they make isn’t too embarrassing.

Brackets, Preferences, and the Limits of Data

As you may have heard, it’s March Madness time. If I had to guess, I’d wager that more people make specific, empirically testable predictions this week than any other week of the year. They may be derived without regard to the quality of the teams (the mascot bracket, e.g.), or they might be fairly advanced projections based on as much relevant data as are easily available (Nate Silver’s bracket, for one), but either way we’re talking about probably billions of predictions. (At 63 picks per bracket, we “only” need about 16 million brackets to get to a billion picks, and that doesn’t count all the gambling.)

What compels people to do all of this? Some people do it to win money; if you’re in a small pool, it’s actually feasible that you could win a little scratch. Other people do it because it’s part of their job (Nate Silver, again), or because there might be additional extrinsic benefits (I’d throw the President in that category). This is really a trick question, though: people do it to have fun. More precisely, and to borrow the language of introductory economics, they maximize utility.

The intuitive definition of utility can be viewed as pretty circular (it both explains and is defined by people’s decisions), but it’s useful as a way of encapsulating the notion that people do things for reasons that can’t really be quantified. The notion of unquantifiability, especially unquantifiable preferences, is something people sometimes overlook when discussing the best uses of data. Yelp can tell you which restaurant has the best ratings, but if you hate the food the rating doesn’t do you much good.*

One of the things I don’t like about the proliferation of places letting you simulate the bracket and encouraging you to use that analysis is that it disregards utility. They presume that your interests are either to get the most games correct or (for some of the more sophisticated ones) to win your pool. What that’s missing is that some of us have strongly ingrained preferences that dictate our utility, and that that’s okay. My ideal, when selecting a bracket, is to make it so I have as high a probability as possible of rooting for the winner of a game.

For instance, I don’t think I’ve picked Duke to make it past the Sweet Sixteen in the last 10 or more years. If they get upset before then, my joy in seeing them lose well outweighs the damage to my bracket, especially since most people will have them advancing farther than I do. On the other hand, if I pick them to lose in the first round**, it will just make the sting worse when they win. I’m hedging my emotions, pure and simple.***

This is an extreme example of my rule of thumb when picking teams that I have strong preferences for, which is to have teams I really like/dislike go one round more/less than I would predict to be likely. This reduces the probability that my heart will be abandoned by my bracket. As a pretty passive NCAA fan, I don’t apply this to too many teams besides Duke (and occasionally Illinois, where I’m from) on an annual basis, but I will happily use it with a specific player (Aaron Craft, on the negative side) or team (Wichita State, on the positive side) that is temporarily more charming or loathsome than normal. (This general approach applies to fantasy, as well: I’ve played in a half dozen or so fantasy football leagues over the years, and I’ve yet to have a Packer on my team.)

However, with the way the bracket is structured, this doesn’t necessarily torpedo your chances. Duke has a reasonable shot of doing well, and it’s not super likely that a 12th seeded midmajor is going to make a run, but my preferred scenarios are not so unlikely that they’re not worth submitting to whichever bracket challenge I’m participating in. This lengthens how long my bracket will be viable enough that I’ll still care about it and thus increase the amount of time I will enjoy watching the tournament. (At least, I tell myself that. My picks have crashed and burned in the Sweet Sixteen the last couple of years.)

Another wrinkle to this, of course, is that for games I have little or no prior preference in, simply making the pick makes me root for the team I selected. If it’s, say, Washington against Nebraska, I will happily pick the team in the bracket I think is more likely to win and then pull hard for the team. (I’m not immune to wanting my predictions to be valid.) So, the weaker my preferences are, the more I hew toward the pure prediction strategy. Is this capricious? Maybe, but so is sport in general.

I try not to be too normative in my assessments of sports fandom (though I’m skeptical of people who have multiple highly differing brackets), and if your competitive impulses overwhelm your disdain for Duke, that’s just fine. But if you’re like me, pick based on utility. By definition, it’ll be more fun.

* To be fair, my restaurant preferences aren’t unquantifiable, and the same is true for many other tastes. My point is that following everyone else’s numbers won’t necessarily yield you the best strategy for you.

** Meaning the round of 64. I’m not happy with the NCAA for making the decision that led to this footnote.

*** Incidentally, this is one reason I’m a poor poker player. I don’t enjoy playing in the optimal manner enough to actually do it. Thankfully, I recognize this well enough to not play for real stakes, which amusingly makes me play even less optimally from a winnings perspective.

Valuing Goalie Shootout Performance (Again)

I wrote this article a few months ago about goalie shootout performance and concluded two things:

  • Goalies are not interchangeable with respect to the shootout, i.e. there is skill involved in goalie performance.
  • An extra percentage point in shootout save percentage is worth about 0.002 standings points per game. This is based on some admittedly sketchy calculations based on long term NHL performance, and not something I think is necessarily super accurate.

I’m bringing this up because a couple of other articles have been written about this recently: one by Tom Tango and one much longer one by Michael Lopez. One of the comments over there, from Eric T., mentioned wanting a better sense of the practical significance of the differences in skill, given that Lopez offers an estimate that the difference between the best and worst goalies is worth about 3 standings points per year.

That’s something I was trying to do in the previous post up above, and the comment prompted me to try to redo it. I made some simple assumptions that align with the one’s Lopez did in his followup post:

  • Each shot has equal probability of being saved (i.e. shooter quality doesn’t matter, only goalie quality). This probably reduces the volatility in my estimates, but since a goalie should end up facing a representative sample of shooters, I’m not too concerned.
  • The goalie’s team has an equal probability of converting each shot. This, again, probably reduces the variance, but it makes modelling things much simpler, and I think it makes it easier to isolate the effect that goalie performance has on shootout winning percentage.

Given these assumptions, we can compute an exact probability that one team wins given team 1’s save percentage p_1 and team 2’s p_2. If you don’t care about the math, skip ahead to the table. Let’s call P_{i,j} the probability that team i scores j times in the first three rounds of the shootout:

P_{i,j} = {3 \choose j} p_i^j(1-p_i)^{3-j}

P(\text{Team 1 Wins } | \text{ } p_1, p_2) = \sum_{j=1}^3 \sum_{k=0}^{j-1} P_{1,j} \cdot P_{2,k} + \left ( \sum_{j=1}^3 P_{1,j}\cdot P_{2,j} \right ) \frac{p_1(1-p_2)} {1-(p_1p_2+(1-p_1)(1-p_2))}

The first term on the right side is just the sum of the probabilities of the ways that team 1 can win the first three rounds, e.g. 2 goals for and 1 allowed or 3 goals for and none allowed. The term on the right is the sum of all the ways they can win if the first three rounds end in a tie, which can be expressed easily as the sum of a geometric series.

Ultimately, we don’t really care about the formula so much as the results, so here’s a table and a plot showing the performance of a goalies who are a given percentage below or above league average when facing a league average goalie:

Calculated Winning

Percentage Points Above/Below League Average Winning Percentage
-20 26.12
-19 27.14
-18 28.18
-17 29.24
-16 30.31
-15 31.41
-14 32.52
-13 33.66
-12 34.81
-11 35.98
-10 37.17
-9 38.37
-8 39.60
-7 40.84
-6 42.10
-5 43.37
-4 44.67
-3 45.98
-2 47.30
-1 48.64
0 50.00
1 51.37
2 52.76
3 54.16
4 55.58
5 57.01
6 58.45
7 59.91
8 61.38
9 62.86
10 64.35
11 65.85
12 67.37
13 68.89
14 70.42
15 71.96
16 73.51
17 75.06
18 76.62
19 78.19
20 79.76

We would expect most of these figures to be close to league average, so if we borrow Tom Tango’s results (see the link above) we figure the most and least talented goalies are going to be roughly 6 percentage points away from the mean. The difference between +0.06 and -0.06 is about 0.16 in the simulation output, meaning the best goalies are likely to win sixteen shootouts per hundred more than the worst goalies assuming both play average competition.

Multiplying this by 13.2%, the past frequency of shootouts, and we get an estimated benefit of only about 0.02 standings points / game from switching from the worst shootout goalie to the best. For a goalie making 50 starts, that’s only about 1 point added to the team, and that’s assuming maximal possible impact.

Similarly, moving up this curve by one percentage point appears to be worth about 1.35 wins per hundred; multiplying that by 13.2% gives a value of 0.0018 standings points / game, which is almost exactly what I got when I did this empirically in the prior post, which leads me to believe that that estimate is a lot stronger than I initially thought.

There’s obviously a lot of assumptions in play here, including the assumptions going into my probabilities and Tango’s estimates of true performance, and I’m open to the idea that one or another of those is suppressing the importance of this skill. Overall, though, I’m largely inclined to hew to my prior conclusions saying that for a difference in shootout performance to be enough to make one goalie better overall than another, it has to be a fairly substantial one, and the difference in actual save percentage has to be correspondingly fairly small.

The Joy of the Internet, Pt. 2

I wrote one of these posts a while back about trying to figure out which game Bunk and McNulty attend in a Season 3 episode of The Wire. This time, I’m curious about a different game, and we have a bit less information to go on, so it took a bit more digging to find.

The intro to the Drake song “Connect” features the call of a home run being hit. Given that it probably required getting the express written consent of MLB for this sample, my guess is that he got it recorded by an announcer in the studio (as he implies around the 10:30 mark of this video). Still, does it match any games we have on record?

To start, I’m going to assume that this is a major league game, though there’s of course no way of knowing for sure. From the song, all we get is the count, the fact that it was a home run, the direction of the home run, and the name of the outfielder.  The first three are easy to hear, but the fourth is a bit tricky—a few lyrics sites (including the description of the video I linked) list it as “Molina,” but that can’t be the case, as none of the Molinas who’ve played in the bigs played the outfield.

RapGenius, however, lists it as “Revere,” and I’m going to go with that, since Ben Revere is an active major league center fielder and it seems likely that Drake would have sampled a recent game. So, can we find a game that matches all these parameters?

I first checked for only games Revere has played against the Blue Jays, since Drake is from Toronto and the RapGenius notes say (without a source) that the call is from a Jays game. A quick check of Revere’s game logs against the Jays, though, says that he’s never been on the field for a 3-1 homer by a Jay.

What about against any other team? Since checking this by hand wasn’t going to fly (har har), I turned to play-by-play data, available from the always-amazing Retrosheet. With the help of some code from possibly the nerdiest book I own, I was able to filter every play since Revere has joined the league to find only home runs hit to center when Revere was in center and the count was 3-1.

Somewhat magically, there was only one: a first inning shot by Carlos Gomez against the Twins in 2011. The video is here, for reference. I managed to find the Twins’ TV call via MLB.TV, and the Brewers’ team did the MLB.com video, and (unsurprisingly) neither call fits the sample, though I didn’t go looking for the radio call. Still, the home run is such that it wouldn’t be surprising if either one of the radio calls matched what Drake used, or if it was close and Drake had it rerecorded in such a way that preserved the details of the play.

So, probably through dumb luck, Drake managed to pick a unique play to sample for his track. But even though it’s a baseball sample, I still click back to “Hold On, We’re Going Home” damn near every time I listen to the album.

Throne of Games (Most Played, Specifically)

I was trawling for some stats on hockey-reference (whence most of the hockey facts in this post) the other day and ran into something unexpected: Bill Guerin’s 2000-01 season. Specifically, Guerin led the league with 85 games played. Which wouldn’t have seemed so odd, except for the fact that the season is 82 games long.

How to explain this? It turns out there are two unusual things happening here. Perhaps obviously, Guerin was traded midseason, and the receiving team had games in hand on the trading team. Thus, Guerin finished with three games more than the “max” possible.

Now, is this the most anyone’s racked up? Like all good questions, the answer to that is “it depends.” Two players—Bob Kudelski in 93-94 and Jimmy Carson in 92-93—played 86 games, but those were during the short span of the 1990s when each team played 84 games in a season, so while they played more games than Guerin, Guerin played in more games relative to his team. (A couple of other players have played 84 since the switch to 82 games, among them everyone’s favorite Vogue intern, Sean Avery.)

What about going back farther? The season was 80 games from 1974–75 to 1991–92, and one player in that time managed to rack up 83: the unknown-to-me Brad Marsh, in 1981-82, who tops Guerin at least on a percentage level. Going back to the 76- and 78-game era from 1968-74, we find someone else who tops Guerin and Marsh, specifically Ross Lonsberry, who racked up 82 games (4 over the team maximum) with the Kings and Flyers in 1971–72. (Note that Lonsberry and Marsh don’t have game logs listed at hockey-reference, so I can’t verify if there was any particularly funny business going on.) I couldn’t find anybody who did that during the 70 game seasons of the Original Six era, and given how silly this investigation is to begin with, I’m content to leave it at that.

What if we go to other sports? This would be tricky in football, and I expect it would require being traded on a bye week. Indeed, nobody has played more than the max games at least since the league went to a 14 game schedule according to the results at pro-football-reference.

In baseball, it certainly seems possible to get over the max, but actually clearing this out of the data is tricky for the following two reasons:

  • Tiebreaker games are counted as regular season games. Maury Wills holds the raw record for most games played with 165 after playing in a three game playoff for the Dodgers in 1962.
  • Ties that were replayed. I started running into this a lot in some of the older data: games would be called after a certain number of innings with the score tied due to darkness or rain or some unexplained reason, and the stats would be counted, but the game wouldn’t count in the standings. Baseball is weird like that, and no matter how frustrating this can be as a researcher, it was one of the things that attracted me to the sport in the first place.

So, those are my excuses if you find any errors in what I’m about to present; I used FanGraphs and baseball-reference to spot candidates. I believe there’s only been a few cases of baseball players playing more than the scheduled number of games when none of the games fell into those two problem categories mentioned above. The most recent is Todd Zeile, who, while he didn’t play in a tied game, nevertheless benefited from one. In 1996, he was traded from the Phillies to the Orioles after the O’s had stumbled into a tie, thus giving him 163 games played, though they all counted.

Possibly more impressive is Willie Montanez, who played with the Giants and Braves in 1976. He racked up 163 games with no ties, but arguably more impressive is that, unlike Zeile, Montanez missed several opportunities to take it even farther. He missed one game before being traded, then one game during the trade, and then two games after he was traded. (He was only able to make it to 1963 because the Braves had several games in hand on the Giants at the time of the trade.)

The only other player to achieve this feat in the 162 game era is Frank Taveras, who in 1979 played in 164 games; however, one of those was a tie, meaning that according to my twisted system he only gets credit for 163. He, like Montanez, missed an opportunity, as he had one game off after getting traded.

Those are the only three in the 162-game era. While I don’t want to bother looking in-depth at every year of the 154-game era due to the volume of cases to filter, one particular player stands out. Ralph Kiner managed to put up 158 games with only one tie in 1953, making him by my count the only baseball player to play three meaningful games more than his team did in baseball since 1901.

Now, I’ve sort of buried the lede here, because it turns out that the NBA has the real winners in this category. This isn’t surprising, as the greater number of days off between games means it’s easier for teams to get out of whack and it’s more likely than one player will play in every game. Thus, a whole host of players have played more than 82 games, led by Walt Bellamy, who put up 88 in 1968-69. While one player got to 87 since, and a few more to 86 and 85, Bellamy stands alone atop the leaderboard in this particular category. (That fact made it into at least one of his obituaries.)

Since Bellamy is the only person I’ve run across to get 6 extra games in a season and nobody from any of the other sports managed even 5, I’m inclined to say that he’s the modern, cross-sport holder of this nearly meaningless record for most games played adjusted for season length.

Ending on a tangent: one of the things I like about sports records in general, and the sillier ones in particular, is trying to figure out when they are likely to fall. For instance, Cy Young won 511 games playing a sport so different from contemporary baseball that, barring a massive structural change, nobody can come within 100 games of that record. On the other hand, with strikeouts and tolerance for strikeouts at an all-time high, several hitter-side strikeout records are in serious danger (and have been broken repeatedly over the last 15 years).

This one seems a little harder to predict, because there are factors pointed in different directions. On the one hand, players are theoretically in better shape than ever, meaning that they are more likely to be able to make it through the season, and being able to play every game is a basic prerequisite for playing more than every game. On the other, the sports are a lot more organized, which would intuitively seem to decrease the ease of moving to a team with meaningful games in hand on one’s prior employer. Anecdotally, I would also guess that teams are less likely to let players play through a minor injury (hurting the chances). The real wild card is the frequency of in-season trades—I honestly have no rigorous idea of which direction that’s trending.

So, do I think someone can take Bellamy’s throne? I think it’s unlikely, due to the organizational factors laid out above, but I’ll still hold out hope that someone can do it—or at least, finding new players to join the bizarre fraternity of men playing more games than their teams.