Category Archives: Basketball

Brackets, Preferences, and the Limits of Data

As you may have heard, it’s March Madness time. If I had to guess, I’d wager that more people make specific, empirically testable predictions this week than any other week of the year. They may be derived without regard to the quality of the teams (the mascot bracket, e.g.), or they might be fairly advanced projections based on as much relevant data as are easily available (Nate Silver’s bracket, for one), but either way we’re talking about probably billions of predictions. (At 63 picks per bracket, we “only” need about 16 million brackets to get to a billion picks, and that doesn’t count all the gambling.)

What compels people to do all of this? Some people do it to win money; if you’re in a small pool, it’s actually feasible that you could win a little scratch. Other people do it because it’s part of their job (Nate Silver, again), or because there might be additional extrinsic benefits (I’d throw the President in that category). This is really a trick question, though: people do it to have fun. More precisely, and to borrow the language of introductory economics, they maximize utility.

The intuitive definition of utility can be viewed as pretty circular (it both explains and is defined by people’s decisions), but it’s useful as a way of encapsulating the notion that people do things for reasons that can’t really be quantified. The notion of unquantifiability, especially unquantifiable preferences, is something people sometimes overlook when discussing the best uses of data. Yelp can tell you which restaurant has the best ratings, but if you hate the food the rating doesn’t do you much good.*

One of the things I don’t like about the proliferation of places letting you simulate the bracket and encouraging you to use that analysis is that it disregards utility. They presume that your interests are either to get the most games correct or (for some of the more sophisticated ones) to win your pool. What that’s missing is that some of us have strongly ingrained preferences that dictate our utility, and that that’s okay. My ideal, when selecting a bracket, is to make it so I have as high a probability as possible of rooting for the winner of a game.

For instance, I don’t think I’ve picked Duke to make it past the Sweet Sixteen in the last 10 or more years. If they get upset before then, my joy in seeing them lose well outweighs the damage to my bracket, especially since most people will have them advancing farther than I do. On the other hand, if I pick them to lose in the first round**, it will just make the sting worse when they win. I’m hedging my emotions, pure and simple.***

This is an extreme example of my rule of thumb when picking teams that I have strong preferences for, which is to have teams I really like/dislike go one round more/less than I would predict to be likely. This reduces the probability that my heart will be abandoned by my bracket. As a pretty passive NCAA fan, I don’t apply this to too many teams besides Duke (and occasionally Illinois, where I’m from) on an annual basis, but I will happily use it with a specific player (Aaron Craft, on the negative side) or team (Wichita State, on the positive side) that is temporarily more charming or loathsome than normal. (This general approach applies to fantasy, as well: I’ve played in a half dozen or so fantasy football leagues over the years, and I’ve yet to have a Packer on my team.)

However, with the way the bracket is structured, this doesn’t necessarily torpedo your chances. Duke has a reasonable shot of doing well, and it’s not super likely that a 12th seeded midmajor is going to make a run, but my preferred scenarios are not so unlikely that they’re not worth submitting to whichever bracket challenge I’m participating in. This lengthens how long my bracket will be viable enough that I’ll still care about it and thus increase the amount of time I will enjoy watching the tournament. (At least, I tell myself that. My picks have crashed and burned in the Sweet Sixteen the last couple of years.)

Another wrinkle to this, of course, is that for games I have little or no prior preference in, simply making the pick makes me root for the team I selected. If it’s, say, Washington against Nebraska, I will happily pick the team in the bracket I think is more likely to win and then pull hard for the team. (I’m not immune to wanting my predictions to be valid.) So, the weaker my preferences are, the more I hew toward the pure prediction strategy. Is this capricious? Maybe, but so is sport in general.

I try not to be too normative in my assessments of sports fandom (though I’m skeptical of people who have multiple highly differing brackets), and if your competitive impulses overwhelm your disdain for Duke, that’s just fine. But if you’re like me, pick based on utility. By definition, it’ll be more fun.

* To be fair, my restaurant preferences aren’t unquantifiable, and the same is true for many other tastes. My point is that following everyone else’s numbers won’t necessarily yield you the best strategy for you.

** Meaning the round of 64. I’m not happy with the NCAA for making the decision that led to this footnote.

*** Incidentally, this is one reason I’m a poor poker player. I don’t enjoy playing in the optimal manner enough to actually do it. Thankfully, I recognize this well enough to not play for real stakes, which amusingly makes me play even less optimally from a winnings perspective.


Throne of Games (Most Played, Specifically)

I was trawling for some stats on hockey-reference (whence most of the hockey facts in this post) the other day and ran into something unexpected: Bill Guerin’s 2000-01 season. Specifically, Guerin led the league with 85 games played. Which wouldn’t have seemed so odd, except for the fact that the season is 82 games long.

How to explain this? It turns out there are two unusual things happening here. Perhaps obviously, Guerin was traded midseason, and the receiving team had games in hand on the trading team. Thus, Guerin finished with three games more than the “max” possible.

Now, is this the most anyone’s racked up? Like all good questions, the answer to that is “it depends.” Two players—Bob Kudelski in 93-94 and Jimmy Carson in 92-93—played 86 games, but those were during the short span of the 1990s when each team played 84 games in a season, so while they played more games than Guerin, Guerin played in more games relative to his team. (A couple of other players have played 84 since the switch to 82 games, among them everyone’s favorite Vogue intern, Sean Avery.)

What about going back farther? The season was 80 games from 1974–75 to 1991–92, and one player in that time managed to rack up 83: the unknown-to-me Brad Marsh, in 1981-82, who tops Guerin at least on a percentage level. Going back to the 76- and 78-game era from 1968-74, we find someone else who tops Guerin and Marsh, specifically Ross Lonsberry, who racked up 82 games (4 over the team maximum) with the Kings and Flyers in 1971–72. (Note that Lonsberry and Marsh don’t have game logs listed at hockey-reference, so I can’t verify if there was any particularly funny business going on.) I couldn’t find anybody who did that during the 70 game seasons of the Original Six era, and given how silly this investigation is to begin with, I’m content to leave it at that.

What if we go to other sports? This would be tricky in football, and I expect it would require being traded on a bye week. Indeed, nobody has played more than the max games at least since the league went to a 14 game schedule according to the results at pro-football-reference.

In baseball, it certainly seems possible to get over the max, but actually clearing this out of the data is tricky for the following two reasons:

  • Tiebreaker games are counted as regular season games. Maury Wills holds the raw record for most games played with 165 after playing in a three game playoff for the Dodgers in 1962.
  • Ties that were replayed. I started running into this a lot in some of the older data: games would be called after a certain number of innings with the score tied due to darkness or rain or some unexplained reason, and the stats would be counted, but the game wouldn’t count in the standings. Baseball is weird like that, and no matter how frustrating this can be as a researcher, it was one of the things that attracted me to the sport in the first place.

So, those are my excuses if you find any errors in what I’m about to present; I used FanGraphs and baseball-reference to spot candidates. I believe there’s only been a few cases of baseball players playing more than the scheduled number of games when none of the games fell into those two problem categories mentioned above. The most recent is Todd Zeile, who, while he didn’t play in a tied game, nevertheless benefited from one. In 1996, he was traded from the Phillies to the Orioles after the O’s had stumbled into a tie, thus giving him 163 games played, though they all counted.

Possibly more impressive is Willie Montanez, who played with the Giants and Braves in 1976. He racked up 163 games with no ties, but arguably more impressive is that, unlike Zeile, Montanez missed several opportunities to take it even farther. He missed one game before being traded, then one game during the trade, and then two games after he was traded. (He was only able to make it to 1963 because the Braves had several games in hand on the Giants at the time of the trade.)

The only other player to achieve this feat in the 162 game era is Frank Taveras, who in 1979 played in 164 games; however, one of those was a tie, meaning that according to my twisted system he only gets credit for 163. He, like Montanez, missed an opportunity, as he had one game off after getting traded.

Those are the only three in the 162-game era. While I don’t want to bother looking in-depth at every year of the 154-game era due to the volume of cases to filter, one particular player stands out. Ralph Kiner managed to put up 158 games with only one tie in 1953, making him by my count the only baseball player to play three meaningful games more than his team did in baseball since 1901.

Now, I’ve sort of buried the lede here, because it turns out that the NBA has the real winners in this category. This isn’t surprising, as the greater number of days off between games means it’s easier for teams to get out of whack and it’s more likely than one player will play in every game. Thus, a whole host of players have played more than 82 games, led by Walt Bellamy, who put up 88 in 1968-69. While one player got to 87 since, and a few more to 86 and 85, Bellamy stands alone atop the leaderboard in this particular category. (That fact made it into at least one of his obituaries.)

Since Bellamy is the only person I’ve run across to get 6 extra games in a season and nobody from any of the other sports managed even 5, I’m inclined to say that he’s the modern, cross-sport holder of this nearly meaningless record for most games played adjusted for season length.

Ending on a tangent: one of the things I like about sports records in general, and the sillier ones in particular, is trying to figure out when they are likely to fall. For instance, Cy Young won 511 games playing a sport so different from contemporary baseball that, barring a massive structural change, nobody can come within 100 games of that record. On the other hand, with strikeouts and tolerance for strikeouts at an all-time high, several hitter-side strikeout records are in serious danger (and have been broken repeatedly over the last 15 years).

This one seems a little harder to predict, because there are factors pointed in different directions. On the one hand, players are theoretically in better shape than ever, meaning that they are more likely to be able to make it through the season, and being able to play every game is a basic prerequisite for playing more than every game. On the other, the sports are a lot more organized, which would intuitively seem to decrease the ease of moving to a team with meaningful games in hand on one’s prior employer. Anecdotally, I would also guess that teams are less likely to let players play through a minor injury (hurting the chances). The real wild card is the frequency of in-season trades—I honestly have no rigorous idea of which direction that’s trending.

So, do I think someone can take Bellamy’s throne? I think it’s unlikely, due to the organizational factors laid out above, but I’ll still hold out hope that someone can do it—or at least, finding new players to join the bizarre fraternity of men playing more games than their teams.

In Search of Losses/Time

While writing up the post about the 76ers’ run of success, something odd occurred to me. The record for most losses in a season is 73, set by the 1972-73 76ers. As you might notice, that means that their loss count matches the a year of their particularly putrid season. Per Basketball Reference, only one other team has done this: the expansion 1961-62 Chicago Packers. (Can you imagine having a team called the Packers in Chicago now? It’d be weird for a name to be shared by a city’s team and a rival of another team in that city, but I suppose that’s how it was for Brooklyn Dodgers fans in the 1940s and 1950s, and maybe for St. Louis fans when the NFC West heats up.)

That Packers team went 18-62, though BR says they were expected to finish at 21-59. The only player whose name I recognize is the recently deceased Walt Bellamy, who was a rookie that year. They only hung on in Chicago for one more year before moving to Baltimore. They also put up 111 points a game and gave up 119, because early 1960s basketball was pretty damned wild.

So, this is an exclusive club, if a little arbitrary–there are 4 other teams from the 20th century who lost more games than the corresponding year, and obviously every team from the 21st has lost more than the year. Still, it’s a set of 2 truly terrible teams, but the next member is presumably going to be one of the very best teams in the league in the next five years or so. The benchmark will only get more and more attainable, so club membership will rapidly devalue. Regardless, I can’t see the members of those two teams popping champagne like the 1972 Dolphins when the last team hits 14 losses this year–though it’d be hilarious if they did.

Lining Up Behind the 76ers

Going into the year, there was an honest discussion as to whether the 76ers would post the worst record in NBA history, topping their own record of 9-73 from 1972-73. (See here and here.) After 3 games…well, let’s just say that discussion’s been tabled. 3 wins in a month makes it hard to take a run at 8-74, especially when two of them are against the very best teams in the league. It’s going to take serious commitment to winning the Ender Game* for them to not scrape out another 6+ wins playing against the Raptors, Celtics, et al.

Still, I would say that the story is not that the Sixers aren’t historically terrible—it’s that they’re quite terrible and managed to beat three teams in four nights, two of which were expected to be very good.

Just how unexpected was this? There are obviously a few ways of looking at this. I could look to AccuScore, probably the most notable sports prediction engine out there, though I can’t find their early season NBA picks lying around anywhere. However, everything is more fun when there’s money involved, so we’re going to do this in terms of gambling. Specifically, if you bet the Sixers to beat the Heat straight up, then bet all your winnings on them against the Wizards, and did it again against the Bulls, how would you have done? Moreover, how often does a streak that’s this improbable occur?

Well, according to Odds Portal, if you’d started with $100 and kept reinvesting, you’d have a stake of $13,206.26, for a profit of $13,106.26. If you prefer a percentage return, since we picked $100 you can see this pretty easily—131x, or 13100%. This isn’t unheard of in sports gambling—someone made $375K on two $250 bets on the Cardinals in 2011—but it’s still quite impressive, especially given that these are single game bets rather than bets on a team winning the title.

How impressive is it? To answer that, I scraped NBA money line data from 2007-08 through 2011-12 from Sports Book Reviews, and while I can’t evaluate the exact accuracy of their lines, I figure it’s probably good enough. (They seem a little extreme, but I don’t bet enough basketball to say for sure.) I looked at every three game winning streak in that dataset, counting longer streaks multiple times, e.g. a five game winning streak is three overlapping three game streaks. (Playoffs and multiseason streaks are also included, though neither turned out to be relevant.) For each of those streaks, I calculated how much a prescient (read: lucky) fellow might have made betting $100 on the first game and entirely reinvesting.

(Quick sidenote: this is a fun exercise, but I’ll acknowledge it’s far from perfect. On a practical level, the data aren’t that trustworthy and the assumption that gambling lines proxy for probability estimates is shaky. More troublingly, on a theoretical level we would expect that the lines for the later games in a given streak shift some with a team’s wins in the first game(s). I don’t know enough to guess how much a given set of lines will jump around, but I suspect that this method overestimates the ex ante probability that a streak like this would occur. Also, if we guess that lines bounce around more in the early season, the odds on the Wizards and Bulls games probably dropped quite a bit, further underestimating how rare this streak is.)

Anyhow, as it turns out, only one team has had a three game streak that was as unlikely as these Sixers’ from a Vegas perspective. The post-lockout Wizards had a streak in April 2012 that is in some senses similar to the current Sixers’ streak. They were 14-46 and, with only six games to go, would presumably tank the shit out of the rest of the season for a shot at Anthony Davis. Instead, though, they beat the league-best Bulls on the road (at +675), Milwaukee at home (at +330), and followed it up by beating the eventual champion Heat in Miami (at +450). Riding them for those three games would have made you a profit of $18,228.75, which is just obscene—it’s an extra $5K (or 5000%) beyond the Sixers. As it turns out, they’d close out the season with three more wins, all as favorites, against two tanking teams and the Heat, who were presumably resting starters by then. If you’d kept piling on, those six games would have gotten you almost 47 grand…though it might have also merited an intervention.

There are only two other teams in the same ballpark as these Sixers and Wizards. The December 2007 Trailblazers picked up a profit of $10,956 and were buoyed by a win in Utah at +1200—apparently they had only won once previously on the road and were missing LaMarcus Aldridge. They were moderate underdogs the other two games (+170 and +215), but it was driven by that one game. The only other team above $10K (or even $8K) is the March 2012 Cavaliers at $10,508, who got there by winning at Denver and Oklahoma City.

So in nearly 5 years of data, we have one bigger streak and two that are in the neighborhood. As I said earlier, I think the Sixers’ streak is a little more unexpected than this method gives it credit for (even relative to other streaks), but by this method we figure that this is a once in five years occurrence, even if seems much odder than that right now.

All this said: even if I take the supposedly rational perspective that weird shit happens in small samples, it still doesn’t make me feel better about the Bulls’ blowing that lead last night, though. At least Rose is back: he’ll probably be fodder for posts later down the line.

*Andrew Wiggins is the presumptive #1 pick in the next draft. Andrew Wiggin is the real name of the protagonist of Ender’s Game. We can make this happen, people.