Monday, September 17, 2007

The Importance Of Sabermetrics

This post was a comment on Noyam's Jeter post.

Somewhere between 25-30 years ago, people started taking a serious look into what makes a baseball team successful. Analyzing the various statistics, they looked for links between specific numbers and winning. One thing they noticed is that, almost universally, winning teams had a high OBP. They reasoned that there was a correlation between winning and a high team OBP. They concluded that successful teams are the teams that get on base the most. The second most common feature was a high SLG. Teams that hit for power also tended to do well. In other words, OBP and SLG were, by far, the best indicators of a team’s success. Better than batting average, home runs, stolen bases, runs, and RBIs.

These analysts reasoned that if successful teams are those which get on base the most, the most valuable players are the ones with the highest OBP. If getting on base is the single most important component of a winning team, a smart team should look for players who make the least outs. Those players are the most valuable. Obviously not all teams have figured this out yet, but that’s to their detriment.

What makes a player “good?” A good player is one who is valuable. The “best” player on a team is the one who is the most valuable to that team. Value is best determined by looking at a player’s OBP first, then SLG, and then going to other stats. This is true whether looking at the MVP race or trying to figure out who should make the Hall.

While this strategy makes sense, people soon realized some obvious flaws. Some players play in great hitters parks, while others hit in stadiums with huge outfields. Moreover some players played in eras with dominant pitching or rules that tilted the game towards pitching. For example, in 1968 Carl Yastrzemski won the batting title with a .301 batting average. In 1930, the mean batting average was .301. Obviously Yastrzemski was much more valuable to his team in 1968 than the average player was in 1930. Therefore OPS (OBP + SLG) is greatly flawed when comparing players from different eras, and sometimes even players from the same era.

That’s where OPS+ comes in. It takes into account stadium and league differences and calculates the player’s OPS in relation to the rest of the players in the league. For example a player with a OPS+ of 120 has an OPS that is 20% better than the average player in his league, taking into account ballpark advantages/disadvantages. OPS+ is very useful because it allows us to compare Derek Jeter to, say, Honus Wagner even though the latter played in the dead ball era when no one hit home runs or slugged at a very high percentage.

While OPS+ is a major improvement over OPS, it suffers from similar flaws. OPS+ does not take into account positional value. Sabermetics assume that the further down one gets on the defensive spectrum, the more valuable the player is, everything else being equal.

Here’s an example. Take a look at a regular baseball team. Let’s use the Mets. How many Mets starters (excluding pitcher) could play 1B adequately (not average, but good enough that his defense isn’t so atrocious that it greatly outweighs his offensive contributions)? I would guess all 8. How many could play LF? Probably everyone except Delgado. What about RF? Again probably everyone besides Delgado. What about 3B? Here’s where the biggest drop off occurs. Obviously Wright, probably Reyes and Castillo, and with enough practice, maybe Lo Duca. What about SS? Reyes and that’s probably it. Maybe Wright could figure it out or Castillo in his prime could be decent. But that’s it.

That’s the point. SS consistently rank at the bottom or next to the bottom offensively. The average 1B is a much better offensive player than the average SS. A team could increase its offensive output tremendously by playing an average 1B at SS. Why don’t teams do that? Well let’s ask why don’t the Cardinals play Pujols at SS and find a league average 1B to take his place? Because he would be so bad defensively that the offensive improvement from a league average SS to a league average 1B would be negated and then some.

Basically only a few select players can play SS, while almost anyone in the majors can play 1B. So everything else being equal, if two players have same numbers but one plays 1B and the other SS, the latter is more valuable because he can play a prime defensive position.
OPS+ doesn’t take this problem into account. It equates a SS with a 120 OPS+ with a 1B with the same OPS+. The two are not equally valuable.

Another major problem is that OPS+, like OPS, overvalues SLG relative to OBP. As I showed earlier, OBP is the statistic more closely correlated with winning and is the more important statistic. Equating OBP and SLG by simply adding them up does not paint an accurate picture of value.

EQA takes into account the latter problem by weighing OBP more than SLG, while VORP and WARP3 adjust for positional differences. That’s why despite David Ortiz’s power being down this season, his EQA is the highest of his career (his OBP is much higher than in past years). It’s also why Hanley Ramirez is second in VORP (behind ARod), even though Prince Fielder has a higher OPS+. WARP3 tries to take into account defense as well, and is probably the most comprehensive statistic available (although it’s not without its share of flaws).

These statistics are far more advanced and useful than batting average or RBI. They remove a lot of the subjectivity that plagues baseball analysis. Rather then simply looking at a player and saying “eh, .317 average and only 200 homers, that’s not great” they allow you to compare Jeter to other players and especially SS of other eras. And Jeter clearly stacks up to those guys.