The Paradox of the National Baseball Hall of Fame, Part Two

Despite his tremendous mustache, critics believe that Jack Morris's playing ability is below Hall of Fame standards

The first part of this two part series was published last Friday, where I began my discussion of the arbitrary nature of baseball's sacred space by examining the 75 percent threshold required for admission, the character clause, and how to make sense of "contributions to the team(s) on which the player played." To those who have not yet read part one, I encourage you to read that first as some of the references made in part two will not make sense without reading at least the first section entitled "Arbitrary Standards."

Today's piece will tackle the question of finding a way to objectively measure the most significant category, playing ability. Following this, I will address a few proposed solutions to the arbitrary nature of the Hall of Fame, then offer some concluding remarks with regard to the upcoming election.

Playing Ability

The final category, playing ability, is the most difficult to evaluate. To simplify the matter, I will consider playing ability only through the statistics the player accrued during his career while removing all weight from claims such as “The Greatest Hitter to Ever Live” (Ted Williams), “The Master of Them All” (Christy Mathewson), and “Mr. October” (Reggie Jackson). While flattering, these claims are often inaccurate.

The debate of the greatest hitter to ever live is typically between Barry Bonds and Babe Ruth. And while Reggie Jackson was renown for his elite postseason performances, not all would agree that Jackson is deserving of the title.

Before turning to Hall of Fame career standards, the problem of vagueness hinders assessments of even single-season performances for awards such as the MVP, Cy Young Award (best pitcher in each league), and the Rookie of the Year. Similar to the Hall of Fame vote, members of the BBWAA vote for the winners of each award, and it would seem reasonable to suggest that the writers ought to be able to reach a consensus on each award. But this is far from the case.

The best example of this is the national debate that has emerged resulting from the American League MVP vote the past two seasons. Angels’ outfielder Mike Trout and Tigers’ third baseman Miguel Cabrera have clearly been the two top players in the league, and Cabrera has won the award both times.

Cabrera is the traditional slugger who won the Triple Crown in 2012 (highest batting average, most home runs, and most runs batted in), while Trout is a true five-tool player, meaning he hits for average, hits for power, fields his position well, has a strong throwing arm and good foot speed. Cabrera is a two-tool player, hitting for average and power, and is below average at the other three categories. How do we compare players who are so different?

The Cabrera versus Trout debate is the best representation of the two schools of thought in modern player evaluation: the traditionalists and the sabermetricians. Traditionalists rely on statistics such as batting average, home runs, and RBIs to evaluate hitters and win-loss record, ERA, and strikeouts to evaluate pitchers. Defense is measured by fielding percentage, but typically is not valued as highly as any of the offensive categories. Miguel Cabrera appeals to traditionalists because of his dominance in the Triple Crown categories that they place extreme value upon. His below average defense is an afterthought and may reduce his value in their eyes, but only slightly. The majority of BBWAA writers are traditionalists.

I place myself in the other school of thought among the sabermetricians. Sabermetricians are advocates of a new brand of statistics, commonly referred to as sabermetrics, hence the name. The word “sabermetrics” is rooted in the acronym SABR, which stands for the Society of American Baseball Research. SABR is an independent, third party baseball think tank with the goal of finding improved ways to evaluate the game through advanced statistical analysis. The sabermetric movement began in 1977 with the publication of the first annual Baseball Abstract by Bill James, who was then employed as a night watchman at a pork and beans factory. James now works for the sabermetrically oriented Boston Red Sox, winners of three of the past 10 World Series titles.

Perhaps the greatest achievement of sabermetricians has been the development of the statistic WAR (wins above replacement), which measures the gap between any player and a hypothetical replacement player, (best described as someone readily available in the minor leagues or from the waiver wire), in team wins. Taking this one step further, sabermetricians have been able to calculate the market value of one win in free agency (currently $5 million) and evaluate free agents accordingly. Other sabermetric breakthroughs have been the ability to find the exact value in runs of every event to take place on the field, defense-independent pitching statistics (DIPS), and, most recently, measuring the skill of individual catchers at framing pitches.

Returning to the Trout versus Cabrera debate, sabermetricians prefer Trout because his WAR is substantially higher than Cabrera’s. While Cabrera is certainly the better hitter, Trout’s superior base running and defense more than cover his inferior hitting and hitting for power. Cabrera’s inadequacy in these categories is defined by more than a poor fielding percentage and low stolen base totals, as sabermetricians have devised statistics to measure the range of a defender and base running value apart from mere stolen base totals. These advanced statistics are often overlooked or not trusted by traditionalists, but they greatly enhance Trout’s value.

The Trout versus Cabrera demonstrates the arbitrary nature of ranking players. If the BBWAA cannot agree on even the best player in one given season, how can they be expected to agree on a ranking of all players over their entire career? Unless one school of thought somehow overtakes the other (which is not imminent), this debate is likely to here to stay. Since the methodology of evaluations within these schools of thought is different, it can't be reasonable to expect that voters will come to the same conclusions when assessing candidates.

Proposed Standards for Admission

One way to define playing ability worthy of Hall of Fame consideration is through milestones, specifically hitting 500 home runs, recording 3,000 hits, and for pitchers, winning 300 games. Reaching any of these milestones typically results in automatic entry into the Hall of Fame, but why are these numbers chosen? What is wrong with winning 299 games or hitting 499 home runs?

Fred McGriff hit 493 home runs (and was not connected with PEDs), but is not a member of the Hall of Fame. Had McGriff hit seven more home runs during his career to reach 500, it's extremely likely that he would be a Hall of Famer and be awarded with a legacy of being one of the greatest power hitters of all time. Despite retiring less than 10 years ago, McGriff has largely been forgotten. And although he will appear on the ballot again this winter, he is not expected to receive the necessary votes for induction.

In addition to their arbitrary nature, milestones are a poor measure of playing ability because they only consider one statistic. It's entirely possible that a player can excel in one category, but be a poor (or at least not elite) player overall. Adam Dunn fits this mold perfectly. An active player, Dunn has hit 440 home runs through his age 33 season, and is a decent bet to reach 500 for his career. Despite the high home run total, Dunn is a career .238 hitter and a poor defender, meaning that the vast majority of his value comes from hitting home runs. Dunn could reach the 500 home run mark, but should not be inducted into the Hall of Fame because of his lack of contributions elsewhere.

One of the challenges of defining a playing career worthy of the Hall of Fame is comparing players of different eras. Was Barry Bonds better than Babe Ruth? How about Jimmie Foxx or Joey Votto? Unlike most of the questions that have been raised thus far, this question has been answered well; the best way to evaluate players is by comparing them to their peers. League averages change from year to year and era to era, from the dead ball era to the year of the pitcher in 1968 to the steroid era in the late 1990s and early 2000s, so the best way to compare players over different eras is by comparing their statistics to the league average. For example, if the league ERA is 4.50 and a pitcher posts a 3.50, his ERA is 22.3 percent better than the average pitcher that year, a number that is much more useful in Hall of Fame consideration than mere ERA.

Bill James took this principle a step further by creating a Hall of Fame monitor which is based on “black ink.” This principle refers to the back of baseball cards where categories in which the player led the league were printed in boldface type, hence the “black ink” label. Players receive points for leading the league in certain statistical categories with the underlying theory being that consistent league leaders are most likely to be Hall of Fame inductees. His formula is not perfect, as there are a substantial amount of arbitrary point values being assigned, but generally speaking, his Hall of Fame monitor proved to be a fairly accurate measure. Character clause omissions aside, leaders of his Hall of Fame monitor match actual Hall of Fame inductees reasonably well.

While James’s model is useful, perhaps a better model would be based upon a player’s wins above replacement (WAR). This statistic has been recently influential in Hall of Fame voting, and a career total of at least 60 WAR is generally seen as necessary to fulfill the playing ability category of the Hall of Fame vote. WAR is not a perfect statistic and has multiple, differing versions, but flaws in math aside, the concept is extremely useful. 60 is still an arbitrary number, but it's not as bad as other arbitrary facets of the process since it was derived on already existing standards (though they are not exactly precise as I will soon explain). I have no better idea than James’s Hall of Fame monitor or simple player WAR apart from a combination of the two, but adding two arbitrary numbers only makes the process more arbitrary. There is no perfect formula for defining a career deserving of the Hall of Fame and it is unlikely that one can ever be devised.

Final Thoughts and the Upcoming Election

We have accrued many more unanswered questions than answered questions thus far, and unfortunately that reality isn't going to change. The four most significant unanswered questions are “How do we measure character?”, “How do we measure postseason performances?”, “How do we measure playing ability?”, and “What is the relationship between these categories?”. For the sake of advancing the conversation, let's assume that we have devised perfect formulas for ranking every player to have ever lived in every category, properly weighted the categories, and compiled a list ranking the players. We would likely find that most of the members of the Hall of Fame would rank near the top of the list, but there are likely to be some exceptions.

Now that we have compiled a perfect list of players, the standard for Hall of Fame entry must be set. Should the standard be set at the average score of the current members of the Hall of Fame? This seems problematic since half of the current members of the Hall of Fame would not deserve to be inducted by that method. Perhaps we could choose an arbitrary numerical score based upon player ranking that candidates must surpass. This is still arbitrary, but seems better, although again some current members of the Hall of Fame would still be found to be undeserving by this method.

How about players only having to surpass the score of the worst Hall of Fame player? This seems fair, but would also result in an enormous amount of new inductees which would drastically lower the existing standard for induction. It seems that there is no optimal place to set the standard, and that no matter where it is set, we will have to deal with a consequence of either an excessively large Hall of Fame or current members of the Hall of Fame that are no longer deserving of their enshrinement based on the new standards.

While one of these consequences may seem acceptable to you, consider that this is a choice we aren't currently able to make. Even the ability to decide between these two negative consequences is preceded by the ability to construct an accurate ranking of all candidates, an ability we do not currently have. Until we find legitimate ways to answer the four questions regarding the ranking of candidates, we have no legitimate way of creating such a ranking.

What does the paradoxical nature of the current Hall of Fame vote mean for the upcoming vote? Some candidates are easy to judge. Greg Maddux and Frank Thomas seem to be worthy candidates by any measure while Armando Benitez does not, but what about borderline candidates?

Members of the BBWAA have been arguing about the candidacy of Jack Morris for 15 years and have failed to reach a conclusion because there is no guideline for how to properly measure and weight the categories. Morris’s playing ability is generally agreed to be below Hall of Fame standards, but he gets high marks for his character and even higher marks for his postseason performance in the 1991 World Series. Morris is an easy choice for writers who place an equal weight on each of the categories, but an easy denial for those who favor playing ability over the others.

Which method is best? I don't know. No one knows. The guidelines for voting don't answer this question, so it's up to the voters to decide. Voters have valued playing ability higher than the other two categories (with the exception of some but not all “cheaters”) and perhaps rightfully so. But how much higher should playing ability be valued? Again, no one knows and to answer that question is as arbitrary as defining the grain of sand where a heap ceases to be a heap.

Yes, it's necessary to answer this question, but there can never be an answer with reasonable justification. If one hypothetically chooses 60 WAR as the standard to fulfill the playing ability category for the Hall of Fame, one (most likely myself) can always ask why it is not 59 or 61. There is no right answer (although there are many wrong answers!).

Some philosophers solve the paradox of the heap by saying that four grains of sand counts as a heap because three grains on the bottom support one grain on top. That's not a very big heap, but it is, perhaps, the best answer since it's justifiable and not arbitrary. There is no such solution to what I can now call “The Paradox of the Hall of Fame.” Instead, there are four solutions that we must find: the proper way to measure character, postseason performance, and playing ability and the relationship between those three categories. Only after answering these questions do we arrive at the Hall of Fame’s version of the paradox of the heap: defining an arbitrary standard that all candidates must meet.