Daily Fantasy Baseball: The Dangers of Split Stats
Over the first three seasons of his big-league career, Anthony Rizzo was straight doggy doo against left-handed pitching. And this wasn't the fake doo you use to scare party goers; this was of the stank variety.
Those three seasons saw Rizzo record 320 at-bats with south paws on the mound; he managed to turn that into a whopping 62 hits, a .194 batting average. He was basically a younger version of Henry Blanco, which is far from the most flattering comparison for a former top prospect.
Apparently Rizzo was none too fond of his label as a platoon hitter who could only face righties.
After those rough first three seasons, Rizzo went on a tear in 2014 and 2015, recording an on-base percentage above .400 in both seasons. He didn't have quite the same pop as he showed against righties, but in just one season, he had gone from a ghoul against lefties to a borderline-savant.
If you were basing your daily fantasy baseball decisions around Rizzo's early-career platoon marks, you'd avoid him like he's an old classmate at the grocery store over Thanksgiving break. You're not only skipping that aisle, but you're likely sprinting out of the store screaming. Ain't no way you want that nastiness in your life.
The problem becomes that you'd be missing out on some seriously acceptable production if you simply wrote off Rizzo against lefties due to a sample of 320 at-bats. And this isn't some rare incident with splits being misleading; it happens all the time.
At the same time, split stats can be immensely valuable when we implement them correctly. This means we can't disregard them. Doing so would put us at a major disadvantage.
Because of this, let's dissect why split stats are tricky and what we can do to combat it, allowing us to fully bask in the advantages they provide without falling prey to their pitfalls.
Why Split Stats are Risky
Some people wake up in cold sweats after dreams of clowns forcing them to violate HIPAA laws at gunpoint. I wake up in cold sweats about small sample sizes. The clown one was also my dream, but that was my fault for eating the raunchy potato salad.
The baseball statistics we rely upon most -- slugging percentage, on-base percentage, wOBA -- all take a long time to normalize. All of those numbers would have told us that Rizzo was awful against lefties and we should never even entertain the thought of using him. Yet, that ended up not being the case.
Every year, we will discuss players who are bound to regress from their previous-year's production. Yet, that will often involve sample sizes of nearly 600 plate appearances. If a guy has a .400 batting average on balls in play (BABIP), then we'll still likely predict a slide, even when the sample size is relatively large. That's how long stats can take to normalize in a sport with as much luck involved as baseball.
If we can see these things over 600 plate appearances, you had better believe we can see them when it comes to split stats. In 2015, Prince Fielder led the league in plate appearances against left-handed pitchers; he had 282 of them. When that's the largest sample size, you can't draw actionable information from the platoon stats of any hitter in the majors for a single season.
Let's do a quick example here to illustrate why. Take Brian Dozier for an example. He had 206 plate appearances against lefties in 2015, resulting in 42 hits, 23 walks, and 1 hit-by-pitch. This puts him at a slash of .232/.320/.442. They're certainly not bad numbers, but you're not going to be targeting him necessarily against left-handed pitching.
However, Dozier had a bit of bad luck that year. He had a .611 average on line drives, a good chunk below the .685 league-average mark. What would his stats against lefties have looked like if he had been average?
Dozier had a 22.8 percent line-drive rate against lefties and put 144 balls in play. This puts him at 33 line drives for the season against lefties. His batting average on line drives can allow us to deduce that 20 of those turned into hits. If he had been league average on line drives, he would have had 22 hits instead of 20.
Once we add those two hits into his slash from the season, things tick up to marks of .243/.330/.453. That doesn't look like a lot, but what if Dozier had been slightly above average on line drives? Then it would have been .254/.340/.464. Things can swing so quickly based on only a few plate appearances that it's impossible to trust data until it expands.
These complications don't exist solely for platoon splits. They're also found in home-road splits and by looking at how a batter has performed in recent games. You're limiting the sample size in all of these scenarios, and that's pushing you into some dangerous waters.
That doesn't mean that you should stop trying to look for advantages when it comes to splits. Let's check out ways you can combat these small sample sizes and turn them into actionable information even if if the raw stats are still misleading.
Focusing on Quickly-Stabilizing Stats
The main thread of this is that we shouldn't trust the triple-slash stats right away when limiting the sample size. There are other stats, though, that will stabilize quickly, allowing us to use them as cruxes of our analysis.
Based on this FanGraphs article on stabilization points, it looks like strikeout rate and walk rate are going to be our best friends. Strikeout rates take only 60 plate appearances to stabilize with walk rates at 120, according to the piece. We're generally going to get to both marks over a full season for a platoon or home-road split.
One other stat I will often use is hard-hit rate. It wasn't included in the FanGraphs piece, though I would assume its stabilization point would be close to ground-ball and fly-ball rates, both of which were at 80 balls in play. This is another mark that batters are going to reach over the course of one season, allowing you to use platoon splits rather quickly.
The other option is to simply wait until the sample size expands for veteran hitters. The thresholds mentioned were 460 plate appearances for on-base percentage and 320 at-bats for slugging percentage. In all likelihood, you're going to hit the important marks within two full seasons, allowing you to operate based on those numbers as opposed to the other rate stats discussed above.
Split stats are great. Small sample sizes suck. When we can find ways to separate the two, life is beautiful, friends.
We can do just this by narrowing our scope to statistics such as hard-hit rate, strikeout rate, and walk rate. If a player is performing well in those areas, then we can expect his slash stats will follow suit some time in the not-so-distant future. If he's struggling, then it's best to divest before the brown stuff hits the fan.
As for our old friend Anthony Rizzo, we should have been looking at these quick-stabilization stats sooner.
In his final season of perceived struggles against lefties, Rizzo had a 34.7 percent hard-hit rate with a 20.4 percent strikeout and 10.2 percent walk rate. With those numbers, you'd expect him to have a slugging percentage well above league average to go with an above-average on-base percentage. That's what he got the following season, and we could have predicted that by utilizing those three categories.
We don't have to wake up in cold sweats thinking about the dangers of small sample sizes. We can easily cure these ills by either focusing on the actionable information that stabilizes quickly or digging a bit further back into a player's history. If this lets us use this information on a more regular basis with fewer fears, then we're dipping our toes into some pretty sweet waters.