Golf Betting and Daily Fantasy: Strokes Gained Sample Sizes and Stabilization Rates

When can we start to trust the strokes gained data from golfers? Here's what the math says.

Betting on golf and building daily fantasy lineups for PGA Tour events requires some level of subjectivity.

With so much variance in a single golf event, you aren't able to predict a tournament's winner just based on long-term averages in strokes gained or scoring average or whichever stats you deem the most vital in a given week.

But hold on a second. Why did I mention long-term stats? Don't golfers get on heaters? They sure do.

Yet it's also true that we would come to recognize a stats leaderboard that spans 100 rounds as a proper list of the world's best more than we would a list that looked only at the past 12 rounds.

So then what might 12 (or 24 or 36 or 4) rounds tell us about a golfer's ability? It depends.

That's why I wanted to find out when strokes gained stats start to stabilize and see what that means for our weekly golf prep.

If you just want the takeaways, you can skip ahead past the math.

The Method

Stabilization rates of statistics are often studied in baseball. How many batters faced does it take for us to trust a pitcher's strikeout rate? How many balls in play are required before we can start to trust a batter's fly-ball rate? Those are questions that have answers by now.

I wanted to figure out how to apply this to golf.

So, leveraging Russell Carleton's methodology of seeking an R-squared of 49%, here's what I found when comparing a golfer's past 100 rounds to smaller samples (of 4, 8, 12, 24, 36, 50, and 75 rounds), via FantasyNational.

Why 100 rounds? Again, take a look at a 100-round leaderboard in any strokes gained stat, and it'll pass the eye test of who the best players are in a given stat. It's long enough to remove short-sample noise and not so long that we're looking at too much old data for a golfer's career.

The Results

Here is the R-squared value (i.e. how much of a given variable we can predict based on another variable) between a given stat over a given timeframe and that stat over the 100-round sample for PGA Tour golfers.

The time frame during which the R-squared surpasses 49% is highlighted in green.

(An example: a golfer's past four rounds in strokes gained: total explains 37.5% of his strokes gained: total over a 100-round sample.)

R-squared With Past
100 Rounds SG: X
Past 4
Past 8
Past 12
Past 24
Past 36
Past 50
Past 75
SG: Total 37.5% 50.1% 54.8% 69.6% 78.0% 84.5% 93.5%
SG: Tee to Green 39.0% 46.9% 53.4% 69.5% 75.5% 85.7% 94.2%
SG: Off the Tee 21.4% 34.7% 49.8% 67.6% 80.8% 89.6% 95.3%
SG: Approach 24.1% 35.4% 49.4% 57.9% 69.5% 77.9% 92.0%
SG: Around the Green 17.1% 31.2% 31.8% 51.5% 58.6% 74.6% 90.5%
SG: Putting 18.8% 37.8% 42.5% 51.9% 64.3% 69.8% 90.8%

The stats barely crack a 49% R-squared at an 8-round sample for strokes gained: total, a 12-round sample for tee-to-green, off the tee, approach, and a 24-round sample for around the green and putting.

I know I'm not breaking this down into smaller intervals, but I want this to have some practicality, too. I don't think we'd start citing the past 11 rounds of strokes gained: approach and the last 19 rounds of putting even if that's what the math found most relevant.

What This Means

Strokes Gained: Total
Back-to-back strong finishes (i.e. good results in total strokes gained), in theory, are meaningful.

That'd be a full eight rounds of solid strokes gained data.

But this also suggests that one good finish (four rounds) doesn't tell us enough to trust. Kinda obvious, right? Well, just keep that in mind when you (I'm not excluding myself from this, of course) cite a golfer coming off of a top-10 last week. That doesn't really mean enough for us to trust. We should want at least two good finishes if not three before we consider any sort of overall recent form.

Strokes Gained: Tee to Green
It takes a full 12 rounds for us to crack the 49% R-squared value for strokes gained: tee to green (though it nearly gets there through just eight rounds).

So, if we're looking for some hot tee-to-green performances, even two events probably isn't enough -- but you could do way worse if citing an all-encompassing tee-to-green performance two weeks straight. That predictiveness jumps up to 67.6% after 24 rounds.

If you see a small-sample turnaround with tee-to-green data, it's meaningful fairly early.

Strokes Gained: Off the Tee
Off-the-tee play is the most predictive of the individual strokes gained stats.

It really starts to win out after a 24-round sample and then stays consistently ahead from there relative to the other strokes gained categories.

Basically, you can't fake good off-the-tee data for very long. That's a simple way to think about it.

Strokes Gained: Approach
Frequently cited as the stat in a given week, strokes gained: approach becomes pretty stable after a 12-round sample. The reason it's so important is that you can gain more strokes from approach than you can with strokes gained: off the tee or around the green in an event.

Historically, field leaders in approach and putting gain between 8.5 to 9.0 strokes gained. For off-the-tee and around the green, those numbers are 5.5 to 6.0.

Two key things to remember here.

First, you can start to trust hot iron play after around three events.

Second, if you see a great long-term iron player have bad ball-striking week or two, don't overreact. It's not particularly meaningful and could be an opportunity for us to buy lower on a golfer's outright win odds.

Strokes Gained: Around the Green
While strokes gained: putting is often crowned as the king of volatility, strokes gained: around the green is actually more volatile.

Through 12 rounds, its R-squared with the 100-round sample a golfer holds is just 31.8%, compared to 42.5% for strokes gained: putting.

We need -- among our buckets -- 24 rounds to start to trust around-the-green play.

When scoping out a golfer's event log, I always look for unsustainable putting results.

This suggests that it's even more important not to overlook spike weeks in around-the-green play. Unless, of course, you're seeing gains over a full six-event sample.

Two or three strong finishes supplemented by elite strokes gained: around the green? Maybe time to fade a golfer.

Strokes Gained: Putting
I recently looked at our ability to predict strokes gained: putting and found that you can explain around 80% of a golfer's strokes gained: putting if you look only at his putting from within 15 feet. That lets us exclude lag putting from the data.

What do these numbers say?

They say that you need around six events of putting data to buy into a surge, so once again: be careful if you see high-end finishes aided too much by putting that don't have tee-to-green data to back it up.

But also, a two or three cold putting weeks doesn't suggest a trend just yet!

Applying These Results

For me, I already account for recency in my stats, which give weight to more recent rounds and less weight to older rounds. Doing this can let me weigh in recent, relevant shifts in ball-striking while reducing the importance of strokes gained from the short-game stats.

But from a more baseline sense, this can help us when we study individual golfers or look at different ranges of stats whether on FantasyNational or data golf.

Whichever event logs you examine when doing your weekly prep, be mindful of hot putters and wedges over samples of four events or smaller. It's not necessarily a sign of long-term success in those categories. That one isn't that shocking.

But on the flip side, one or two bad weeks of off-the-tee or approach data isn't anything to worry about if the golfer in question excels at those stats long-term. If anything, that can give us a chance to buy low on a golfer in the betting markets and daily fantasy contests.