Why We Always Pick a 12-Seed Over a 5-Seed

Is there a mathematical reason for us always choosing the 12 seed?

While we fill out our brackets, we continuously hear: "You have to pick at least one 12-seed over a 5-seed." It has seemingly become the break-point where we expect the upset to happen - no one is overly surprised by the 12-over-5 victory.

Using data going back to 2000, I looked at estimated point differential in the first round based on each team's seed. For estimated point differential, I used nERD, a measure of overall team efficiency. nERD is an estimate of a team's score differential against a league-average team on a neutral court. For example, if Duke's nERD is 12.4 and Syracuse's nERD is 6.5, we would expect Duke to win by 5.9 points on a neutral court. The results are below.

The distribution of estimated point differential fits a cubic function extremely well based on seed. That means that the tails will be far more extreme and the estimated point differential will flatten out over the middle. Looking at the graph, it appears that happens right around the 4-13 or 5-12 seed mark. The difference between the 5-12 is almost 2.5 times the difference between a 3-14 matchup.

But what does this mean in terms of win probability?

We can estimate win probability with a logistic regression on estimated point differential. The resulting win probability by seed matchup is a linear relationship. This is due to diminishing returns on point differential - being favored by 30 points is not much different than being favored by 20, yet being favored by 15 is a whole lot different than being favored by 5.

So, what should we see each year? The estimated win probability for a 5-seed is 74.0%, which means we would expect to see one 12-seed upset per year on average. Since each matchup is independent, we can use a simple binomial distribution to show the probability of any number of 12-seeds winning:

4-seeds are expected to win 83.5% of the time, which means that we are expected to see less than one 13-seed win per year and the most likely scenario (48.6% of the time), we will see no 4-seed winners (although we are ever-so-slightly more likely to see at least one 13-seed upset than to see none at all).