## Tuesday, March 24, 2020

#### Projection Hindsight Is 20/20 and It’s Totally Awesome

One of the things you have to get used to when you work with projections is being wrong. Like, All. Of. The. Time. While I’d like to believe that the projections are accurate and it’s just real life that mucked things up, that isn’t quite how they work. There are always events you didn’t see coming, assumptions you made erroneously, and just plain old irreducible error, all of which are going to thwart you.

On a basic level, you’re supposed to be wrong. Imagine a world in which you knew, for an exact fact, that every team was a coin flip to win every game. With this perfect knowledge, you’d still expect nearly a quarter of the league to win either 73 games or fewer, or 89 games or more, through nothing but luck. For the math-inclined, this is a hypergeometric distribution, not a binomial one; the coin flips are not independent because the win totals will still add up to 2,430 and one team’s win invariably is another team’s loss. Here’s a quick table for some of the win totals, showing the probability of a team winning exactly X games and how many of the teams you’d expect to have won up to X games:

....

As an example, you’d expect 3.4% of those coin flip teams to win exactly 74 games, with 15% of all teams winning up to 74 games.

But we don’t have anywhere near perfect knowledge about how good a team will be. We’re not even in the same zip code as “near perfect”; we just hope to be on the right continent. As a result, our error bars are going to be significantly larger than even the rather erroneous results you still get with omniscient projections.

One of the best courses I ever took in college was a one-off course concerning Predicting The Future- as a result, this sort of consideration is of deep interest to me.

1. The 15-Day DL Posted: March 24, 2020 at 08:27 AM (#5933011)
On the flip side, ZiPS overrated the Tigers, Padres, and Red Sox by an average of 14 wins in 2019, an error that is cut nearly in half once you know who actually took the field. I under-projected the plate appearances of Fernando Tatis Jr. and Luis Urías by about 40%, failed to anticipate players like Gordon Beckham getting so much playing time, and greatly overestimated the health of Boston’s non-Eduardo Rodriguez pitchers and Dustin Pedroia.
Son, nobody ever found success estimating any measure of health for Dustin Pedroia.
2. Ron J Posted: March 24, 2020 at 08:59 AM (#5933026)
I knew it was a Szym article just based on the headline.

And Dan, you might want to consider looking at predicted runs scored and runs allowed with perfect information about playing time. This after all is really what you're predicting. There's noise in translating runs scored/allowed to wins and losses after all.

EDIT: Come to that, there's noise in converting counter stats into runs as well so it would be interesting to see where the noise is coming from.
3. Swedish Chef Posted: March 24, 2020 at 09:58 AM (#5933059)
Son, nobody ever found success estimating any measure of health for Dustin Pedroia.

4. Walt Davis Posted: March 24, 2020 at 06:12 PM (#5933196)
ZiPS is not a playing time projection system. :-)

