Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Wednesday, February 12, 2003

How Important are In-Season Winning Percentages?

Rob takes a look at the predictive value of seasonal records.

Recently Retrosheet has made available the Game Logs for every game played in the 20th century (1901-2002) NL and AL.  The logs provide summary information on each game, including the number of runs scored by each team.  A great deal of research can be conducted using this complete collection of game logs.  In this article, I want to scratch the surface of an issue that has been around for a very long time.  Namely, how accurate are within-season win pcts as a predictor of a team?s end-of-season win pct.

 

For example, suppose a team starts out 7-3 in its first 10 games.  Do we expect it to go on and win 100 games that season?  What if the team starts out 35-15 (again a .700 win pct), are we now more inclined to predict that it will win 100 games?  From a purely empirical perspective, the game logs allow us to look at each team?s win pct evolution through every season in the 20th century. 

 

You may remember that Bill James wrote about this issue in his 1985 Baseball Abstract in the Detroit Tigers commentary following the Tigers blistering 35-5 start in 1984.  In my article here, I will present some data and some ideas following in James? footsteps.  However, I merely hope to rekindle a dialog on methods and findings, so feel free to chime in with any comments or ideas you may have.

.700 Teams after 40 games

Let?s start our journey by looking at the best teams ever after 40 games into the season.  As you probably know, the 1984 Tigers at 35-5 were the best ever (win pct of .875).  I have arbitrarily set a cutoff of a .700 win pct to qualify for the elite teams in my study (28-12 or better after 40 games).  By my count, there have been 75 teams to get off to such a hot start.  The first was the 1902 Pirates (33-7) and the most recent were the 2002 Mariners (28-12) and the 2002 Red Sox (29-11). 

 

For comparability purposes, in what follows I will exclude the 1981 Dodgers (29-11) and the 1994 Yankees (28-12) because of the shortened seasons due to the strikes.  So that leaves us with the 73 teams with the hottest 40-game starts in history.  How do they wind up?  Are all these great teams?  How many clunkers are in the bunch?  Are 40 games enough to allow us to get a good read on these teams?

 

Yes, indeed, many of these were great teams and went on to post fabulous end-of-season records.  The 1902 Pirates (102-36, .739), the 1905 Giants (105-47, .691), the 1907 Cubs (107-44, .709), the 1909 Pirates (110-42, .724), the 1931 Athletics (107-45, .704), the 1932 Yankees (107-47, .695), the 1939 Yankees (106-45, .702), the 1970 Orioles (108-54, .667), the 1986 Mets (108-54, .667),  the 1995 Indians (100-44, .694), the 1998 Yankees (114-48, .704), and the 2001 Mariners (116-46, .716).  So a lot of the game?s all-time best teams got off to a hot start after 40 games.

 

But there were also some clunkers who got off to hot starts.  At least, several of these teams are far less than world beaters.  Consider the 1907 Giants (started 28-12, then went 54-59, for an 82-71 record), the 1912 White Sox (28-12, 50-64; 78-76), the 1951 White Sox (29-11, 52-62; 81-73), the 1972 Mets (29-11, 54-62; 83-73), and the 2001 Twins (28-12, 57-65; 85-77). 

 

The average end-of-season win pct of these 73 hot starters is .624.  Let?s break that out a little bit:

 

N

W-L

Start WPct

ROS WPct

EOS WPct

29

28-12

.700

.576

.608

24

29-11

.725

.575

.613

20

30-10+

.780 avg

.621

.662

 

Not surprisingly, each of these collections performed worse after their hot starts.  This is a well-known phenomenon in stochastic processes, and has taken many names in the baseball analysis lexicon.  I simply call it the Up-Down theory.  Teams that are up are apt to go down, and vice versa.  Same is true for hitters, pitchers, and every day people.  You?re never as good as you look when you win and you?re never as bad as you look when you lose.  The truth is almost always somewhere in the middle.

 

In fact, only one of these 75 teams (including the two strike-year teams) started off hot, and then got hotter.  These were the Honus Wagner-led 1909 Pirates who went 110-42.  They started off 28-12 (.700) and then proceeded to go 82-30 (.732) the rest of the season.

 

There are many ways to rationalize why the vast majority of teams that get off to a hot start cool off.  You can think of the .500 level as being a magnet of sorts.  The way I like to think about it is the following.  Each team has an underlying "true ability".  When we observe a very hot team, we can ask ourselves the question ? which is more likely, that the team is playing to its actual ability or that the team?s true ability is less than its current record but it has been the recipient of good luck?  Since we intuitively believe that the distribution of true abilities of all the teams in the league form something like a bell curve, this second possibility is always more likely.

.600 Teams after 40 Games

Let?s dial down our definition of hot start from a win pct of .700 (28-12) to .600 (24-16). Of course there is more "room" for a team that starts 24-16 to improve over the rest of the season compared to a 28-12 start. 

 

There have been 101 teams that began a season 24-16 after 40 games, starting with the 1903 Pirates and most recently the 2001 Phillies and Cardinals.  A quick review of these 101 .600 starting teams shows that there is more of a mixture of quality than the .700 starters.  We do have very good teams such as the 1909 Cubs (104-49), the 1919 Reds (96-44), the 1954 Yankees (103-51), the 1968 Tigers (103-59), and the 1976 Reds (102-60).  But we also have such mediocrities as the 1924 Red Sox (68-87), the 1927 White Sox (70-83), the 1956 Pirates (66-88), the 1973 White Sox (77-84), the 1976 Rangers (76-86), the 1978 Athletics (69-93), and the 1986 Orioles (73-89). 

 

On average, these 101 teams had a .550 win pct during the rest of the season, and average end-of-season win pct of .562.  The worst end-of-season win pct was .426 and the best was .686.  So you can see that there is quite a large spread here.  For one thing, a .600 win pct is not sufficiently far away from .500; and for another, 40 games does not seem to be sufficient to allow us to make any definitive predictions.  In the next section, I?ll keep to a .600 start, but I?ll increase the number of games into the season the start extends.

.600 Teams after N Games

Here are the results I found at 20-game intervals.  Hopefully, this table will be formatted properly.

 

Start G

Start W-L

N

Start WPct

ROS WPct

EOS WPct

40

24-16

101

.600

.550

.562

60

36-24

86

.600

.559

.575

80

48-32

69

.600

.562

.581

100

60-40

49

.600

.566

.587

120

72-48

41

.600

.602

.601

140

84-56

37

.600

.608

.600

 

You can see that there are fewer .600 teams as you go deeper into the season.  I believe this confirms the "luck" explanation described above.  Also, the deeper you go into the season, the more confident we can be that the team actually is a .600-ish team. 

 

The next table presents the quartiles of the end-of-season (EOS) distribution of the teams that started out with a .600 win pct.

 

Start G

Lowest EOS

25% EOS

50% EOS

75% EOS

Highest EOS

40

.426

.519

.568

.600

.686

60

.478

.549

.574

.601

.669

80

.487

.556

.578

.605

.649

100

.500

.566

.590

.610

.647

120

.529

.586

.605

.612

.649

140

.568

.593

.599

.605

.625

 

As you can see, there is a "funnel effect" at play here.  The deeper into a season we observe a .600 win pct, the more confident we can be that the team really is (close to) a .600 team.  Plus, the spread in the end-of-season win pcts of these teams decreases as the start is extended.  Part of this is simply due to the "weight" on the start win pct is automatically increased, but another part is due to the rest-of-season win pcts are closer to the start win pct as well.


Fiddling with Formulas

Okay, where does this leave us?  I have presented data that shows that there is a systematic relationship between a team?s starting win pct and its finishing win pct.  The .500 level can be considered a magnet that attempts to pull every team?s win pct towards it.  The strength of the .500 attraction depends upon two factors: how far away from .500 is the starting win pct (e.g., we saw that the attraction is weaker for a .700 team than for a .600 team); and how deep into the season the start extends.

 

I have attempted to develop a formula that predicts a team?s end-of-season win pct depending upon its starting win pct and how many games the start consists of.  However, my preliminary attempts have not been entirely satisfactory.

 

My formulas have been of the form EOS = (A * START) + (1-A)*.500, where A depends upon the number of games in the start.  I have tried both linear and non-linear functions for the A relationship. 

 

I have also attempted to develop a formula that tells us how confident we should be in our EOS prediction (i.e., the spread in the second table above), without total success.  I am thinking that there is probably some systematic formula that links the two ideas based purely on statistical theory, but I have not yet worked this out.

Next Steps

I have thus far done a piece-meal investigation into the relationship between a team?s starting win pct and its finishing win pct.  I picked a .600 win pct arbitrarily.  Hopefully others can do a more systematic investigation, perhaps over all 20th century teams.

 

And besides just looking at more data, more thinking can be done on the derivation of useful formulas.  These formulas can have one of two sources, one purely empirical and the other based upon statistical derivations.  Ideally, the best formulas will combine the statistical underpinnings with the empirical observations. 

 

Comments are encouraged.

 

Rob Wood Posted: February 12, 2003 at 06:00 AM | 16 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Doug Posted: February 13, 2003 at 02:30 AM (#608764)
Interesting stuff.

You've set yourself an ambitious objective to come up with a predictive formula. If you can make it work, I suspect you could sell this to some folks in Vegas.

Maybe you'll think this is cheating, but I suspect your formula may be aided by working in Pythagorean WPCT to your formula, as a counterbalance to actual WPCT. This should help you get a better feel for the luck vs. true ability angle.

Some other things to throw in the mix might be:
- WPCT in most recent N games (possible fine tuning mechanism to account for your up/down rule)
- WPCT of opponents (current WPCT of each opponent already played or yet to play, weighted by games played or games remaining against that opponent - provides further refined view of results to date, OR predictor of future schedule challenges/opportunities)
- Home/Road mix to date or remaining (another take on the schedule angle)

Obviously, it would be a major labor of love to try and program the latter suggestions, but the Pythagorean bit should be workable, and probably a good enhancer to the predictive capabilities.
   2. jimd Posted: February 13, 2003 at 02:30 AM (#608789)
Interesting stuff (yes, there is an echo in here).

Here's a formula I've used before when trying to convert final standings from one schedule length to another. It's based on calculating a binomial(p=.5) z-score in the shorter schedule
and converting that to the longer schedule. (Hope I'm using the right terminology here, or at least you can figure out what I'm trying to say; it's been 30 years since stat class and I don't use it professionally.)

AdjustedDelta = (ActualWins - ActualLosses) * sqrt(ProjectedGames / ActualGames)

ProjectedWins = (ProjectedGames + AdjustedDelta)/2

Using this formula, the 29-11 start translates to 99.11 ProjectedWins on a 162 game schedule (.612 WPct).

I tried applying this to your .600 team data, and it seems to track the EOS results pretty well. But those darn teams seem to have a consistent ability to outperform this projection by one-to-two wins in each start-category. I'll leave explaining that discrepancy to the rest of you (because I don't have a clue).

It's just a thought; I hope it helps in some way.
   3. Ned Garvin: Male Prostitute Posted: February 13, 2003 at 02:30 AM (#608792)
I like the pyth. WP idea - maybe you could do the exact same analysis with PythWP and see if it gives you better results for what the rest of their season looks like. Does the pyth WP for the first N games match the actual record for the first N games? Does the pythWP for the first N games match the pythWP for the rest of the season? How about the actual WP for the rest of the season?
   4. Rob Wood Posted: February 13, 2003 at 02:31 AM (#608802)
Thanks for the good comments. Here I am not all that interested in the uber-stat to predict end-of-season records. I am only interested in extrapolating a team's current won-loss record to their final won-loss record, and how that prediction improves over the course of the season (the "funnel" effect).

I believe TangoTiger is preparing a prediction system for use during the upcoming season that will include many other factors such as Pythagorean information, strength of upcoming schedule, etc.

As people have pointed out, there are several reasons why teams fall back towards the .500 mark besides purely statistical factors. The schedule is the main one, strength of opposition as well as home vs road games need to even out over a full season. Possibly also riding your best players (pitchers) too much during the good times.

By the way, my preliminary estimate of a team's end-of-season win pct is given as:

EOS_Pred = [((1-f)^2.5)*0.500] + [(1-((1-f)^2.5))*WPct]

where f is the fraction of the season that has already been played, and WPct is the team's current win pct.

There is probably a straightforward relationship between f and the "precision" of the prediction that people can work out from pure statistical sampling theory.

Thanks again.
   5. Marc Stone Posted: February 14, 2003 at 02:31 AM (#608810)
You said you have the Retrosheet logs through 2002. I can only find them listed through 2001 on the web site. Is there some trick to getting 2002?
   6. Marc Stone Posted: February 14, 2003 at 02:31 AM (#608812)
About the article:

If a team wins 28 out of its first 40 games, how likely is it that the team is
1) A true .750 team playing slightly below its true level?
2) A true .700 team playing at its expected level?
3) A true .600 team playing above its true level?
4) A true .500 (or worse team) playing well above its true level?

A big part of the answer depends upon how common it is to have teams whose "true" (at least over the course of a season) winning percentage is .700 or .600 or whatever.

A true .500 team will win 25 times or more in a 40 game period about 15% of the time. A true .625 team will win 25 or more games about half the time. If .500 teams are ten times more common than .625 teams, a team with a hot start is three times more likely to play .500 ball over the rest of the season than .625 ball. If .500 teams are only twice as common, then a .625 pace would be more common

In an era where there is a great disparity in quality among teams, it is more likely that a team that won 70% of its first 40 games would be a true .700 team. In an era with great parity, a true .700 team would be unlikely so a hot start
   7. Rob Wood Posted: February 14, 2003 at 02:31 AM (#608815)
Marc, I accessed the Retrosheet game logs through the new "Games" feature of the Baseball-Reference website. The 2002 season information is available there.
   8. Rob Wood Posted: February 15, 2003 at 02:31 AM (#608831)
As a quick answer to Bernard, I think we are looking for a robust formula (or approach) that applies to any team after any number of games. Of course, the most interesting teams are the ones that start out really hot so they have received the most attention. By the way, the pull of the .500 record (or call it the regression to the mean) is quite real and, statistically speaking, should be factored in when projecting a team's final win pct, no matter what the team's record.
   9. Rob Wood Posted: February 17, 2003 at 02:31 AM (#608843)
In response to DCA and others, in a Bayesian framework as you describe one starts with a prior belief on the distribution of team win pct "abilities", and then "updates" one's beliefs based upon new observations.

The standard approach in a binomial setting such as winning or losing games is to use a beta-distribution as the prior (with mean .500). With a beta, the updating formula is quite simple.

The best estimate after observing a team has gone W-L to start a season is simply:

Pred = [W + X] / [W + L + 2X] where the value of X reflects the "confidence" we place in the prior beliefs.

The empirical evidence suggests that we can improve on this formula in the baseball setting. Possibly the reason is that going into each season we have different prior beliefs about the different teams. People expect the Yankees to win more games this year than Tampa Bay. The above formula ignores such information.

Maybe we can simply substitute last year's win pct for the mean of the prior distribution to improve the accuracy of the formula?
   10. Rob Wood Posted: February 17, 2003 at 02:31 AM (#608845)
Okay, the best updating formula using last year's record as the basis for the prior distribution is quite simple (and maybe obvious):

Pred = [W + WPREV] / [W + L + WPREV + LPREV]

where W is the team's current number of wins, L is the current losses, WPREV is last year's win total, and LPREV is last year's losses.

To give a couple examples, suppose a team starts 24-16 for a .600 win pct. If it went 96-64 last season (.600), then we'd predict that it really is a .600 team. If it went 80-80 last season (.500), then we'd predict that it really is a .520 team. And if it went 64-96 last season (.400), then we'd predict that it really is a .440 team.

Note that this is the best predictor for the team's win pct over the remainder of the season (ROS). So to get the end-of-season (EOS) win pct, you'd have to factor in how many games they have already played and how many games they have left to play.

Finally, this approach essentially assumes that this year's team is identical (drawn from the same population) as last year's team. This is an extreme assumption, especially in this era of free agency and frequent player movement. This approach also ignores scheduling factors that have been raised above such as a team with a good record is likely to have played a lot of weak teams and/or a lot of home games. Over the course of the season the schedule, of course, balances. These two assumptions work in opposite directions so I'm not sure how to incorporate them, or if it is necessary to do so.
   11. Rob Wood Posted: February 18, 2003 at 02:32 AM (#608856)
Here is a report on some further calculations. I used the updating prediction formula I posted above with the previous season's win pct as the prior distribution, and looked at the same 456 .600 and .700 teams I presented above.

Compared to the "base" formula using a .500 as the prior for all teams, this new formula leads to significantly more accurate predictions. The root mean squared error (RMSE) of the EOS predictions using the new formula is .034 compared to .048 using the base formula. That is, using the team's previous season win pct as the original "prior" expectation is significantly better than using a league-wide .500 level for the prior expectation.

Per the suggestion above, I then tweaked the new formula by multiplying last season's "weight" by a constant A (originally set to 1.0). I then minimized the RMSE using Excel's Solver by modifying A. The RMSE does not change very much (.034 becomes .033) and the best multiplicative factor A is 0.83. This result is based entirely on the 456 teams I included in my original study, so I am not sure how much the 83% figure will carry over.
   12. Rob Wood Posted: February 18, 2003 at 02:32 AM (#608860)
I should have included this in the previous post. The formula using last season's win pct as the prior expectation outperforms the non-linear formula (with exponent 2.25) that does not use last season's win pct. The RMSE over the same teams for the non-linear (2.25 exponent) formula is .036 compared to .033 from the previous post.

It turns out that there several teams for which knowing last season's win pct is critical information. Such as the 1924 Red Sox, the 1938 Yankees, the 1954 Yankees, the 1956 Pirates, and the 1978 Athletics.

I'm not exactly sure where this leaves us, but I appreciate the discussion.
   13. Rob Wood Posted: February 19, 2003 at 02:32 AM (#608867)
Dennis, yes your math looks correct. And thank you for running all those cases. If you wouldn't mind, could you use a multiplying of 0.83 on WPREV and WPREV+LPREV in the formula to see if the predictions get any better? And if it is feasible, it would really be interesting to optimize the multiplier; this would be fairly easy if you have it all in one Excel spreadsheet via the Solver tool.

Plus, it should be pointed out that my "cherry-picked" examples cover the entire 20th century, whereas Dennis' cases are all recent teams (I forget how far back you go). As we know that there is less stability in teams' records from year to year nowadays, the formula that uses last year's record as the prior expectation is likely to perform less well in this sample.

The formula could not know that Randy Johnson, Luis Gonzalez, Steve Finley (any others?) had signed with Arizona prior to the 1999 season, thus catapulting the Diamondbacks from 65 wins to 100 wins in one season. Such occurrences were very rare back in the day.

So maybe we need a variable formula depending upon the degree of "parity" (may not be the proper term) exhibited by that era's teams?
   14. Rob Wood Posted: February 19, 2003 at 02:32 AM (#608877)
Again, many thanks to Dennis for doing all that work. I really didn't realize that the linkage between consecutive season win pcts was so weak in the modern game.

By the way, in case it is not obvious, Dennis' linkage formula can be written as Next Year Win Pct Pred = .500 + 0.40*(Last Year - .500).

If anybody has the data handy, I wonder what that relationship looked like pre-1976, say, before free agency. I would bet that the 0.40 is more like 0.75. Anybody?
   15. Michael Posted: February 22, 2003 at 02:32 AM (#608893)
Dennis...could you send me that file of sequential W/L records in Excel? I'd like to mess around a bit with this problem and that would
help me a lot. This is great stuff by the way.

My email address is mlm1968@yahoo.com. Thanks
   16. Kevin Harlow Posted: February 23, 2003 at 02:32 AM (#608897)
I came up with the following cubic equation for rest-of-season winning percentage:

P = [.5*(P*(1-P)/G + 0.004/G) + Y*0.0028] / [P*(1-P)/G + 0.004/G + 0.0028]

where

P=ROS winning percentage
G=games played
Y=actual winning percentage in G games played

The derivation is on my website.

I also thought I'd compare my result to Dennis Boznango's, so I computed an estimate of EOS win% from my P estimate and actual winning percentage. Boznango's and my estimates have an R^2=0.998. The standard error is about 0.0044, which would usually lead to a difference of less than 1 win.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
Los Angeles El Hombre of Anaheim
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Syndicate

Page rendered in 0.5223 seconds
47 querie(s) executed