You are here > Home > Primate Studies > Discussion
 
Primate Studies — Where BTF's Members Investigate the Grand Old Game Wednesday, February 12, 2003How Important are InSeason Winning Percentages?Rob takes a look at the predictive value of seasonal records. Recently Retrosheet has made available the Game Logs for every game played in the 20th century (19012002) NL and AL. The logs provide summary information on each game, including the number of runs scored by each team. A great deal of research can be conducted using this complete collection of game logs. In this article, I want to scratch the surface of an issue that has been around for a very long time. Namely, how accurate are withinseason win pcts as a predictor of a team?s endofseason win pct.
For example, suppose a team starts out 73 in its first 10 games. Do we expect it to go on and win 100 games that season? What if the team starts out 3515 (again a .700 win pct), are we now more inclined to predict that it will win 100 games? From a purely empirical perspective, the game logs allow us to look at each team?s win pct evolution through every season in the 20th century.
You may remember that Bill James wrote about this issue in his 1985 Baseball Abstract in the Detroit Tigers commentary following the Tigers blistering 355 start in 1984. In my article here, I will present some data and some ideas following in James? footsteps. However, I merely hope to rekindle a dialog on methods and findings, so feel free to chime in with any comments or ideas you may have. .700 Teams after 40 games
Let?s start our journey by looking at the best teams ever after 40 games into the season. As you probably know, the 1984 Tigers at 355 were the best ever (win pct of .875). I have arbitrarily set a cutoff of a .700 win pct to qualify for the elite teams in my study (2812 or better after 40 games). By my count, there have been 75 teams to get off to such a hot start. The first was the 1902 Pirates (337) and the most recent were the 2002 Mariners (2812) and the 2002 Red Sox (2911).
For comparability purposes, in what follows I will exclude the 1981 Dodgers (2911) and the 1994 Yankees (2812) because of the shortened seasons due to the strikes. So that leaves us with the 73 teams with the hottest 40game starts in history. How do they wind up? Are all these great teams? How many clunkers are in the bunch? Are 40 games enough to allow us to get a good read on these teams?
Yes, indeed, many of these were great teams and went on to post fabulous endofseason records. The 1902 Pirates (10236, .739), the 1905 Giants (10547, .691), the 1907 Cubs (10744, .709), the 1909 Pirates (11042, .724), the 1931 Athletics (10745, .704), the 1932 Yankees (10747, .695), the 1939 Yankees (10645, .702), the 1970 Orioles (10854, .667), the 1986 Mets (10854, .667), the 1995 Indians (10044, .694), the 1998 Yankees (11448, .704), and the 2001 Mariners (11646, .716). So a lot of the game?s alltime best teams got off to a hot start after 40 games.
But there were also some clunkers who got off to hot starts. At least, several of these teams are far less than world beaters. Consider the 1907 Giants (started 2812, then went 5459, for an 8271 record), the 1912 White Sox (2812, 5064; 7876), the 1951 White Sox (2911, 5262; 8173), the 1972 Mets (2911, 5462; 8373), and the 2001 Twins (2812, 5765; 8577).
The average endofseason win pct of these 73 hot starters is .624. Let?s break that out a little bit:
Not surprisingly, each of these collections performed worse after their hot starts. This is a wellknown phenomenon in stochastic processes, and has taken many names in the baseball analysis lexicon. I simply call it the UpDown theory. Teams that are up are apt to go down, and vice versa. Same is true for hitters, pitchers, and every day people. You?re never as good as you look when you win and you?re never as bad as you look when you lose. The truth is almost always somewhere in the middle.
In fact, only one of these 75 teams (including the two strikeyear teams) started off hot, and then got hotter. These were the Honus Wagnerled 1909 Pirates who went 11042. They started off 2812 (.700) and then proceeded to go 8230 (.732) the rest of the season.
There are many ways to rationalize why the vast majority of teams that get off to a hot start cool off. You can think of the .500 level as being a magnet of sorts. The way I like to think about it is the following. Each team has an underlying "true ability". When we observe a very hot team, we can ask ourselves the question ? which is more likely, that the team is playing to its actual ability or that the team?s true ability is less than its current record but it has been the recipient of good luck? Since we intuitively believe that the distribution of true abilities of all the teams in the league form something like a bell curve, this second possibility is always more likely. .600 Teams after 40 Games
Let?s dial down our definition of hot start from a win pct of .700 (2812) to .600 (2416). Of course there is more "room" for a team that starts 2416 to improve over the rest of the season compared to a 2812 start.
There have been 101 teams that began a season 2416 after 40 games, starting with the 1903 Pirates and most recently the 2001 Phillies and Cardinals. A quick review of these 101 .600 starting teams shows that there is more of a mixture of quality than the .700 starters. We do have very good teams such as the 1909 Cubs (10449), the 1919 Reds (9644), the 1954 Yankees (10351), the 1968 Tigers (10359), and the 1976 Reds (10260). But we also have such mediocrities as the 1924 Red Sox (6887), the 1927 White Sox (7083), the 1956 Pirates (6688), the 1973 White Sox (7784), the 1976 Rangers (7686), the 1978 Athletics (6993), and the 1986 Orioles (7389).
On average, these 101 teams had a .550 win pct during the rest of the season, and average endofseason win pct of .562. The worst endofseason win pct was .426 and the best was .686. So you can see that there is quite a large spread here. For one thing, a .600 win pct is not sufficiently far away from .500; and for another, 40 games does not seem to be sufficient to allow us to make any definitive predictions. In the next section, I?ll keep to a .600 start, but I?ll increase the number of games into the season the start extends. .600 Teams after N Games
Here are the results I found at 20game intervals. Hopefully, this table will be formatted properly.
You can see that there are fewer .600 teams as you go deeper into the season. I believe this confirms the "luck" explanation described above. Also, the deeper you go into the season, the more confident we can be that the team actually is a .600ish team.
The next table presents the quartiles of the endofseason (EOS) distribution of the teams that started out with a .600 win pct.
As you can see, there is a "funnel effect" at play here. The deeper into a season we observe a .600 win pct, the more confident we can be that the team really is (close to) a .600 team. Plus, the spread in the endofseason win pcts of these teams decreases as the start is extended. Part of this is simply due to the "weight" on the start win pct is automatically increased, but another part is due to the restofseason win pcts are closer to the start win pct as well.
Fiddling with Formulas
Okay, where does this leave us? I have presented data that shows that there is a systematic relationship between a team?s starting win pct and its finishing win pct. The .500 level can be considered a magnet that attempts to pull every team?s win pct towards it. The strength of the .500 attraction depends upon two factors: how far away from .500 is the starting win pct (e.g., we saw that the attraction is weaker for a .700 team than for a .600 team); and how deep into the season the start extends.
I have attempted to develop a formula that predicts a team?s endofseason win pct depending upon its starting win pct and how many games the start consists of. However, my preliminary attempts have not been entirely satisfactory.
My formulas have been of the form EOS = (A * START) + (1A)*.500, where A depends upon the number of games in the start. I have tried both linear and nonlinear functions for the A relationship.
I have also attempted to develop a formula that tells us how confident we should be in our EOS prediction (i.e., the spread in the second table above), without total success. I am thinking that there is probably some systematic formula that links the two ideas based purely on statistical theory, but I have not yet worked this out. Next Steps
I have thus far done a piecemeal investigation into the relationship between a team?s starting win pct and its finishing win pct. I picked a .600 win pct arbitrarily. Hopefully others can do a more systematic investigation, perhaps over all 20th century teams.
And besides just looking at more data, more thinking can be done on the derivation of useful formulas. These formulas can have one of two sources, one purely empirical and the other based upon statistical derivations. Ideally, the best formulas will combine the statistical underpinnings with the empirical observations.
Comments are encouraged.

BookmarksYou must be logged in to view your Bookmarks. Hot TopicsLoser Scores 2015
(12  2:28pm, Nov 17) Last: jingoist Loser Scores 2014 (8  2:36pm, Nov 15) Last: willcarrolldoesnotsuk Winning Pitcher: Bumgarner....er, Affeldt (43  8:29am, Nov 05) Last: ERRORJolly Old St. Nick What do you do with Deacon White? (17  12:12pm, Dec 23) Last: Alex King Loser Scores (15  12:05am, Oct 18) Last: mkt42 Nine (Year) Men Out: Free El Duque! (67  10:46am, May 09) Last: DanG Who is Shyam Das? (4  7:52pm, Feb 23) Last: RoyalsRetro (AG#1F) Greg Spira, RIP (45  9:22pm, Jan 09) Last: Jonathan Spira Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010 (5  12:50am, Sep 18) Last: balamar Mike Morgan, the Nexus of the Baseball Universe? (37  12:33pm, Jun 23) Last: The Keith Law Blog Blah Blah (battlekow) Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011 (2  8:03pm, May 16) Last: Diamond Research Retrosheet SemiAnnual Site Update! (4  3:07pm, Nov 18) Last: Sweatpants What Might Work in the World Series, 2010 Edition (5  2:27pm, Nov 12) Last: fra paolo Predicting the 2010 Playoffs (11  5:21pm, Oct 20) Last: TomH SABR 40: Impressions of a FirstTime Attendee (5  11:12pm, Aug 19) Last: Joe Bivens, Floundering Pumpkin 

Page rendered in 0.3205 seconds 
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. Doug Posted: February 13, 2003 at 01:30 AM (#608764)You've set yourself an ambitious objective to come up with a predictive formula. If you can make it work, I suspect you could sell this to some folks in Vegas.
Maybe you'll think this is cheating, but I suspect your formula may be aided by working in Pythagorean WPCT to your formula, as a counterbalance to actual WPCT. This should help you get a better feel for the luck vs. true ability angle.
Some other things to throw in the mix might be:
 WPCT in most recent N games (possible fine tuning mechanism to account for your up/down rule)
 WPCT of opponents (current WPCT of each opponent already played or yet to play, weighted by games played or games remaining against that opponent  provides further refined view of results to date, OR predictor of future schedule challenges/opportunities)
 Home/Road mix to date or remaining (another take on the schedule angle)
Obviously, it would be a major labor of love to try and program the latter suggestions, but the Pythagorean bit should be workable, and probably a good enhancer to the predictive capabilities.
Here's a formula I've used before when trying to convert final standings from one schedule length to another. It's based on calculating a binomial(p=.5) zscore in the shorter schedule
and converting that to the longer schedule. (Hope I'm using the right terminology here, or at least you can figure out what I'm trying to say; it's been 30 years since stat class and I don't use it professionally.)
AdjustedDelta = (ActualWins  ActualLosses) * sqrt(ProjectedGames / ActualGames)
ProjectedWins = (ProjectedGames + AdjustedDelta)/2
Using this formula, the 2911 start translates to 99.11 ProjectedWins on a 162 game schedule (.612 WPct).
I tried applying this to your .600 team data, and it seems to track the EOS results pretty well. But those darn teams seem to have a consistent ability to outperform this projection by onetotwo wins in each startcategory. I'll leave explaining that discrepancy to the rest of you (because I don't have a clue).
It's just a thought; I hope it helps in some way.
I believe TangoTiger is preparing a prediction system for use during the upcoming season that will include many other factors such as Pythagorean information, strength of upcoming schedule, etc.
As people have pointed out, there are several reasons why teams fall back towards the .500 mark besides purely statistical factors. The schedule is the main one, strength of opposition as well as home vs road games need to even out over a full season. Possibly also riding your best players (pitchers) too much during the good times.
By the way, my preliminary estimate of a team's endofseason win pct is given as:
EOS_Pred = [((1f)^2.5)*0.500] + [(1((1f)^2.5))*WPct]
where f is the fraction of the season that has already been played, and WPct is the team's current win pct.
There is probably a straightforward relationship between f and the "precision" of the prediction that people can work out from pure statistical sampling theory.
Thanks again.
If a team wins 28 out of its first 40 games, how likely is it that the team is
1) A true .750 team playing slightly below its true level?
2) A true .700 team playing at its expected level?
3) A true .600 team playing above its true level?
4) A true .500 (or worse team) playing well above its true level?
A big part of the answer depends upon how common it is to have teams whose "true" (at least over the course of a season) winning percentage is .700 or .600 or whatever.
A true .500 team will win 25 times or more in a 40 game period about 15% of the time. A true .625 team will win 25 or more games about half the time. If .500 teams are ten times more common than .625 teams, a team with a hot start is three times more likely to play .500 ball over the rest of the season than .625 ball. If .500 teams are only twice as common, then a .625 pace would be more common
In an era where there is a great disparity in quality among teams, it is more likely that a team that won 70% of its first 40 games would be a true .700 team. In an era with great parity, a true .700 team would be unlikely so a hot start
The standard approach in a binomial setting such as winning or losing games is to use a betadistribution as the prior (with mean .500). With a beta, the updating formula is quite simple.
The best estimate after observing a team has gone WL to start a season is simply:
Pred = [W + X] / [W + L + 2X] where the value of X reflects the "confidence" we place in the prior beliefs.
The empirical evidence suggests that we can improve on this formula in the baseball setting. Possibly the reason is that going into each season we have different prior beliefs about the different teams. People expect the Yankees to win more games this year than Tampa Bay. The above formula ignores such information.
Maybe we can simply substitute last year's win pct for the mean of the prior distribution to improve the accuracy of the formula?
Pred = [W + WPREV] / [W + L + WPREV + LPREV]
where W is the team's current number of wins, L is the current losses, WPREV is last year's win total, and LPREV is last year's losses.
To give a couple examples, suppose a team starts 2416 for a .600 win pct. If it went 9664 last season (.600), then we'd predict that it really is a .600 team. If it went 8080 last season (.500), then we'd predict that it really is a .520 team. And if it went 6496 last season (.400), then we'd predict that it really is a .440 team.
Note that this is the best predictor for the team's win pct over the remainder of the season (ROS). So to get the endofseason (EOS) win pct, you'd have to factor in how many games they have already played and how many games they have left to play.
Finally, this approach essentially assumes that this year's team is identical (drawn from the same population) as last year's team. This is an extreme assumption, especially in this era of free agency and frequent player movement. This approach also ignores scheduling factors that have been raised above such as a team with a good record is likely to have played a lot of weak teams and/or a lot of home games. Over the course of the season the schedule, of course, balances. These two assumptions work in opposite directions so I'm not sure how to incorporate them, or if it is necessary to do so.
Compared to the "base" formula using a .500 as the prior for all teams, this new formula leads to significantly more accurate predictions. The root mean squared error (RMSE) of the EOS predictions using the new formula is .034 compared to .048 using the base formula. That is, using the team's previous season win pct as the original "prior" expectation is significantly better than using a leaguewide .500 level for the prior expectation.
Per the suggestion above, I then tweaked the new formula by multiplying last season's "weight" by a constant A (originally set to 1.0). I then minimized the RMSE using Excel's Solver by modifying A. The RMSE does not change very much (.034 becomes .033) and the best multiplicative factor A is 0.83. This result is based entirely on the 456 teams I included in my original study, so I am not sure how much the 83% figure will carry over.
It turns out that there several teams for which knowing last season's win pct is critical information. Such as the 1924 Red Sox, the 1938 Yankees, the 1954 Yankees, the 1956 Pirates, and the 1978 Athletics.
I'm not exactly sure where this leaves us, but I appreciate the discussion.
Plus, it should be pointed out that my "cherrypicked" examples cover the entire 20th century, whereas Dennis' cases are all recent teams (I forget how far back you go). As we know that there is less stability in teams' records from year to year nowadays, the formula that uses last year's record as the prior expectation is likely to perform less well in this sample.
The formula could not know that Randy Johnson, Luis Gonzalez, Steve Finley (any others?) had signed with Arizona prior to the 1999 season, thus catapulting the Diamondbacks from 65 wins to 100 wins in one season. Such occurrences were very rare back in the day.
So maybe we need a variable formula depending upon the degree of "parity" (may not be the proper term) exhibited by that era's teams?
By the way, in case it is not obvious, Dennis' linkage formula can be written as Next Year Win Pct Pred = .500 + 0.40*(Last Year  .500).
If anybody has the data handy, I wonder what that relationship looked like pre1976, say, before free agency. I would bet that the 0.40 is more like 0.75. Anybody?
help me a lot. This is great stuff by the way.
My email address is mlm1968@yahoo.com. Thanks
P = [.5*(P*(1P)/G + 0.004/G) + Y*0.0028] / [P*(1P)/G + 0.004/G + 0.0028]
where
P=ROS winning percentage
G=games played
Y=actual winning percentage in G games played
The derivation is on my website.
I also thought I'd compare my result to Dennis Boznango's, so I computed an estimate of EOS win% from my P estimate and actual winning percentage. Boznango's and my estimates have an R^2=0.998. The standard error is about 0.0044, which would usually lead to a difference of less than 1 win.
You must be Registered and Logged In to post comments.
<< Back to main