Page rendered in 1.0485 seconds
66 querie(s) executed
— Where BTF's Members Investigate the Grand Old Game
Tuesday, September 03, 2002
Strength of Schedule
An interesting look at park effects.
The straightforward calculation of park effects must be done with enormous caution.? If it is done with partial or even single season data the sample size may be too small; larger samples may obscure genuine changes from season to season.? Calculation is even more distorted if the schedule is unbalanced; runs scored in a particular park may have had more to do with strengths and weaknesses of the teams playing there than the intrinsic characteristics of the parks.? This is particularly likely if teams have not played equal numbers of home and away games with the same opponents.? The effect of an unbalanced schedule will diminish somewhat as the season progresses, but will not disappear because the schedules themselves are highly unbalanced.?
Distortion of park effects is only one consequence of the unbalanced schedule.? I do not believe there has been adequate consideration of the effects of an unbalanced schedule on analysis on the accomplishments of teams or players.? If a team plays relatively more home games against good offensive teams, for example, the park factor will be falsely inflated.? To make this point I have developed a more sophisticated method of estimating park effects that takes into account strength and balance of schedule.? The method turns out to have a number of additional benefits.
The number of runs a team scores in a game can be broken down into four elements: the team?s offensive prowess, the pitching and defense of the opposition, the park factor, and whether or not a designated hitter is used.? This can be written as a simple linear equation:
Runs scored = Offensive strength + Defensive (pitching and fielding) strength of opponent + Park factor + DH factor + random error
By writing the equation this way, I am suggesting that the park factor as well as the other three elements can be estimated using linear regression.? Each game played provides two observations, one for each team, consisting of runs scored; the identities of the park and offensive and defensive teams and whether a DH was used.? I estimated these factors for games played through August 23rd.
Before I present the results, I need to make a few statistical notes.? Because the distribution of runs scored is skewed to the right (lognormal), calculations are made on a logarithmic transformation of runs scored (The transformations are reversed in the reported results.).? This provides better statistical estimates and two additional advantages.? First, the effect of high scores is diminished.? This is desirable because there is a lower limit to runs scored but no upper limit.? Secondly, a logarithmic transformation treats park effects as multiplicative rather than a fixed number of runs per game, making it consistent with current practice.?
Here then are the results followed by explanations:
Off r/g is the number of runs per game the team is expected to score in a neutral park against average pitching and defense with no DH
ExpOff is the number of runs per game the team would be expected to score against the actual opponents in the actual parks but without a DH.? Because of the logarithmic transformation mentioned above, this is not the same as the average number of runs scored per game (arithmetic mean).? The number is essentially a geometric mean using a work around for shutouts (all numbers used in a geometric mean must be positive) and should also be close to the median.
Off Crxn is the Offensive Correction multiplier, the ratio of Off r/g to ExpOff.? It is used to correct any unadjusted run-based offensive statistic (runs scored, RBI, runs created, Raw EqR, etc.) for teams or players for both park effects and the defensive strength of opposition.
Def r/g is the expected number of runs per game allowed in a neutral park against average offense with no DH.
ExpDef is the number of runs per game the team would be expected to allow against the actual opponents in the actual parks but without a DH.
Def Crxn is the Defensive Correction multiplier, the ratio of Def r/g to ExpDef.? It is used to correct any unadjusted run-based defensive statistic (runs allowed, runs prevented, ERA, etc.) for both park effects and the offensive strength of opposition.
Park r/g is the expected number of runs per game scored per team in that team?s home field by teams with average offense and defense and no DH.
Actual Park is the expected runs per team per game without the DH given the actual teams that have played there.? The DH effect (currently estimated at 9.5%) can be multiplied to give a more realistic number for AL parks.
Teams Effect is the ratio of Actual Park to Park r/g and represents the relative strength of offense versus defense of the teams that have played in the park distortion to apparent park effects caused by the unbalanced schedule.? This is split into
HmTm Effect, the degree to which the home team offense and defense would raise or lower runs scored in the park against average opposition.
VsTm Effect, the degree to which the visiting team offense and defense would raise or lower runs scored in the park against average opposition.
Neutral WP% is the Pythagorean Winning Percentage using Off r/g and Def r/g.? It is an adjustment for strength of schedule, estimating how the team would be doing against average opposition.? It should be a good measure of the relative quality of all the major league teams.
ExPyth is the Expected Pythagorean Winning Percentage based on the expected runs scored and allowed against the actual opposition.? It should be close to the Pythagorean Winning Percentage calculated using actual runs scored and allowed but underweighs high scoring games.
SoS is the relative Strength of Schedule, the ratio of Neutral WP% to ExPyth.? Multiply SoS by the number of games won to adjust the team?s win total for the strength of its opposition.? Note the largest adjustments. Toronto, because of all its games against the Yankees and Bosox, has been penalized about five wins.? The Twins, in addition to being fortunate in one-run games and extra innings, have accrued about four additional wins from playing their weak opposition.
Offrank is the team?s offensive ranking (by Off r/g).? It?s no surprise that the Yankees are No. 1 but how can the Phillies be second?? See below.? The teams at the bottom are not surprising, although it?s impressive how poor the Tigers look even after correcting for Comerica.
Defrank is the team?s defensive ranking (by ascending Def r/g).? No surprises at the top: the A?s, Atlanta, Arizona, Boston, and Anaheim.? I have no concern here about confounding with park effects; there are good hitter?s parks (AZ and Oak (this year)), mild pitcher?s parks (Bos and Ana) and a neutral park (Atl).? The bottom rankings seem plausible except perhaps the Phillies (once again, vide infra)
Parkrank is the ranking of park effects.? The ranking of 1 (Colorado) is the best hitter?s park; 30 (Philly) is the best pitcher?s park.? Coors rose from 12th to first from Memorial Day to the All-Star break.? So much for the humidor.
Overank is the overall ranking.? The Red Sox are on top, followed by the Yankees, D-backs and Braves.? Milwaukee ranks dead last behind Tampa and Detroit.? The Tigers and Brewers play in weak divisions while the D-rays have all sorts of games against the Yankees and Boston and have done badly in one-run games so cut them some slack.? Why are the Red Sox on top?? The same reason they are at the top of the Pythagorean standings calculated with actual runs scored and allowed.? They too have been unlucky in one-run games but strength of schedule has not affected them at all, deflating their win total by only 0.6%
So what about the Phillies?? Does my method overstate the pitcher?s park effect?? There is little question the Vet is a pitcher?s park: the Phillies are fifth in the majors in EqA using conventionally calculated park effects even though they are only seventh in the National League in runs scored.? And the conventional park effect calculation would likely understate that effect; the Phillies play almost half their road games against the NL East, which has no good hitter?s parks (I rank Atlanta as the best one at 15th).? Their home/road differentials are made against other pitcher?s parks on average. Finally, the Phillies face the Braves? pitching as much as anyone and they are the only team in their division who don?t get to hit against their own crummy pitching.?
Looking at the results as a whole, there is a moderate negative correlation between offensive and defensive ratings (-.19) and stronger negative correlations between Park r/g and both Def r/g (-.27) and Off r/g (-.47).? This would suggest either difficulty in separating park from true offensive and defensive effects or an affirmation of the role of park effects in concealing team weaknesses, i.e., Coors makes the Rockies think their hitting is better and their pitching is worse than it really is so they concentrate on improving the latter at the cost of the former.? I believe the latter to be the case.? The method allows for formal statistical tests; these show that the differences among teams in offense, defense and park effects are all highly and separately statistically significant
You must be logged in to view your Bookmarks.
What do you do with Deacon White?
(17 - 1:12pm, Dec 23)
Last: Alex King
(15 - 12:05am, Oct 18)
Nine (Year) Men Out: Free El Duque!
(67 - 10:46am, May 09)
Who is Shyam Das?
(4 - 8:52pm, Feb 23)
Last: RoyalsRetro (AG#1F)
Greg Spira, RIP
(45 - 10:22pm, Jan 09)
Last: Jonathan Spira
Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010
(5 - 12:50am, Sep 18)
Mike Morgan, the Nexus of the Baseball Universe?
(37 - 12:33pm, Jun 23)
Last: The Keith Law Blog Blah Blah (battlekow)
Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011
(2 - 8:03pm, May 16)
Last: Diamond Research
Retrosheet Semi-Annual Site Update!
(4 - 4:07pm, Nov 18)
What Might Work in the World Series, 2010 Edition
(5 - 3:27pm, Nov 12)
Last: fra paolo
Predicting the 2010 Playoffs
(11 - 5:21pm, Oct 20)
SABR 40: Impressions of a First-Time Attendee
(5 - 11:12pm, Aug 19)
Last: Joe Bivens, Minor Genius
St. Louis Cardinals Midseason Report
(12 - 12:42am, Aug 10)
Napoleon Lajoie: Definition of Grace
(9 - 12:38am, Jul 01)
Last: Hang down your head, Tom Foley
Youth Baseball Hitting Drills: Shine the Light
(5 - 6:47am, Mar 11)
Last: Pat Rapper's Delight