Baseball for the Thinking Fan

You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

## Tuesday, September 03, 2002

#### Strength of Schedule

An interesting look at park effects.

The straightforward calculation of park effects must be done with enormous   caution.? If it is done with partial or even single season data the sample   size may be too small; larger samples may obscure genuine changes from season   to season.? Calculation is even more distorted if the schedule is unbalanced;   runs scored in a particular park may have had more to do with strengths and   weaknesses of the teams playing there than the intrinsic characteristics of   the parks.? This is particularly likely if teams have not played equal numbers   of home and away games with the same opponents.? The effect of an unbalanced   schedule will diminish somewhat as the season progresses, but will not disappear   because the schedules themselves are highly unbalanced.?

Distortion of park effects is only one consequence of the unbalanced schedule.?   I do not believe there has been adequate consideration of the effects of an   unbalanced schedule on analysis on the accomplishments of teams or players.?   If a team plays relatively more home games against good offensive teams, for   example, the park factor will be falsely inflated.? To make this point I have   developed a more sophisticated method of estimating park effects that takes   into account strength and balance of schedule.? The method turns out to have   a number of additional benefits.

The number of runs a team scores in a game can be broken down into four elements:   the team?s offensive prowess, the pitching and defense of the opposition,   the park factor, and whether or not a designated hitter is used.? This can   be written as a simple linear equation:

Runs scored = Offensive strength + Defensive (pitching and fielding) strength   of opponent + Park factor + DH factor + random error

By writing the equation this way, I am suggesting that the park factor as   well as the other three elements can be estimated using linear regression.?   Each game played provides two observations, one for each team, consisting   of runs scored; the identities of the park and offensive and defensive teams   and whether a DH was used.? I estimated these factors for games played through   August 23rd.

Before I present the results, I need to make a few statistical notes.? Because   the distribution of runs scored is skewed to the right (lognormal), calculations   are made on a logarithmic transformation of runs scored (The transformations   are reversed in the reported results.).? This provides better statistical   estimates and two additional advantages.? First, the effect of high scores   is diminished.? This is desirable because there is a lower limit to runs scored   but no upper limit.? Secondly, a logarithmic transformation treats park effects   as multiplicative rather than a fixed number of runs per game, making it consistent   with current practice.?

Here then are the results followed by explanations:

 Off r/g ExpOff Off Crxn Def r/g ExpDef Def Crxn Park r/g Actual Park Teams Effect HmTm Effect VsTm Effect Neutral WP% ExPyth SoS Offrank Defrank Parkrank Overank Anahe ?????? 4.46 ?????? 4.47 ???? 0.997 ?????? 3.68 ?????? 3.67 ???? 1.004 ?????? 3.81 ?????? 3.82 ?????? 1.00 ???? 0.985 ???? 1.017 ???? 0.594 ???? 0.598 ???? 0.994 8 5 21 7 Arizo ?????? 4.48 ?????? 4.57 ???? 0.981 ?????? 3.46 ?????? 3.49 ???? 0.994 ?????? 4.59 ?????? 4.30 ?????? 0.94 ???? 0.962 ???? 0.973 ???? 0.626 ???? 0.632 ???? 0.990 7 3 6 3 Atlan ?????? 4.07 ?????? 4.01 ???? 1.016 ?????? 3.16 ?????? 3.04 ???? 1.039 ?????? 4.14 ?????? 3.58 ?????? 0.86 ???? 0.876 ???? 0.986 ???? 0.624 ???? 0.635 ???? 0.983 16 2 15 4 Balti ?????? 3.82 ?????? 3.91 ???? 0.978 ?????? 3.88 ?????? 4.09 ???? 0.950 ?????? 4.09 ?????? 3.98 ?????? 0.97 ???? 0.933 ???? 1.044 ???? 0.492 ???? 0.478 ???? 1.031 24 8 16 14 Bosto ?????? 4.85 ?????? 4.76 ???? 1.021 ?????? 3.51 ?????? 3.47 ???? 1.012 ?????? 3.86 ?????? 3.97 ?????? 1.03 ???? 1.013 ???? 1.016 ???? 0.656 ???? 0.652 ???? 1.006 3 4 20 1 ChiCub ?????? 4.19 ?????? 3.94 ???? 1.064 ?????? 4.54 ?????? 4.18 ???? 1.086 ?????? 3.90 ?????? 3.96 ?????? 1.02 ???? 1.058 ???? 0.961 ???? 0.460 ???? 0.470 ???? 0.978 13 25 18 20 ChiSox ?????? 4.04 ?????? 4.37 ???? 0.926 ?????? 4.41 ?????? 4.66 ???? 0.948 ?????? 4.49 ?????? 4.70 ?????? 1.05 ???? 1.024 ???? 1.022 ???? 0.456 ???? 0.468 ???? 0.975 17 22 8 22 Cinci ?????? 4.00 ?????? 4.11 ???? 0.975 ?????? 4.26 ?????? 4.31 ???? 0.990 ?????? 4.71 ?????? 4.49 ?????? 0.95 ???? 1.001 ???? 0.951 ???? 0.469 ???? 0.476 ???? 0.984 20 18 5 19 Cleve ?????? 3.67 ?????? 3.88 ???? 0.947 ???? ??4.41 ?????? 4.62 ???? 0.955 ?????? 4.22 ?????? 4.25 ?????? 1.01 ???? 0.978 ???? 1.030 ???? 0.409 ???? 0.413 ???? 0.990 26 21 10 25 Color ?????? 3.86 ?????? 4.31 ???? 0.897 ?????? 4.20 ?????? 4.75 ???? 0.885 ?????? 5.70 ?????? 5.43 ?? ????0.95 ???? 0.977 ???? 0.976 ???? 0.458 ???? 0.451 ???? 1.015 23 16 1 21 Detro ?????? 3.55 ?????? 3.40 ???? 1.044 ?????? 4.63 ?????? 4.53 ???? 1.024 ?????? 3.56 ?????? 3.63 ?????? 1.02 ???? 0.991 ???? 1.031 ???? 0.370 ???? 0.360 ???? 1.026 28 27 25 29 Flori ?????? 4.03 ?????? 3.68 ???? 1.093 ?????? 4.49 ?????? 4.06 ???? 1.106 ?????? 3.65 ?????? 3.65 ?????? 1.00 ???? 1.031 ???? 0.971 ???? 0.446 ???? 0.452 ???? 0.988 18 23 24 23 Houst ?????? 4.18 ?????? 4.27 ???? 0.979 ?????? 4.01 ?????? 3.92 ???? 1.023 ?????? 4.36 ?????? 4.17 ?????? 0.96 ???? 0.992 ???? 0.963 ???? 0.520 ???? 0.542 ???? 0.959 14 12 9 12 Kansa ?????? 3.31 ?????? 3.94 ???? 0.840 ?????? 4.09 ?????? 4.76 ???? 0.858 ?????? 5.20 ?????? 4.96 ?????? 0.95 ???? 0.896 ???? 1.063 ???? 0.396 ???? 0.406 ???? 0.976 30 14 3 27 Los A ?????? 4.28 ?????? 3.81 ???? 1.122 ?????? 3.96 ?????? 3.50 ???? 1.130 ?????? 3.47 ?????? 3.35 ?????? 0.97 ???? 0.997 ???? 0.970 ???? 0.539 ???? 0.542 ???? 0.994 11 10 28 9 Milwa ?????? 3.72 ?????? 3.42 ???? 1.090 ?????? 4.96 ?????? 4.54 ???? 1.092 ?????? 3.79 ?????? 3.80 ?????? 1.00 ???? 1.051 ???? 0.954 ???? 0.361 ???? 0.362 ???? 0.997 25 29 22 30 Minne ?????? 4.16 ?????? 4.45 ???? 0.934 ?????? 3.94 ?????? 3.99 ???? 0.989 ?????? 4.21 ?????? 4.24 ?????? 1.01 ???? 0.981 ???? 1.026 ???? 0.527 ???? 0.555 ???? 0.949 15 9 11 11 Montr ?????? 4.35 ?????? 4.05 ???? 1.073 ?????? 4.49 ?????? 4.09 ???? 1.098 ?????? 3.88 ???? ??3.99 ?????? 1.03 ???? 1.070 ???? 0.960 ???? 0.484 ???? 0.495 ???? 0.976 9 24 19 16 NY Me ?????? 3.91 ?????? 3.64 ???? 1.074 ?????? 4.07 ?????? 3.81 ???? 1.068 ?????? 3.98 ?????? 3.71 ?????? 0.93 ???? 0.967 ???? 0.965 ???? 0.480 ???? 0.477 ???? 1.006 22 13 17 17 NY Ya ?????? 5.00 ?????? 5.12 ???? 0.978 ?????? 3.83 ?????? 4.00 ???? 0.958 ?????? 4.14 ?????? 4.52 ?????? 1.09 ???? 1.070 ???? 1.020 ???? 0.630 ???? 0.620 ???? 1.015 1 7 14 2 Oakla ?????? 3.93 ?????? 4.35 ? ???0.904 ?????? 3.10 ?????? 3.53 ???? 0.879 ?????? 4.72 ?????? 4.24 ?????? 0.90 ???? 0.851 ???? 1.055 ???? 0.616 ???? 0.603 ???? 1.022 21 1 4 5 Phila ?????? 4.88 ?????? 4.08 ???? 1.195 ?????? 5.01 ?????? 4.15 ???? 1.207 ?????? 3.11 ??? ???3.62 ?????? 1.16 ???? 1.198 ???? 0.972 ???? 0.486 ???? 0.492 ???? 0.989 2 30 30 15 Pitts ?????? 3.36 ?????? 3.48 ???? 0.964 ?????? 4.14 ?????? 4.16 ???? 0.995 ?????? 4.55 ?????? 4.07 ?????? 0.89 ???? 0.908 ???? 0.984 ???? 0.397 ???? 0.411 ???? 0.964 29 15 7 26 San D ?????? 4.27 ?????? 3.71 ???? 1.149 ?????? 4.83 ?????? 4.33 ???? 1.116 ?????? 3.39 ?????? 3.63 ?????? 1.07 ???? 1.101 ???? 0.971 ???? 0.439 ???? 0.424 ???? 1.034 12 28 29 24 San F ?????? 4.65 ?????? 4.13 ???? 1.124 ?????? 3.96 ?????? 3.50 ???? 1.131 ?????? 3.48 ?????? 3.44 ?????? 0.99 ???? 1.042 ???? 0.948 ???? 0.580 ???? 0.583 ???? 0.995 5 11 27 8 Seatt ?????? 4.75 ?????? 4.56 ???? 1.042 ?????? 3.76 ?????? 3.65 ???? 1.033 ?????? 3.54 ?????? 3.75 ?????? 1.06 ???? 1.031 ???? 1.025 ???? 0.614 ???? 0.610 ???? 1.007 4 6 26 6 St. L ?????? 4.56 ?????? 4.19 ???? 1.088 ?????? 4.31 ?????? 3.76 ???? 1.147 ?????? 3.67 ?????? 3.75 ?????? 1.02 ???? 1.074 ???? 0.951 ???? 0.527 ?? ??0.553 ???? 0.953 6 19 23 10 Tampa ?????? 3.64 ?????? 3.68 ???? 0.989 ?????? 4.60 ?????? 4.93 ???? 0.933 ?????? 4.20 ?????? 4.33 ?????? 1.03 ???? 0.998 ???? 1.034 ???? 0.385 ???? 0.358 ???? 1.075 27 26 12 28 Texas ?????? 4.01 ?????? 4.60 ???? 0.873 ?????? 4.23 ?????? 5.01 ???? 0.844 ?????? 5.27 ?????? 5.41 ?????? 1.03 ???? 0.999 ???? 1.029 ???? 0.473 ???? 0.457 ???? 1.036 19 17 2 18 Toron ?????? 4.34 ?????? 4.38 ???? 0.992 ?????? 4.39 ?????? 4.78 ???? 0.919 ?????? 4.19 ?????? 4.59 ?????? 1.09 ???? 1.058 ???? 1.035 ???? 0.494 ???? 0.456 ???? 1.084 10 20 13 13 Mean ?????? 4.14 ?????? 4.11 ?????? 1.01 ?????? 4.14 ?????? 4.11 ?????? 1.01 ?????? 4.13 ?????? 4.11 ?????? 1.00 ?????? 1.00 ?????? 1.00 ???? 0.500 ???? 0.501 StdDev ?????? 0.44 ?????? 0.41 ?????? 0.08 ?????? 0.48 ?????? 0.51 ?????? 0.09 ?????? 0.59 ?????? 0.53 ?????? 0.06 ?????? 0.07 ?????? 0.04 ???? 0.085 ???? 0.088

Off r/g is the number of runs per game the team is expected to score   in a neutral park against average pitching and defense with no DH

ExpOff is the number of runs per game the   team would be expected to score against the actual opponents in the actual   parks but without a DH.? Because of the logarithmic transformation mentioned   above, this is not the same as the average number of runs scored per game   (arithmetic mean).? The number is essentially a geometric mean using a work   around for shutouts (all numbers used in a geometric mean must be positive)   and should also be close to the median.

Off Crxn is the Offensive Correction multiplier,   the ratio of Off r/g to ExpOff.? It is used to correct any unadjusted run-based   offensive statistic (runs scored, RBI, runs created, Raw EqR, etc.) for teams   or players for both park effects and the defensive strength of opposition.

Def r/g is the expected number of runs per   game allowed in a neutral park against average offense with no DH.

ExpDef is the number of runs per game the   team would be expected to allow against the actual opponents in the actual   parks but without a DH.

Def Crxn is the Defensive Correction multiplier,   the ratio of Def r/g to ExpDef.? It is used to correct any unadjusted run-based   defensive statistic (runs allowed, runs prevented, ERA, etc.) for both park   effects and the offensive strength of opposition.

Park r/g is the expected number of runs per   game scored per team in that team?s home field by teams with average offense   and defense and no DH.

Actual Park is the expected runs per team per game   without the DH given the actual teams that have played there.? The DH effect   (currently estimated at 9.5%) can be multiplied to give a more realistic number   for AL parks.

Teams Effect is the ratio of Actual Park to Park   r/g and represents the relative strength of offense versus defense of the   teams that have played in the park distortion to apparent park effects caused   by the unbalanced schedule.? This is split into

HmTm Effect, the degree to which the home team offense and defense would   raise or lower runs scored in the park against average opposition.

VsTm Effect, the degree to which the visiting team offense and defense would   raise or lower runs scored in the park against average opposition.

Neutral WP% is the Pythagorean Winning Percentage   using Off r/g and Def r/g.? It is an adjustment for strength of schedule,   estimating how the team would be doing against average opposition.? It should   be a good measure of the relative quality of all the major league teams.

ExPyth is the Expected Pythagorean Winning   Percentage based on the expected runs scored and allowed against the actual   opposition.? It should be close to the Pythagorean Winning Percentage calculated   using actual runs scored and allowed but underweighs high scoring games.

SoS is the relative Strength of Schedule,   the ratio of Neutral WP% to ExPyth.? Multiply SoS by the number of games won   to adjust the team?s win total for the strength of its opposition.? Note the   largest adjustments. Toronto, because of all its games against the Yankees   and Bosox, has been penalized about five wins.? The Twins, in addition to   being fortunate in one-run games and extra innings, have accrued about four   additional wins from playing their weak opposition.

Offrank is the team?s offensive ranking (by   Off r/g).? It?s no surprise that the Yankees are No. 1 but how can the Phillies   be second?? See below.? The teams at the bottom are not surprising, although   it?s impressive how poor the Tigers look even after correcting for Comerica.

Defrank is the team?s defensive ranking (by   ascending Def r/g).? No surprises at the top: the A?s, Atlanta, Arizona, Boston,   and Anaheim.? I have no concern here about confounding with park effects;   there are good hitter?s parks (AZ and Oak (this year)), mild pitcher?s parks   (Bos and Ana) and a neutral park (Atl).? The bottom rankings seem plausible   except perhaps the Phillies (once again, vide infra)

Parkrank is the ranking of park effects.? The   ranking of 1 (Colorado) is the best hitter?s park; 30 (Philly) is the best   pitcher?s park.? Coors rose from 12th to first from Memorial Day   to the All-Star break.? So much for the humidor.

Overank is the overall ranking.? The Red Sox   are on top, followed by the Yankees, D-backs and Braves.? Milwaukee ranks   dead last behind Tampa and Detroit.? The Tigers and Brewers play in weak divisions   while the D-rays have all sorts of games against the Yankees and Boston and   have done badly in one-run games so cut them some slack.? Why are the Red   Sox on top?? The same reason they are at the top of the Pythagorean standings   calculated with actual runs scored and allowed.? They too have been unlucky   in one-run games but strength of schedule has not affected them at all, deflating   their win total by only 0.6%

So what about the Phillies??   Does my method overstate the pitcher?s park effect?? There is little question   the Vet is a pitcher?s park: the Phillies are fifth in the majors in EqA using   conventionally calculated park effects even though they are only seventh in   the National League in runs scored.? And the conventional park effect calculation   would likely understate that effect; the Phillies play almost half their road   games against the NL East, which has no good hitter?s parks (I rank Atlanta   as the best one at 15th).? Their home/road differentials are made   against other pitcher?s parks on average. Finally, the Phillies face the Braves?   pitching as much as anyone and they are the only team in their division who   don?t get to hit against their own crummy pitching.?

Looking at the results as   a whole, there is a moderate negative correlation between offensive and defensive   ratings (-.19) and stronger negative correlations between Park r/g and both   Def r/g (-.27) and Off r/g (-.47).? This would suggest either difficulty in   separating park from true offensive and defensive effects or an affirmation   of the role of park effects in concealing team weaknesses, i.e., Coors makes   the Rockies think their hitting is better and their pitching is worse than   it really is so they concentrate on improving the latter at the cost of the   former.? I believe the latter to be the case.? The method allows for formal   statistical tests; these show that the differences among teams in offense,   defense and park effects are all highly and separately statistically significant

Marc Stone Posted: September 03, 2002 at 06:00 AM | 10 comment(s) Login to Bookmark
Related News:

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

1.  Posted: September 03, 2002 at 12:44 AM (#606084)
Any chance we can convince you to post some of the detailed calculations behind this table? It's not at all clear from the explanation how you got to these numbers.

-- MWE
2. Walt Davis Posted: September 04, 2002 at 12:45 AM (#606108)
Well, part of the problem is that, if I read this right, this is based only on this year's numbers. Estimating park effects (and adjusting them for schedule stregth, etc.) based on less than 4 months of data is bound to lead to weirdness.

Like Oakland. So far this year it is playing like a hitter's park, though who knows why. Oakland is scoring 5.33 runs at home, 4.63 on the road. Oakland's opponents are scoring 4.19 runs at Oakland, 3.93 elsewhere. I'm pretty sure any measure of park factors that is based on just 2002 data will show Oakland as a hitter's park. However, in 2001, facing a very similar schedule, Oakland and their opponents scored about .4 runs more in A's road games.

One BP (2001?) argued that park factors, usually based on 3 years of data, should be based on 5 years of data.
3. Marc Stone Posted: September 04, 2002 at 12:45 AM (#606118)
Thanks for the feedback.

Walt, you beat me to the punch in pointing to the unadjusted numbers for Oakland this year. The comparable numbers for the Phillies: Offense 4.08 r/g at home, 5.02 on the road.
Defense 4.06 r/g at home, 5.24 on the road.

Is four months of data a sufficient sample size? The differences in observed park effects are highly statistically significant, i.e., not due to chance. One reason three to five years of data may look more reliable is that most park effects may be genuine but unstable - weather-dependent, for example. Three to five years of data may average out all but the most persistent park effects. I don't believe there is any one correct time frame; it depends on what you are trying to measure and compare. What my method tries to do is increase the accuracy and efficiency of measurements for whatever time frame you use by correcting for the effects of an unbalanced schedule.
4. Marc Stone Posted: September 05, 2002 at 12:45 AM (#606122)
Consistent with the Woody Allen movie, "Sleeper", by 2150 we will know that cigarettes and saturated fat are good for you and that the heliocentric solar system really is claptrap. Also, Coors is really a neutral park; it's just looks the way it does because deceased Hall of fame sluggers channel through however is in the batter's box.

I didn't address Mike Emeigh's comment about how these numbers are calculated. Three of the numbers in the table; Off r/g, Def r/g, and park r/g; come directly from the regression estimation which is done by a statistical program. The remainder are derived as follows:
ExpOff is the (adjusted) geometric mean of runs scored in each game for each team. Runs scored in AL parks are reduced by approximately 9.5% (DH adjustment). ExpDef is the same for runs allowed.

I already explained the calculation of the Crxn factors.

Actual Park is the adj. geometric mean of all run totals of games played in that park (reduced 9.5% for AL parks).

HmTmEffect is the ratio of Off r/g to the average multiplied by the ratio Def r/g to the average. If the home team is stronger offensively than defensively this will be greater than one.

VsTmEffect is the ratio of Teams Effect to HmTmEffect because HmTmEffect * VsTmEffect = Teams Effect

Neutral WP% and ExPyth% are the usual Pythagorean calculations with an exponent of 2.

SoS is explained in the article.

5. Marc Stone Posted: September 06, 2002 at 12:45 AM (#606132)
First, it's dobtful that Earned Runs help any measurement. Home town scorers are likely to give their guy a hit whenever possible and help out the home team pitchers by awarding errors on close plays if the fielder isn't a known defensive whiz.

More importantly, I am trying to measure total defense, not just pitching; earned runs, even if accurate, are irrelevant.

Finally, the disparity you note about the A's opponents hitting better on the road while the A's do better at home is why you need to correct for the unbalanced schedule. If, at this point on the season, the A's have faced better hitting teams on the road than at home you can see this kind of contradictory result.
6. Rob Wood Posted: September 06, 2002 at 12:45 AM (#606134)
Very interesting stuff. I am curious whether there might be a problem with circular reasoning or something like that somewhere. In the traditional way of estimating park effects (assuming a balanced schedule), the calculation is fairly straightforward since you can use season totals of runs scored and runs allowed at home and away (for both teams combined).

As you point out if you fully take into account the unbalanced schedule, the traditional calculations are no longer correct. However, there does not seem to be any "easy" way to calculate park effects under this scenario. Your sample sizes become very small very quickly, since you no longer can lump all opponents into one bin. It might not be a case of circular reasoning, but your degrees of freedom seem to shrink precipitously. Maybe it's a massive simultaneous equations system.

Could you describe your method of calculating park effects in the face of unbalanced schedules? Thanks much.
7. Marc Stone Posted: September 07, 2002 at 12:45 AM (#606139)
Rob, it's simply linear regression on (logarithmically transformed)runs scored in each game against dummy variables for the identities of the offensive teams, defensive teams, and parks. This gives roughly 4000 observations for 90 variables; degrees of freedom isn't a problem.
8. Marc Stone Posted: September 09, 2002 at 12:45 AM (#606151)
Jonathan, it's entirely possible to replace the "BoSox pitching" variable with two variables, "Derek or Pedro starting" and "Other Bosox starting". There are enough games played at this point in the season so there are enough degrees of freedom, in fact, to easily allow 10 or 15 subcategories for each team (One for each regular starter, Barry Bonds playing/not playing, dome open/closed, park effect by month, etc.) How much you want to do this depends on the question you're trying to ask. Alao there's the matter of a level playing field. If we split out Derek and Pedro, should we also split out Mark Buehrle for the White Sox? Because Boston and AZ have two pitchers that are significantly better than the rest of the staff, the effect is more likely to even out over opponents than if there is only one really good (or really bad) pitcher in the rotation. Hence it probably would make more sense to split out Buehrle.

Dave, the "lognormal" transformation I make is calculated by the statistical program to make the run distribution unskewed, essentially normal. For AL teams it adds about 4 runs, takes the geometric mean, then subtracts the four runs. For the NL about 3.5. For example if an AL team scores 4 runs twice the average would be sqrt((4+4)*(4+4))-4 = sqrt(64)-4 =4. If the team scores 0 and 8 runs, the average would be sqrt((0+4)*(8+4))-4 = sqrt(48)-4 or 2.93. For an NL team the results for the first example is the same and the second would be sqrt(3.5*11.5)-3.5 = 2.84. Why the difference? In the NL a shutout is slightly more typical and scoring 8 runs is slightly more unusual.
9. Marc Stone Posted: September 09, 2002 at 12:45 AM (#606153)
I should add that I really do think blowouts should be underweighted because they exaggerate the differences between teams. I think this is subjectively true but it can well be argued that failing to make an adjustment causes a bias because you cannot accurately measure the magnitude of a DEFENSIVE blowout (shutout) because a team cannot score less than zero runs; in essence the data is censored. Offensive blowouts also act as statistical outliers which distort the fit of a least squares regression. Furthermore, since we are trying to calculate an expected number of runs scored for each team in each game using a logarithmic transformation also prevents the model from an expected number of runs that is negative.
10. Marc Stone Posted: October 09, 2002 at 12:54 AM (#606641)

You are right about batting events (hits and bbs) being a useful way of gauging the magnitude of a shutout/defensive blowout but using them as a general measure is problematic. First, the winner of the game is which side scores the most runs, not who has the most hits and bbs. Second, how do you weight the different events to make an aggregate measure of offensive or defensive strength(runs created, linear weights, OPS, etc.)? None of these measures is perfect so you are adding another level of error and uncertainty. You could use this method to adjust for park and opposition for the different events: Team X hits a lot of HRs, Team Y allows a lot of doubles, or there are relatively few Ks at Park Z after team tendencies are accounted for.

You must be Registered and Logged In to post comments.

<< Back to main

### Support BBTF

Thanks to
Shooty would run in but these bone spurs hurt!
for his generous support.

### Bookmarks

You must be logged in to view your Bookmarks.

### Syndicate

Page rendered in 0.6900 seconds
58 querie(s) executed