You are here > Home > Primate Studies > Discussion
 
Primate Studies — Where BTF's Members Investigate the Grand Old Game Monday, March 10, 2003Baserunning Park FactorsDo ballparks have a measurable impact on taking that extra base? We all know that ballparks have a significant impact on offense. They have an impact on home runs, strikeouts and other hitting and pitching elements. In the past couple of years, we?ve even come to understand that ballparks have an impact on defense. But did you know that ballparks also have an impact on baserunning?
At least, this is the conclusion I came to after analyzing some baseball statistics, courtesy of Ray Kerby?s Astros Statistical Software (aka ASS).
Here?s what I did. I pulled data for four years (1999 to 2002) and calculated each team?s baserunning performance at home and on the road. Specifically, I reviewed how often each team:
I then calculated how often the baserunner on first made it to third or home, and I looked at the difference between home games and away games. Here are the results:
Keep in mind that the "Diff" column is expressed in percentage points. Expressed in pure percentage terms, Atlanta baserunners were 33% more likely to advance that extra base at home vs. on the road.
You may have noticed that Colorado was very good at getting runners to third or home both at home and on the road. I have no idea why.
Four years data represents about 800 total at bats; half at home and half on the road. This should be large enough to achieve statistical significance and "blend" any specific situational biases, such as number of outs or baserunner speed.
To be sure, I calculated the same statistic for each year in the four year period. Here are the results:
This data verifies that the results are real, at least for the extreme cases of Atlanta, Texas and San Diego. I?ll let you draw your own conclusions about the other parks.
Don?t forget that Detroit, Pittsburgh, San Francisco, Houston and Milwaukee all moved to new parks during this time period. It?s interesting to note the impact Miller Park and PNC Park had on their respective team?s baserunning results, in particular.
So how big is this impact? Well, this situation occurred about 100 at bats for each team in a season. Pete Palmer?s old run chart from the Hidden Game of Baseball indicates that difference between second and third is about .2 runs. This means that Atlanta added six or seven runs (.2*33%*100) as a result of Turner Field?s baserunning factor. Given that they scored about 360 runs at home last year, this means that baserunning increased scoring at home about two percent.
However, this is certainly understated. First of all, Palmer?s run expectancy probably underestimates the impact of advanced bases in these runinflated times. But more importantly, it seems likely that these parks have a similar impact on other baserunning situations. A 3% to 4% impact seems likely for the extreme ballparks.
This is what the data says. Anyone know why?

BookmarksYou must be logged in to view your Bookmarks. Hot TopicsLoser Scores 2014
(8  2:36pm, Nov 15) Last: willcarrolldoesnotsuk Winning Pitcher: Bumgarner....er, Affeldt (43  8:29am, Nov 05) Last: ERRORJolly Old St. Nick What do you do with Deacon White? (17  12:12pm, Dec 23) Last: Alex King Loser Scores (15  12:05am, Oct 18) Last: mkt42 Nine (Year) Men Out: Free El Duque! (67  10:46am, May 09) Last: DanG Who is Shyam Das? (4  7:52pm, Feb 23) Last: RoyalsRetro (AG#1F) Greg Spira, RIP (45  9:22pm, Jan 09) Last: Jonathan Spira Northern California Symposium on Statistics and Operations Research in Sports, October 16, 2010 (5  12:50am, Sep 18) Last: balamar Mike Morgan, the Nexus of the Baseball Universe? (37  12:33pm, Jun 23) Last: The Keith Law Blog Blah Blah (battlekow) Sabermetrics, Scouting, and the Science of Baseball – May 21 and 22, 2011 (2  8:03pm, May 16) Last: Diamond Research Retrosheet SemiAnnual Site Update! (4  3:07pm, Nov 18) Last: Sweatpants What Might Work in the World Series, 2010 Edition (5  2:27pm, Nov 12) Last: fra paolo Predicting the 2010 Playoffs (11  5:21pm, Oct 20) Last: TomH SABR 40: Impressions of a FirstTime Attendee (5  11:12pm, Aug 19) Last: Joe Bivens, Minor Genius St. Louis Cardinals Midseason Report (12  12:42am, Aug 10) Last: bjhanke 

Page rendered in 0.6530 seconds 
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. J. Lowenstein Apathy Club Posted: March 10, 2003 at 01:39 AM (#609333)As I would have expected, the turf parks are all in the negatives, because the ball gets to the outfielders quicker on turf and the outfielder can charge a little harder because of the solid footing and speed increase on turf. (Am I right in supposing that the only parks that were turf throughout the study are PHI, MON, TBA, TOR, MIN?)
Colorado's superb performance at home would be explained by the depth that outfielders have to play in Coors, which makes it tougher to get to the ball. Their good performance on the road may just be aggressiveness seeping over from their home environment, or just a naturally aggressive base coach or baserunning instructor, or it might be because they always struggle to score runs on the road and feel the need to run more aggressively. Minnesota, another team that usually struggles to score runs, also has high scores.
Other things to check for is if the ball is hit to left, center or right.
Ironically, this trend doesn't hold for San Diego (which also had more home at bats with two outs  33% vs. 29%) or Texas (Reverse: 28% vs. 30%). These two teams' trends appear to be occuring in spite of their out situations.
I also did some more examination of the Rockies. They appear to have had some awesome baserunners on their team the last few years.
The average runner makes it to third or home about 30% of the time in these situations. Individual players range from 66% (Juan Pierre) to 5% (Jason Varitek). Here are some of the top Colorado runners' percentages (besides Pierre): Hollandsworth: 53%, Uribe: 47%, Helton: 45%, Larry Walker: 35%.
More additional analysis to come.
TB 6.5
MIL 5.75
COL 5.25
PHI 4
PIT 3.5
OAK 2.75
MON 2.75
ARI 2.5
NYY 2.25
SF 1.75
BAL 1.75
NYM 1.5
KC 1.5
DET 1.25
ANA 1
ATL .5
CHN .25
TEX .75
TOR .75
SL 1.25
CHA 1.5
LAN 2
SEA 2
SDN 2.5
CLE 3.75
MIN 3.75
FLO 4.25
BOS 4.25
CIN 5
HOU 6
Substantially similar results, completely different teams! How could this be?
I will return around 10:00 PCT. Any guesses as to my results?
This seems to lump together too many disparate batting events: the bangbang play resulting in an infield single; doordie defense of the bunt single; the clean single; and what can charitably be called a "failed double", a ball well hit to the outfield that plates the runner but records the batter as "out stretching" at second.
In the last case you can consider that a ball hit to the same spot with the same velocity against the same defense might result in a clean double for Andruw Jones but only a single for Chipper Jones, whether or not he's out streching at second or content to stop at first.
I'd prefer to see the numbers stated with the batterrunner not out.
I wonder if it might not be useful to survey all attempts at the extra base to establish a team's overall agressiveness. Some coaching and baserunning gaffes aside, you'd have to think that most cases of attempting the extra base begin with a reasonable chance of success.
Once you have an overall gauge of a team's habits, you might be able to draw inferences about how the react on the bases in relation to Win Expectancy  does Team X try to slug or run their way to victory in close games or nonclose nonblowouts  and see if this strategy varies home and road.
Bangkok, you make some good points. As you and Tango point out, there are probably ways I can filter out some of the extraneous "stuff." These include number of outs, score and "batter/runner not out." My assumption was that these sorts of things even out over enough at bats, but I guess 800 at bats (or 400 each) isn't enough.
If we use the binomial standard error formula for p=.7 and q=.3, in 400 trials, we get a variance of .00058 or .058%. The variance of the difference between two samples (home and away) is the sum of the two variances, which is .0005 plus .0005, or .00116. Take the square root of that to get the standard deviation or the standard error of the difference between the home and away percentages, which is .034, or 3.4%. So by chance alone, even if there were no differences among parks, we would expect that 10 or so of the parks would have home and away differences greater than 3.4%. We would also expect that at least 1 park out of the 30 would have a home/away difference of over 7%, again by chance alone (again, assuming no "real" park effect)!
This makes it almost impossible to identify with any reasonable certainty whether park effects do in fact exist, and makes it even more difficult, if in fact park effects do exist, to identify which parks and to what magnitude.
To add insult to the above injury, because the samples in the study do inlcude different elements home and away (different numbers of outs, different percentage of baserunners on first versus second, etc.), we have even more variance than the binomial results above would indicate.
OK, back to my chart. It was a trick or a trap as I said. It was generated by computer using the exact same success rate (30%) for all teams and for all parks, home and away. So any differences were a result of chance alone based on the same sample sizes as in the study (400 trials at home and 400 trials away). If those were actual results, it sure would look like park effects existed even though we know that they did not!
In fact, when I ran the above experiment 1000 times (the chart in my last post were the results of one of those experiments), over 50% of the time (54.05) there was a "team" that had a home/road difference of over 7%, again by chance alone! This of course should correspond to what you would expect if you used a binomial model, as in the above calculations.
So if you hypothesize that there may be park factors in terms of baserunning (which there are, of course  turf gets the ball faster to the outfield, it is easier to run on turf, it is never wet in the indoor stadiums or in the "never rain" stadiums, outfielders play deeper in Coors, etc.), how do we verify or quantify that supposition?
Unfortunately, it is not so easy. The key is sample size. You don't necessarily have to control for outs and things like that, because they WILL "even out" given a large enough sample, but it will help (if you don't control for things like that, your "effective" sample is smaller than if you do control for things like that).
I really can't think of anything else but to include home AND away teams, of course, like you do with park factors (that will automatically double your sample size), and then to use as many years as possible, which is often limited by the number of years a park or team is in existence, of course.
OK, now for real, I redid the study going back to 93. I also did not break down the data by outs. I counted a success as advancing the extra base on a single or double (by the lead runner only) and no success as the lead runner getting thrown out or not attempting an advance at all (basically everything else). Here are my results:
If a team completely changed parks (not just a renovation, like ANA or OAK, or a dimension change), or switched from turf to grass, I list them separately, as for example ATL2(96), which means Atlanta's new stadium in 1996, or KC2(95), which means KC with grass installed in 1995.
The second number represents approx. 2 standard errors (margin of error) of the home and away difference based on the sample size of the home and away chances.
9302
ARI 1.6, 2.6
ATL .5, 3.8
CHN .2, 1.9
CIN 1.6, 2.2
COL 6.2, 3.9
FLO 1.1, 1.9
HOU 1.3, 2.3
LA .5, 2.0
MIL .9, 2.1
MON .2, 1.9
NYN 1.7, 1.9
PHI .3, 1.8
PIT 1.9, 2.1
SDN 3, 1.9
SLN .5, 3.7
SFN 2.2, 2.4
BAL .3, 1.8
BOS 1.5, 1.7
ANA 1.4, 1.8
CWS .4, 1.8
CLE 5.1, 5.5
DET 1.9, 2.3
KC 3.1, 4.6
MIN .1, 1.8
NYY .5, 1.8
OAK 2.1, 1.9
SEA .2, 2.4
TB .2, 2.8
TEX 1.4, 5.7
TOR .1, 1.9
CLE2(94) .3, 2.0
TEX2(94) .5, 1.8
COL2(95) 3.4, 1.8
ATL2(96) .9, 2.3
SEA2(99, second half) .9, 3.2
DET2(00) .8, 3.2
SFN2(00) 1.7, 3.4
HOU2(00) .5, 3.3
PIT2(01) 3.2, 4.5
MIL2(01) .6, 4.6
KCA2(95) .9, 2.1
SLN2(96) 0, 2.3
CIN2(01) .5, 4.4
Now all of a sudden we get a different story! Even though we haven't even accounted for # outs, no team shows a significant difference between home and away except for COL (which we expect), SD, and OAK (and the SD and OAK differences for some reason show up in the road games only  maybe they had lots of fast runners)!
In addition, another 13 out of the remaining 27 are more than 1 SD from 0, which is not too far from what we would expect by chance.
(My COL numbers do not show anything weird on the road, BTW. My numbers are for both teams of course, but the road success rate for COL and their opponents was 26.2% in the old stadium and 28.1% in the new stadium, which is right around average.)
Interestingly, as you can see, the turf parks do not show any results significantly different than the grass parks (you would expect a negative average value)...
The way I adjusted each team's home or away stats was to take the league (NL and AL) average ratio for each of the 3 possiblities and apply it to each team's data. IOW, leaguewide during 9302, there were 18,756 singles with a runner on 1st (and not second) and 0 outs, 11359 with a runner on 2nd or 1st and second with 0 outs, and 6329 doubles with a runner on 1st and 0 outs. With 1 out, the numbers are 23,446 and 19979 and 9212, and with 2 outs, it is 21,840 and 22,364 and 9553. The total of all these numbers is 142,838. I divide each number by 142,838 to get the "share" for each situation (e.g., 16.41% of the total chances are a single with a runner on 1st and 1 out). I then prorate each team's situation by the league "shares".
If that makes no sense, take my word for it that I am adjusting for outs and "baserunner and hit" situation.
ARI .1, 2.6 ATL .2, 3.8 CHN 2.2, 1.9 CIN 1.6, 2.2 COL 7.0, 3.9 FLO 2.8, 1.9 HOU 4.3, 2.3 LA .7, 2.0 MIL .2, 2.1 MON .2, 1.9 NYN 1.0, 1.9 PHI 2.6, 1.8 PIT 1.1, 2.1 SDN 4, 1.9 SLN .7, 3.7 SFN 1.4, 2.4 BAL 1.4, 1.8 BOS .9, 1.7 ANA 1.3, 1.8 CWS .3, 1.8 CLE 6.2, 5.5 DET 1.6, 2.3 KC 2.6, 4.6 MIN 2.1, 1.8 NYY .3, 1.8 OAK .5, 1.9 SEA 2, 2.4 TB .5, 2.8 TEX 2.1, 5.7 TOR 1.4, 1.9 CLE2(94) 0, 2.0 TEX2(94) 2.5, 1.8 COL2(95) 6.3, 1.8 ATL2(96) 2.9, 2.3 SEA2(99, second half) .5, 3.2 DET2(00) 2.4, 3.2 SFN2(00) 3.5, 3.4 HOU2(00) 1.7, 3.3 PIT2(01) .5, 4.5 MIL2(01) 4.9, 4.6 KCA2(95) 2.6, 2.1 SLN2(96) .5, 2.3 CIN2(01) 1.8, 4.4
As you can see, these numbers are completely different from the previous ones. Because of the bug, the previous ones have no meaning.
There definitely seems to be a correlation with turf and lower success rate. The average difference among all the turf parks is 1.05. ANd of course, COL stands by itself with around a 6 or 7 point difference (although I still don't get anyhting different from league average on the road).
Here are the parks that have more than a 2 SD (from 0) difference between home and away, which suggests (at the 95% confidence level) that they have some kind of "real" park factor:
Cubs (thick grass and wet a lot?)
FLO (short grass?)
HOU (Astrodome) (slick turf)
PHI (slick turf, at least until 01 when they put in new "Nexturf")
SD (short grass, no rain)
CLE (old stadium) (short grass?)
MIN (slick turf)
TEX (new stadium) (thick grass?)
ATL (new stadium) (thick grass, wet a lot?)
SF (new stadium) (short grass?)
KC (with grass) (thick grass?)
MIL (new stadium) (??)
Thank you Dave for coming up with this concept! I will use some kind of (regressed, of course) baserunning park factors for the "outfield arm" and baserunning components of Slwts...
Taking the Extra Base
John Jarvis Team Baserunning Charts
BTW, tango, those John Jarvis pages are awesome. I had never run across those before. Thanks.
Arvin, thanks for the feedback. You are also out of my league statistically. I do understand what you are saying, however. Just for convenience and illustrative purposes I identified each park that had a significant difference (2 SD's) from a zero mean.
I'm sure that the grass and turf park differences will pass a Ttest at the .05 level, and I will use whether a park is turf or grass as my "starting point" for baserunning and OF arm park factors.
Of course, if we "know something" about each of the parks, we can use that info to supplement our statistical analysis of the sample park factors (IOW, we no longer have random samples drawn from the same population). I was hoping that some readers who are familiar perhaps with certain parks could comment on my speculation as to short or tall grass, etc.
Anyway, interesting topic...
IOW, if you are calculating the CIN park factor, whether it be for batter and pitcher offensive adjustments, or for baserunning, or for defense, you take all the data from all CIN home games (excluding IL games), and compare this to all the data from CIN road games (excluding IL games). That is how I did the above baserunning stuff and that is how everyone, including STATS, does the regular park stuff. Technically, of course, with the imbalanced schedule (or any nonhomogenous schedule  where you don't play every other team an equal number of times), you should "adjust" to the road data, but if you don't it's close enough...
I, too, was hoping for a bit more discussion about ballpark characteristics. For instance, does the dry grass in San Diego really have an impact? Of course, we've got to get the math straight first. I'd retackle the subject, but I think you've already done a great job updating the approach. Thanks again.
This is actually an excellent point, so I'm glad you brought it up. too often we forget why we do the things we do. (Break out into song here.)
If let's say Coors' RF zone is especially cavernous, and you have a RF there with a cannon arm, then the opponents' 1st to 3rd rates will pale much against Colorado's rates. Take these same teams on the road where the RF's arm won't be as much a factor, and maybe the opponents 1st to 3rd rate will be only a little worse.
The same thing applies with pitchers, except with pitchers, as a group, you won't have the same level of effect, and you have more of them too.
I would definitely do a break down by lf/cf/rf and by out. I would also split the home team from the road team in these computations. And while Dave looked only at 1st to 3rd on a single with 2b open, and MGL lumped various categories together (but with adjustments), I would keep all these separate as well. I know, I know, sample size. But before things get lumped in and a uniform single adjustment is made, I think we should study the issue first.
As well, I might even consider putting in the thrown outs as well. After all, these guys felt they could go, and they didn't make it either because they were too stupid, too slow, or the RF arm was too strong. They make it to 3rd because they are smart, fast, RF arm is weak, or it was a gimme. I suppose you can also break it up by throw out rate, and come up with a park factor for that.
And 4 years is just not enough data to make any conclusion. 19741990 is also available, and I might revive my program and take another look at it in the future.
Good job everyone!
As far as Flo, Tex, and SF, those are all grass parks, so as you point out (I think you forgot about SF), the Tex difference is not really significant (at 2 SD's) since the "expected mean" should be around +1, not zero, for grass parks. That leaves Flo and SF particularly tough to run in for some reason, perhaps, like SD, due to extrememly short grass, at least in the OF.
Now, interestingly, the "GB out" PF (that I use for my UZR ratings) for SD is 1.01, which implies that it DOESN'T have short grass in the IF. Same exact thing for SF and FLO (IF PF of 1.01). Ari, on the other hand, has the lowest IF PF (.93) by far of all grass parks, at least after 1999, when they changed the grass. Their baserunning PF is only slightly negative, however (.1), which is only around 1/2 a SD away from an average grass park.
There must be something else going on with these baserunning PF's other than the length of the grass...
The home and road data are "extra base percentage". The "difference" listed in the "chart" is home minus road, so if the difference is plus, that means that the home extra base percentage is higher  therefore, it is easier to run at home. Colorado is plus. It is easier to run in Colorado because the outfielders play so deep. Hou is negative because it is harder to run on articficial turf because the ball gets to the OF'ers too quickly.
So the average turf (artificial turf) factor is around 1.0 (harder to run) and the average grass factor is around +1.0.
It is obviously "easier to run" on turf (SB percentages are higher), but it is harder to take the extra base because the ball gets to the OF'ers more quickly than on grass fields. When I say "harder" or "easier" to run, I mean harder or easier to take the extra base, not harder or easier to actually "run"...
You must be Registered and Logged In to post comments.
<< Back to main