Predicting the 2009 Playoffs
Over the past couple of years, I have added to the original work of Vinay Kumar (whom last time I saw him put in appearance
around here went by the name Harold) in trying to find the statistical categories that have the most value to a team in the
postseason. Vinay’s work first appeared in an article at The Hardball Times web site
href=”http://www.hardballtimes.com/main/article/so-billy-what-does-work-in-the-playoffs”>“So Billy, What Does Work in the
Playoffs?”
about how regular season statistics for a team could forecast its chances of success in the postseason.
Vinay’s work has stood the test of time, although with one or two minor changes. The result is a system that does work quite
well at identifying the teams that will reach the World Series. Its record in the finals themselves is not very good, and
this year it became apparent to me that there is an issue related to the difference between the two leagues. National League
teams have won twice (2006 and 2008) when they should have lost, and American League teams once (2004). For the World Series
itself, different rules seem to apply.
The Categories
Vinay used 30 categories in his original research. But rather than using the data straight up, he used minimum splits between
two teams in order to eliminate about half of the results, to ensure that the data only reflected when a team had distinct
advantage over its opponent. The columns below show the winning percentage in each category through 2007, and how the 2008
results changed them. The numbers in parentheses indicate how many series these categories were a factor. In the case of
Won-Lost Record, 50 series featured one team having an advantage of five wins over its opponent.
Team totals: through 2007 adding 2008
Won-lost record (50) .600 .580
Runs Scored/Runs Allowed (48) .565 .563
Batting records:
Runs scored total (47) .419 .426
Batting average (54) .450 .440
On-base percentage (51) .468 .451
Slugging percentage (42) .452 .476
Doubles (58) .453 .448
Triples (63) .448 .484
Home runs (48) .477 .479
Batter walks (49) .543 .551
Batter strikeouts (fewer) (59) .577 .559
Stolen bases (47) .477 .511
Stolen base attempts (more) (49) .522 .551
Net stolen bases (52) .396 .442
Stolen base Average (59) .327 .373
Caught stealing (fewer) (54) .396 .426
Pitching records:
Runs allowed (48) .638 .646
ERA (49) .592 .592
Pitchers strikeouts (50) .551 .540
Pitchers walks (fewer) (44) .550 .523
Hits allowed (fewer) (48) .739 .729
Home runs allowed (fewer) (37) .600 .595
Complete games (51) .617 .608
Pitchers’ shutouts (55) .667 .673
Saves (56) .480 .482
Saves by team leader (53) .574 .566
Bullpen ERA (61) .545 .574
Fielding records:
Errors committed (fewer) (50) .660 .680
Defensive efficiency (52) .667 .654
Fielding double plays (54) .500 .500
Over the years there have been fluctuations in the ranking of the categories. Vinay caught the speed indicators at a high.
After his 2003 snapshot, they started to fall and only reversed the decline last postseason. They’ve recovered a little bit
of ground, but the running game isn’t all the significant for postseason success. Bullpen depth continued its rise. Vinay
found the Bullpen ERA winning percentage was .471. It’s added over 100 points in the last five seasons. The top three gainers
since 2003 have been Bullpen ERA, Home Runs and Slugging Percentage. The worst three losers have been Batter Strikeouts
(whiffing isn’t such a penalty), Stolen Bases and Triples.
Two seasons ago I divided the categories into strong and weak ones, the strong ones being those where the teams holding the
advantage have won more than half the playoff series. This creates a set of super categories. It is beginning to look like it
might be worth jettisoning some of the worse performing categories. Batting Average, for example, has never looked like
getting close to the .500 level. In fact, there is very little movement across the .500 boundary, all of two categories. But
that’s enough to keep me from reducing the number of categories.
Here are the strong categories, ranked by their winning percentage:
Hits allowed
Errors committed (fewer)
Pitchers’ shutouts
Defensive efficiency
Runs allowed
Complete Games
Home Runs Allowed (fewer)
ERA
Won-lost Record
Bullpen ERA
Saves by team leader
Runs Scored/Runs allowed
Batters’ strikeouts (fewer)
Batters’ walks
Stolen Base Attempts
Pitchers’ strikeouts
Pitchers’ walks (fewer)
Stolen Bases
Stolen Bases returns after a season out of the list. Let’s carry this information forward and profile the 2009 Divisional
Series. I’ve put the strong categories in italics.
Minnesota Twins vs New York Yankees
The Twins had to win one of the greatest baseball games of all time in order to get humiliated by the Yankees. I’m on record
as saying that the 2010 British General Election will be a good one to lose, and I think the same applied to the playoff game
between the AL Central’s postseason candidates.
Twins’ advantages
Triples
Pitchers’ Walks
Yankees’ advantages
Won-Lost Record
Runs Scored/Runs Allowed
Runs Scored
On-Base Percentage
Slugging Percentage
Doubles
Home Runs
Batters’ Walks
Net Stolen Bases
Pitchers’ Strikeouts
Hits Allowed
Defensive Efficiency
PREDICTOR PICK: NEW YORK YANKEES
Hedging my bet: St Jude, ora pro geminis.
Los Angeles Angels of Anaheim vs. Boston Red Sox
This has become a perennial postseason match-up, and each year the Red Sox’s fan wave a friendly farewell to the Rally Monkey
as they celebrate moving on to the next round. Last season, the Angels should have won, and didn’t. This time have the Angels
finally found a recipe to keep that Rally Monkey jumping around past the Boston series?
Los Angeles Angels of Anaheim advantages
Batting Average
Triples
Batters’ Strikeouts
Stolen Base Attempts
Shut Outs
Saves
Saves by Closer
Double Plays
Boston Red Sox’ advantages
Doubles
Home Runs
Batters’ Walks
Net Stolen Bases
Stolen Base Percentage
Caught Stealing
Pitchers’ Strikeouts
PREDICTOR PICK: LOS ANGELES ANGELS of ANAHEIM
Hedging my bets: The Red Stockings seem to have a hoo-doo over the Angels.
Colorado Rockies vs. Philadelphia Phillies
The Phillies are out to avenge 2007. (Has anyone else noticed that this is the third year in succession the Phillies have
made the postseason? Wasn’t Disco music fashionable the last time the Phillies were this consistent? Maybe B.T. Express
should have the last word on that.) This Colorado team, though, is very different to that 2007 National League Champion. It
doesn’t have the bullpen and that is the fashionable accessory for the playoff team du jour.
Rockies’ advantages
Triples
Batters’ Walks
Home Runs Allowed
Double Plays
Phillies’ advantages
Home Runs
Batters’ Strikeouts
Net Stolen Bases
Stolen Base Average
Caught Stealing
Shut Outs
Complete Games
Bullpen ERA
PREDICTOR PICK: PHILADELPHIA PHILLIES
Hedging my bet: That’s an uninspiring set of advantages the Phillies have. This is the first decisive prediction I’ve seen in
three seasons that I have no confidence in.
Los Angeles Dodgers vs St Louis Cardinals
Tony La Russa’s team hits for power and they are fiends on the basepaths. What’s missing? They don’t get on base. The Dodgers
come with tons of pitching and some defence. This could be more of a mismatch than the predictor suggests.
Dodgers’ advantages
Runs Scored/Runs Allowed
On-base Percentage
Batters’ strikeouts
Caught Stealing
Pitchers’ Strikeouts
Pitchers’ Walks
Hits Allowed
Bullpen ERA
Errors
Defensive Efficiency
Cardinals’ advantages
Doubles
Home Runs
Stolen Bases
Stolen Base Attempts
Net Stolen Bases
Stolen Base Average
Complete Games
Shut Outs
Double Plays
PREDICTOR PICK: LOS ANGELES DODGERS
Hedging my bets: La Russa’s teams always do the opposite of the numbers, losing to the Red Sox they should have beaten in
2004.
Peering Further Ahead
Wow, the network bosses will be rubbing their hands with glee. Look at the size of those media markets in the Championship
Series! And it’s a gift that keeps on giving as we go back to the 1950s for a Dodgers vs Yankees World Series. How will that
turn out? Well, let’s see a little closer to the date if I can’t give you better information about what does work in the
World Series.
fra paolo
Posted: October 07, 2009 at 01:52 PM |
58 comment(s)
Login to Bookmark
Related News:
Reader Comments and Retorts
Go to end of page
Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.
1. RoyalsRetro (AG#1F) Posted: October 07, 2009 at 02:25 PM (#3343557)Red Sox over Angels
Red Sox over Twins
Cards over Dodgers
Phillies over Rockies
Cards over Phillies
Cards over Red Sox in 7
I like you.
Two of the big advantages for the Angels over the Red Sox are shutouts and saves by closer. These are, I assume, rough measures of the quality of the frontline starting pitching and the quality of the ace reliever. In both cases, though the Angels score higher, the Red Sox are clearly the superior club.
To make a substantive argument, what I'm interested in here are less the quantitative relations between stolen base attempts and playoff success (.53 or .47?), but the more general claims we can make from this data about what works in the playoffs.
It seems like what we learn is, mostly, that pitching and defense win championships. There are mild exceptions (batter's walks), but mostly, it's better to be a run prevention team than a run scorin team in the playoffs. That's definitely interesting. The specifics look to me to get to fluctuate-y to say a lot about individual categories.
Angels 3 Red Sox 2
Cards 3 Dodgers 2
Rockies 3 Phillies 2
Yankees 4 Angels 2
Rockies 4 Cards 3
Yankees 4 Rockies 0
Red Sox 3 Angels 2
Yankees 3 Twins 1
Cardinals 3 Dodgers 2
Phillies 3 Rockies 0
Red Sox 4 Yankees 3
Cardinals 4 Phillies 2
Red Sox 4 Cardinals 2
FWIW, my q'n'd log5 numbers also have the Red Sox / Angels and Phillies / Rockies as a toss, with the Yankees as massive favorites and the Dodgers as moderate favorites.
I like you.
I BELIEVE!
Its more wishful thinking than anything. Obviously the Yanks look heads and tails better than the Twins. But that Metrodome does not want to go down without a fight.
Angels 3 Red Sox 2
Cardinals 3 Dodgers 1
Phillies 3 Rockies 0
For what it's worth I think there is a real chance that the Angels/Red Sox series takes its place alongside the great five game series with Houston/Philly '80 and New York/Seattle '95 to name a couple. I think it's a compelling, somewhat nasty matchup with two extremely good teams.
Yankees 4 Red Sox 1
Cardinals 4 Phillies 2
Yankees 4 Cardinals 2
Angels 3-2
Cards 3-0
Phillies 3-1
Yanks 4-2
Cards 4-2
Yanks 4-2; jmurph sad
I had the Twins winning in the other thread. Here's what I foresee happening.
Yankees win Games 1 and 2 easily.
Yankees take late inning lead into Game 3- then something screwy related to the Metrodome occurs. Flyballs get lost in the roof, balls start bouncing all over the field, keying an amazing Twins comeback to keep their season alive.
Game 4- Twins fans show up and go bonkers. Sabathia is rocked and A-Rod goes 0 for 4 and commits 2 errors. Twins win easily.
Game 5- Facing enormous pressure, the Yankees start A.J. Burnett in this must-win game. Need I say more? Sometime around the 7th inning, Fox cameras catch Jorge Posada choking Joe Giradi in the Yankees dugout.
Yes, I am a Yankees fan.
I think if the losing team advances to the ALCS, it will undeniably go down as one of the most memorable. ;-)
I love how English grows as a language.
It's not the WS, per se, but the difference in league strengths. For the past decade or so, the AL has been a much stronger league. So to compare the AL champs' stats to those of the NL champs, you should first adjust for the quality of the competition. Otherwise, the NL team will tend to look more competitive than it really is.
Colorado is west of the Mississippi.
-got bored of picking 3-2 series
Maybe. Going by this year alone, the Phillies project to be starting pitchers with ERA+s of 126-99-146-106, while the Rockies will have 130-108-104-111. That's probably an edge to the Phillies, but not a huge one.
What worries me about the Phillies is that they'll start three lefties. At this point, Tulowitzki is the Rockies' only big righthanded bat.
Rockies 3 Phillies 1
Angels 3 Red Sox 1
Twins 3 Yankees 0
Dodgers 4 Rockies 2
Angels 4 Twins 3
Dodgers 4 Angels 2
Angels 3 Red Sox 0 (Biggest 'surprise' series)
Phillies 3 Rockies 1
Cardinals 3 Dodgers 2 (Best Series)
Angels 4 Yankees 3
Phillies 4 Cardinals 1
Angels 4 Phillies 2
The Phillies' starters probably worry me a bit more than Tom. Hamels and Lee have "shut down" stuff. The Rockies' guys after Ubaldo don't really have that. I'm a little worried that this could turn out like those late-season road San Francisco series.
I'd still be surprised and disappointed at a sweep.
Have you noticed that since Helton's 2004/2005 power outage, he turns into Luis Castillo when he's batting against lefties? I added it up once and 2006-2009, Helton's something like .375 OBP/.355 SLG against lefties.
Yankees
Dodgers
Phillies
Dodgers
Yankees
Dodgers
MIN over NYY (3-2) (I have a bad feeling about this)
PHI over COL (3-1)
LAD over STL (3-2)
BOS over MIN (4-3)
LAD over PHI (4-1)
LAD over BOS (4-2) aka "Manny and Torre's Revenge"
Yankees over Twins (5)
Phillies over Rockies (3)
Cardinals over Dodgers (4)
Red Sox over Yankees (7)
Cardinals over Phillies (5)
Cardinals over Red Sox (6)
Angels (5)
Rockies (4)
Cards (3)
Yankees (5) (dammit)
Cards (6)
Cards (7)
Dodgers (4)
Yankees (4)
Angels (5)
Angels (6)
Phillies (7)
Phillies (7)
The problem is that two of the three reversals during 2004-8 have been the NL team (St Louis, Philadelphia) beating the AL team (heavily favoured Detroit, marginally favoured Tampa Bay). The quality of the AL only applies in the case of Boston over St Louis in 2004.
I have a hypothesis, and I'm going to try and work through data before the World Series starts and come back with a 'What works in the World Series' study.
The specifics look to me to get to fluctuate-y to say a lot about individual categories.
Just eyeballing the fluctuations (I have year-to-year figures) leads me to think that there are upper and lower limits involved. In other words, as we get more data, we will be able to eliminate some categories (eg, Doubles) from consideration.
It gets more difficult in the Championship Series, as it should. Here goes anyway:
Phillies
Dodgers
Red Sox (the coin was heads)
Yankees
Phillies
Yankees
Yankees
Why so far back? In the five years from 1977 through 1981, the Yankees and Dodgers played in 3 World Series.
I got 3 out of the 4 series EXACTLY right! (The fourth was sepctacularly wrong, however.)
Next up, I'll stick with Angels over Yankees in 7 but with the Dodgers in, I will say Phillies in 6, instead of 5.
Still the Angels over the Phillies in 6 for the WS.
For the past decade or so (this century), the WS has split: NL, AL, NL, AL, AL, NL, AL, NL.
Yes, that is a delightfully selected endpoint, but "for the last decade or so", the WS is evenly decided. So, if the WS is "different" it seems that some mythical "AL is so much stronger" doesn't work in the WS, and so I don't know if "adjusting for league strength" is warranted, or even makes sense.
Well, here are the last five years of interleague play results:
Year AL NL
2005 136 116
2006 154 98
2007 137 115
2008 149 103
2009 137 114
Total 713 546
That's a .569 winning percentage, and that recent AL superiority is no myth. To argue against that you have to go back past 2005, not to mention write off the last 21 years of All-Star games, where the AL is 18-3-1.
Oh, and in the last five World Series, the AL has won 14 of 22 games, including three sweeps.
All other points aside for a moment, are we not mentioning this because it's obvious it should never have been on the books (in order to be written off) in the first place?
Someone care to check home-field records in the WS (adjusted for relative qualities of team, of course) for the last 15 years or so? Because I imagine you'd find at least a little evidence that this isn't really a joke, and that NL teams win more of their home games (no DH) than they otherwise should.
Of course all this doesn't mean a whole lot in a short series, since it's two teams and not two leagues. And on paper the Dodgers have a huge pitching edge over either of the AL teams, even allowing for league difference. So as a predictor in a World Series, the AL's superiority doesn't help us much more today than the NL's superiority did from the mid-50's through the mid-70's.
It's not the pitcher-hitting advantage that's most key. It's the not-having-a-DH disadvantage, most highlighted on teams with true DHs. If the Sox make the WS they have to both juggle half the lineup and then deal with the nightmare that is "David Ortiz, fielding man" and deal with a pitcher hitting. In contrast, an NL team in the AL park just has to tell its best guy on the bench (and it's not like they're stuck roster-wise and end up with 6 slap-hitting MIF) "Okay, you pinch-hit 4 times this game instead of 1."
You argued againt my statements which are based on the decade comment, so yes, you did.
That wasn't my intention, and anyway, I explicitly referred to the interleague results from 2005-2009 when I made my case for AL superiority.
The All-Star gap goes back a lot longer than that, and IMO when it's that lopsided it can't be brushed aside, but obviously the interleague results give us a far larger sample size.
Right, and like other analyses, when we have the large sample, the ASG *can* be brushed away, and thus is from 1997 to 2005.
Well, if you're trying to say that for the past decade the historical results are murkier, that's one thing. But if you're trying to say that 2005-2009 results shouldn't be weighted over those from 1997-2004, that's little more than a way of evading the two leagues' relative current strength, which is what the whole point (at least my point) is about.
The important point is that when league strengths differ, Vinay's method should take account of that rather than just comparing stats which came under differing levels of competition. If you were trying to estimate the likelihood of a AAA team beating an MLB team, you wouldn't just compare the regulare season strikeout rates of the two pitching staffs, right?
And when you refer to "mythical 'AL is so much stronger,'" you aren't seriously questioning whether the AL has been better over the past 5 years are you?
I'm okay with that. No harm, no foul.
Second, my quibble is with the "so much". It's a little stronger, and the top teams may not be at all. the difference is small enough that these aren't likely to matter in the WS.
It was the whole "decade" thing.
I don't think there's any reason to think the advantage is any less at the WS level. The AL payroll disparity is slightly larger than the NL's, even accounting for the higher average salary in the AL. You'd have to check, but I think you will find that the disparity in win% in the AL in recent years has been at least as large in the AL, probably larger. So if anything, the AL likely distributes its talent less equally than the NL, and so the AL talent advantage may be magnified in the postseason.
Only if the results of prediction vs outcome cannot be explained except by taking the apparent disparity into account. Based on my preliminary research, I'm not sure that this is the case. If anything, at first glance the AL's superiority seems to have little influence on the outcome of the World Series.
It's the not-having-a-DH disadvantage, most highlighted on teams with true DHs.
This may be more important, but not necessarily in the way Jeff K is arguing. I'm not ready yet to commit to any hypothesis. Again, at first glance, it does seem that 'run prevention', which appears powerful in the LDS and LCS, has less influence on the World Series. That implies the DH rule gives an advantage to AL teams. But there could be another explanation, which I haven't formulated yet. The problem is that NL teams are more successful in the World Series than they ought to be.
We're good on that.
WRT the MGL study, the Beltre taint has me concerned about that.
agreed.
Re-read the final paragraph! Or are you demanding a complete breakdown of advantages?
I would disagree. You have only 14 WS outcomes to work with here (assuming you're looking only at post 1994 postseasons). Your goal should not be "explaining" those, because there's so much luck involved. If you believe these variables are telling you something real based on the larger sample of playoff results (I'm agnostic), then the logical thing to do for the WS is to compare the teams as accurately as you can on those same dimensions. So actually you need two adjustments: 1) normalize these by league (for example, strikeout rates are higher in NL because pitchers hit), and then 2) adjust for overall league quality.
Whoa, whoa, whoa. I'm not saying that what I said is 100% certain to be correct, but I think this drastically overstates the case that can be made from the numbers above. There is no on-field rule that applies in a stadium in October that doesn't apply to the same one in May. Given that, inputs with dramatically different run predicting ability/causation in the WS vs. the regular season should warrant a very close examination, and "run prevention" is so fundamental a piece that there's almost zero chance of my accepting numbers from small-sample size WS histories that aren't exceedingly close to regular season numbers. Just like if a guy has 1000 PAs against lefties and sucks but then hits HR in 3 consecutive PA, I'm not breaking that out from the 1000.
The first thing I would look at there would be the decision to use minimum splits, which not only further reduces sample size, but very well may mask what is a sizable and measurable effect of run prevention that has rapidly diminishing marginal returns. That would not be at all hard to believe, especially in short series of games, and would match the numbers.
The evidence available is limited in quantity, and therefore all that we have learned so far suffers from a tremendous handicap. The problem with being too sceptical is that it does seem to have a predictive power based on experience over the last three postseasons. Is that significant yet? Not mathematically, but for practical purposes I'm inclined to give it the benefit of the doubt based on past performance.
However, it is clear that using this predictor as I have done above is no better than flipping a coin when it comes to the World Series. Already, however, I've detected two odd consistencies about World Series winners, but I'm not yet convinced it's meaningful, because the research is still very preliminary.
You must be Registered and Logged In to post comments.
<< Back to main