Baseball for the Thinking Fan

Login | Register | Feedback

btf_logo
You are here > Home > Baseball Newsstand > Baseball Primer Newsblog > Discussion
Baseball Primer Newsblog
— The Best News Links from the Baseball Newsstand

Monday, November 26, 2007

THT: Beamer: Introducing Markov chains

A teaser from the 2008 Hardball Times Baseball Annual. Or as the late great J.J. Smuggo Mohl used to say about collecting every edition of Kicks Magazine…“Have and own, own and have.”

There is also one other reason to buy the Annual. Over the years I have been (haphazardly) developing a Run Modeler that works out the run value of each base out state, the linear weight value of each event and run frequency distributions, among other things. This forms the basis of an article in the book, and in the spirit of sharing we have decided to let everyone who buys the book download a copy of the Run Modeler for gratis.

For the more technical among you the Run Modeler is based on Markov chains, takes into account both batting and non-batting plays, is written in Excel, and weighs in at a very healthy 20Mb—actually the fully blown custom LI version tips the scales at 200MB! And, if I say so myself, it is an awesome application. For the bandwidth shy among you it compresses to 4MB when zipped.

The rest of this column outlines how the Markov works; why it knocks the socks off other run estimators like Runs Created and BaseRuns; and some pitfalls of my particular implementation.

Repoz Posted: November 26, 2007 at 02:47 PM | 54 comment(s) Login to Bookmark
  Tags: books, sabermetrics

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Mike Emeigh Posted: November 26, 2007 at 03:38 PM (#2625109)
Markov chain analyses have been around for years - Gary Skoog wrote about Markov modeling in one of the mid-80s Abstracts, and Mark Pankin has been presenting very similar analyses at SABR conventions for a long time - so it's not exactly something new. (Beamer did credit Pankin.)

First, an assumption is that there is no situational hitting, which is probably the biggest source of error.


By far.

Hits and walks (even apart from IBB) are not distributed randomly. Both are more likely to occur in certain situations - for example, any given player is more likely to get a hit with a runner on first than he is with bases empty, something that Bill James noted as far back as the mid-80s, and any given player is more likely to draw a non-intentional walk with runners in scoring position and 1B open than he is with bases empty.

-- MWE
   2. John Northey Posted: November 26, 2007 at 04:29 PM (#2625152)
Something I've wondered, but not bothered to try to figure out (time and all that) is if the reason players get more hits and the like with runners on is due to selection bias. Namely, if you have Roger Clemens on the mound (for example) you are far less likely to have people on base than if you have, say, Josh Towers. Thus you end up with far more AB's vs Josh Towers with runners on than you do vs Roger Clemens thus hitters, overall, will hit better with runners on than without due to the total AB's vs Towers vs AB's vs Clemens. Likewise for with no runners on as it is safe to say Clemens would have far more situations with no runners on than Towers would.

Has a study factored this in, and if so how? I know I'd like to see how it affects things.
   3. Vogon Poet Posted: November 26, 2007 at 04:41 PM (#2625161)
#2: In The Book, they found that pitchers' wOBA goes up by 5 points when pitching from the stretch (which they split up as bases empty v. men on base). That would control for what you're talking about.
   4. Slinger Francisco Barrios (Dr. Memory) Posted: November 26, 2007 at 04:42 PM (#2625163)
First, an assumption is that there is no situational hitting, which is probably the biggest source of error.

By far.


Isn't that a limitation of RC as well? I.e., w.r.t. comparisons with RC, it's a wash?

It's kind of hard to tell from the article whether the Markov analysis leans more heavily on this assumption than RC.
   5. Honkie Kong Posted: November 26, 2007 at 04:48 PM (#2625175)
It's kind of hard to tell from the article whether the Markov analysis leans more heavily on this assumption than RC

IRRC, Markov chains are rooted in Markov events, and a Markov event says that the outcome of the current event is not based on any history ( i.e. it will totally ignore the situation of the game ). So I would think it would be the same as RC, if not worse.
   6. Mike Emeigh Posted: November 26, 2007 at 04:56 PM (#2625187)
Isn't that a limitation of RC as well? I.e., w.r.t. comparisons with RC, it's a wash?


That's a limitation of ALL models based on aggregate performance - RC, Linear Weights, Base Runs, etc.

In The Book, they found that pitchers' wOBA goes up by 5 points when pitching from the stretch (which they split up as bases empty v. men on base). That would control for what you're talking about.


Actually, it wouldn't, because it doesn't account for variations in the distribution of the "individual" events that make up wOBA - and it doesn't account for the fact that not all runner-on-base situations are identical, either. Leverage makes a difference, for example - pitchers are more inclined to walk a hitter in a high-leverage situation than in an identical base/out situation with lower leverage.

-- MWE
   7. Slinger Francisco Barrios (Dr. Memory) Posted: November 26, 2007 at 05:03 PM (#2625196)
That's a limitation of ALL models based on aggregate performance - RC, Linear Weights, Base Runs, etc.

Thanks...that's what I figured. I just wanted to bring it out so the casual reader doesn't dismiss it out of hand for that reason.
   8. Super Creepy Derek Lowe (GGC) Posted: November 26, 2007 at 05:04 PM (#2625197)
Isn't that a limitation of RC as well? I.e., w.r.t. comparisons with RC, it's a wash?


Depends what version of Runs Created you are using. The one that James uses in Win Shares likes like the 1040 form and accounts for homers with runners on base, batting average with RISP, and even adjusts upwards or downwards if a team exceeeds or underperforms their expected Runs Created.

This sounds interesting, but

a) my math education ended right before they started tackling stuff like Markov chains. We did some work with matrix algebra, but I've forgotten it over the years.

b) is there really a need for a better mousetrap? I suppose that their might be for evaluating players at the extremes.

In any case, I look forward to the full article in the book.
   9. beamer Posted: November 26, 2007 at 05:13 PM (#2625208)
MWE -- in the model released with the book you can "fix" the situational hitting to an extent. You can specify different weightings for different events by different base states and out states.

Looking at the retrosheet data it is the walk that is affected most for the base states, while for the out state we see the hitting stats change. It is by no means perfect but it improves the model.

And for the record I'm not suggesting that this is anything particularly new. What is new, I think (correct me if I am wrong) is that there is no Markov tool that the general public can get their hands on save Tango's simple markov at his website. If there is then I have probably been wasting my time!

John
   10. AROM Posted: November 26, 2007 at 05:16 PM (#2625210)
is there really a need for a better mousetrap?


I'm not sure we do. On the team level baseruns, eqr, rc all get you very close.

As for the situational hitting mentioned here, I wonder if John built that into the program. 20 MB is a lot of calculations.
   11. AROM Posted: November 26, 2007 at 05:20 PM (#2625213)
At the very last minute I did add in some tweaking by situation (for instance the user can tweak where and when walks are issued), but it is by no means perfect.


Looks like he's done something with situational hitting.
   12. beamer Posted: November 26, 2007 at 05:28 PM (#2625220)
Sean -- yes, I did build it in. The reason for the size is because I allow the user to analyze any combination of hitters, which effectively requires 9 Markov models. Add in win expectancy, base-out state frequency, run distribution frequency and the calcs multiply. Excel isn't particularly efficient at matrix algebra either.

I don't think the world needs a better mouse trap but I think the Markov is a fun, different, interesting, analytically correct, way to look at a game of baseball.
   13. Super Creepy Derek Lowe (GGC) Posted: November 26, 2007 at 05:41 PM (#2625230)
John, I still have some of my textbooks from my college days. I may dig them out because of this. Heck, BTF, THT, and BPro are the main reason that I remember anything from stats class. My job doesn't require much quantitative analysis, so those skills that I have would've completely atrophied if Jim Furtado never turned me on to this place.
   14. beamer Posted: November 26, 2007 at 05:54 PM (#2625241)
gary -- snap!
   15. Mike Emeigh Posted: November 26, 2007 at 06:04 PM (#2625244)
I think the Markov is a fun, different, interesting, analytically correct, way to look at a game of baseball.


I differ on the last. I don't think it's analytically correct, primarily because once you've removed a player from the game, you can't re-use him. If the probability distribution going forward can be shown to depend on the path that you took through the process to get to the current state, it's not a Markov process, and I think it's very easy to show that the probability distribution going forward does in fact depend on the path you took to get to where you currently are in the game - specifically, which players (especially pitchers) you've already used.

-- MWE
   16. The importance of being Ernest Riles Posted: November 26, 2007 at 07:15 PM (#2625300)
15: correct, but the effects may be small. Here's an example of a non-Markovian process in baseball: http://www.hardballtimes.com/main/article/the-memory-remains/
   17. Slinger Francisco Barrios (Dr. Memory) Posted: November 26, 2007 at 07:16 PM (#2625301)
I suppose that their might be for evaluating players at the extremes.

I do wonder why it is such a problem with BaseRuns that it doesn't function when the on-base average is .500.
   18. Kyle S Posted: November 26, 2007 at 07:45 PM (#2625342)
I thought BsR was okay for extreme situations, whereas RC was not.
   19. Ron Johnson Posted: November 26, 2007 at 08:00 PM (#2625360)
John,

I think selection bias is a fairly big component of the differences in results with runners on/nobody on.

There are a couple of other issues though. First of all, sac flies. No reliable way to hit a sac fly without runners on. Otherwise you've got a generic 0-1. That's obviously only a subset of PAs with runners on.

Second, while it's true pitchers as a group get somewhat worse results with runners on, batters hit significantly worse in PAs where there's a stolen base attempt. (BA and SLG drop quite a bit while walks are up significantly. Some of the walks are intentional) Nobody's looked at the issue since Doug Drinen's study, but I'd be really surprised if this has changed. (As a group, players with high walk rates do pretty well in these PAs. No power or BA, but their walk rates tend to go up quite a bit more than other players)

Doesn't exactly speak to the issue, but when I looked at RISP (which obviously includes a fair number of PAs with runners on first) I found that players with 1000+ PAs with RISP hit .275/.346/.432/.340 overall and .278/.369/.430/.345 with RISP. (The
last number being OBP with IBB removed. Didn't think to remove SF.)
   20. TOLAXOR Posted: November 26, 2007 at 08:03 PM (#2625363)
That's a limitation of ALL models based on aggregate performance - RC, Linear Weights, Base Runs, etc.

FORGIVE ME IF I'M CONFUSED, BUT I'M UNDER THE IMPRESSION THAT THE ISSUE IS WITH THE MODEL, NOT THE MARKOV CHAIN APPROACH... IN OTHER WORDS, THERE ISN'T (SHOULDN'T BE) A "MARKOV CHAIN :: RUNS CREATED" COMPARISON, BUT A COMPARISON OF PROBABILISTIC MODELS (OF WHICH MARKOV CHAIN CAN BE A TOOL FOR USE)....

IT'S AN ENGINE, NOT A CAR/TRUCK...
   21. GuyM Posted: November 26, 2007 at 08:33 PM (#2625388)
Second, while it's true pitchers as a group get somewhat worse results with runners on, batters hit significantly worse in PAs where there's a stolen base attempt. (BA and SLG drop quite a bit while walks are up significantly. Some of the walks are intentional) Nobody's looked at the issue since Doug Drinen's study, but I'd be really surprised if this has changed.

That's largely, if not entirely, because after a SBA there is no longer a runner on 1B with 2B open. That situation increases BA and SLG, espec for LH hitters, but is removed by the SBA. If you compare hitter performance after a successful SB to other runner-on-2B/1B open situations, I think you'll find little or no difference. Same thing if you compare failed SBA to other bases empty situations.

Also, Tango/Lichtman/Dolphin look at this in The Book.
   22. Ron Johnson Posted: November 26, 2007 at 09:23 PM (#2625452)
Guy, it's not just after a successful SB and the difference is huge.

Quoting from Doug Drinen's study:

For example, here is the data for the 1980 AL:

Profile of an SBA hitter: 0.270 0.336 0.399
League average hitter: 0.269 0.331 0.399

SIT PA AB H 2B 3B HR BB BA OBP SLG
------------------------------------------------------------------------
NSA 85211 76303 20640 3443 538 1823 6936 0.271 0.332 0.401
SBA 1933 1585 318 46 15 21 285 0.201 0.319 0.288

NSA = no steal attempt
SBA = stolen base attempted

It's similar for all the other seasons -- the typical SBA
hitter was the same as or slightly better than average.
BA and SLG are always down. OBP is sometimes up and
sometimes down.

Also, from another post:

The IBB rate is very slightly up in the SBA PAs. I just checked a few
random years.

NSA SBA
Year LG IBB% IBB%
---------------------
1981 NL 12.1 12.3
1986 NL 12.0 15.6
1984 AL 7.8 9.1
1987 AL 6.4 8.4

IBB% = IBB / TBB

If you're interested, it's on usenet.
newsgroup:rec.sport.baseball,
author: Doug Drinen,
subject: More on hitting and the running game (long)
   23. Edmundo got dem ol' Kozma blues again mama Posted: November 26, 2007 at 09:39 PM (#2625468)
Heh, and I thought Markov Chains were those 2 sticks w/ chain thingeys that the football refs use
   24. studes Posted: November 26, 2007 at 10:55 PM (#2625560)
That's a limitation of ALL models based on aggregate performance - RC, Linear Weights, Base Runs, etc.

I don't think that's really right, depending on what you're using. The "value added" approach to linear weights does take situational impacts into account, in the modeling of the initial weights.

Something I've wondered, but not bothered to try to figure out (time and all that) is if the reason players get more hits and the like with runners on is due to selection bias.

That's a great point, and worth remembering when dealing with aggregate stats, but I think Mike and Ron are also correct that there are other factors driving the difference with men on.
   25. GuyM Posted: November 27, 2007 at 12:39 AM (#2625635)
Ron: I didn't find Drinen's study (is subscription required?), but there has to be a mistake. If hitters lost 110 points of slugging every time there was a SBA, the break-even percentage for stealing would be something like 120%. And the idea that this could go unnoticed by managers and players for the last 30 years seems implausible. In The Book, Tango et. al. find that a SBA produces a .007 drop in wOBA, a small drop in production but nothing remotely like what you are reporting. I don't know what the mistake was -- perhaps counting some 2-out PAs where the runner was out? -- but I find these results hard to believe.
   26. Ron Johnson Posted: November 27, 2007 at 05:55 AM (#2625828)
Guy,

Try Doug's Study

And your point about the true break even point was brought up in the first study on the issue. (I brought up an earlier -- much smaller -- study, whose results Doug verified)

There's yet another issue. The hideous numbers are generally driven by a couple of off the wall horrible performances.

One spectacular example: In 1985 Ken Griffey and Don Mattingly went a combined 3-48 (18 walks, no extra base hits) in PAs when a SB was attempted. IOW it's possible that the damage can be minimized by careful selection of the guys batting after the base stealers.

From another post:

Oh, I thought I'd decipher the mystery of the 80's Cardinals,
but I got tired...I just did the SBPA totals for the guys
who made the lists.

PA AB H 2B 3B HR BB BA/ OBP/ SLG
697 530 138 21 6 5 156 .260/.421/.351

Conclusion: When a runner was stealing, players for the
1980s St. Louis Cardinals showed a tendency to turn
into Max Bishop.
   27. GuyM Posted: November 27, 2007 at 02:40 PM (#2625933)
Ron: Thanks for the link. Tango also pointed me to this article with similar results:
http://www.knology.net/~johnfjarvis/sbcs.html. So I stand corrected: there clearly was a major dropoff in BA and SLG following a SBA in the early/mid 80s. I was skeptical because the pattern is much different (and more logical) in today's game: some loss of BA/SLG, but offset by a higher OBP. That tradeoff makes sense, as hitters are willing to make an out that advances runner to 3B (if 0 outs), and pitchers often elect to issue BB w/ runner on 2B. And since they mainly walk good hitters, the non-BB PAs are disproportionately weak hitters.

That said, a lot more needs to be controlled for before we can conclude that SBAs per se affect the hitter. For example:

1) About 75% of these PAs are successful SBs (because 2-out CS result in no PA), so we're mainly looking at fast runner on 2B and/or 3B. Also, it's probably disproportionately situations that call for 1-run strategies (or why would SBA have occured?). We have to compare the successful SB PAs to other PAs with that same runner-speed/base-state/score situation. Note that Doug reports a significantly smaller reduction in BA/SLG when comparing SB to all runner-on-2B PAs. It appears that hitters in the early 80s were quick to trade an out for a base, but that may have been just as true when a Willie McGee type hit a double as when he stole 2B.

2) A significant proportion of the CS here are really busted hit-and-runs. This means we know the hitter has one swinging strike, and that will substantially reduce BA and SLG. However, that doesn't mean the attempted SB/H&R;caused the lower performance.

I don't think there's any question that hitters change their approach in some base/out/score situations. But how much SBAs specifically impact performance is still an open question.
   28. BFFB Posted: November 27, 2007 at 02:57 PM (#2625948)
Arrgghh. Markov Chains brings back bad memories of using Matlab.
   29. AROM Posted: November 27, 2007 at 03:41 PM (#2625975)
So I stand corrected: there clearly was a major dropoff in BA and SLG following a SBA in the early/mid 80s. I was skeptical because the pattern is much different (and more logical) in today's game: some loss of BA/SLG, but offset by a higher OBP.


I didn't realize how big the dropoff was, his results are very consistent for every season from 1980-1987. It could be that to get the opportunities for the high steal totals from the 80's, batters had to be instructed to take more. Maybe batters today approach the situation just like any other AB, swing when they get a good pitch, giving the runners fewer chances.
   30. Mike Emeigh Posted: November 27, 2007 at 04:05 PM (#2626002)
Also, it's probably disproportionately situations that call for 1-run strategies (or why would SBA have occured?).


It's also disproportionately early-inning situations; in late-inning situations that call for 1-run strategies, teams usually sacrifice.

-- MWE
   31. Mike Emeigh Posted: November 27, 2007 at 04:09 PM (#2626009)
One thing that's become increasingly apparent to me as I work through my data is that there are performance changes related to leverage as well as performance changes related to the base/out situation. Simply put, there tend to be more walks and fewer extra-base hits as the leverage of the situation increases (controlling for base/out situation, of course).

-- MWE
   32. Super Creepy Derek Lowe (GGC) Posted: November 27, 2007 at 04:23 PM (#2626025)
Simply put, there tend to be more walks and fewer extra-base hits as the leverage of the situation increases (controlling for base/out situation, of course).


Is this because pitchers change their approach in these situations? I've heard this as a reason the some pitcher outperform their peripheral stats and/or component ERA.
   33. GuyM Posted: November 27, 2007 at 04:30 PM (#2626037)
The other big factor to consider here is the count. I would guess that PAs with a SBA disproportionately end with a count that favors the pitcher. Now, to some extent that may be a real result of the SBA, if hitters take good pitches to allow the runner to steal. But it's also true that many SBAs occur after the hitter is already in a hole. A 1-2 count with a good hitter up is a great time to steal, since you either advance runner to 2B or the hitter gets to start again with a 0-0 count if runner is caught. I suspect this explains most or all of the drop in BA/SLG that isn't explained by the base/out/game state.

Does anyone know how SBAs are distributed by count?
   34. AROM Posted: November 27, 2007 at 04:35 PM (#2626044)
Simply put, there tend to be more walks and fewer extra-base hits as the leverage of the situation increases (controlling for base/out situation, of course).


Including or excluding intentional walks? High leverage situations are when just about all intentional walks occur, mostly to the hitters with the best ability to produce extrabase hits. Even excluding intentionals, you still have the semi-intentional walk/pitch-around situations.
   35. BFFB Posted: November 27, 2007 at 04:40 PM (#2626052)
One thing that's become increasingly apparent to me as I work through my data is that there are performance changes related to leverage as well as performance changes related to the base/out situation. Simply put, there tend to be more walks and fewer extra-base hits as the leverage of the situation increases (controlling for base/out situation, of course).


Logically this makes sense.

In a high leverage situation most pitchers whos pitches tend towards the "average" are going to be more careful where they aim the ball, stick to the corners. Which would lead to more walks and fewer extra base hits.
   36. Mike Emeigh Posted: November 27, 2007 at 04:48 PM (#2626061)
Including or excluding intentional walks?


Excluding them.

High leverage situations are when just about all intentional walks occur, mostly to the hitters with the best ability to produce extrabase hits.


True, but even with bases empty there are fewer EBH.

-- MWE
   37. AROM Posted: November 27, 2007 at 05:00 PM (#2626080)
Makes sense. If its a tie game in the bottom 9th and A-Rod or Pujols is up with the bases empty, I'm not going to challenge him, and I can accept issuing a walk.
   38. Ron Johnson Posted: November 27, 2007 at 06:38 PM (#2626201)
Does anyone know how SBAs are distributed by count?


You can get them for any given year at baseball-reference.com.

EG, for 2007 NL batting splits
   39. GuyM Posted: November 27, 2007 at 07:09 PM (#2626239)
Ron: the splits there only show a small number of SBAs. I'm guessing those are SBs which took place on a pitch that also ended the PA. Any other ideas?
   40. Slinger Francisco Barrios (Dr. Memory) Posted: November 27, 2007 at 08:22 PM (#2626316)
EG, for 2007 NL batting splits

OT, but I find it curious that there were six games in which no LHB appeared.
   41. Ron Johnson Posted: November 27, 2007 at 11:01 PM (#2626507)
Guy,

I have the numbers for 1993 at home (they're in the 1994 Stats Scoreboard -- the last of the really good Scoreboards)

Always assuming I can find the damned thing.

That said, I do know that 26% of SB attempts happened on the first pitch and that just over half of the attempts came on either the first or second.

Memory says there's a pretty fair variation by manager though.
   42. GuyM Posted: November 27, 2007 at 11:14 PM (#2626521)
I think count probably explains a lot. Just take the 26% where SBA is on first pitch. If hitter made contact it didn't count as SBA, so hitter either took pitch or had swinging strike. He's probably 0-1 at least 60% of the time, and OPS+ after an 0-1 count was 69 in NL last year.
   43. Dan Turkenkopf Posted: November 28, 2007 at 12:16 AM (#2626568)
Ron: the splits there only show a small number of SBAs. I'm guessing those are SBs which took place on a pitch that also ended the PA. Any other ideas?


Here's the breakdown by count for 2006 - columns are SBA, Balls, Strikes. I think this includes successful (SB) or unsuccessful attempts (CS) on the same pitch as the plate appearance ended, but doesn't include if the batter were running on the pitch and the ball was either fouled off or put into play.

887 0 0
403 0 1
139 0 2
490 1 0
384 1 1
245 1 2
139 2 0
218 2 1
188 2 2
22 3 0
54 3 1
3 3 2 


If you're interested, I ran the same calculations for every event where retrosheet actually has the pitch sequence, which I'm willing to provide.
   44. Los Angeles Waterloo of Black Hawk Posted: November 28, 2007 at 12:50 AM (#2626590)
So last year major league basestealers were:

130-87 (SB-CS) after a 1-0 count,
126-63 after an 0-1 count, and
2918-1002 overall.

Unless I'm mistaken, this doesn't make any sense.

(This is per the splits at BB-Ref.)
   45. Dan Turkenkopf Posted: November 28, 2007 at 01:05 AM (#2626597)
So last year major league basestealers were:

130-87 (SB-CS) after a 1-0 count,
126-63 after an 0-1 count, and
2918-1002 overall.

Unless I'm mistaken, this doesn't make any sense.

(This is per the splits at BB-Ref.)


I'm really confused by the count splits on BB-Ref when it comes to stolen base attempts. According to the splits, there were no attempted steals on the first pitch, which can't be correct. Also, I only get 1718 attempts total by adding up the unique splits (not including the three balls and two strikes categories since they're rollups of individual counts).

So I don't know where the issue lies. I'm missing about 800 or so from the BB-Ref overall total from 2006, so maybe I am missing the attempts where the PA ended during the at-bat. I'll try and run my query again with some different criteria after dinner.
   46. GuyM Posted: November 28, 2007 at 02:08 AM (#2626638)
Dan: If you can do it, it would also be interesting to see 1) the distribution of counts after the SBA (for example, how many of the 887 1st pitch attempts resulted in 0-1 vs. 1-0 count?), and 2) the distribution of final counts for SBA PAs.
   47. Dan Turkenkopf Posted: November 28, 2007 at 02:17 AM (#2626645)
I'll try and run my query again with some different criteria after dinner.


Ok, rerunning the query with slightly different parameters and ensuring I record multiple events when more than runner attempts a steal on the same pitch gets me 3874 SB attempts which is pretty close to the 3877 that BB-Ref has. Here's the breakdown by count:

Balls  Strikes  Attempts   %
0   0  954  24.6
0   1  444  11.5
0   2  180  04.6
1   0  530  13.7
1   1  419  10.8
1   2  309  07.8
2   0  147  03.8
2   1  240  06.2
2   2  281  07.3
3   0  025  00.6
3   1  059  01.5
3   2  286  07.3 


Other interesting tidbits:

3 people were picked off on 3-0 counts
32 runners were caught stealing home versus 11 successful steals of home - when runners are out at the plate when the ball gets away from the catcher, is that considered a CS? whereas it's a WP or PB if the runner scores?
   48. Dan Turkenkopf Posted: November 28, 2007 at 02:20 AM (#2626648)
Dan: If you can do it, it would also be interesting to see 1) the distribution of counts after the SBA (for example, how many of the 887 1st pitch attempts resulted in 0-1 vs. 1-0 count?), and 2) the distribution of final counts for SBA PAs.


Guy, that's going to take a little more work than I can devote right now. I might be able to get to it this weekend. I have to change how I'm doing the analysis and probably write a program for this instead of just using a query.
   49. JoeArthur Posted: November 28, 2007 at 05:05 PM (#2627061)
Here's an incomplete answer to Guy's question, [using retrosheet data,limited to 2006 attempts to steal second (including successful double steal attempts, but not including failed double steals where a lead runner was caught stealing and the trail runner does not get an official attempt, and not including at bats which had foul balls with the runner going )]:
<table>
<th>count</th> <th>Number</td> <th>SB/CS</th> <th>ball</th> <th>strike</th> <th>pickoff-throw</th>
<tr><td>0-0</td> <td>829</td> <td>595-234</td> <td>466</td> <td>309</td> <td>51</td></tr>
<tr><td>0-1</td> <td>390</td> <td>284-106</td> <td>249</td> <td>113</td> <td>28</td></tr>
<tr><td>0-2</td> <td>152</td> <td>115-37</td> <td>117</td> <td>20</td> <td>15</td></tr>
<tr><td>1-0</td> <td>455</td> <td>341-114</td> <td>232</td> <td>196</td> <td>27</td></tr>
<tr><td>1-1</td> <td>361</td> <td>258-103</td> <td>192</td> <td>144</td> <td>25</td></tr>
<tr><td>1-2</td> <td>274</td> <td>215-59</td> <td>212</td> <td>39</td> <td>22</td></tr>
<tr><td>2-0</td> <td>126</td> <td>97-29</td> <td>67</td> <td>55</td> <td>4</td></tr>
<tr><td>2-1</td> <td>207</td> <td>140-67</td> <td>107</td> <td>85</td> <td>15</td></tr>
<tr><td>2-2</td> <td>254</td> <td>186-68</td> <td>169</td> <td>61</td> <td>24</td></tr>
<tr><td>3-0</td> <td>16</td> <td>14-2</td> <td>0</td> <td>16</td> <td>0</td></tr>
<tr><td>3-1</td> <td>52</td> <td>32-20</td> <td>0</td> <td>50</td> <td>2</td></tr>
<tr><td>3-2</td> <td>228</td> <td>119-109</td> <td>0</td> <td>220</td> <td>8</td></tr>
</table>

"ball" and "strike" are the results of the pitch on which the SBA occurred; sometimes the SBA occurs on a pickoff throw instead of a pitch to the plate. I thought there might be a platoon bias in the batter/pitcher matchup when steals were attempted, but off the top of my head the platoon breakdowns look normal
RHBvRHP: 1412
RHBvLHP: 565
LHBvRHP: 1173
LHBvLHP: 194

and here's a list by "pitch" outcome
<table>
<th>SB-CS</th> <th>Type</th>
<tr><td>1368-330</td> <td>Ball</td></tr>
<tr><td>53-60</td> <td>Pitchout(taken)</td></tr>
<tr><td>1-7</td> <td>swinging strike on pitchout</td></tr>
<tr><td>399-203</td> <td>swinging strike</td></tr>
<tr><td>3-0</td> <td>missed bunt</td></tr>
<tr><td>13-3</td> <td>caught foul tip</td></tr>
<tr><td>517-161</td> <td>called strike</td></tr>
<tr><td>37-184</td> <td>pickoff throw</td></tr>
</table>
and a few data errors on pitch type:
3-0 [pitch sequence blank]
2-0 foul [these might be miscoded caught foul tips, or perhaps more likely a pickoff throw may be missing; there were 2 other cases where the decisive pitch was supposedly a foul but the result was a pitcher initiated caught stealing.]
There are also 8 caught stealings which look like pickoff throwss because the pitcher was given the assist not the catcher, but a pickoff throw was not recorded. probably the assist in the play by play is correct and the pitch sequence is wrong (in leaving out the pickoff) but I've left them in as caught stealings on pitches (4 balls, 3 called strikes, 1 swinging strike)

There were 339 strikeouts on the 839 pitches with a two strike count. Since I am not counting fouls or balls in play with the runner going, this means the batter took 498 balls in these situations (plus the 2 mysterious fouls. With the 339 K's dropping out, and 221 counts unchanged because the attempt came on a pickoff throw, here are the post-attempt counts:

<table>
<th>count after</th><th>Number</td><th>at bat continued (3rd out not made on base)</th>
<tr><td>0-0</td><td>54</td><td>36</td></tr>
<tr><td>0-1</td><td>337</td><td>301</td></tr>
<tr><td>0-2</td><td>128</td><td>111</td></tr>
<tr><td>1-0</td><td>493</td><td>432</td></tr>
<tr><td>1-1</td><td>470</td><td>420</td></tr>
<tr><td>1-2</td><td>283</td><td>255</td></tr>
<tr><td>2-0</td><td>236</td><td>215</td></tr>
<tr><td>2-1</td><td>262</td><td>237</td></tr>
<tr><td>2-2</td><td>322</td><td>291</td></tr>
<tr><td>3-0</td><td>67</td><td>61</td></tr>
<tr><td>3-1</td><td>125</td><td>115</td></tr>
<tr><td>3-2</td><td>228</td><td>209</td></tr>
</table>

I did not attempt to figure out the actual results of these continued at bats. But by my rough estimate (using a couple of different tables for eventual outcomes when passing through a particular count), overall these post steal counts are pretty neutral, not biased in favor of the hitter or the pitcher. If you bring the strikeouts on the steal attempt itself back in to the calculation, THEN I estimate a drop of 95 points of OPS. [Doug Drinen measured a drop of 126 points in study based on 8 years of data[ 1980-87]). But I don't think it's fair to do that. If the runner had not been going, the batter would have struck out on that two strike pitch sometimes anyway. And off the top of my head again, it doesn't look like the batter struck out at an unusual rate on those pitches with the runner going.

I dug out my copy of The Great American Baseball Stat Book [1987] and reread the article by Kevin Hoare which I think Ron alluded to in #26. He didn't look at pitch sequence or count data, and Doug's study, which said it replicated Hoare's method with more years of data, doesn't seem to have used by pitch by pitch or count data either. Nor does the John Jarvis study mentioned by Guy in #27.

A different variation on the topic is in Chapter 11 of The Book, but that also does not appear to involve analysis of pitch or count information. Instead it compares situations in which the runner remained at first to those in which a stolen base attempt occurred.

Obviously a more careful study would use more years of data, and could look at a host of factors: at bats with foul balls with the runner going, batter quality, platoon advantage, and base-out situation at least. The accuracy of the pitch sequence data itself needs to be looked at. But I think it is likely that the stolen base attempt penalty on the batter is much weaker than indicated by Hoare and Drinen and Jarvis, if it actually exists at all.
   50. JoeArthur Posted: November 28, 2007 at 05:09 PM (#2627064)
well the tables in #49 looked OK in preview mode and appear OK when I check them with a browser. Not sure what I did wrong. sorry about that
   51. GuyM Posted: November 28, 2007 at 06:45 PM (#2627164)
Joe:
So what you're saying, I think, is that the SBA/non-SBA gap is a function mainly of the PAs that end on the same pitch as the SBA, because a K results in the SBA being counted whereas other outcomes (hit, BB, out-on-BIP) do not? Is that right? And if we look only at PAs which continue after the SBA, should that allow us to do a fair comparison? (I would think we would still need to control for base/out, score, runner speed, and count, though perhaps that would all turn out to be a wash.)
   52. JoeArthur Posted: November 28, 2007 at 09:04 PM (#2627287)
Guy,
yes, given how I understand the Hoare/Drinen/Jarvis studies to have been done, that's my explanation of why they show such a severe "penalty." There may still be some penalty, just not of that magnitude.

This may be a case in which different people have a different theory about what exactly the problem is. I can think of 3 separate aspects to this, which should be unbundled and studied separately.

1) do hitters take pitches they might otherwise swing at, to give the runner a chance to get a jump, although the runner may not actually go for several pitches (or at all)? (Taking more pitches ultimately could work for the hitter or against him, in terms of putting him into a pitcher's count or hitters' count. [I think the theory that batters tend to fall into pitchers' counts when stolen bases are attempted is a post hoc attempt to explain the Hoare/Drinen/Jarvis results. I don't remember seeing data to support it, and I think the 2006 data undercut it.]
2) are hitters distracted when the runner runs? This could have a couple of consequences, if true: a) weaker or less accurate swings, so more swings and misses and more poorly hit batted balls b) poorer recognition of the pitch leading to worse decisions about whether to take or swing, so more called strikes and fewer called balls.
3) reposititioning of the defense caused by the steal attempt leading to less favorable outcomes on actual batted balls after the steal attempt [the big hole at first is closed because the runner is not held; the SS and 2B probably back up from double play depth, so a little harder to hit it past them; on the other hand one of the middle infielders may be slightly out of position trying to "hold" the runner at 2nd]. Counterbalancing that, greater likelihood for the batter to get a walk with 1st base now open. The controls you mention are certainly appropriate for this problem.
   53. GuyM Posted: November 29, 2007 at 04:35 AM (#2627726)
Joe: Great job solving the mystery. So based on the 2006 data, about 10% of the PAs in these SBA samples are those in which the PA ends on the same pitch as the SBA occurs. And 100% of these are Ks. If we remove these from the analysis (assuming the proportion was similar in the 1980 AL), then instead of Drinen's line of .201/.319/.288 we get something like .229/.354/.328. Much better.

But then we have a second bias: even when the batter doesn't strike out, it's still the case that any time the batter swings on an attempted steal, the SBA will only be recorded if it's a swinging strike (no SBA if foul or BIP). According to Joe's 2006 data, there were 610 SBs on pitches where the batter swung, so that's 271 "extra strikes" for the hitters (once we remove the 339 who Ked on the pitch). As a result, the counts after the SBA occurs puts the hitter at a bit of a disadvantage. If we use the B-Ref split stats through these counts, weighted by the SBA, we find that hitters in these counts should have a collective SLG about .025 below average and collective BA about .020 below average. So now we've accounted for about 70% of the alleged BA/SLG gap.

To see if there is any 'SBA penalty' for the hitter at all, I think you'd have to first exclude all SBAs on which the batter swung at the pitch. Then, compare SBA and non-SBA plate appearances, controlling for base-out, runner speed, and score. I'm sure you'll still see a drop in BA/SLG and rise in OBP, but I doubt much of an overall 'penalty' will remain.

* *

One other interesting note: runners succeed at a 76.2% rate on called strikes, but only 65.6% on swinging strikes. This shows that a pretty significant number of CS, probably around 9%, are really busted hit-and-runs, so true SBAs succeed at a higher rate than raw SB/CS data imply.
   54. GuyM Posted: November 29, 2007 at 02:28 PM (#2627876)
Ron: Does your data, or Drinen's, break out Ks as a batter outcome? If so, I think you'll find that the main difference in the SBA samples is a huge increase in Ks. If these studies can be re-run excluding PAs on which the batter swung at the pitch during the SBA, I think you'll see that the SBA penalty largely vanishes.

The lesson: when you find a result that can't be true -- like hitters' SLG drops 110 points after a SBA -- it probably isn't.

You must be Registered and Logged In to post comments.

 

 

<< Back to main

BBTF Partner

Support BBTF

donate

Thanks to
Dingbat_Charlie
for his generous support.

Bookmarks

You must be logged in to view your Bookmarks.

Hot Topics

NewsblogOT:  Soccer (the Round, True Football), November 2014
(438 - 2:54pm, Nov 23)
Last: Biff, highly-regarded young guy

NewsblogOT: Monthly NBA Thread - November 2014
(969 - 2:53pm, Nov 23)
Last: theboyqueen

NewsblogOTP Politics November 2014: Mets Deny Bias in Ticket Official’s Firing
(4181 - 2:49pm, Nov 23)
Last: Jolly Old St. Nick Is A Jolly Old St. Crip

NewsblogESPN Suspends Keith Law From Twitter For Defending Evolution
(105 - 2:47pm, Nov 23)
Last: JE (Jason)

NewsblogOT - November 2014 College Football thread
(563 - 2:32pm, Nov 23)
Last: theboyqueen

NewsblogBraves shopping Justin Upton at a steep price | New York Post
(32 - 2:25pm, Nov 23)
Last: CFBF Is A Golden Spider Duck

NewsblogKemp drawing interest, raising chance he's the Dodgers OF dealt - CBSSports.com
(18 - 1:54pm, Nov 23)
Last: Dan The Mediocre

NewsblogAstros interested in Robertson: source | New York Post
(11 - 1:53pm, Nov 23)
Last: JE (Jason)

NewsblogPirates DFA Ike Davis, clear path for Pedro Alvarez - Pittsburgh Post-Gazette
(7 - 1:53pm, Nov 23)
Last: Howie Menckel

NewsblogSunday Notes: Arroyo’s Rehab, Clark & the MLBPA, Doc Gooden, AFL Arms, ChiSox, more
(2 - 1:50pm, Nov 23)
Last: bobm

NewsblogPablo Sandoval leaning toward Red Sox, to decide next week — Padres have highest offer, all offers on table (including SF Giants’) - John Shea
(13 - 1:49pm, Nov 23)
Last: Ziggy

NewsblogDeadspin: Curt Schilling’s Son Accidentally Brings Fake Grenade To Logan Airport
(20 - 1:45pm, Nov 23)
Last: Bring Me the Head of Alfredo Griffin (Vlad)

NewsblogFemale Sportswriter Asks: 'Why Are All My Twitter Followers Men?' | ThinkProgress
(147 - 1:19pm, Nov 23)
Last: PreservedFish

NewsblogMike Schmidt: Marlins' Stanton too rich too early? | www.palmbeachpost.com
(28 - 12:37pm, Nov 23)
Last: BDC

NewsblogCashman in wait-and-see mode on retooling Yanks | yankees.com
(21 - 12:32pm, Nov 23)
Last: You Know Nothing JT Snow (YR)

Page rendered in 0.7365 seconds
52 querie(s) executed