Baseball for the Thinking Fan

Login | Register | Feedback

You are here > Home > Primate Studies > Discussion
Primate Studies
— Where BTF's Members Investigate the Grand Old Game

Friday, November 15, 2002

Hot and Cold Streaks

Fact or fiction?

We are all aware of how much stock baseball announcers, commentators, fans, managers, coaches, and players put in the significance of a player (or team) being “hot” or “cold.” You cannot watch or listen to a game without the commentator at some point mentioning that “so-and-so is red hot or ice cold,” presumably referring to the fact that said player has recently (e.g., the last game, week or month) had a very good or very bad spate of performance, measured by whatever statistic is convenient for the commentator or the fan who is listening, or in some cases, no particular statistic at all.

While there is no doubt that players and teams go through short (or less frequently, long) periods of time where their performance is well above or below “average,” the question that this article attempts to answer is “Is there any predictive value to hot and cold streaks, as measured by BA, OBP, SA, and OPS?” Another way to couch this question is “Are hot and cold streaks solely a function of normal statistical fluctuation or are they a function of statistical fluctuation and a temporary change in a player?s ability to perform offensively, which we expect to continue for some, probably short, period of time?”

It is unlikely that these casual fans and baseball pundits are infatuated with hot and cold streaks simply for the sake of pointing out what has happened in the past. To mention that “so-and-so has hit over .400 in his last 10 games,” with no implication that this has anything to do with how we expect him to hit in the near future would be banal. The fact is that when most peoples talk about hot and cold streaks they hold the belief that these steaks have significant predictive value ? i.e., that a player who has been hot is expected to hit better than normal in the future, and vice versa for a player who has been cold.

This belief is evidenced by several things that regularly occur in baseball. Managers will often bench a regular player who has been cold and/or play a reserve player who has recently been hot. Managers will sometimes rearrange their batting order depending upon who is hot and who is cold. Opposing managers will often pitch around or intentionally walk a hot player and pitch “right at” a cold player. Managers will sometimes play “small ball” when their team has been cold, and eschew such an approach when their team has been hot. How often have we heard an announcer say “so-and-so (team) is bunting in the first inning because they have had trouble scoring runs lately?” The clear implication is that because the team has been cold (hitting-wise) in the past few games or weeks, they are more likely to score fewer runs than normal in the not too distant future, including in the current game. Finally, fans and handicappers regularly assess the current “value” of a team depending upon whether their players, as a whole, have been hot or cold in the recent past.

The study described in this article looks at what happens exactly one day immediately following a 2-week hot or cold streak and, independently, what happens over a 7 day period immediately following a 2-week hot or cold streak. The fans and pundits mentioned above would say that players, as a group, will hit significantly better than their own normal level of performance for some period of time following (or “during”) a hot streak, and significantly worse following (or “during”) a cold streak. Since no one would have any idea as to how long this effect should last, either our 1-day or our 7-day snapshot (or both) should capture such an effect, if in fact it does exist. We would logically expect this effect to “peter off” as time goes by.

In response to the following potential criticism - “You are looking at a 1 or 7-day period after the streak has been identified, therefore the streak may be over” - remember that in real life, a manager, commentator, or fan, is always looking at a point in time that is subsequent to the streak (the streak is always in the past). While they have no idea whether the streak is over or not, their assumption is that there is still something left to the streak, at least for some finite and probably short period of time (until the batter returns to normalcy, no doubt). The study described herein tries to duplicate this scenario by looking at games (1 or 7 days after a streak) wherein a player is essentially “in the midst” of a streak ? i.e., he has just had 2 weeks of hot or cold hitting.

Again, most managers, players, fans, and commentators would expect these players to hit significantly better or worse than their “true” average during this game or over the next 7 games. Let?s see if this is true, and if it is, to what extent.

(Since we would expect that the results of the 1-game study would be more telling than that of the 7-game study, keep in mind that the primary reason for including the 7-game data is to get a larger sample size in order to reduce the standard error of our results.)


The study encompasses all major league play from 1998 to 2001 (4 years). I arbitrarily divided all of the data into 2-week periods. The first and last 2 weeks (more or less) of each month constitute a 2-week period. Since there are around 6 months in a season (April through September), there are twelve 2-week periods in each season.

Next I culled all those 2-week “mini-seasons” in which a player was either hot or cold during that 2-week period. (Remember I am culling a player?s stats for the 2-weeks ? not just his name.) Any given player could have more than one 2-week period selected. In fact, as you would expect, many players, especially the very good and very bad ones, had several 2-week periods selected. Also, many players? stats are included in the hot as well as the cold group (they were hot in one period and cold in another).

What were the criteria for being hot or cold over any 2-week period? First, OPS (on base percentage plus slugging average) was the stat of choice for determining whether a player was hot or cold. OPS was chosen for no particular reason other than that it is easily computed, it is recognizable, and it fairly accurately represents a player?s overall offensive production. Batting average, on base percentage, slugging average, or any one of a number of stats could have been used as well. The overall results should be around the same regardless of the stat used.

I did include the following stats in the results portion of the study, although, as mentioned above, the only stat I used in determining the hot and cold groups at the outset, was OPS:

BA, OBP, SA, and OPS.

All four of these stats were computed in the traditional way, except OBP. For purposes of this study, OBP is (hits plus non-intentional walks plus HBP) divided by (AB?s plus SF?s plus non-intentional walks). Traditional OBP includes IBB?s and sometimes includes SF?s. I do not include IBB?s. Obviously, the results of this study will not be affected either way.

Getting back to the selection criteria for culling a hot or cold 2-week periods, I chose an OPS above which a “2-week player period” went into the hot group and an OPS below which a “2-week player period” went into the cold group. A player must have had at least 30 AB?s in order to qualify for a group. The hot and cold OPS?s were chosen such that around 10% of all “2-week player periods” (with at least 30 AB?s) in any given year went into the hot group, 10% into the cold group, and 80% into neither group. The exact minimum and maximum OPS depended upon the year, but a typical minimum 2-week OPS for admission into the hot group was 1.100 and a typical maximum 2-week OPS for the cold group was .525.

I chose to use absolute OPS maximums and minimums for the group selection criteria rather than a “relative to a player?s average OPS” criteria for the following reason: I wanted to mimic real life as much as practicable. We usually don?t identify a hot or cold streak by how a player has performed relative to his norm ? it is usually defined by whether a player has been hitting well or not, relative to a more or less average player. For example, if Rey Ordonez hits .280 (in BA) over a 10-day period, we normally don?t characterize him as hot. Similarly, if Bonds is 10 for his last 32 (.312), we probably don?t say that he is cold, even though .312 is substantially less than his 2002 BA of .370.

In other words, if a player has greater than a 1.100 OPS for 2-weeks, most people would consider him hot, no matter who he was. Similarly, if a player has a 2-week OPS of less than .525, he is usually considered cold, even if he is the original Mendoza.

In any case, because the selection criteria for both the hot and cold groups is fairly high and low, respectively, and the mean OPS in each of those groups is even higher and lower, respectively, while the hot group will tend to have a higher percentage of good players and the cold group will tend to have a higher percentage of poor players (obviously), we will still capture most of the legitimate hot and cold streaks (significant deviations from a player?s normal OPS). As well, our groups should not be significantly “polluted” by extreme players, like Bonds or Rey Ordonez, performing at around their normal level.

Once the hot and cold 2-week periods were selected or identified, I looked at the hitting results of each player in: one game (actually one day) immediately following the hot or cold streak, and; in the 7 games (actually 7 days) immediately following the streaks.

Thus, 6 groups were formed:


  1. The hot group, consisting of the combined 2-week stats of all players who were hot (see the above criteria) for those 2-weeks.
  2. Same as above for the cold players.
  3. Combined stats 1 day immediately following the hot periods.
  4. Combined stats during the 7 days immediately following the hot periods.
  5. Combined stats 1 day immediately following the cold periods.
  6. Combined stats during the 7 days immediately following the cold periods.


In a recent post on Clutch Hits on Primer, a helpful, if somewhat overzealous, critic suggested that I adjust for the park and the opposing pitcher in all of the above stats. (Actually, he said, and I paraphrase, “Without accounting for park and pitcher, your study is completely worthless!”) His harsh criticism not withstanding, I did indeed adjust all of the 1-day and 7-day stats for park and opposing pitcher. This was done on a PA-by-PA basis using component 3-year regressed park factors and 1-year component “opposing pitcher factors”.

In order to draw any conclusions from this study, we need to compare the 1 and 7-day stats (BA, OBP, SA, and OPS) to those same players? combined “true” stats, which I also call their “expected” stats. Originally, I used a player?s full-season stats in the same year as the hot or cold streak as a proxy for his “true” stats. This did not work out well. The combined full-season stats of the players in the hot group ended up being substantially higher than their previous or subsequent year?s stats, and the combined full-season stats of the players in the cold group ended up being lower than their subsequent or previous year?s stats. (Actually, in the first draft of the study, I used a player?s full-season stats not counting the 2-week streak period - that ended up understating his “true” stats.)

This is because the players in the hot group were not only better-than-average players, but were also, as a group, slightly lucky in the entire year in which their streaks occurred. The same was true in reverse for the players in the cold group. Here?s why:

Anytime we select a player or a group of players, based upon a criteria that is above (or below) average, we are automatically, and by definition, selecting a lucky (or unlucky) group of players. Even two weeks during an entire season of hot or cold play will mathematically imply that a player has been slightly lucky or unlucky for the entire season.

In the final draft, I ended up using the average of the previous and subsequent year?s stats to represent a player?s “true” stats. As in the original draft, in order to calculate each group?s (hot or cold) expected (“true”) OPS, I weighted the average of each player?s subsequent and prior year?s OPS by the number of AB?s that each player had in their respective group (hot or cold). In other words, if Jeff Kent had 64 AB?s in the hot group (one 2-week streak) in a certain year, Tejada had 133 (two hot streaks), and Renteria had 55 (one streak), the average of each player?s previous and subsequent year?s OPS would be weighted by 64, 133, and 55, respectively, when calculating the entire hot group?s expected OPS for that year.

Suffice it to say that the expected OPS of the hot and cold groups, based upon the previous and subsequent year?s OPS for each player in each of the groups, is exactly what we expect each group to hit at any point in time, including right after (“during”) a streak, if, in fact, streaks have no predictive value.

The pundits, on the other hand, would theorize that the hot group would have an OPS (and the other stats) significantly greater than their expected OPS, following the hot streak, while the cold group would have an OPS significantly less than their normal OPS, following the cold streak. How much more or less is not clear.

They might also hypothesize that if there is a significant difference between the actual and expected stats immediately following a streak, that the difference will be greater in the 1-day stats than in the 7-day stats, as by definition, a hot or cold steak must dissipate over time.

What would be the rationale for such a belief? Perhaps hot players “see the ball better, are ?in the zone?, and are mentally and physically at the top of their game,” whereas cold players “are mentally and/or physically unfit, out of sync, pressing, and are not seeing the ball well.”

My hypothesis differs from that of the pundits. I expect the hot group to be slightly healthier than the cold group and the cold group to be slightly more injured than the hot group. Other than that, I would not expect there to be any differences between the two groups as far as post-streak hitting is concerned, other than as a result of the differences between their “true” levels of performance. In other words, I expect the hot and cold groups to return almost exactly to their true level of performance during both the 1 and the 7-day time periods.

The reason I do not expect each group to return exactly to their expected level of performance is two-fold:

One, as I said, the day or days immediately following a hot streak and the hot period itself will tend to be a player?s healthiest time of the year, whereas the cold periods and the days following them will tend to be a player?s least healthy time of the year. In other words, a cold streak slightly suggests that a player is injured and a hot streak slightly (less so) suggests that a player is healthy.

Two, the hot and cold streaks are aptly named. A hot streak will tend to be during hot (temperature-wise) periods (in outdoor stadiums of course), while a cold streak will tend to be during a cold period. The weather during the 1 and 7-day periods and their corresponding streak periods will tend to be similar, since the former occurs immediately after the latter. In fact, if I repeat the study and control for weather, I suspect that we might see any differences between the expected and actual results in the 1 and 7-day periods shrink or even disappear.

Some people might characterize the 2-week hot and cold periods as suspect because they are chosen before looking at the content of each period (i.e., before the fact). In other words, while a player in the study may have had a certain high or low OPS within a certain 2-week period, the hot and cold streak may have ended before the 2-week period was up. This is true and it is a valid criticism. A better methodology would have been to choose a starting and ending point (actually, the starting point is not important) after the fact, such that the player in the study is always “in the middle” of the streak (i.e., the streak has definitely not ended yet) when we look at the 1 and 7-day stats. This would almost exactly mimic how we perceive and identify hot and cold players and streaks in real life.

In defense of the present methodology, while my “before-the-fact” 2-week periods may contain some streaks that have already “ended” (Does 1 bad day constitute the “end” of the streak? How about 2 bad days?), they should contain plenty of streaks that have continued more or less to the end of the 2-week period, such that the results should enable us to reach a reliable conclusion one way or another.


                        BA   OBP   SLG   OPS
During 2-week hot streak:      .383   .458 .754 1.210
1 day after hot streak:      .302   .385 .537   .922
During 7 days after hot streak:  .296   .373 .528   .901
Expected (prev. and sub. years): .286   .365 .505   .870

During 2-week cold streak:    .165   .222 .218   .440
1 day after cold streak:      .263   .319 .418   .737
During 7 days after cold streak: .264   .328 .408   .736
Expected (prev. and sub. years): .267   .328 .410   .738

Analysis and Conclusion:

Looking at OPS only, we see a significant tendency for a hot player to remain hot 1 day after a hot streak, as compared to his normal or expected OPS (.922 to .870). We see the same (significant) tendency over the 7-day period (.901 to 870), although this tendency has dissipated.

The cold group does not exhibit the same tendency. That is, after a cold streak, players tend to hit about the same as their normal or expected OPS, both 1 day after (.737 to .738), and during the 7 days following (.736 to .738), the cold streak.

The standard error of the 1-day OPS is around 15 OPS points (around 2300 AB?s) and the standard error of the 7-day OPS is around 5 points (14,500 AB?s), so the differences between the 1 and 7-day OPS and the expected OPS (these stats are highlighted above) for the hot group, are statistically significant.

The weather and injury factors (see explanations above) may explain some of the results, although I have no idea why significant differences between expected and actual OPS are seen in the hot group but not in the cold group. (The average temperature for all outdoor parks 1 day after a hot streak was 73.6 degrees, while the average temperature 1 day after a cold streak was 72.6. In calculating the average temperatures, I did not control for venue - i.e., more hot streaks may have occurred in warmer venues).

In conclusion, there does appear to be some predictive value to a hot streak, but not to a cold streak. Until further research or explanation, based upon the results of this study, it is not clear whether typical measures employed by managers, such as benching an otherwise good player, pitching differently to batters depending upon whether they are hot or cold, or re-arranging a lineup, are justified.

Mitchel Lichtman Posted: November 15, 2002 at 05:00 AM | 31 comment(s) Login to Bookmark
  Related News:

Reader Comments and Retorts

Go to end of page

Statements posted here are those of our readers and do not represent the BaseballThinkFactory. Names are provided by the poster and are not verified. We ask that posters follow our submission policy. Please report any inappropriate comments.

   1. Mikαεl Posted: November 15, 2002 at 01:04 AM (#607238)
Fascinating study, MGL. Thank you.

I was thinking about your "injury" hypothesis. Specifically, I wonder if the difference between a "hot" player and a "healthy" player is not as cut-and-dried as it initially seems. A "hot" player is someone functioning at or near the top of their physiological capabilities. Anyone functioning below that highest level could be considered "injured" - mentally or physically, they are unable to perform at their best.
   2. tangotiger Posted: November 15, 2002 at 01:04 AM (#607239)
I agree, great study, and well-written. I especially like that you answer the questions I had while reading! (Let no stone be left unturned.)

I was surprised by this comment in particular
Even two weeks during an entire season of hot or cold play will mathematically imply that a player has been slightly lucky or unlucky for the entire season.

The only justification that I can think of for this to be true is the "health" aspect. That is, players, on average, will perform to say 80% of their potential. But, by being in the hot group, we automatically know that this player MAY be healthy, and maybe they should be expected to perform to 90% of their potential throughout the year.

Can you also present the "same-year" OPS for the two groups (excluding their streak periods)?

As for the cold group, there is another consideration. A player, given that he has played for 2 weeks (to meet your minimum qualifications of 30 PAs I think), as a given is healthy to some degree. Therefore, it is more likely that this group was just unlucky. If they were really injured, then they might not even show up in your groups.

Good stuff!
   3. Warren Posted: November 15, 2002 at 01:04 AM (#607240)
I'm a bit confused by the "lucky" comment as well - I'm unclear why the current season (minus the streak) would tend to have similar "luckiness" as the streak itself.
   4. tangotiger Posted: November 15, 2002 at 01:04 AM (#607243)
In order to dismiss the idea of a predictable hot streak, you must at least consider the possibility that the day after figure of .922 comes not from 100% of the players distributed about .922, but from 85% of the players (those whose streaks were random) distributed about .870 and 15% (those whose streaks were real) distributed about 1.210.

This is important piece from the last poster.

If this is true, then we would expect to see the distribution of the players skewed to the right. That is, if you have your normal distribution around .870 and another normal distribution around 1.210, then we would see more players on the extreme right hand side of the curve (say more than 3 SD), than on the left-hand side. I would be shocked if this is the case, and perhaps Mitchel can present this data as well.
   5. emancip8d Posted: November 15, 2002 at 01:04 AM (#607244)
Thanks MGL for sharing the results of your interesting study.

1) One aspect of the research was whether or not hot/cold streaks are short term performance predictors. However, in calculating the player's true OPS you used year (i+1) OPS, which obviously is not available at the end of the 2-week streak. Thinking about this helped me formulate this (strained) hypothesis: Streak effects that are stronger than what is shown by this research could be masked by cold streak players being on a career descent (on average) and hot streak players being on a career ascent (on average). For the cold streaks, my proposed masking effect is produced by the player declining in OPS in year (i+1), thereby decreasing the estimated true OPS ability that the post cold streak is being compared to. (just the opposite for hot streaks) Was there any correlation between hot/cold streaks and the change in OPS from year (i-1) to year (i+1)?

2) Might there be some selection bias involved with the cold streak analysis? I would think injured players going on the DL and perhaps managers correctly identifying someone who is having a mechanical problem would lead to a cold-streak player more likely sitting during the 1-day after period and/or have fewer PA (less weight) over the 7-day period than players who were fell into the cold streak due to normal statistical fluctuation. Would comparing the post-cold streak to the post-hot streak results give you any information. For example, what was the ratio of 1-day AB after a cold streak to 1-day AB after a hot streak. 7-day AB ratio?

3) I think an important question is whether this methodology captured a manager's concept of hot/cold streaks. I think you could get at that question by looking at the ratio of (AB per cold streak) to (AB per hot streak). Although I do not dispute the claim, in the first part of the article a claim is made that managers sit players down in cold streaks. If that is true, then (AB/cold streak) / (AB/hot streak) will be significantly less than 1, especially given that the players in the
identified cold streaks might also be injured. If the ratio is very close to 1 then I would think that this methodology has misidentified the streaks. (Although with the minimum AB requirement the ratio probably couldn't get below 0.5?)

4) Some people may make the equivalent of one of the "clutch hitter" arguments, that only certain hitters are streaky. Given that there were about 2300 AB in the 1-day period following a 2-week hot streak, I would guess that there were about 650 hot streaks. A single player could theoretically have 12*4=48 of the ~650 hot streaks. How many *different* players did it take to make up 10% of the hot streaks? 25% of the hot streaks? 50%? For cold streaks?

5) Given a hot or cold streak in "2-week period" i, what is the actual probabilities of having either type of streak during the "2-week period" i+1?

6) It seems a little inconsistent that you didn't use relative OPS to determine the streaks since you say that's not how managers do it, yet use OPS as the metric which isn't how managers would do it. How do most managers decide if a player is hot/cold? BA? SO? HR for power hitters? Since you have the PBP data, perhaps a better metric would be line drive%? Most likely it doesn't matter too much though to the analysis, but I'm curious about the best way of capturing the common manager view of a streak using available data.

7) Why did you use AB for minimum requirements and weightings instead of PA if you were using OPS as the metric?

   6. emancip8d Posted: November 15, 2002 at 01:04 AM (#607245)
To clarify my question #1, do the short term streak effects increase when you make an estimate of true OPS using information only available thru the end of the streak?
   7. MGL Posted: November 15, 2002 at 01:04 AM (#607247)
Here is a response I e-mailed to Tango in response to his comments and concerns (BTW, how in the heck do you guys italicize quotes? - I tried , but it doesn't work):


Thanks. Here are some reasons (explantions) why I made the comment you quoted:

We KNOW that if we deliberately choose any above-average player or players for an entire season, on the average, this player or players is in fact, a true above-average player AND he got lucky. The fact that his true ability is less than his sample ability tells us that he (or they) got lucky. That's the essence of regression and that's what you captured in your Banner Year study, right? Anytime we select or simply look at a group of players (or one player) from an average population and find out that their sample stats are NOT average, it is a given that they were a little lucky or unlucky and that their true ability (which can be estimated from a sample from any other time period) is more or less than their sample ability, depending upon whether their sample ability was above or below average.
If that is true (which it most definitely is), then what about a player (or players) who was above average for 99% of the season? The same must be true - their full-season stats will be above average and they will be a lucky (and good of course) group since we chose to look at them based on the fact that they were above average for 99% of the season. Surely selecting a group of players who were, say substantialy above average for the season (like you essentialy did in your BY study) and selecting a group of players who were above-average for, say, the entire season other than the last week (say we didn't even look at the last week) must yield the same thing (a group of good AND lucky players). Follow this down its logical path. Say we select players based on the fact that the first 6 months were above average. Will this group have been lucky for the entire season including the last 6 months? Sure. What about the first month? First (or any) 2 weeks?

As you often like to frame it, all players in a season are made up of 10% lucky, 80% around average, and 10% unlucky (obviously it as a continuum in reality) and all different abilities. If from that group, we select players who had a 2-week hot streak sometime during the season, we are effectively eliminating some portion of the 10% of players who were unlucky for the whole season, since those unlucky players would be more likely to not have ANY hot streaks. I wrote some programs on my computer whereby I set up a group of 100 players all with different true BA's (.200 to .299). I then simulated a season with these 100 players (500 AB's each), using each player's true BA to determine the result of an AB (e.g., if a player's true BA was .232, I generate a random integer between 1 and 1000; if it is 1-250, the batter gets a hit; if it is 251 to 1000, batter makes an out). I then ran 1,000 seasons. Within each season I broke down each player's 500 AB's into 20 groups of 25 AB's per group, similar to what I did in the streaks study. I then looked at what happens when players had hot and cold streaks within those 25 AB periods...

Here's what I did and here's what happens:

The mean true BA for the 100 players is of course .2505 (average of .200, .201, .202,...., .299).

I ran 1000 seasons of 500 AB's for each player. I looked at any player who had at least one "hot streak" in a season. A hot streak was defined as a BA over .350 in one of the (20) 25 AB subsets in a season. Almost all players had at least one hot streak in every season season. Actually, out of 100,000 player seasons (100
players/1000 seasons), 91,094 had at least one hot streak. So far this is similar to the "hot group" in my study.

The true average BA for those players (91,094 batter seasons) who had at least one hot streak in a season was .2535. IOW, these are better than average players as a group. IOW, players who have hot streaks (any kind of hot streaks we want to define - it doesn't matter) tend to be above-average players to start with. We pretty much suspected that.

What was the average sample BA for the "hot" players, including the hot periods? It was .2554, slightly higher (2 BA points) than their true or expected BA of .2535. That's why I said that any group of players who have any kind of hot streak (even 2 weeks during a season) will have a full-season sample BA above their true BA - i.e., they got lucky during the year of the batting streak!

Now obviosuly, the "luck" is only contained in that batting streak. Maybe that's where some of the confusion lies. One poster thought that I meant that they got lucky during the rest of the year as well. Of course, that is not true. One thing has nothing to do with the other. So when I say that players who are identified by virtue
of having some kind of a hot streak in any given year automatically got lucky that whole year, it is only because they were lucky (by definition) during the hot streak that we can say that they were lucky for the whole year.

Now, what happens if we look at the whole year for these players, not including the hot period? I originally thought that the average sample BA for this time period (everything but the hot period) would be equal to their true BA. This turned out to NOT be true, which is why I changed the way I estimated the hot and cold players' true BA's (using the previous and subsequent year's stats).

In the simulation program I wrote, the average BA, not including the 25 AB hot period (BTW, since the defintion of a season with a hot streak was at least one 25 AB hot period, when I say "the 25 AB hot period" I am arbitrarily using the last hot period in a season, since there may be more than one), was .2482, around 5 BA points less than their true BA! The reason for this phenomenon is that we are selectively looking at a period of time that does not include one particulat hot 25 AB period (or in the case of my Primer study, all 15 day hot streaks), so obviously the rest of the year will tend to be "cold" periods. Using the previous or subsequent year's BA are
OK as a measure of the hot players' true BA, because they are independent of the hot streaks, are NOT selectively sampled, and are, in fact, a random sample of BA for players who happened to have some hot streaks in some other year.

For my Primer study, here are the full-season (during the streak seasons) stats for the hot and cold players, both including the streak periods and NOT inlcuding the streak periods (all in OPS), weighted by the number of AB for each player in the 7-day streak periods:

Hot players

Full-season OPS, including streak periods: .917
Compare this to those same players' previous and subsequent year's OPS of .870, which is presumably a good estimate of true OPS for the streak season.
Full-season OPS, NOT including streak periods: .885.

Cold players

Full-season OPS, including streak periods: .716
Compare this to those same players' previous and subsequent year's OPS of .738.
Full-season OPS, NOT including streak periods: .746.

I don't know what the full-season OPS NOT including the hot streaks is greater than the previous and subsequent year's OPS. This should NOT be the case, as was shown in my simulation, unless for some reason the "hot players" in any given year are somehow truly "hot" (have a higher true OPS) for the whole year, while this does not seem to be true for the cold players...

   8. MGL Posted: November 15, 2002 at 01:04 AM (#607248)
For my Primer study, here are the full-season (during the streak seasons) stats for the hot and cold players, both including the streak periods and NOT inlcuding the streak periods (all in OPS), weighted by the number of AB for each player in the 7-day streak periods:

Hot players

Full-season OPS, including streak periods: .917
Compare this to those same players' previous and subsequent year's OPS of .870, which is presumably a good estimate of true OPS for the streak season.
Full-season OPS, NOT including streak periods: .885.

Cold players

Full-season OPS, including streak periods: .716
Compare this to those same players' previous and subsequent year's OPS of .738.
Full-season OPS, NOT including streak periods: .746.

I don't know what the full-season OPS NOT including the hot streaks is greater than the previous and subsequent year's OPS. This should NOT be the case, as was shown in my simulation, unless for some reason the "hot players" in any given year are somehow truly "hot" (have a higher true OPS) for the whole year, while this does not seem to be true for the cold players...

   9. MGL Posted: November 15, 2002 at 01:04 AM (#607253)
Let me start by responsing to Chris R. (Disclaimer: I'm not that thrilled with this study either, but definitley not for the same reasons as you.)

It is a given that "most of the 1.210 OPS" in the hot periods is due to pure, old-fashioned fluctuation. We all know that. If anyone reading the study doesn't know that, they won't comprehend much if anything of the study anyway. Occasionally I try and explain to the more statisitically-challenged folk how "You can never get rid of fluctuation - it alway exists and there's nothing you can do about it. "

If you intend to find that something else, besides fluctuation, is indeed operating (which is often the goal of these kinds of studies), such as, in this case, 'a temporary change in ability', it necessarily must coexist with the fluctuation, and it the job of the researcher to excise one from the other. This is often not as hard as it may seem. In this study, if there is nothing but fluctuation, the OPS of all streak players will return to normal (within the bounds of sample error of course, which is why we always try and get our sample sizes large enough, without sacrificing too much of the integrity of the study. It is as simple as that.

Now, if there is something else going on, in this case, a change in ability during and after the streak (which eventually peters off), IN ADDITION to random fluctuaion, then regardless of how the distributions will look (and you are not wrong in your characertization of the various distiributions), one thing will be perfectly evident - that is the OPS in the 1-day or the 7-day sample will NOT return to "normal" levels, they will remain high for the hot group and low for the cold group, again regardless of what the distribution of individual OPS's in these samples looks like (yes, it is true that if nothing but normal fluctuation is going on, players in the hot group will be more or less normally distibuted around the .922 and if there is indeed something else going on, as Tango says, the overlapping of the various distributions will make it look like one big right-skewed somewhat normal distiribution) But who cares! The mean OPS in the 1-day and 7-day samples will tell us everyhting we need to know. We don't care what the distribution looks like (we can probably infer it anyway from the mean)! For the cold group, the entire group returns exactly to its expected mean OPS (plu sor minus 10 OPS points)!

That tells us that there are virtually no players in that group who will continue their cold streak! Period!

In the hot group, we don't get that result. We have the hot players continuing to be somewhat hot (beyond 4 SD's or so). There is indeed something else going on. What it is, why, and how we (a manager) could use this in practice, we don't know for sure. Is it partly or solely due to the weather situation I mentioned? Is is partly or solely due to the fact that all of the players in the hot group are, on the average, not injured, as compared to a random 2-week period for any player (IOW, the .922 is really those players' "true OPS whe healthy")? Is it becuase they are pitched to differently (i.e, as if they were Bonds-like, thus less optimally for their true OPS)? Or is it due to the traditional notions we have about a player being hot (in the zone, seeing the ball well, confident, etc.)? More than likely, it is a combination of these and other things not mentioned herein.

One thing is for sure (actually two things, maybe three): There is virtually no evidence that there is any predictive value to being cold. (Given that, BTW, it is surprising that the same is not true of the hot players.) OTOH, there is some evidence that the opposite is true of the hot players, althouhg I hate to use the words "predictive value", as they imply some cause-effect relationship between the hot streak and the 1 and 7-day after stats. Finally, IMO, and it is mostly a matter of semantics, there does not seem to be as strong a predictive effect as some people traditionally think (IOW, if a red hot player, say an OPS of 1.250 in his last 30 AB's, stepped up to the plate and I told you that according to my study, this guy was likely to have a true OPS of 30 points above his normal true OPS, would you get all that excited?)

Anyway, you essentially say that "If there is such a thing as some players significantly changing their ability level for short periods of time, some players in the 1 and 7-day after group will have an average true OPS of 1.210 or so, with their sample OPS's normally distributed around this, while the players whose hot streaks were merely fluctuations will have an average true OPS of around .886 or so, with their individual sample OPS's distributed normally around that number." First of all, I don't know why this mattters. If this were the case, the average OPS of the whole group (those whose hot streaks were a fluctuation AND those whose hot streaks were "legitimate") would be larger than .886! That's the linchpin of the study - the average OPS of the hot and cold groups after the streaks! We know that no matter how the study turns out, and no matter what is going on (there are no legitimate streaks or there are) that the "after" data will contain players who have merely fluctuated during the streaks. If nothing is going on, the "after" data will contain nothing but players who fluctuated during their streaks and the mean OPS of the "after" data will be .886 or so. If there is something going on, and there are players in the hot group who will continue to be hot, then the "after" data is a combination of players who mereely fluctuated and will retuen to theirtrue OPS of around .886, and those who continue their hotness, who will post an OPS of something higher than .886, so that the overall OPS of the "after" data will be higher than .886 - and that';s what we are tring to determine.

There absolutely nothing wrong with the premise or the execution of the study!

And BTW, in reality of course, if there were indeed something going on with some of the players in the hot groups, other than mere fluctuation, the distribition of OPS's in the 1 or 7-day after data would look something like this:

Some proportion of players (we don't know how many, but most of them) will have an average true OPS of .886 and will distibute normally around that. The other players (the ones who experienced "legitimate streaks with rpedictive value" will have an average true OPS of some unknown number, greater than .886, but probably not that much greater, and their sample OPS's during the 1 or 76 days, will be normally distributes around that unknown number. The overall distribution, as Tango states correctly, is going to look like a pretty nice normal curve skewed to the right.

I suppose that another way to test the hypothesis of whether streaks are predictive or not would be to look at the skewedness of the distribution as Tango implies might be a valid methodology. However, why do this when the mean tells us eveything we need (maybe not "want") to know? An deven if the distribution ended up being heavily skewed to the right, we would still need to look at the mean, since of the mean OPS was not higher than the "expected" OPS, we could be sure that there was NO predictive value to the streaks, despite the skewedness of the distribution (aren't all baseball stats for the popualtion of ML player skewed towards the lower end anyway, as they are really a slice of the distribution of baseball stats from the general population of male athletes?)...
   10. Warren Posted: November 16, 2002 at 01:04 AM (#607254)
"Full-season OPS, including streak periods: .917 Compare this to those same players' previous and subsequent year's OPS of .870, which is presumably a good estimate of true OPS for the streak season. Full-season OPS, NOT including streak periods: .885."

While both the .885 and .917 values have the problems you discussed, shouldn't those be the lower and upper bounds for the player's "true" OPS for that season? If so, then this might suggest that the 7-day value of .901 is right in line with the players seasonal OPS, and possibly the 1-day .922 value as well.

The problem I have with most studies (including my own) is that it's human nature to repeatedly refine the study, but stop once you reach the conclusion you expect. The cold streak here is a good example - it's easy to stop and not spend any more time refining the cold streak numbers, since you've reached the conclusion you expected to reach.

Don't take this as a specific personal criticism - it's something everybody does, and I'm not sure there's a good solution to the problem. It's just something to keep in mind - just as you wrote about a players' ability to be lucky, there's a certain level of "luckiness" to these studies. Out of say 20 possible variations of the study, at least one of those is likely to lead you to a conclusion that is the opposite of the other 19. Such is the world of probability :)
   11. MGL Posted: November 16, 2002 at 01:04 AM (#607255)
Warren very good insight about about this kind of research. In statistics it's called "curve fitting" I think. Yes, I tend to do that too! Sometimes when I do research and I'm not "happy" with the results or conclusions, I am even tempted not to publish or share it with anyone! How bad is that?

With this particular study, I really tried to put an "anticipation of the results" in the back of my mind and refine the study as much as possible in order to mimic real life and NOT to come up with a certain result. Because there are so many permuations in the possible metholdogies for a study like this, a little bit of "curve fitting" is almost unavoidable. We (particularly I) should keep that in mind.

To answer a few of the questions above:

Using AB as the unit for the "number of trials" for OPS is not correct of course - PA should be used. I only used AB's because it was a remnant in my programs from when I was using BA rather than OPS. I doubt it makes much of a difference.

Also I doubt that using OPS rather than BA, which a manager is more likely to use to identify hot and cold streaks, makes a difference either. I think I mentioned in the study that whether I use BA, OBP, or anything else, it probably won't change the results much if at all. If anyone is interested, I'll post the results in BA, OBP, and SA, as the programs utilize and spit out all of those stats as well. Also, even if managers use BA more than OPS (or whatever a manager's version of OPS is) in identifying streaks (or whatever they use), we are still looking for the answer to the question "Is short-term well-above (or well-below) average hitting production (as measured by OPS for example) predictive of short-term future hitting production?"

Chris R.:

I agree that this is not a novel idea or study. It just occurred to me that I didn't recall any real good studies looking at hot and cold streaks - that's all. I wanted to profuce the seminal study on the topic. I think it is close to that, although it is far from prefect, and could easily be refined and improved.

In fact, I am going to redefine the streak periods, such that I go through every player season PA by PA and keep a running total of each player's OPS. When we reach a 2-week (or some number of PA's) period where the OPS is above or below a certain amount, I will call this a hot or cold streak and then start looking at the following PA's (some number) as the "after a streak" period. This is a much better way of defining a streak of course, and is exactly how a streak is defined in real life (a manager is making out the linuep card in his office and he notices that "so-and-so" has been red hot for the last 2 weeks or that "so-and-so" has been ice cold, so perhaps he makes certain assumptions about those players pertaining to their expected results in that day's game).

I will look at (and post) the previous year's and the next year's OPS separately. I used the average of both years as an estimate of the players' true OPS for several reasons: 1) larger sample size; 2) it helps to counteract any age or injury things going on in either group. 3) it helps to counteract any environmental differences between year x-1 and year x.

The dice example is a great example of exactly what would be going on if there was no such thing as a "real" hot or cold streak (fair dice) and if there were some players who had real hot or cold streaks (introduction of some loaded dice). The "answer" is, of course, that if we introduce a few loaded dice, the average value of the dice after the streak will no longer be 3.5 - it will be higher or lower (plus or minus of course, due to sample error), no matter how many loaded dice we introduce. Even if we introduce only one loaded dice (one truly hot player), the average value of all the dice in the "after the streak" rolls would be ever so slightly higher than 3.5. That was my point about not caring about the "distributions" of OPS in the "after the streak" data and only caring about the average OPS in that same data. If the average OPS in the "after" data equals the true OPS (which, unlike with the dice, we all concede is fairly hard to determine precisely) of each group (3.5) then we have no real hot and cold streaks (no loaded dice). If the average OPS in the "after" data is (statistically) significantly less than or greater than the expected OPS of that group (hor or cold), then we have some (how many?) true streaks somewhere in there (some loaded dice have been introduced).

Of course, we can't identify which dice are loaded if we find that the avreage roll after the streak period is NOT 3.5! Not possible! Nor could we identify which players and/or streaks were real (i.e, had predictive value) if the post-streak OPS were greater or less than the expected (true) OPS of that group of players! Also not possible! We can only determine (within the confines of sample error) whether or not there were any real streaks (loaded dice) or not. If there were, then anytime a player has just had a streak, then we have to assume that there was a "tendency" for it to be a real streak!

In the study, we think we found that there were some real streaks in the hot group and no real streaks in the cold group - that's all we know! We have no idea which streaks in the hot group were real and which were due to fluctuation. For any given streak, it is overwhelmingly more likely that it was fluctuation, but we also have to assume that there is a slight tendency for it to be a real streak, therefore we have to assume that when a player has been hot for 2 weeks, that during his next 7 days or so (maybe more) he will hit slightly (30 points?) better than his expected OPS, as measured by some 2-year period independent of the current year.

Now because of the limitation in sampling statistics (sampling error), we can never say for sure that there aren't at least a few instances of "real" steaks even if the reuslts indicate that there are not (even if the average roll after the streak was 3.5, there could easily be 1 or 2 loaded dice out of 1000). Unfortunately, there is no way of ever identifying those few loaded dice. No way! Same thing for "clutch hitting". We don't find any evidence of clutch hitting. All clutch hitters one year return to their normal hitting in clutch situations the next year (sam exact methodology BTW, as this one), do they not? Becuase of sampling error, we have no way of knowing if there are not a few "real" clutch hitters in a sea of phony ones. And if G-d came down and told us that there were, we would have no way of identifying them.

BTW, Chris R., you have some good comments. Why the hotility toward this study? Are you a frustrated researcher?

MGL's (non-baseball or statistics) rule #23 (I don't have that many non-baseball or stats rules):

Unless you can at least duplicate someone else's effort, be careful about about how much you criticize that effort!
   12. MGL Posted: November 16, 2002 at 01:04 AM (#607256)
Warren, I'm not sure why you think that the full-season OPS including the streak periods and the full-season OPS, not inclduing the steak periods should be the "upper and lower bounds" of each groups average true OPS. Why would this be so, and if there were some justification, why would this be better than say the previous and subsequent year's OPS?

BTW, the breakdown for the previous and subsequent year's OPS for the hot and cold groups, weighted by the number of AB's (sorry it's not PA's) each player has in the 7-day after period, is:

Hot group

Previous year's OPS: .867
Subsequent year's: .868

Cold group

Previous year's OPS: .748
Subsequent year's: .724

BTW, all players had an OPS of .763 in previous years (97-00) and .765 in subsequent years (99-02).

It is interesting that the subsequent year's OPS for the cold players went down so precipitously, whereas the hot players stayed almost exactly the same. Were the cold players as a group indeed on the decline career-wise? Were they injured and then tended to be less than healthy in year x+1? You would think that the group would have lost the more worse/most-injured/oldest players in year x+1 such that the OPS in year x+1 would have been higher than in year x-1. Hmmm...

Here is a correction for the full-season OPS, not including the streak periods:

Hot group:
.819 (rather than .885)

Cold group:
.780 (rather than .746)

That's more like it! Sorry!


There were 511 distinct players in the hot group with an average number of AB's of 76.7 per player (remember that some players had more than one 2-week streak period).

In the one day after a hot streak only 433 (85%) remained. I don't know why; perhaps some of them had no game (I looked at the next day following the streak - not the next game) and they had 5.8 AB's per player.

In the 7 days following a hot streak, 477 (93%) players remained, with 32.9 AB's per player.

There were 588 distinct players in the cold group with an average number of AB's of 54.6 per player.

In the one day after a cold streak only 464 (79%) remained and had 4.5 PA per player. I guess more of these players were benched or had injuries and didn't play.

In the 7 days following a cold streak, 572 (97%) players remained, with 22.9 AB's per player. I don't know why a higher percentage of the cold players were "left" after 7 days. It could be chance.

The reason for the higher number of AB's per player across the board in the hot group (better hitters) is probably because they bat higher in the BO and the cold group (worse hitters) tends to get pinch hit for more.
   13. MGL Posted: November 16, 2002 at 01:04 AM (#607257)
In re-reading my hypothesis and conclusions, I think they were a little unclear and wishy-washy.

My hypothesis was:

Any measures taken by managers which take into consideration a player's hot or cold status, as measured by OPS, is unjustified as a player's 2-week hot or cold streak has no predictive value beyond that which can be expected from a normal projection model, vis-a-vis his performance both 1 day and for 7 days after the streak periods, again as measured by OPS.


The above hypothesis is true for players who have had a 2-week "cold streak", as roughly defined in the study. However, some adjustments by a manager based upon a player being or having been in a 2-week hot streak, also as defined in the study, is justified, as a 2-week hot streak is a predictor of continued above-average performance (30 to 50 OPS points above "normal") for at least 7 days after a streak. Also, above average performance after a streak is greater immediately after (1 day) the hot streak and appears to dissipate with time. How much longer the above average performance continues would be interesting to know, but is not within the scope of this study.

I doubt that Chris R. will like it, but I am very happy with the above!
   14. MGL Posted: November 16, 2002 at 01:04 AM (#607260)
Chris, your criticism of the "tone" of the presentation of the background, problem, hypothesis, etc. is well taken. I will be more careful in the future as far as "tipping my hand" as to which way I expect, hope, etc., the results and conclusion to go. A properly presented study should appear unbiased I suppose, at least until some possible subjective conclusions are drawn from the results.

I still think that if the OPS of either group after the streak retuns to "normal" (not considering sample error), that there is no possible way that anything other than random fluctuation was going on during the streaks (i.e., none of the streaks can possibly "continue"), unless of course some players perform below normal after a hot streak or better than normal after a cold streak, for some strange reason (other than more fluctuation of course), in which case the continued regular "streakers" and reverse "streakers" could balance eachother out to produce a normal OPS. I hope you are not talking about that possiblity! I could be wrong though. I like your dice analogy. Please explain again your last scenario with the dice (in different words). I do not understand it.
   15. MGL Posted: November 16, 2002 at 01:04 AM (#607261)
Since we know that the full-season stats, not including the streak periods are relatively close to a players "normal", expected stats (previous and subsequent year), it must follow that as time goes on after a hot streak, the OPS must eventually (and probably quickly) drop to normal levels.

This time I looked not only at a 1 and 7-day period after a streak, but a 8 to 14 day period and a 15 to 21 day period after a streak. I wanted to get a picture of how much and how quickly the "hotness" dissipated after a hot streak, since we already know that it tends to continue, at least for 7 days.

So far, I only crunched the data for the hot streaks. Since I had to alow for 3 weeks after a streak, I only l;ooked at streaks that occurred before Sept. 1. Also, since I lopped off the entire month of September in the streak period, I lopped off April as well (I'm not sure why I did that). Anyway, this time the streak periods were only from May 1 to Aug 31 (4 months, 8 2-week periods). Here are the results all in OPS:

Hot periods: 1.216 (24697 AB's, 378 players)
1 day after: .914 (1758, 341 players)
First week after: .910 (10676, 368 players)
Second week after: .880 (8547, 331 players)
Third week after: .893 (10181, 366 players)
Previous year: .865 (weighted by the first week after AB's)
Previous year: .860 (weighted by the second week after AB's)
Previous year: .866 (weighted by the third week after AB's)
Subsequent year: .875 (weighted by the first week after AB's)
Subsequent year: .874 (weighted by the second week after AB's)
Subsequent year: .880 (weighted by the third week after AB's)

Looks like the hot players are pretty much cooled off by the second week, although we have sample error of around + or - 15 OPS points (2 SD's) in each weekly period.

I'll have the cold players tomorrow (today)...
   16. MGL Posted: November 16, 2002 at 01:04 AM (#607262)
One more thing:

If we know for a fact that some players have legitimate streaks, while we can never know FOR SURE which ones are the "real McCoys" (which was the point I think I was trying to make in a previous post), obviosuly the ones who show the highest post-streak OPS's will be more likely to be the real deal! Again Chris, I don't know what you meant when you were giving examples of dice roll sequences...
   17. Dan Turkenkopf Posted: November 16, 2002 at 01:04 AM (#607265)
unless of course some players perform below normal after a hot streak or better than normal after a cold streak, for some strange reason (other than more fluctuation of course), in which case the continued regular "streakers" and reverse "streakers" could balance each other out to produce a normal OPS. I hope you are not talking about that possiblity!

That is exactly the possibility I am talking about. However improbable it may be (I seriously doubt it happens), you have to take this, and other similarly unlikely situations, into account if you want to produce a convincing argument. This is what my earlier ?doomed to failure? comment was related to. Even if the study had produced a day after hot streak OPS equal to the expected OPS, the possibility of predictable streaks would still exist

Can't we just look at the distribution of the actual 1-day OPS to figure the answer to this question? I would assume it would be a fairly normal distribution with a pretty small SD, indicating such an effect doesn't happen. But any strange distributions could be indicative of something more happening here.
   18. MGL Posted: November 16, 2002 at 01:04 AM (#607268)
I agree that we could have looked at the perecentage of streaks in any given number of AB's and made some good inferences as to whether they were all normal fluctuations or partly a function of short-term better or worse ability. If the latter, we should see more streaks than expected from a normal distribution of hitting results.

I do think that the average person can "relate to" the methodology of this study, as it tries to model real life "streak" situations. If we simply looked at the percentage of streaks and found that we expected 532 and there were actually 589, the average person is going to say, "so what?" Anyway, that's not important.

If the post-streak OPS falls exactly at our expected OPS, to say that this doesn't imply that there are no legitimate streaks is perposteous for 2 reasons. First, when we conclude from empirical studies (where there is sample error) that "something or other" does not exist, we usually mean that it is likely that it doesn't exist or that there is no significant evidence that it exists, etc. We rarely mean literally that "there is no way on earth that it exists!" We are usually limited by the sample error in empirical studies based upon sample size. That is why most empirical studies have conclusion and a "level of confidence" associated with that conslusion, again based on the sample size of the population whose results we are looking at.

In this study, we have sample sizes of 3000 and 10,000, etc. Even if the post-streak OPS falls exactly at our expected OPS, as you know, that sample OPS is subject to sample error. So I'm not going to ever say "There are no legitimate streaks based upon the reuslts of this study." I might say, "I have a certain comfiodence (95%,99%) that there are no legitinate streaks" or "There is little evidence to indicate the existence of legitmate streaks" or something like that. So for you to say that my conclusion in this study, for example, that "there doesn't seem to be any evidence of the existence of legitmate cold streaks" is wrong is preposterous!

Sure there could be players who have reverse streaks that counterbalance the existence of any legitmate streaks, which causes the post-streak OPS to regress to normal, but what is the chance that this "theory" is true and to what extent could there possibly be many players who have reverse streaks (that is, after a hot streak their ability markedly changes for the worse for some short period of time after the streak)? Will the possibility of that unlikely scenario change my conclusion? No! You say that is is not likely, but it is possible! Big deal! How does that change my conclusion? I didn't say (well, maybe I did; if I did, I recant) that if the post-streak OPS retuns to normal that "there is no chance that any legitimate streaks occurred!" If your reverse streak scenario is unlikely (which it is and you readily admit so) than if we see a post-streak OPS return to normal then we can still say with confidence and impunity that we are very (not 100%) certain that no legitmate streaks occurred or that almost no legitmate streaks occurred (which has the same practical signifcance - woould anyone care if I told them that out of every 100 2-week hot streaks of greater than 1.200 OPS, that 1 was legitimate? Of course not! If the post-streak OPS returns to normal, it is almost impossible for more than an occasional streak to continue (be legitimate) otherwise the post-steak OPS would be higher, unless you think that there are maybe 20 legitimate streaks in 100 and an equal number (20) of reverse streaks, which is preposterous (I don't even know where you got this idea of a "reverse" streak anyway).

So no, the study was not "doomed from the start" by any stretch of the imagination. Nor do the post-streak OPS's tell you "nothing" or almost nothing as you seem to claim. In fact, they tell you everything, which was the purpose of the study in the first place (although admittedly, there were perhaps other means to arrive at the same conclusions). If the OPS returns to normal, as it did in the cold group, again, we can say with impunity that it is likely that a cold steak has no predictive value, SINCE IT DIDN'T IN THE STUDY OVER A LARGE SAMPLE OF PLAYERS, AB'S, AND STREAKS! If it doesn't, as in the hot group, then we can say, be definition, that there is some (not a whole lot) of predictive value to a hot streak! These conclusions I have just stated simply mimic the empirical results of the study. I don't know what is going on (I can speculate) during and after those streaks, but I do know, from the exact empirical results of the study, that if a player is cold for 2 weeks (and we don't know anything else about him or his situation), I expect him to perform almost exactly at his true level of performance the very next day and for the rest of the season. That's exactly what the study found out! If a player has been hot for 2 weeks (and we don't anything else about him or his situation), I expect him to peform better than his normal level of performance for a while (for some unknown reason) and after about 2 weeks or so, I expect him to return to normal. Again, that is exactly what the study found out, which I think is a gigantic revelation, BTW!

At this point I think you (Chris) are being contrarian just for the sake of being contrarian , as so many other people do, so I'm going to drop this line of argument...
   19. MGL Posted: November 16, 2002 at 01:04 AM (#607269)
I decided not to do the 1 week, 2 week, 3 week thing for the cold group since we didn't see any sustained coldness in the 1 or 7-day post-streak periods, so we wouldn't expect to see anyhting in the 1, 2, and 3 week periods other than random fluctuation around the cold players' expected OPS (i.e., there is nothing to peter off from).

One very important thing I discovered today that no one pointed out! I was lazy when figuring the standard deviation for OPS in order to compute sample error in the various samples in the study. I just assumed that it would be a little higher than the SD for BA, which we can compute using the standard binomial formula SD=sqr(p*q)/sqr(AB), where p=mean BA and q=1-mean BA. For example, the SD for BA (for a mean BA of .250) for 100 AB's is the square root of ..250 times .750, which is .433, divided by the square root of 100, which is 10, of course, which is .433 divided by 10, or .0433 or 43.3 BA points!

Because most populations and samples of players are comprised of playes with different BA's, the SD of a sample of different players is a little larger (the distribution of BA for a bunch of players with different true BA's is wider than that of a bunch of players with the same BA), around 50 points.

I assumed that for OSP, the SD was a little larger than that of BA. Boy was I wrong! Try around 5 times larger! The SD for OPS in 100 AB's is about 250 OPS points!

So those 10,000 AB samples in my study have a standard deviation of around 24 points and not 5! The SD for those 3000 AB samples are almost 50 points!

That puts the results in a whole new light. Maybe we should look at BA only...
   20. MGL Posted: November 17, 2002 at 01:04 AM (#607271)
Here are some more results using BA only. I did the 1-day, 1st week, 2nd week, 3rd week run for the cold streaks. Remember that I only looked at May 1st thru August 31 to identify the streaks (eight 2-week periods). I am including the same results for the hot groups in this post.

Hot groups

During: .385 +-.06 (AB=24,697 N=378) N=# players

1-day after: .303 +-.024 (AB=1,758 N=341)
year x-1: .288
year x+1: .287

1st week after: .299 +-.010 (AB=9,076 N=368)
year x-1: .288
year x+1: .286

2nd week after: .293 +-.010 (AB=8,547 N=331)
year x-1: .286
year x+1: .285

3rd week after: .293 +-.010 (AB=10,181 N=366)
year x-1: .286
year x+1: .287

Cold groups

During: .172 +-.06 (AB=23,296 N=465)

1-day after: .263 +-.022 (AB=2,094 N=362)
year x-1: .266
year x+1: .262

1st week after: .263 +-.009 (AB=13,125 N=445)
year x-1: .267
year x+1: .261

2nd week after: .269 +-.013 (AB=5,805 N=314)
year x-1: .264
year x+1: .258

3rd week after: .263 +-.011 (AB=8,689 N=423)
year x-1: .266
year x+1: .260

The +- designations are "plus or minus" 2 standard deviations in BA based on the sample number of AB's).

Basically, same results as before (with OPS), but we are more confident in the conclusions as the sample error with BA is smaller.

1 day after a cold streak 103 players, or 22%, in the cold group did not play (benched, injured, day off, etc.). There is no evidence that the remaining 362 players were any better than the ones that didn't play, however, as their previous and subsequent year's BA was the same as that of the entire cold group during the cold streaks.

The hot groups, OTOH, only lost 37 players, or 10%, 1 day after a hot streak. IT APPEARS AS IF MANAGERS BENCH A SUBSTANTIAL PORTION OF COLD PLAYERS AFTER A COLD STREAK.

The players who are benched have around the same true BA as the players who are not benched. It is not clear from the study why some players are benched and others are not. Perhaps the benched players had worse streaks than the non-benched players.

It is also possible that the benched players, as a group, may have continued their cold performance for the 1-day in which they were benched, even though the non-benched players did not (continue their cold performance). IOW, perhaps the worst "cold streakers" would have continued their coldness had they not been benched whereas the better "cold streakers", who were allowed to play, did not continue their coldness (i.e., if you are "really" cold for 2-weeks, you will sustain that coldness, but if you were cold, but not THAT cold, you will not sustain that coldness). This is probably not true, however.

In the 1st week after the cold streaks, 445 players from the original cold group "showed up". This means that most of the players who were cold for 2-weeks returned to the linuep sometime in the first week after their streaks, including most of the players who were benched for the 1 day following the streak. Since most of the players were back some time during that first week, and they posted a .263 BA, close to their expected BA, it is unlikely that the benched players (for the 1 day) would have continued their coldness for exactly 1 day, had they been allowed to play, and then returned to normal (which they presumably did) during the first week. It is possible, but not likely.

In any case, I still think it is relatively safe to say that a cold streak has little or no predictive value, and therefore, a manager is unjustified in benching a cold player or moving him down in the lineup (or an opposing manager pitching to him differently), assuming that the manager does not know that a player is substantially injured which might have contributed to his cold streak. If there are some players in the cold group who ARE substantially injured (and will remain injured for some period of time), they must necessarily comprise a very small percentage of the cold group, otherwise we would see a lower OPS than we do in the post-streak data. It is unlikely that mnay of the 102 players benched 1 day after their cold streaks wer injured players. If they were (and again, remained injured), the would have also brought down the OPS of the 1-week after group, since most of them returned to that group.

For some reason, only 314 players showed up in the 2-week after group (131 players were lost from the previous week and most of them returned the subsequent week). This seems odd. I will have to comb through my data to see if this is right.
   21. MGL Posted: November 17, 2002 at 01:04 AM (#607273)
There was a slight error in my program that caused there to be only 314 players out of an original 465 in the cold group remaining in week 3. It should be 428, which makes sense. So if you look at all the numbers of players in the various groups, what is going on is that all players pretty much show up in the same numbers in each group, except for 1 day after the cold streaks, which again, (strongly) implies that managers are benching, for one day only, some players who have been cold.

As we saw, the cold players who are benched are not any better or worse (in terms of their true BA) than the cold players who are not benched. Interestingly, the players who were benched did not even have a worse streak than the ones who were not benched. The average BA during the cold streaks for the benched players was .173. For the non-benched players it was .171 (essentially the same). So it appears as if managers do not bench according to the severity of the cold streak, nor do they bench otherwise good or bad players as compared to the players who are not benched. And to reiterate my last post (this is an important point), if managers were indeed benching cold players for a legitmate reason, why do most of them return within a week, and why, when they return, do they revert to a normal (for them, of course) BA? Although we have no way of proving it, I think it would be a stretch to say that a manager only benches those cold players (pretty much) whom he knows or suspects would continue to be cold for less than a week, but who would return to normal (which they do) within a week or so. It is much more likely that managers bench those cold players who have other "indicia" of coldness other than their BA during the streak. IOW, maybe those players who have had a very low BA (hence they are put in the cold group in my study), but who have been hitting the ball "well" nonetheless (line drives caught, balls hot to the warning track, etc.), are NOT benched, while the ones who have been cold BA-wise AND have "looked" bad (have NOT been hitting the ball well), ARE benched.

Of course, we know that the cold players who were NOT benched returned to normal levels immediately (one day) after the streak, so in some sense, the managers were right (in not benching them). OTOH, we don't know, and will never know, how the benched cold players would have performed immediately (one day) after the streak (since they did not play). Maybe the managers were right in benching them. Maybe they would have performed far below their expected level in the one day or even a few days following the streak. Again, I think this is unlikely as most of those players returned shortly (within a week) to the lineup, and when they did they performed at normal levels. If a manager benched players who were legitimately having "problems" which changed their ability, surely enough of those players would continue to have problems when they came back, such that the overall BA of the one-week after group (which includes most of those benched players) would NOT be quite normal.

My next step (and probably last) is to copmpletely change the criteria for being hot or cold, as I mentioned way back when. In order to duplicate real life more accurately, I am going to put a player in a hot or cold group any time he has 1 or 2 (I haven't decided yet) consecutive weeks of very good or very poor hitting (probably as measured by OPS, since that really is the best indication of a player's ability).

Even if we are measuring the results in BA, rather than OPS, in order to reduce sample error, I still think it is best to put players in hot or cold groups based on OPS and not BA, since we are trying to put them in groups based upon their short-term offensive performance - and offensive performance is best measured by OPS than by BA. I doubt it matters much for this kind of study anyway. As I also said in an earlier post, what measure we use (BA, OPS above league average, BA, OPS above a player's own average, etc.) to determine the hot and cold groups, probably does not matter much. In real life, there is no particilar criteria or measure that a manager or fan uses or considers when characterizing a player as having been either hot or cold.

Anyway, I will not overlap streaks. After a player's first streak is identified, I will start the very next day looking for a new streak. I don't want to overlap streaks as this will produce non-independent data sets and all kinds of weighting problems. The nice thing about changing the methodolgy in this regard is that I will not consider a 1 or 2-week period to be a streak unless the good or bad hitting continues right up to the last day int he 1 or 2-week period. This will sort of eliminate the streaks under the old methodology which may have had the good or bad hitting "bunched up" at the beginning of the 2-week period.

For example, in the study, using the old methodology, if a player had a .300 OPS for the first week of a 2-week period and a somewhat normal OPS for the second week of the two-week period, he still may get inlcuded in the cold group (if the total OPS for the 2 week period was below a certain amount), even though technically, that player's coldness "ended" midway through the streak. As I mentuoned in the article, the exisitence of lots of these "hybrid" streaks, could arguably affect the results. Thge new methodology should eliminate this problem. In fact, that is how I should have done the study in the first place. Anyway, now that I have ironed out and presented most of the other details (concerning the results) to my satisfaction, once I incorporate the new methodology above, I think that, despite what Chris R. may think, the study overall will be very enlightening and at least somewhat novel...
   22. MGL Posted: November 18, 2002 at 01:04 AM (#607275)
The short answer to the question about other sports is "I don't know!" I suppose it is plausible. The only other "study" I have heard of was in basketball, where someone did look at hot and cold shooting percentage to see if it had any predictive value. If I remember correctly (I think it is mentioned in the recent book "Curveball"), there was none.

As far as "What does it all mean in English" question it is a good one . I think it means that, as a general rule, there is no such thing as a cold streak, in terms of predictive value. IOW, if we notice that a player has been cold lately, we can ignore that fact in terms of predicting or estimating what he is likely to do in any future games.
It means that if you are a manager and one of your regular players has been cold, even ice cold, that unless he could use a rest (in which case this is as good a time as any to rest him), by benching him and replacing him with a worse player, you are simply putting a less-than-optimal team on the field, thereby decreasing your chances of winning that particular game.

The "hot" players are a little trickier. The results seem to indicate that if two players are of equal hitting value, you would "choose" the one that was "hotter" lately, at least for a week or two.

As far as why cold streaks seem to have little or no predictive value and hot streaks do, that is the $64,000 question...
   23. Marc Stone Posted: November 18, 2002 at 01:04 AM (#607278)
This is a very thoughtful presentation and discussion that I have only now had a chance to read. So I feel a little bad to interject so late in the discussion that there is a simple, valid and statistically accurate way to prove or disprove streakiness: compare the result of a players at-bat with the result of what he does five at-bats later on a simple contingency table and do a chi-square test. (I say look five at-bats later to avoid the effect of seeing the same pitcher, unlikely to happen five ABs apart.) If only hot streaks exist then there should be an excess of hits following earlier ABs with hits but not outs following a previous out.
   24. MGL Posted: November 18, 2002 at 01:05 AM (#607295)

1) Yes the number of AB's in the "2-week after the cold streak" group is going to be appropriately larger, thereby narrowing the various confidence intervals.

2) The reason for the 5.8 AB's per player in the 1-day group is that some players are counted more than once in that group due to the fact that they had more than one hot or cold streak. The # of AB's is the total number of AB's and the # of players is the total number of DISTINCT players.

3) As far as players who have multiple streaks, it might be something to look at. Of course, the very good and very bad players will always be more likely to have any streak or to have multiple streaks, as defined by an OPS above or below some ABSOLUTE number; this would have to be accounted for.

Marc, yes the "contingency table" and chi-squared test you suggest would tell us something about the "streakiness" of players in general as well as whether any prior short-term performance is a good predictor of future performance (independent of the pitcher, etc.). It is a good sugestion, and sometimes I wish that I had more statistical acumen.

I am FAR from a statistics expert, from a statistician's perspective. From a sabermetrician's persepctive, I am pretty well versed in statistical theory and its application (sabermetricians tend NOT to be statisticians).

Second, I like to contruct and present studies that the average sabermetrician or even the average "fan" can understand and "relate to". As Tango likes to say, hardly anyone reads, let alone understands, the (sometimes good) baseball studies done by hard-core statisticians that include hard-core statistical analyses. I generally hate them.

If I were even capable of doing a "streak" study as you suggest, it would probably have at least as much (maybe much more - I don't know) merit as the one I did. However, almost no one who would be interested in reading a study about baseball hot and cold streaks would have any idea what a "contingency table" or a "chi-square test" were, let alone understand the implications, unless the results themselves could be presented in a manner that would make sense to them, and the statistical tests were merely incidental (at least as
far as the reader was concerned).

As you can easily tell from the study and my follow-up comments, I like to construct my studies and present the results in a manner that not only the average person can understand, but in a manner that "mimics" real life, and can be applied to real-life decision making.

In this case, it's like, "OK here are some players who have been red-hot or ice-cold for around 2-weeks. What should a manager do or what should a fan or announcer think? Do we expect this guy to hit above (or below) his normal level THIS game, or over the next few games? Should we pitch differently to him? Should we bench this player or move him down in the lineup because he has been ice-cold over the last 2 weeks?"

These are the kinds of real-life situations that the study is designed to illucidate. Those same situations guide me as to how to construct my study.

That is not to say that I am not meticulous and rigorous in designing, implementing, and interpreting the results of my studies. I think that I am, especially given that I am NOT a statistician.

In fact, I think that the methodology of this particular study is good in that it does closely mimic real-life situations that managers and fans would confront vis-a-vis making decisions and assumptions about the implications of hot and cold streaks. At the same time, I think that the methodology and results also comport very nicely (perhaps not as nicely as the type of analysis that you have proposed) with generally accepted statistical (accounting?) principles.

Thanks for the feedback...
   25. Marc Stone Posted: November 19, 2002 at 01:05 AM (#607311)
Again, a very thoughtful response and I agree with everything you said. I think I used too much statistical jargon because I thought I could be more precise and you would still understand me (which you did). Unfortunately it made what I was describing seem more esoteric than it is. If you define streaks in terms of batting average (it's slightly more complex if you use other definitions) all you do is make a 2x2 table with "hit" and "out" on each side. One side represents the results of the first AB and the second represents the result five ABs later. If streakiness exists, the table will show a player is more likely to have a hit (or out) if he had a hit (or out) five ABs earlier. The table would be clear to a lay person and be consistent with common ideas of streakiness. The chi-square test is only needed to confirm that the observed differences are larger than could be expected by chance.

All you need is a simple dataset of player id and the hit/out result for all ABs listed chronologically. If you can put your data in that form, I can generate the table in five minutes.
   26. MGL Posted: November 19, 2002 at 01:05 AM (#607313)
I could easily provide the data. How would you adjust or account for the fact that each player had his own true BA such that the result in AB #1 is NOT independent of the result in AB #6, even if there were no such sthing as streakiness?
   27. MGL Posted: November 19, 2002 at 01:05 AM (#607315)
I realize this thread is dying, but I hopoe there are still a few stragglers...

I redid the study using a new methodology as promised. This time I did not break a player's full-season stats into 12 or 8 2-week periods and then look at each of those 2-week periods to determine whether or not they were hot or cold.

For each player in each season, I kept a running 7-day total of their OPS. Any time a player's last 7 days' OPS was above or below a certain number (around 1.100 and .500), the stats from that 7-day period were put into the hot or cold "bin". If a 7-day period was put into a hot or cold bin, the next day automatically started at "day 1" in terms of the runnning 7-day OPS's. That way, no 7-day streaks overlapped.

In any case, this time there were about 100 hot and 100 cold streaks per season, for a total of around 800 hot and 800 cold streaks for the 4 years in the study. Here are the results in BA:


During: .403 (21467 PA)
1-day after: .293 (1922)
prev. year: .278
Subs. year: .273
during 1st week after: .276 (7605)
prev. year: .279
subs. year: .275
During 2nd week after: .279
prev. year: .279
subs. year: .277
During 3rd week after: .287
prev. year: .279
subs. year: .278


During: .148 (23116 PA)
1-day after: .271 (1955)
prev. year: .264
Subs. year: .264
during 1st week after: .274 (7387)
prev. year: .265
subs. year: .270
During 2nd week after: .263
prev. year: .265
subs. year: .272
During 3rd week after: .261
prev. year: .268
subs. year: .272

There do not seem to be any statistically significant differences between the actual and expected BA in any of the post-streak periods.
I think it is fair to say that we have no edivence that a hot or cold streak has any predictive value.
   28. tangotiger Posted: November 20, 2002 at 01:05 AM (#607320)
...but I am not about to argue with someone who thinks that statisticians have no place in sabermetrics.

I think that statisticians, for the most part, are not good sabermetric writers. John Jarvis and Jim Albert, to pick on two that I've read, are intelligent, and generate useful information.

I am very math literate, and I've taken a few stats courses, and I have much desire to learn more about statistical concepts, sabermetrics, and baseball, and the way I read their works, it requires me to read their stuff 4 or 5 times just to let it sink in.

Imagine the typical fan who has some math background, maybe little or no statistical concepts, and is somewhat passionate about sabermetrics.

Statisticians have their place in sabermetrics. It's just that they don't get their message across very well.
   29. Marc Stone Posted: November 20, 2002 at 01:05 AM (#607321)
Independence means that there is the same expectation for a hit regardless of what happened before. If no streakiness exists a .300 hitter will have a hit 30% in of AB#6 whether or not AB#1 is a hit or an out. If streakiness exists, AB#6 will have >30% hits after a hit in AB#1 and <30% after an out.
   30. MGL Posted: November 20, 2002 at 01:05 AM (#607324)
FJM, yes, my statement "There is no evidence..." is strictly based on the fact that none of the post-streak groups shows any statistically significant (at the 95% condifence level) differences between actual and expected BA. That's all I can conclude!

Marc, I don't think I am explaining my concern about independence real well, or I am misunderstanding your methodology. I understand the general concept (if all x AB's had 30% hits, we would expect all y [5 days later) AB's to have 30% hits, etc.). It seems to me that this kind of "test: would work just fine if all of the data points belonged to one "entity" that had the same mean BA. But here we have, say, thousands of data points. Some belong to players who have true BA's of .220, some .250, some .320, etc. If the statistical test found some "streakiness", how would it "know" that that "streakiness" was not players who were high BA players in AB 1 and AB 6 or low BA players in AB 1 and AB 6? I'm not explaining my concern real well, but if you can tell me where my thinking is incorrect or give me an example of a small set of data that the test might identify (whether significant or not with a "chi-squared" test) as "streaky"...+
   31. Marc Stone Posted: November 20, 2002 at 01:05 AM (#607325)
If you define streakiness as periods of performance above or below a player's average and you assume that all players are either equally prone to streakiness or that propensity to streakiness is distributed at random you can consider all players combined together at once. The independence of players' individual levels of ability is preserved by pairing the observations: AB#1 is paired to AB#6 for the same player.

You must be Registered and Logged In to post comments.



<< Back to main

BBTF Partner

Dynasty League Baseball

Support BBTF


Thanks to
Rough Carrigan
for his generous support.


You must be logged in to view your Bookmarks.


Page rendered in 0.6373 seconds
42 querie(s) executed