Hot and Cold Streaks
Fact or fiction?
We are all aware of how much stock baseball announcers, commentators, fans, managers, coaches, and players put in the significance of a player (or team) being “hot” or “cold.” You cannot watch or listen to a game without the commentator at some point mentioning that “so-and-so is red hot or ice cold,” presumably referring to the fact that said player has recently (e.g., the last game, week or month) had a very good or very bad spate of performance, measured by whatever statistic is convenient for the commentator or the fan who is listening, or in some cases, no particular statistic at all.
While there is no doubt that players and teams go through short (or less frequently, long) periods of time where their performance is well above or below “average,” the question that this article attempts to answer is “Is there any predictive value to hot and cold streaks, as measured by BA, OBP, SA, and OPS?” Another way to couch this question is “Are hot and cold streaks solely a function of normal statistical fluctuation or are they a function of statistical fluctuation and a temporary change in a player?s ability to perform offensively, which we expect to continue for some, probably short, period of time?”
It is unlikely that these casual fans and baseball pundits are infatuated with hot and cold streaks simply for the sake of pointing out what has happened in the past. To mention that “so-and-so has hit over .400 in his last 10 games,” with no implication that this has anything to do with how we expect him to hit in the near future would be banal. The fact is that when most peoples talk about hot and cold streaks they hold the belief that these steaks have significant predictive value ? i.e., that a player who has been hot is expected to hit better than normal in the future, and vice versa for a player who has been cold.
This belief is evidenced by several things that regularly occur in baseball. Managers will often bench a regular player who has been cold and/or play a reserve player who has recently been hot. Managers will sometimes rearrange their batting order depending upon who is hot and who is cold. Opposing managers will often pitch around or intentionally walk a hot player and pitch “right at” a cold player. Managers will sometimes play “small ball” when their team has been cold, and eschew such an approach when their team has been hot. How often have we heard an announcer say “so-and-so (team) is bunting in the first inning because they have had trouble scoring runs lately?” The clear implication is that because the team has been cold (hitting-wise) in the past few games or weeks, they are more likely to score fewer runs than normal in the not too distant future, including in the current game. Finally, fans and handicappers regularly assess the current “value” of a team depending upon whether their players, as a whole, have been hot or cold in the recent past.
The study described in this article looks at what happens exactly one day immediately following a 2-week hot or cold streak and, independently, what happens over a 7 day period immediately following a 2-week hot or cold streak. The fans and pundits mentioned above would say that players, as a group, will hit significantly better than their own normal level of performance for some period of time following (or “during”) a hot streak, and significantly worse following (or “during”) a cold streak. Since no one would have any idea as to how long this effect should last, either our 1-day or our 7-day snapshot (or both) should capture such an effect, if in fact it does exist. We would logically expect this effect to “peter off” as time goes by.
In response to the following potential criticism - “You are looking at a 1 or 7-day period after the streak has been identified, therefore the streak may be over” - remember that in real life, a manager, commentator, or fan, is always looking at a point in time that is subsequent to the streak (the streak is always in the past). While they have no idea whether the streak is over or not, their assumption is that there is still something left to the streak, at least for some finite and probably short period of time (until the batter returns to normalcy, no doubt). The study described herein tries to duplicate this scenario by looking at games (1 or 7 days after a streak) wherein a player is essentially “in the midst” of a streak ? i.e., he has just had 2 weeks of hot or cold hitting.
Again, most managers, players, fans, and commentators would expect these players to hit significantly better or worse than their “true” average during this game or over the next 7 games. Let?s see if this is true, and if it is, to what extent.
(Since we would expect that the results of the 1-game study would be more telling than that of the 7-game study, keep in mind that the primary reason for including the 7-game data is to get a larger sample size in order to reduce the standard error of our results.)
The study encompasses all major league play from 1998 to 2001 (4 years). I arbitrarily divided all of the data into 2-week periods. The first and last 2 weeks (more or less) of each month constitute a 2-week period. Since there are around 6 months in a season (April through September), there are twelve 2-week periods in each season.
Next I culled all those 2-week “mini-seasons” in which a player was either hot or cold during that 2-week period. (Remember I am culling a player?s stats for the 2-weeks ? not just his name.) Any given player could have more than one 2-week period selected. In fact, as you would expect, many players, especially the very good and very bad ones, had several 2-week periods selected. Also, many players? stats are included in the hot as well as the cold group (they were hot in one period and cold in another).
What were the criteria for being hot or cold over any 2-week period? First, OPS (on base percentage plus slugging average) was the stat of choice for determining whether a player was hot or cold. OPS was chosen for no particular reason other than that it is easily computed, it is recognizable, and it fairly accurately represents a player?s overall offensive production. Batting average, on base percentage, slugging average, or any one of a number of stats could have been used as well. The overall results should be around the same regardless of the stat used.
I did include the following stats in the results portion of the study, although, as mentioned above, the only stat I used in determining the hot and cold groups at the outset, was OPS:
BA, OBP, SA, and OPS.
All four of these stats were computed in the traditional way, except OBP. For purposes of this study, OBP is (hits plus non-intentional walks plus HBP) divided by (AB?s plus SF?s plus non-intentional walks). Traditional OBP includes IBB?s and sometimes includes SF?s. I do not include IBB?s. Obviously, the results of this study will not be affected either way.
Getting back to the selection criteria for culling a hot or cold 2-week periods, I chose an OPS above which a “2-week player period” went into the hot group and an OPS below which a “2-week player period” went into the cold group. A player must have had at least 30 AB?s in order to qualify for a group. The hot and cold OPS?s were chosen such that around 10% of all “2-week player periods” (with at least 30 AB?s) in any given year went into the hot group, 10% into the cold group, and 80% into neither group. The exact minimum and maximum OPS depended upon the year, but a typical minimum 2-week OPS for admission into the hot group was 1.100 and a typical maximum 2-week OPS for the cold group was .525.
I chose to use absolute OPS maximums and minimums for the group selection criteria rather than a “relative to a player?s average OPS” criteria for the following reason: I wanted to mimic real life as much as practicable. We usually don?t identify a hot or cold streak by how a player has performed relative to his norm ? it is usually defined by whether a player has been hitting well or not, relative to a more or less average player. For example, if Rey Ordonez hits .280 (in BA) over a 10-day period, we normally don?t characterize him as hot. Similarly, if Bonds is 10 for his last 32 (.312), we probably don?t say that he is cold, even though .312 is substantially less than his 2002 BA of .370.
In other words, if a player has greater than a 1.100 OPS for 2-weeks, most people would consider him hot, no matter who he was. Similarly, if a player has a 2-week OPS of less than .525, he is usually considered cold, even if he is the original Mendoza.
In any case, because the selection criteria for both the hot and cold groups is fairly high and low, respectively, and the mean OPS in each of those groups is even higher and lower, respectively, while the hot group will tend to have a higher percentage of good players and the cold group will tend to have a higher percentage of poor players (obviously), we will still capture most of the legitimate hot and cold streaks (significant deviations from a player?s normal OPS). As well, our groups should not be significantly “polluted” by extreme players, like Bonds or Rey Ordonez, performing at around their normal level.
Once the hot and cold 2-week periods were selected or identified, I looked at the hitting results of each player in: one game (actually one day) immediately following the hot or cold streak, and; in the 7 games (actually 7 days) immediately following the streaks.
Thus, 6 groups were formed:
- The hot group, consisting of the combined 2-week stats of all players who were hot (see the above criteria) for those 2-weeks.
- Same as above for the cold players.
- Combined stats 1 day immediately following the hot periods.
- Combined stats during the 7 days immediately following the hot periods.
- Combined stats 1 day immediately following the cold periods.
- Combined stats during the 7 days immediately following the cold periods.
In a recent post on Clutch Hits on Primer, a helpful, if somewhat overzealous, critic suggested that I adjust for the park and the opposing pitcher in all of the above stats. (Actually, he said, and I paraphrase, “Without accounting for park and pitcher, your study is completely worthless!”) His harsh criticism not withstanding, I did indeed adjust all of the 1-day and 7-day stats for park and opposing pitcher. This was done on a PA-by-PA basis using component 3-year regressed park factors and 1-year component “opposing pitcher factors”.
In order to draw any conclusions from this study, we need to compare the 1 and 7-day stats (BA, OBP, SA, and OPS) to those same players? combined “true” stats, which I also call their “expected” stats. Originally, I used a player?s full-season stats in the same year as the hot or cold streak as a proxy for his “true” stats. This did not work out well. The combined full-season stats of the players in the hot group ended up being substantially higher than their previous or subsequent year?s stats, and the combined full-season stats of the players in the cold group ended up being lower than their subsequent or previous year?s stats. (Actually, in the first draft of the study, I used a player?s full-season stats not counting the 2-week streak period - that ended up understating his “true” stats.)
This is because the players in the hot group were not only better-than-average players, but were also, as a group, slightly lucky in the entire year in which their streaks occurred. The same was true in reverse for the players in the cold group. Here?s why:
Anytime we select a player or a group of players, based upon a criteria that is above (or below) average, we are automatically, and by definition, selecting a lucky (or unlucky) group of players. Even two weeks during an entire season of hot or cold play will mathematically imply that a player has been slightly lucky or unlucky for the entire season.
In the final draft, I ended up using the average of the previous and subsequent year?s stats to represent a player?s “true” stats. As in the original draft, in order to calculate each group?s (hot or cold) expected (“true”) OPS, I weighted the average of each player?s subsequent and prior year?s OPS by the number of AB?s that each player had in their respective group (hot or cold). In other words, if Jeff Kent had 64 AB?s in the hot group (one 2-week streak) in a certain year, Tejada had 133 (two hot streaks), and Renteria had 55 (one streak), the average of each player?s previous and subsequent year?s OPS would be weighted by 64, 133, and 55, respectively, when calculating the entire hot group?s expected OPS for that year.
Suffice it to say that the expected OPS of the hot and cold groups, based upon the previous and subsequent year?s OPS for each player in each of the groups, is exactly what we expect each group to hit at any point in time, including right after (“during”) a streak, if, in fact, streaks have no predictive value.
The pundits, on the other hand, would theorize that the hot group would have an OPS (and the other stats) significantly greater than their expected OPS, following the hot streak, while the cold group would have an OPS significantly less than their normal OPS, following the cold streak. How much more or less is not clear.
They might also hypothesize that if there is a significant difference between the actual and expected stats immediately following a streak, that the difference will be greater in the 1-day stats than in the 7-day stats, as by definition, a hot or cold steak must dissipate over time.
What would be the rationale for such a belief? Perhaps hot players “see the ball better, are ?in the zone?, and are mentally and physically at the top of their game,” whereas cold players “are mentally and/or physically unfit, out of sync, pressing, and are not seeing the ball well.”
My hypothesis differs from that of the pundits. I expect the hot group to be slightly healthier than the cold group and the cold group to be slightly more injured than the hot group. Other than that, I would not expect there to be any differences between the two groups as far as post-streak hitting is concerned, other than as a result of the differences between their “true” levels of performance. In other words, I expect the hot and cold groups to return almost exactly to their true level of performance during both the 1 and the 7-day time periods.
The reason I do not expect each group to return exactly to their expected level of performance is two-fold:
One, as I said, the day or days immediately following a hot streak and the hot period itself will tend to be a player?s healthiest time of the year, whereas the cold periods and the days following them will tend to be a player?s least healthy time of the year. In other words, a cold streak slightly suggests that a player is injured and a hot streak slightly (less so) suggests that a player is healthy.
Two, the hot and cold streaks are aptly named. A hot streak will tend to be during hot (temperature-wise) periods (in outdoor stadiums of course), while a cold streak will tend to be during a cold period. The weather during the 1 and 7-day periods and their corresponding streak periods will tend to be similar, since the former occurs immediately after the latter. In fact, if I repeat the study and control for weather, I suspect that we might see any differences between the expected and actual results in the 1 and 7-day periods shrink or even disappear.
Some people might characterize the 2-week hot and cold periods as suspect because they are chosen before looking at the content of each period (i.e., before the fact). In other words, while a player in the study may have had a certain high or low OPS within a certain 2-week period, the hot and cold streak may have ended before the 2-week period was up. This is true and it is a valid criticism. A better methodology would have been to choose a starting and ending point (actually, the starting point is not important) after the fact, such that the player in the study is always “in the middle” of the streak (i.e., the streak has definitely not ended yet) when we look at the 1 and 7-day stats. This would almost exactly mimic how we perceive and identify hot and cold players and streaks in real life.
In defense of the present methodology, while my “before-the-fact” 2-week periods may contain some streaks that have already “ended” (Does 1 bad day constitute the “end” of the streak? How about 2 bad days?), they should contain plenty of streaks that have continued more or less to the end of the 2-week period, such that the results should enable us to reach a reliable conclusion one way or another.
BA OBP SLG OPS
During 2-week hot streak: .383 .458 .754 1.210
1 day after hot streak: .302 .385 .537 .922
During 7 days after hot streak: .296 .373 .528 .901
Expected (prev. and sub. years): .286 .365 .505 .870
During 2-week cold streak: .165 .222 .218 .440
1 day after cold streak: .263 .319 .418 .737
During 7 days after cold streak: .264 .328 .408 .736
Expected (prev. and sub. years): .267 .328 .410 .738
Analysis and Conclusion:
Looking at OPS only, we see a significant tendency for a hot player to remain hot 1 day after a hot streak, as compared to his normal or expected OPS (.922 to .870). We see the same (significant) tendency over the 7-day period (.901 to 870), although this tendency has dissipated.
The cold group does not exhibit the same tendency. That is, after a cold streak, players tend to hit about the same as their normal or expected OPS, both 1 day after (.737 to .738), and during the 7 days following (.736 to .738), the cold streak.
The standard error of the 1-day OPS is around 15 OPS points (around 2300 AB?s) and the standard error of the 7-day OPS is around 5 points (14,500 AB?s), so the differences between the 1 and 7-day OPS and the expected OPS (these stats are highlighted above) for the hot group, are statistically significant.
The weather and injury factors (see explanations above) may explain some of the results, although I have no idea why significant differences between expected and actual OPS are seen in the hot group but not in the cold group. (The average temperature for all outdoor parks 1 day after a hot streak was 73.6 degrees, while the average temperature 1 day after a cold streak was 72.6. In calculating the average temperatures, I did not control for venue - i.e., more hot streaks may have occurred in warmer venues).
In conclusion, there does appear to be some predictive value to a hot streak, but not to a cold streak. Until further research or explanation, based upon the results of this study, it is not clear whether typical measures employed by managers, such as benching an otherwise good player, pitching differently to batters depending upon whether they are hot or cold, or re-arranging a lineup, are justified.
Posted: November 15, 2002 at 05:00 AM | 31 comment(s)
Login to Bookmark