Wednesday, December 13, 2017

Batting Streaks: Statistical Illusions?

The Book asserts, after some statistical flourish, that batter streaks posses no predictive information. But this really presupposes that batters go through streaks at all.

We can test if a batter is going through a streak using the Wald-Wolfowitz runs test, which is a generic statistical test for determining if there is an anomalous streak in flipping a coin repeatedly (or any other repeated Bernoulli process).

The code used for doing the hypothesis testing with the data is available on github.

Hitting Streaks in General

Using the Retrosheet data for 2014–2016 (and 2006–2016), we can determine if a batter has hit the ball and successfully arrived on a base (or a home run) or is out. In other words, if a plate appearance was a "success" or "failure".

A streak remains ambiguous to me, but seems to be some quantity of nonrandomness in the result of trials. That is to say, a player's next plate appearance will be nearly identical to his previous plate appearance whilst "in a streak". This is completely different than the definition given in the rules of major league baseball!!! But it captures what people mean in the vernacular when saying, "Wow, so-and-so is on a hot streak."

Claim 1. Given a batter, his plate appearances over his career appear to be independent of each other.

Corollary. Streaks, hot or cold, do not exist.

Assumptions, Restrictions. If we ignore errors, pickoffs, caught stealing, wild pitches, balks, other advances,..., basically everything except singles, doubles, triples, home runs, and strike outs (and other forms of outs), then we can apply the Wald-Wolfowitz test.

To avoid false positives and "small samples", we restrict focus to batters who have at least 50 plate appearances in the 2016 season.

Wald-Wolfowitz Testing

We can try getting data for batters from 2014–2016, removing games at Coors field and batters on the Colorado team, and filtering out batters with fewer than 50 plate appearances. We then consider a plate appearance a "failure" if the batter ends up out, and a "success" otherwise.

There are two ways to handle the data now: we can consider each player's career as one long string of "success"/"failure" (or 1 and 0, respectively), then see if the plate appearances are independent of each other using the Wald-Wolfowitz test. Or we can consider each game as a string of "success"/"failure" and see if in a given game each plate appearance of a given batter is independent of each other. Spoiler alert: either way, we get the same result (each plate appearance is independent of each other).

We need to factor in the fact that we're doing Multi-hypothesis testing, so we're going to use either the Holm–Bonferroni method or Šidák correction. Again, spoiler alert, the results are unchanged by either method.

Player-by-Player testing of independence

We will consider the career for a given batter, and see if the plate apperances are independent of each other. Our hypotheses are:

  • H0: the plate appearances are independent of each other
  • Ha: the plate appearances are not independent of each other

Proposition 1. With α = 0.05, we fail to reject the null hypothesis for batters with at least 50 plate appearances in the games between 2014–2016 (inclusively) in 837 batters.

Proposition 2. With α = 0.05, we fail to reject the null hypothesis for batters with at least 50 plate appearances in the games between 2006–2016 (inclusively) in 1601 batters.

Game-by-Game testing of independence

We will consider the career for a given batter in a given game, and see if the plate appearances are independent of each other. Since we are working with small samples (each game has around 4 or 6 plate appearances), we can use exact p-values. Our hypotheses are:

  • H0: the plate appearances are independent of each other
  • Ha: the plate appearances are not independent of each other

Proposition 3. With α = 0.05, we fail to reject the null hypothesis for batters with at least 50 plate appearances in the games between 2014–2016 (inclusively) in 131 275 samples.

Proposition 4. With α = 0.05, we fail to reject the null hypothesis for batters with at least 50 plate appearances in the games between 2006–2016 (inclusively) in 484 746 samples.

Caution: In roughly a quarter of batter-game combinations, the batters consistently strike out. (More precisely, for the 2014–2016 this occurs 25.56693% of the time, and for 2006–2016 it occurs 24.8470% of the time.) We should expect this to occur for batters, which would be something around 11% of the data; the remaining 13% may very well be due to random fluctuation, which seems plausible by order of magnitude estimates for batters with 0.300 BA.

Batter Streaks in The Book

The Book considers batters over the course of 5 games, kind of like the n-gram of the batter's games for the season, except the first 5 (and last 5) games. Then to each of these games, we compute the wOBA for the batter. A hot streak is the top 5% of these 5-game-appearance wOBA scores, and a cold streak is the bottom 5% wOBA scores.

We reproduce table 14 from The Book, which examines data from 2000–2003 (excluding games at Coors field, and excluding Rockies players):

Number of distinct Players with one or more 5-game hot streaks543
Total number of hot streaks6408
Total PA during the streaks141259
Average wOBA during the streaks0.587

Observe, during a hot streak there's an average of 141259/6408 ≈ 22.044164 PA during a streak (or 4.4 PA per game). On average, the weighted linear combination of plays for a batter sums to (0.587 × 141259/6408) ≈ 12.939924. Using the formula for wOBA given in The Book, this translates to somewhere between 6.6358585 home runs and 14.377693 singles. Using the league average for the American League, we find the average BA over the time period is:

YearAL average BA1B/H2B/H3B/HHR/H
2000.2760.65934070.197155790.0193923730.12411118
2001.2670.65748820.20147750.0208193420.12021491
2002.2640.653478860.20532060.0211459750.12005457
2003.2670.65953180.19933110.0214046820.11973244
Average0.26850.657495440.200767060.0206770060.12106053

The probability of getting at least 6 hits in 22 PA with a BA of 0.2685 is, according to the binomial distribution, P(H > 6) ≈ 61.130404%. Given the average over these years, the expected wOBA when there are at least 6 hits is 0.2214 with a standard deviation of 0.1002. Assuming everyone were closer to the average, we would expect naively this to be a mildly rare thing to see a streak.

But looking at the League Average wOBA scores, which fluctuate around 0.33, we see a hot streak is then just a 2-sigma event, something which should happen every 3 weeks or so. From this perspective, of rough probabilistic arguments, it is unsurprising The Book concludes there is little predictive information in knowing if a player is in the middle of a hot streak (or cold streak, for that matter).

No comments:

Post a Comment