Wednesday, December 13, 2017

Batting Streaks: Statistical Illusions?

The Book asserts, after some statistical flourish, that batter streaks posses no predictive information. But this really presupposes that batters go through streaks at all.

We can test if a batter is going through a streak using the Wald-Wolfowitz runs test, which is a generic statistical test for determining if there is an anomalous streak in flipping a coin repeatedly (or any other repeated Bernoulli process).

The code used for doing the hypothesis testing with the data is available on github.

Hitting Streaks in General

Using the Retrosheet data for 2014–2016 (and 2006–2016), we can determine if a batter has hit the ball and successfully arrived on a base (or a home run) or is out. In other words, if a plate appearance was a "success" or "failure".

A streak remains ambiguous to me, but seems to be some quantity of nonrandomness in the result of trials. That is to say, a player's next plate appearance will be nearly identical to his previous plate appearance whilst "in a streak". This is completely different than the definition given in the rules of major league baseball!!! But it captures what people mean in the vernacular when saying, "Wow, so-and-so is on a hot streak."

Claim 1. Given a batter, his plate appearances over his career appear to be independent of each other.

Corollary. Streaks, hot or cold, do not exist.

Assumptions, Restrictions. If we ignore errors, pickoffs, caught stealing, wild pitches, balks, other advances,..., basically everything except singles, doubles, triples, home runs, and strike outs (and other forms of outs), then we can apply the Wald-Wolfowitz test.

To avoid false positives and "small samples", we restrict focus to batters who have at least 50 plate appearances in the 2016 season.

Wald-Wolfowitz Testing

We can try getting data for batters from 2014–2016, removing games at Coors field and batters on the Colorado team, and filtering out batters with fewer than 50 plate appearances. We then consider a plate appearance a "failure" if the batter ends up out, and a "success" otherwise.

There are two ways to handle the data now: we can consider each player's career as one long string of "success"/"failure" (or 1 and 0, respectively), then see if the plate appearances are independent of each other using the Wald-Wolfowitz test. Or we can consider each game as a string of "success"/"failure" and see if in a given game each plate appearance of a given batter is independent of each other. Spoiler alert: either way, we get the same result (each plate appearance is independent of each other).

We need to factor in the fact that we're doing Multi-hypothesis testing, so we're going to use either the Holm–Bonferroni method or Šidák correction. Again, spoiler alert, the results are unchanged by either method.

Player-by-Player testing of independence

We will consider the career for a given batter, and see if the plate apperances are independent of each other. Our hypotheses are:

  • H0: the plate appearances are independent of each other
  • Ha: the plate appearances are not independent of each other

Proposition 1. With α = 0.05, we fail to reject the null hypothesis for batters with at least 50 plate appearances in the games between 2014–2016 (inclusively) in 837 batters.

Proposition 2. With α = 0.05, we fail to reject the null hypothesis for batters with at least 50 plate appearances in the games between 2006–2016 (inclusively) in 1601 batters.

Game-by-Game testing of independence

We will consider the career for a given batter in a given game, and see if the plate appearances are independent of each other. Since we are working with small samples (each game has around 4 or 6 plate appearances), we can use exact p-values. Our hypotheses are:

  • H0: the plate appearances are independent of each other
  • Ha: the plate appearances are not independent of each other

Proposition 3. With α = 0.05, we fail to reject the null hypothesis for batters with at least 50 plate appearances in the games between 2014–2016 (inclusively) in 131 275 samples.

Proposition 4. With α = 0.05, we fail to reject the null hypothesis for batters with at least 50 plate appearances in the games between 2006–2016 (inclusively) in 484 746 samples.

Caution: In roughly a quarter of batter-game combinations, the batters consistently strike out. (More precisely, for the 2014–2016 this occurs 25.56693% of the time, and for 2006–2016 it occurs 24.8470% of the time.) We should expect this to occur for batters, which would be something around 11% of the data; the remaining 13% may very well be due to random fluctuation, which seems plausible by order of magnitude estimates for batters with 0.300 BA.

Batter Streaks in The Book

The Book considers batters over the course of 5 games, kind of like the n-gram of the batter's games for the season, except the first 5 (and last 5) games. Then to each of these games, we compute the wOBA for the batter. A hot streak is the top 5% of these 5-game-appearance wOBA scores, and a cold streak is the bottom 5% wOBA scores.

We reproduce table 14 from The Book, which examines data from 2000–2003 (excluding games at Coors field, and excluding Rockies players):

Number of distinct Players with one or more 5-game hot streaks543
Total number of hot streaks6408
Total PA during the streaks141259
Average wOBA during the streaks0.587

Observe, during a hot streak there's an average of 141259/6408 ≈ 22.044164 PA during a streak (or 4.4 PA per game). On average, the weighted linear combination of plays for a batter sums to (0.587 × 141259/6408) ≈ 12.939924. Using the formula for wOBA given in The Book, this translates to somewhere between 6.6358585 home runs and 14.377693 singles. Using the league average for the American League, we find the average BA over the time period is:

YearAL average BA1B/H2B/H3B/HHR/H
2000.2760.65934070.197155790.0193923730.12411118
2001.2670.65748820.20147750.0208193420.12021491
2002.2640.653478860.20532060.0211459750.12005457
2003.2670.65953180.19933110.0214046820.11973244
Average0.26850.657495440.200767060.0206770060.12106053

The probability of getting at least 6 hits in 22 PA with a BA of 0.2685 is, according to the binomial distribution, P(H > 6) ≈ 61.130404%. Given the average over these years, the expected wOBA when there are at least 6 hits is 0.2214 with a standard deviation of 0.1002. Assuming everyone were closer to the average, we would expect naively this to be a mildly rare thing to see a streak.

But looking at the League Average wOBA scores, which fluctuate around 0.33, we see a hot streak is then just a 2-sigma event, something which should happen every 3 weeks or so. From this perspective, of rough probabilistic arguments, it is unsurprising The Book concludes there is little predictive information in knowing if a player is in the middle of a hot streak (or cold streak, for that matter).

Saturday, December 2, 2017

wOBA is more elegant than you think

Review: How do we compute wOBA

Any given moment in baseball may be described by (1) how many outs there are, and (2) who's on base. There are possibly 0, 1, or 2 outs; and 8 possible configurations of runners on base. Hence we may describe the game at any moment in a given inning by 24 possible states. We may represent each possible state by a number from 0 to 23, using the formula: 8×outs + (base-configuration) = (game state). Here the base-configuration = 1×(first occupied) + 2×(second occupied) + 4×(third occupied), where the parenthetic (base occupied) is 1 if the given base is occupied and 0 otherwise...think of it as a 3-bit number.

Step 1: Compute the Run-Expectancy Matrix. We may set up a table whose rows are the base configurations, and the columns are the number of outs. Tom Tango calls this table the "Run Expectancy matrix" (or RE-matrix), but it's really a random variable. We find for a given state, over the course of a season (or set of seasons) the number of runs from that state until then end of the inning; then we divide by the number of times that state has occurred over the season(s).

In pseudocode (pidgin Python):

for plate_appearance in season:
    state = to_state(plate_appearance)
    runs[state] = runs[state] + runs_at_end_of_inning(plate_appearance)
    counts[state] = counts[state] + 1

for i = 0, 1, ..., 23:
    re[i] = runs[i]/counts[i]

Step 2: Compute the raw coefficients. The "run expectancy" for a given play is the number of runs resulting from the play, plus the difference in the value from the RE-matrix component for the final state from the RE-matrix component for the initial state.

Now, for a given play (BB, HBP, 1B, 2B, 3B, HR, Outs), we compute a coefficient k(BB) by summing the run expectancy for every walk in the season(s) then dividing by the total number of walks occurring in the season(s). Schematically in pseudo-code (pidgin Python):

number_of_walks = 0
re_of_walks = 0
for walk_event in season_walk_events:
    re_of_play = re[end_state(walk_event)] - re[start_state(walk_event)]
    re_of_walks = re_of_walks + re_of_play
    number_of_walks = number_of_walks + 1
k_BB = re_of_walks/number_of_walks

Given the structure of the RE-matrix, k(Outs) < 0 always.

Step 3: Scale the raw coefficients. For each of the offensive plays (BB, HBP, 1B, 2B, 3B, HR) we have the coefficient c(play) = k(play) − k(Outs).

Step 4: Compute the wOBA. We know compute the wOBA for a player by the formula:

wOBA = c(BB)×(BB/PA) + c(HBP)×(HBP/PA) + c(1B)×(1B/PA) + c(2B)×(2B/PA) + c(3B)×(3B/PA) + c(HR)×(HR/PA)

The normalizations vary, sometimes instead of PA it is (AB + BB − IBB + SF + HBP). The intuition remains the same, we multiply the coefficients by the probability our given batter will perform the given play.

But that means wOBA is the expected value for some random variable.

Exercise 1. Assume for simplicity that PA = BB + HBP + 1B + 2B + 3B + HR + Outs. Prove the following formula holds:

wOBA = k(BB)×(BB/PA) + k(HBP)×(HBP/PA) + k(1B)×(1B/PA) + k(2B)×(2B/PA) + k(3B)×(3B/PA) + k(HR)×(HR/PA) + k(Outs)×(Outs/PA) − k(Outs)

[Hint: plug in the definition of the re-scaled coefficients in terms of the raw coefficients.]

Due to the intricacies of a degenerate sigma algebra, the wOBA for a batter who has never even been at plate once will be zero.

Mathematical Cleverness Hidden in the Coefficients

The raw coefficients k(play) is actually the conditional expectation value E[RE|B=play] where "RE" is the random variable describing the entries of the RE-matrix, and "B" is the random variable for the play at hand (the BB, HBP, 1B, etc.). Recall the conditional expectation is itself a random variable when the "B" is left unspecified.

The expression E[RE|B=play] is precisely step 2 in computing wOBA, and if we do not fix the play it gives us a "random variable" — the function which, given a play, produces the corresponding coefficients for a that play.

For a player's wOBA, this is just the expectation of the conditional expectation minus the coefficient for outs: wOBA = Ebatter[E[RE|B]] − k(Outs).

A more elegant solution would be just to use Ebatter[E[RE|B]], so bad players are penalized for their outs. The only plausible reason for subtracting out k(Outs) that I could think of is to make wOBA look superficially "similar" to SLG, but it does have a nifty feature that a batter that only strikes out will have a vanishing wOBA score as opposed to a negative score (which has its own drawbacks).

Remark 1. The astute reader may recall from basic probability that E[E[X|Y]] = E[X], which is true when we take the expectation value using the probability distribution over the same probability space. But we are not doing that with wOBA, we are taking the inner expectation with respect to the season(s)'s average, and the outer expectation with respect to the batter's history. The geometry of the probability space is more subtle than one would think, it's more analogous to a Fiber bundle, where the fiber is the probability space over the 25-states of the inning, and the base-space is the batter's possible plays.

Possible Improvements

Steps 1 and 2 in the algorithm for computing wOBA coefficients didn't specify any conditions on which runs we look at. That is to say, we didn't restrict focus to a particular park, or for particular weather, and so on.

A possible improvement would be to project the statistics onto a particular subset: compute wOBA coefficients relative to a particular park, or for particular weather. This would factor into the statistic the park's idiosyncrasies.

More "controversial" improvements include counting "Caught Stealing" as a play, which doesn't really measure a batter's performance, but does measure a player's judgement and more crucially ability to steal.

Variance

As far as I can tell, The Book first introduces wOBA. It was rather quick in giving its variance in an appendix. Recall for a random variable on a finite probability space, the variance is:

Var(X) = (∑jX(j)2Pr(j)) - E[X]2

where I have written out explicitly the E[X2] for emphasis on the structure of the formula. The Book asserts it is the same as the Bernoulli distribution's variance.

Exercise 2. Assume for simplicity that PA = BB + HBP + 1B + 2B + 3B + HR + Outs. Show the following formula holds:

var(wOBA) = c(BB)2×(BB/PA) + c(HBP)2×(HBP/PA) + c(1B)2×(1B/PA) + c(2B)2×(2B/PA) + c(3B)2×(3B/PA) + c(HR)2×(HR/PA) − (wOBA)2

where the superscript 2 is to indicate square, x2 = x×x. Then prove or find a counter-example that

var(wOBA) = wOBA(1 − wOBA)/PA.