How the Idea Transpired
Reddit user DShep: Darvish Pitch Selection
My idea for this post emerged from my adult baseball league. Most pitchers threw between 70-80 MPH, but their breaking balls were released from different release points or arm slots than their fastballs and were easy to distinguish. I struck out six times in 29 at-bats for the season. Of the six times I struck out during the season, three strikeouts were against the same guy in the same game – and he wasn’t even hitting 70 MPH. He had three to four different pitches, and they all released from what looked like the same point.
He was consistent with his release point regardless of his pitch selection, making it extremely difficult to predict what pitch I would be swinging at.
This gave me the idea to look at the value associated with release point consistency (while keeping in mind that some pitchers use release point inconsistency as a form of deception). Mike Fast on BP reminds us of this, and noted that evaluation on this subject could be useful in a number of areas: injury and fatigue as well as the fruitfulness for looking at the consistency between pitch types, which I would like to evaluate more in the future. This post will focus on the outcomes associated with different levels of consistency and provide what we all love as FanGraphs users: a new leaderboard.
The consistency will be measured by RMSE (Root Mean Squared Errors) in release points. The RMSE represents the standard deviation of the differences (errors) between each pitch’s release point (observed) and the mean release point within that game (model/estimator). We are using RMSE instead of general standard deviations from a mean release point because using RMSE assumes that we have a model/estimator or a “line of consistency” versus a simple stray from one single value.
The lower the value, the better; the closer each pitch is to the same release point.
I am considering this a skill or benefit and form of deception: it is hard to pick up different pitches from an extensive repertoire if they are released from the same point.
Season-long consistency makes the least sense to evaluate because of adjustments a pitcher makes throughout the season, such as a Stephen Strasburg or Madison Bumgarner mound-move. The focus will be on within-game consistency, but I will also list pitch-to-pitch consistency and the game-to-game consistency. The correlations and standard deviations that I reference throughout this post will be on the within-game consistency, even though some correlations and outcomes are slightly stronger with the pitch-to-pitch facet.
The Epitomic Example: Mike Fiers
Mike Fiers is not consistent. He averaged a 2.18 RMSE for the season, which was about .80 standard deviations less consistent than average for pitchers with 70+ innings pitched. However, on his August 14th start, he turned into Corey Kluber, with a 1.43 RMSE, which was a glorious outlier. Do you remember what happened in Fiers’ August 14th start against the Cubs? He struck out 14 guys in 6 innings. Naturally, his swinging-strike (16% vs. 9.5% season rate) and contact (62.2% vs. 78.6% season rate) rates were his best all season. In addition, from a balls-in-play perspective, only 28% of his balls in play were flies relative to his 46.9% for the season.
The correlations between his game outcomes and within-game RMSE’s for his starts were strong, albeit in a small 10 game sample: .80+ for his expected earned run averages and .60 to his contact rates. Mike Fiers epitomizes the value of evaluating release point consistency. When he was really “on,” he dominated.
On almost the exact other side of the spectrum, there is Corey Kluber or “Klubot” as I will further verify. Kluber’s game RMSE averaged 1.46, which was almost .90 standard deviations more consistent than average. His most consistent game was on September 6th versus the White Sox with a 1.29 RMSE. His least consistent game was on July 6th against the Royals with a 1.85 RMSE. He did throw a shutout with 8 strikeouts on September 6th, but he also threw a 10 strikeout, 1-run performance against the Royals on July 6th. The effect of his release point consistency on his outcomes was less apparent: .10 or less correlation to his ERA’s. However, if we look at the games when he was “on” (lower than his season average RMSE) versus the games he was “off” (higher than his season average RMSE) we still see this:
*I cheated and excluded his first game of the season: let’s call it cobwebs. The RMSE presented above is Kluber’s pitch-to-pitch RMSE, which had more of an effect for him relative to his general within-game RMSE.
The 2014 Release Point Consistency Smorgasbord (Leaderboard):
First and foremost, I would like to thank Tanner Bell (@smartfantasybb) of Smart Fantasy Baseball for developing a streamlined approach to processing the PITCHf/x data that I used and for the initial editing of this content. Tanner’s site is fantastic if you want to create your own fantasy baseball projections, rankings, dollar and inflation values, etc. His content is great, but perhaps more impressive is the presentation and demonstration of the “how-to” on his site walking you through each step. I cannot fully comprehend the amount of hours that Tanner cut out for me in the data manipulation. I also want to thank our Jeff Zimmerman who pulled the PITCHf/x data.
A few contingencies out of the way first:
• All pitches with an “IN” (identified as an intentional walk pitch) or “UN”/blank (unidentified) classification were omitted from the data so as not to effect the release point consistency scores.
• Only pitchers with more than 70 innings pitched were included in the data. Therefore, most relievers have been excluded from the evaluation. The sample should be considered large enough (200+ pitchers multiplied by all of the appearances multiplied by the pitch counts).
• For the game-to-game RMSE correlations to the outcomes, the approach was quite manual. I removed relievers and starting pitcher games with pitch counts distinguishably low so that they did not provide equal weight. I also removed shellackings (starts > 9.00 ERA), which would have strongly affected the correlations. Therefore, you need to take all of the game-to-game outcomes correlations with a handful of salt. The season RMSE correlations to outcomes are on the complete sample.
And now, our 2014 Leaderboard:
Each player referenced in this post is highlighted in yellow for your convenience.
• Column 5 (m_Gm_RMSE) is the mean within-game RMSE for the season and the order that the file is currently embedded. It is currently sorted in ascending order (i.e. the lowest RMSE’s/most release point consistency at top). Therefore Jacob deGrom has the best within-game release point consistency in 2014 for a fulltime starting pitcher.
• Column 6 (zGm) is how many standard deviations that RMSE is from the mean
• Column 7 (G2G_Gm_RMSE) is the game-to-game RMS change (i.e. their game to game consistency score). While Wily Peralta had the 37th best mean within-game RMSE, he wound up first overall for game-to-game consistency. Brandon McCarthy, Vance Worley, Yovani Gallardo, Felix Hernandez and Sonny Gray were the only pitchers in the top 50 game-to-game RMSE’s that had barely below average within-game RMSE’s. This means that while their repertoire is not released from the same point as consistently as others, their game to game release points are more consistent than others.
• Columns 8-10 (m_P2P_RMSE to G2G_P2P_RMSE) are hidden, but provide the same information in columns 5-7 except on the pitch-to-pitch change consistency within each game. As noted in the “Approach” section above, some outcomes and/or correlations have stronger relationships with the pitch-to-pitch change facet versus the general within-game consistency, but the two sets of values parallel each other; naturally they have a .94 correlation.
• Columns 11-22 (ERA to SwStr%) are the outcomes for the season for each pitcher. Row two within these columns contains the correlation between the season average within-game RMSE and the season outcomes for all pitchers.
• Columns 23-34 (ERA_Corr to SwStr_Corr) are the correlations for each game’s RMSE and each game’s outcomes for all games, but again, note the manual contingencies I made prior to the leaderboard (removed relievers, lower pitch count games, etc.). Row two within these columns contain the correlation between all the game RMSE’s and all the game outcomes for all pitchers. Relievers were omitted from this segment based on the lower pitch counts and game-to-game variance).
The “Game RMSE Correlations to Game Outcomes for All Starters” disqualifies the generalizability (results for the full sample are in row two), as you can see that no correlation was stronger than the .29 between RMSE and SIERA. Still, the relationship is still apparent in row two of the “Season Average RMSE Correlations to Season Outcomes for All Pitchers with 70+ Innings Pitched.” You can filter the individual outcome correlations under “Game RMSE Correlations to Game Outcomes for All Starters” to get a sense for which pitchers had those stats correlate to their within-game RMSE most. We want a strong positive correlation between RMSE and ERAs (the higher the RMSE, the less consistent, the worse the ERA). We want a strong negative correlation between RMSE and K% as well as SwStr% (the lower the RMSE, the more consistent, the higher the strikeout and whiff rate).
The effect is much more evident when you bin the outcomes by RMSE (versus through correlations).
Outcomes by Pitchers: For examples, see the blurb on Mike Fiers and Corey Kluber above, but also here are my posts on “What Happens When Madison Bumgarner is Really on” and “Ballsy with Trevor Cahill.” I can go on. Instead, let’s look at the entire sample.
Here are the outcomes and average RMSE’s for pitchers most consistent (> 1 standard deviation); within one standard deviation of the mean consistency level; and least consistent (< 1 standard deviation):
Perfect. The outcomes for pitchers with more consistency are superior to those less consistent. ERA is not very distinguishable from those within a standard deviation from those who are less consistent, but the results become apparent as we look to xFIP and SIERA. Even the balls in play outcomes are worse for the pitchers who are less consistent.
Group three (higher RMSE/less consistency/more than one standard worse than the mean) by no means associates negative value to all included. For example, this group includes Adam Wainwright, Collin McHugh, Garrett Richards and Max Scherzer. An obvious reason they are clustered into this group is their breaking ball usage:
Breaking balls tend to have more distinguishable release points. On the right-hand side, there is the average Slider, Curveball and combined average usage for major league pitchers last year. The above group averaged six percent more breaking pitches. Those highlighted in red used breaking pitches .5+ standard deviations more, which is a big reason they wind up in this group. Clearly, Waino, McHugh, Richards and Scherzer do just fine despite release point consistency more than one standard deviation worse than the mean.
Outcomes by Games: Whereas the “Outcomes by Pitchers” was presented in a table that split pitchers into three groups, this table breaks games into three groups: all games for all pitchers that were a standard deviation more consistent than their individual season RMSE mean, within one standard deviation, and one standard deviation less consistent from the mean. To explain with an example, Mike Fiers’ August 14th start against the Cubs (extremely consistent/low RMSE outlier) is in the first group (> 1 SD). Mike Fiers’ games that were one standard deviation less consistent than his mean would be in the third group (< 1 SD).
The results for all games, which are more consistent than usual, are not really distinguishable from games that fell within a standard deviation from the mean. In fact, the average expected ERA’s and contact rates were slightly better for all games within a standard deviation than they were when significantly more consistent.
Still confirming the idea, the first two sets are still distinguishable from all games that were less consistent.
From this, we can interpret that while consistency on a high level means more success, it’s on a spectrum of importance for different pitchers. We know that Mike Fiers, Trevor Cahill and Carlos Carrasco perform better when they are “on” (filter by descending ‘xFIP_Corr’ which is the 23rd column in the leaderboard above). On the other hand, Corey Kluber is great even when he is “off” based on his extensive repertoire and per-pitch effect. Some heavy breaking ball users like Collin McHugh, Garrett Richards, and Adam Wainwright get more movement/spin based on their distinguished release points. After all, we know that spin rates are impactful for this group.
Corey Kluber actually winds up at #10 on the spin rate list, which makes him that much more impressive to me: two or three pitches released very consistently from one point (Changeup, Sinker with the Fourseamer tossed in) and then a nasty Slider-Cutter combo released relatively consistently from another. With excellent spin, an extensive repertoire that induces whiffs, above average control as well as pitch-type consistency and sequencing, Corey Kluber should remain dominating.
Future Consideration: Release Point Consistency Value Between Pitch Types
As I mentioned earlier, I am going to look more at RMSE between pitch types including within pitch sequences. Below, I simply looked at the two clusters of Kluber’s release points:
Here, we see that in 20 of Kluber’s 34 starts (column 7, titled “BBdiff”), his Slider-Cutter combination had a RMSE better than his general within-game RMSE when we removed his Sinkers, Fastballs and Changeups. His Sinker-Fastball-Changeup combination was naturally more consistent than his general within-game RMSE in 32 of his 34 starts. In short, when you distinguish these pitch types from each other, the consistency scores get better.
While there were not strong correlations to his general within-game consistency, there was a relationship between his Fastball-Sinker-Changeup RMSE and Fastball Pitch Type Linear Weight: a -.23 correlation to Kluber’s general within-game consistency, but a -.43 to his Fastball-Sinker-Changeup-specific consistency. If you filter this by descending column 10, titled ‘zFB/CH,’ his Fastball pitch type linear average was a .37 wFB/c when those three pitches were “on.” It averaged -1.94 when they were “off.”
A Bad Example: Shane Greene
Corey Kluber verifies the value of evaluating release point consistency based on the game and pitch type outcomes. On the other side of the spectrum, Shane Greene had the third best starting pitcher RMSE after Jacob deGrom and Jeremy Guthrie, but from the correlations, we see that he has a rather strong reverse relationship, meaning the farther outward he goes with his breaking pitches, the less in-game consistency he has and the better the outcomes. In the starts less consistent than his season RMSE (7 games), he had a 3.70 ERA and that included a 20.25 ERA on September 2nd and 14.73 ERA on September 24th, otherwise, he would have had a 1.10 ERA during this time. In the starts he was more consistent (8 games), he had a 3.86 ERA. His strikeout rate was also about 5% higher when less consistent. Keep in mind that he still had the third most consistent average within-game release consistency.
According to BP’s PITCHf/x leaderboard, Shane Greene has a top 15 whiff-rate on his Slider, Cutter and Changeup. His Sinker is above average, but his Fourseam is just flat – from movement and whiff perspectives.
If he works on his fastball – perhaps shifting the pitch out to a similar release point, I think the fastball could jump in value ensuring he hovers between his actual (3.73) and expected (3.40) ERA’s from last year. He was not overly lucky with a left-on-base rate around MLB average and a Batting Average on Balls in Play and Homerun-to-fly ball ratio slightly worse than average.
Descriptively, he was better when less consistent, but if his fastball was released closer to his other pitches, it A) might get more movement and induce more whiffs from that perspective, and B) could create more deception between both pitch types ensuring more swing and miss on his repertoire in general.
As I typed the above statement, which I do believe, I hear myself reaching to prove my hypothesis. In all honesty, the first few pitchers I evaluated release point consistency for was Madison Bumgarner, Trevor Cahill, Yusmeiro Petit, Corey Kluber, and Mike Fiers. The results (when looking at the outcomes when “on” versus “off”) got me very excited. It took looking at the entire sample and picking out guys on the opposite side of the spectrum (Shane Greene and Jacob deGrom) to ground my excitement.
Mike Fast referenced that pitchers use release point inconsistency (similar pitch types from different arm slots or release points) as a form of deception or confusion. This and within-game mound moves (contingent on batter-handedness) are obvious limitations to the analysis. I do not have the bandwidth to evaluate important filters like these.
Again, consistency on a high level means more success, but it’s on a spectrum of importance for different pitchers and pitch types.
I had an unsober conversation about this with Eno during the All Star weekend in Minneapolis this year. Let me try to type it sober now. We need to:
a) Categorize pitchers by repertoire or pitch-type combinations
b) Score each repertoire/combo by usage, movement, velocity for each pitch and velocity differential between pitches. Eno is starting to score repertoires on the outcome level.
c) Find pitchers with similar scores within each repertoire/combo
d) On like-scores, evaluate pitch sequencing – how one pitch affects the next from a swinging-strike perspective. Through trajectories/swing-commitment points, Jon Roegele depicted popular band usage and outcomes for the The Hardball Times.
We have the value of each pitch based on the outcomes (SwStr% and GB/FB) and contextual run-prevention (FanGraphs linear weights). We have the features of each pitch (movement and velocity). We can get the swinging-strike rates for certain pitch sequences. The dark matter that is still there…what distinguishes one repertoire’s swing-and-miss/weak contact outcomes from another? One other factor might be release point deception.
There is an excellent chapter in the “Commentary” segment of Hardball Times Annual 2015 where Jeff Sullivan evaluates whether sabermetrics tipped the balance of power toward pitching (PITCHf/x) and defense (shift effect). Perhaps we can chalk this post up for pitchers as well. Madison Bumgarner works with a mirror doing drills. Or perhaps hitters can look for a slightly distinguishing release point and gain 75 milliseconds prior to swing commitment.
Daniel Schwartz contributes for RotoGraphs when he's not selling industry leading thermal packaging. You can follow him on twitter @RotoBanter