Developing the Bestest xBABIP Equation Yet by Mike Podhorzer February 8, 2015 As a projectionist, I am seemingly on a never-ending quest to develop equations for every result statistic. By result statistic, I mean home runs, for example, which are fueled by such skills as hitting the ball far, among others, which itself is summarized by the average batted ball distance we reference here quite often. Another one of those result statistics is batting average. A hitter’s batting average is derived from two underlying skills — his ability to make contact (strikeout rate) and turn balls in play into hits (batting average on balls in play). While a hitter’s strikeout rate is quite stable from year to year, unfortunately his BABIP is not. It’s one of the metrics we still struggle to explain, with luck considered to play a major role. There have been numerous attempts to come up with xBABIP equations and calculators, none of which proved reliable enough to use in our every day analyses. But I refuse to give up, throw up my hands, and simply go with some sort of three-year average to project the following season’s BABIP. So my newest adventure involves journeying to the bestest xBABIP ever developed. And now I share that trip with you. I began by poring over all the metrics available on the FG player pages. I decided that the likeliest metrics that would affect BABIP would be those in the Plate Discipline and Batted Ball sections, along with ISO and Spd. My thoughts for including these metrics for my tests were as follows: Plate Discipline — hitters swinging at more and making greater contact with pitches outside the strike zone would seemingly lead to weaker contact; similarly, balls hit inside the strike zone would be struck better and go for hits more often Batted Ball — we know that line drives go for hits most often of all the batted ball types, while infield fly balls are almost guaranteed outs ISO — isolated slugging percentage is used as a proxy for a hitter’s power; greater power should result in harder hit balls, which we know go for hits more often than weakly hit balls Spd — the faster a player is, the more likely he’ll be able to record infield hits In addition to looking into these metrics, I discovered a new secret bullet. Aside from giving us the batted ball distance, Baseball Heat Maps also has a “Ground Ball and Line Drive Pull Percentages” leaderboard. It provides us with the average angle of those batted ball types. You will notice on that leaderboard that all the lefties are at top and righties are at the bottom. That’s because right field is measured as positive, with the higher the number the further away from center. And conversely, left field is negative and a greater negative mean the closer to the right field line the ball was hit. What we care about is the absolute value of those numbers, so the right-handers with negative values turn into positive numbers. So now the higher the number, the more pull happy the batter is. In recent years, teams have shifted more and more and hitters who pull the majority of their ground balls are seeing their BABIP marks plummet as a result. So the hope is that this new data could significantly boost our ability to estimate and project BABIP marks. After collecting all the necessary data, I calculated the correlation of each metric to BABIP. Let’s take a look at the results: O-Swing% Z-Swing% Swing% O-Contact% Z-Contact% Contact% SwStr% 0.02 0.04 0.03 -0.07 -0.06 -0.08 0.08 Absolute Val of Angle LD% GB% FB% IFFB% Actual IFFB% IFH% BUH% ISO Spd -0.29 0.40 0.16 -0.32 -0.38 -0.43 0.19 0.09 0.11 0.25 The Plate Discipline metrics in the first table came in lower than I expected. However, there is some negative correlation between O-Contact% and BABIP, which is logical, but not as much as I expected. The second table is where all the fun is. Yup, it looks like angle plays a real role, while line drive rate, IFFB% and Spd also factor into the equation. You might wonder why there is an IFFB% correlation, as well as an “Actual IFFB%” one. The “Actual” version represents the percentage of pop-ups hit out of all batted balls, not just fly balls. It’s calculated simply by multiplying IFFB% by FB%. A hitter who posts a 15% IFFB% and 50% FB% is going to hit significantly more pop-ups that are easy outs than one with the same IFFB% but only a 30% FB%. So performing this adjustment results in a more accurate picture of the hitter’s batted ball distribution. And sure enough, the correlation is higher with this adjusted number. My population set was composed of 2,375 player season from 2007 to 2014. After testing several combinations of components, I landed on the following winning equation: xBABIP = 0.2530 + (O-Contact% * -0.0484) + (ISO * 0.1814) + (Absolute Value of Angle * -0.0024) + (LD% * 0.3657) + (FB% * IFFB% * -0.4531) + (Spd * 0.0046) Adjusted R-Squared = 0.423 And now a plot of the results: The next step was to determine if using xBABIP would help us forecast a player’s next season BABIP better than BABIP itself. So I calculated the correlation of xBABIP Year 1 to BABIP Year 2 and compared it to the correlation of BABIP Year 1 to BABIP Year 2. Here are the results: Metric Correlation xBABIP Yr 1 to BABIP Yr 2 0.404 BABIP Yr 1 to BABIP Yr 2 0.352 Success! Now personally, I would prefer to look at a hitter’s previous three seasons of xBABIP marks to forecast BABIP Season 4 and perhaps take a three year average of those xBABIP marks. But we don’t always have three season’s worth of data, so this is as good as it gets. It’s time to put this equation into action and investigate the 2014 season. What follows is a spreadsheet with two separate tabs — the first is the overperformers and the second is the underperformers. Overperformers We knew that Drew Stubbs couldn’t possibly deserve a .404 BABIP, but an xBABIP mark of .345 confirms that he was still knocking the snot out of the ball. Along with his blend of power and speed, he should once again be one of the most valuable fourth outfielders in fantasy baseball and be a solid contributor in NL-Only leagues or mixed leagues with daily transactions. J.D. Martinez’s power surge may or may not be for real, but his BABIP surge most certainly wasn’t. Still, a significantly lower rate of pop-ups led to an xBABIP mark that was still higher than any of his previous actual BABIP marks. His draft cost is sure to be highly variable and there will be no telling if he’ll come at a discount, a fair price or be an expensive risk. Since he rarely takes a walk and strikes out too often for a hitter with mediocre power, Danny Santana is highly dependent on his BABIP for his offensive output. Unfortunately, that’s going to come crashing down in 2015, though he apparently did do all the right things last year to carry an inflated mark. Without any sort of BABIP luck, he’s at risk of losing grip on the leadoff spot in the lineup, which if lost, would take a huge bite out of his fantasy value. Yasiel Puig outperformed his xBABIP mark in 2013 by even more than in 2014, so perhaps he’s doing something else not being captured by our equation. He’s due for a HR/FB rate rebound anyway, which will offset any decline in BABIP and keep his fantasy value near elite levels. By doubling his stolen base total and posting a crazy BABIP, Lorenzo Cain put himself back on fantasy owners’ radars. But that high BABIP isn’t going to last and he’ll have just his legs to fall back on to earn himself respectable fantasy value. Underperformers Surprise, surprise, Chris Davis tops our underperformers list. Despite finishing fifth in baseball in pull-happiness with his ground balls and line drives, xBABIP still thinks Davis profiles as a strong BABIPer. What helped? A low O-Contact%, a high ISO and LD% and very few pop-ups. His pull tendency lost him a whopping .041 points of xBABIP! While the extreme pull guys are losing hits to the shift that the equation can’t account for, there’s little doubt that Davis should enjoy a significant BABIP rebound this season. Brian McCann and Mark Teixeira are two more pull-happy guys whose xBABIP marks may be a bit inflated without any additional reductions from the shifts they face. McCann actually wasn’t totally about pulling the ball as he ranked just 55th among left-handed hitters in angle, but Teixeira ranked third in all of baseball in angle. Since 2012, McCann has underperformed his xBABIP by a large margin, after previously being close to it. That’s surely due to the shift. Teixeira is in the same boat, but his underperformance dates back to 2011. What’s up with players whose first name ends in “hris” and last name is “Davis”?! Khris Davis doesn’t get shifted like the other Davis, but still hit into some apparent bad luck. Like the other Davis though, this one also hits for excellent power and avoids the pop-up. There’s some serious playing time risk here though due to the presence of Gerardo Parra, but he should perform better from a rate perspective. Jedd Gyorko underperformed his xBABIP in 2013 as well, but not to the same degree. Expectations of a healthier season and a much improved supporting cast make him a nice option as a cheap second baseman or middle infielder. Jay Bruce has underperformed his xBABIP in five of his seven seasons and he was one of the most shifted players in 2014. He was a complete disaster at the plate last year, but entering his age 28 season, you have to figure some sort of rebound is in store. Joey Votto has shown no pattern of xBABIP under or overperformance, which is a good sign for 2015. His BABIP should fully rebound, but his health is obviously the bigger question mark. In mid-January, Jeff Zimmerman published 2014 xBABIP values using his most updated equation. His formula incorporated “hard hit ball percentage” data from Inside Edge, as well as Speed Score. Let’s compare the xBABIP values using both equations and take a look at whose values differ most. Pod’s xBABIP More Bullish My equation loves Jose Abreu’s huge power, relatively low angle, above average line drive rate and pop-up avoidance skills. But Zimm’s equation isn’t so enamored and thinks he should have been a sub-.300 hitter during his phenomenal rookie campaign. Christian Yelich is a BABIP dream. He hits liners and grounders to all fields, has a bit of pop, above average speed and hit all of one pop-up all season long. Perhaps he didn’t rate highly in terms of “well-hit” balls, which is the type that boosts Zimm’s xBABIP numbers. Boy oh boy, if Joe Mauer ever lost his sensational ability to turn balls in play into hits, his fantasy value would crater. With a 27.2% LD% and 0 (ZERO!!) pop-ups all year, I have to admit that my .346 xBABIP value appears more reasonable here than Zimm’s mark of .310, which would have represented a career low. My equation thinks that Jean Segura will enjoy a BABIP rebound this season if his profile holds up, while Zimm’s version thinks his low BABIP was deserved. Either way, it will be tough for him to recover his fantasy value while batting eighth in the lineup. What’s Dee Gordon without a high BABIP? A one-category standout with lots of risk. My xBABIP thinks he deserved a high BABIP, while Zimm’s disagrees. Zimm’s xBABIP More Bullish Zimm’s xBABIP believes that Carlos Santana was quite unlucky last year, while mine thinks that wasn’t necessarily the case. He led baseball in pull percentage and hit a high percentage of pop-ups, which led to the low mark from my version of the equation. By my equation, Andrew McCutchen was one of the luckier hitters in BABIP this past year. However, Zimm’s equation thinks an inflated BABIP was well deserved. McCutchen has beaten his xBABIP marks in five of his six seasons, and significantly so in the last three years. So perhaps Zimm’s version better reflects his true BABIP talent. Edwin Encarnacion hasn’t posted a .300 BABIP since 2007, but Zimm’s equation thinks he deserved one in 2014. A low line drive rate, lots of pop-ups and a pull tendency hurt his xBABIP in my formula. On the whole, I think I made real progress in explaining and projecting hitter BABIP marks. We do still have a lot of room to improve though as dealing with shifts is proving difficult. But this is another step in the right direction and I finally feel comfortable using xBABIP values to help me project next season BABIP marks.