Projecting X: How to Project Players

February 4, 2013

ZiPS. Marcel. Bill James. Cairo. PECOTA. Steamer. These are not the names of my pet hamsters. They are the names of some of the most well-known baseball player projection systems out there. All of them derive their forecasts by throwing various historical data into a blender and spitting out a projected performance line. These systems are pretty darn good, but for the most part, very little is shared about the formulas that make up their guts. So at some point, you might get that urge to begin projecting players yourself, rather than rely on these systems that are unaware of injury issues or changes in a pitcher’s repertoire, for example.

For over 10 years, I have projected players myself for use in my local fantasy leagues. As such, I have never projected any non-fantasy statistics such as hitters walks, doubles, triples, etc (which is why I don’t mention them below!). This is how I do it.

Hitters

At-Bats: Projecting playing time is just as important as projecting performance. Unfortunately, there is little science to it. It’s the one category that needs constant tweaking leading up to the season as position battles are won, injuries occur and recovery timetables become clearer. A full-time player who makes it through the entire season will generally receive anywhere between 525 to 650 at-bats, depending upon their spot in the batting order and how often they take walks. Research has found that the next slot in the batting order receives 15-20 fewer plate appearances than the previous. Fourth outfielders and utility infielders could see as many as 350 to 400 at-bats, earning themselves decent fantasy value in single league or deeper mixed leagues. Most first-string catchers accumulate around 400 to 450 at-bats over a season, though the top ones play nearly every game and record over 500 at-bats.

When projecting at-bats, it is important not to ignore injury risk. Though I am personally not a fan of projecting a lower total that essentially infers some sort of injury will occur, it generally should be done. For example, it would have been unwise to project someone like Chipper Jones for 550 at-bats in recent seasons simply because he was the starting third baseman. He had only recorded more than 500 at-bats once since 2004, so projecting an at-bat total below 500 would have been the right thing to do.

Strikeout Rate (K%): A hitter’s strikeout rate is one of the most stable statistics in baseball. According to research by Russell Carleton (formerly known as Pizza Cutter), the metric becomes reliable after just 150 plate appearances. Furthermore, our own Bill Petti found that it has a high year-to-year correlation. This means that a hitter’s strikeout rate shouldn’t vary too drastically from year to year, which makes it one of the easier metrics to project. Age is an important consideration here. Our own Jeff Zimmerman found that a hitter’s strikeout rate declines until about age 24, levels off until around 31 and then increases until age 35, before plateauing again. Age should be your primary guide for how the hitter’s strikeout rate should be trending.

The most difficult scenario is when a hitter’s strikeout rate experiences a sudden significant increase or decrease. How much of the change the hitter holds onto the following year we can never be sure of, so it’s going to have to require an educated guess. A search into a possible cause, such as a change in batting stance or swing, could lead to valuable insights that could assist in developing your projection.

Since 2005, the league average K% has increased every single year from 16.4% to 19.8% in 2012. Knowing that trend and the league-average mark should also help explain changes in a hitter’s strikeout rate.

Batted Ball Distribution (LD%/GB%/FB%): A hitter’s batted ball distribution is important to familiarize yourself with because it directly affects his home run potential and batting average on balls in play (BABIP), which then factors into his batting average. Like a hitter’s strikeout rate discussed above, the distribution of batted balls is also quite stable, though not to the same degree. A hitter’s line drive, ground ball and fly ball rates all become reliable between 150 and 250 plate appearances. However, while ground ball and fly ball rates correlate rather well on a year-to-year basis, line drive rate displays significantly higher variability. All of this research means that a hitter’s ground ball and fly ball rates generally remain similar every season.

Similar to strikeout rate, a significant change from one season to the next could be the result of a change in hitting mechanics or an altered hitting philosophy. Sometimes the new distribution sticks, but most of the time it would be correct to project the distribution to return to the rates from previous seasons. Unfortunately, FanGraphs does not provide these rates for minor leaguers. You can find them instead at MinorLeagueCentral.com in the “Balls in Play Ratios” section; however, you must add the IFB% (infield fly ball rate) to the FB% to give you a FB% that is equivalent to the way FanGraphs calculates it.

In 2012, the league average distribution in LD%/GB%/FB% form was 20.9%/45.1%/34.0%. When in doubt, regress towards those numbers.

Batting Average on Balls in Play (BABIP): Probably the most well known of all the newfangled sabermetrics, it is also possibly the least understood and most misused. First, batters actually do possess a significant amount of control over their BABIP marks. In 2012, qualified hitters posted BABIPs between .242 and .390. That in itself should tell you that there is some degree of skill involved as the range is much wider than for pitchers. However, despite the skill involved, there is still a ton of other factors outside a hitter’s control that influence it. That’s why it was found that it requires more than 750 plate appearances to become reliable and has the third lowest year-to-year correlation of the many metrics tested. There are many factors that influence a hitter’s BABIP, but unfortunately not all of those are readily available in stat form on the pages of FanGraphs. What we do know that can be found in the stats, though, is the following:

line drives go for hits significantly more often than ground balls
ground balls go for hits more often than fly balls
pop-ups (or IFFB%) are essentially automatic outs
faster players could sustain higher BABIPs than slower players

So, LD > GB > FB > IFFB and speed = good.

To project BABIP for an upcoming season, we could do several things. Since there is a level of skill involved, we could look at what the hitter has actually posted over the last several years. For an established veteran, it should be pretty easy to determine if his BABIP skills are better, equal to, or worse than the league average. A projection that averages the last couple of years should do the trick most of the time. Beware of minor leaguers though as BABIP marks are higher there and a simple average will lead to inflated expectations.

Then there is the expected BABIP formula, originally developed by blogger slash12 and recently updated by Jeff Zimmerman. The appendix section at the bottom of the article links to a Google Docs spreadsheet to download so you can input a player’s data as instructed and it will spit out an xBABIP for each of those seasons. Not only can the spreadsheet help smooth out historical BABIP marks to assist in projecting the future, but you could also use your actual projections as the input data for the upcoming season and get a projected BABIP mark using the formula.

The league average BABIP has ranged between .295 and .300 over the past five seasons.

Home Runs per Fly Ball Rate (HR/FB): The rate at which a fly ball travels over the fence for a round-tripper takes 300 plate appearances to become reliable, which falls in the middle of the pack among all the metrics analyzed. It also sits in the middle of the pack in terms of year-to-year correlation, but it still rates highly and sports an identical mark to fly ball rate itself. So, HR/FB rates are relatively consistent each year, which isn’t too surprising since power hitters usually remain so and weaker hitters don’t generally enjoy sudden power outbursts. Here, age plays a significant role and if a player switches teams, then park factors could also factor into the projection, more so than when projecting other statistics.

Though we have not published a specific aging curve for HR/FB ratio, ISO makes for a decent proxy. Jeff Zimmerman found an earlier peak than most of us would have assumed, with ISO at its highest from ages 24 to 25. After a hitter debuts, he will experience a gradual increase until that age before a consistent decline through retirement. Of course, ISO includes doubles and triples, which may very well peak earlier than the home run skill. But, there would be little disagreement that a hitter’s power peaks in his mid-to-late 20’s followed by steady decline. With our knowledge of how aging affects home run power, we could do a better job of determining when a power surge is for a real or is just a one season fluke. A 26-year old who enjoyed a HR/FB ratio spike from 10% to 15% has a much better chance of sustaining those gains than a 35-year old.

Aside from looking at the historical data and a hitter’s age, I also always peak at ESPN Home Run Tracker, which provides the distance of every home run hit and classifies each in various categories depending on how far past the fence it flew. I have found in prior research that I have conducted that a hitter with a relatively high percentage of “Just Enough” home runs are more likely to experience a decline in their HR/FB rates the following year. Though the data has not been updated since 2006, 27% of league home runs were classified in the “Just Enough” category that season. So, a hitter whose home runs were classified as such at a significantly higher rate (think 40% and above) are at high risk for regression.

In addition to the Home Run Tracker data, I also look into the hitter’s average fly ball and home run distance. After choosing the batter from the drop-down box, check off the home runs and flies boxes, select the start and end dates and submit. There is a high correlation between a batter’s HR/FB ratio and the distance his fly balls and home runs have traveled. In 2012, for those with at least 46 combined fly balls and home runs, the average distance was around 280 feet. If a hitter’s HR/FB ratio does not match up with his distance, he would be a candidate for improvement or regression the following year.

The league average HR/FB ratio tends to bounce around the 10% level, as it has ranged from 9.4% to 11.3% over the last five seasons.

Runs Batted In and Runs Scored: These are two of the three fantasy categories that I project completely manually. I have stumbled upon a couple of formulas to project RBI, but they either require way too much additional work, or I did not feel that they added any additional accuracy. As a result, I simply look at past years, take expected batting order slot into account, any lineup changes (e.g., addition of Josh Hamilton) and any other changes that might factor into these numbers, like a projected increase in power for the hitter in question. Personally, I round these projections off to end in a 5 or 0. It’s impossible to be perfectly accurate, so I don’t believe it’s worth wasting more brain cells trying to come up with an exact number.

As usual, you must remember to assume some regression at the extremes. For example, just because Miguel Cabrera knocked in 139 batters in 2012 doesn’t mean that it would be wise to project him for another 130+ RBI season. From 2006-2009, Ryan Howard recorded between 136 and 149 RBI, but again, no system should have projected that consistency to continue. Regression (to the mean, or league average, and to a hitter’s own previously established level of skill) is an extremely important concept to understand when projecting players and applies here since so much of an RBI and runs scored total is out of a hitter’s control.

Stolen Bases: This is the third fantasy category that I do not rely on any formulas or other inputs to project. I have considered projecting components such as the rate at which the hitter attempts a steal given his opportunities, combined with a projection of his success rate. But, I do not believe the benefit of increased accuracy, if there even would be any, outweighs the extra time required.

Speed is a skill of the young and therefore age plays a significant role when projecting stolen bases. I look at Speed Score (Spd) and triples as proxies for potential stolen base prowess. If I see a high stolen base total coming from a player with an average or worse Spd score, I raise an eyebrow. The reverse is also true as sometimes a high Spd guy won’t attempt many steals for whatever reason, which makes him a candidate for a breakout stolen base total. Triples are used since it’s primarily a speed stat in that it could be considered “doubles for fast guys”.

Furthermore, I take into account a batter’s position in the lineup. A third place hitter is going to attempt fewer steals than if he was hitting lead off out of fear of taking the bat out of the hands of the clean-up hitter. Managerial philosophy also needs to be considered as some teams simply don’t run as often. Recent research suggests that spring training stolen base totals could actually hint at a change in mentality during the season. A significant increase or decrease in attempts during the spring may foreshadow team-wide changes in that direction during the regular season.

Non-speedsters who steal bases are the toughest to project. Nick Markakis’ stolen base total jumped from two to 18 from his rookie to sophomore season, then dropped back to the high-single to low-double digit range and held steady there for the next four seasons. Then in 2012, he stole just one base. When dealing with these types of players, you never really know how much they are going to decide to run. The best way to handle this situation is to be conservative in your projection.

Home Runs: Projecting home runs should be an automated process using formulas. The inputs are simply the hitter’s contact rate, fly ball rate and HR/FB ratio, all of which are discussed above. Multiply those three together and voilà, out pops a home run total. Projecting each of the various components rather than the home run total itself will lead to a much more accurate projection.

Batting Average: Like home runs, batting average relies on various other inputs that get thrown into a blender. Here, you will use at-bats, BABIP, K% and home run total to calculate your batting average projection.

Pitchers

Innings Pitched: Because standard fantasy categories include two ratio categories for pitchers, versus just one for hitters, playing time projections aren’t as important. But of course they must still be made and with intelligent thought behind them. Surprisingly (at least to me), innings pitched doesn’t correlate very well on a year-to-year basis, finishing in the bottom half of the metrics tested at 0.42. Obviously, injuries are a primary culprit here, which more often rob pitchers of significant playing time than hitters. In terms of the impact of aging, sabermetric extraordinaire Tangotiger found that a pitcher’s seasonal innings total increases until its peak at age 27 and then declines consistently from there.

When projecting a pitcher’s innings pitched for an upcoming season, let the previous season(s) and his age guide you. Of the top 20 pitchers last season in innings pitched, only five of them were over the age of 30, and one of those was a knuckleballer, who the rules don’t apply to. When projecting rookies, the better ones who stick in the rotation all year will typically throw 160-180 innings. In fact, in 2012, the top 10 rookies ranged between 161.0 and 193.2. For set-up men/closers, between 60 and 75 innings is the sweet spot.

Strikeout Rate (K/9): A pitcher’s strikeout rate is one of the most stable skills he possesses (K/PA was tested, rather than K/9, but it’s close enough). That’s a good thing, because it makes it that much easier to project. In addition to the metric becoming reliable after 150 batters faced, it also has a high correlation from year to year. That means that a pitcher doesn’t typically experience large swings in their strikeout rates on a yearly basis.

As you might expect, pitchers lose their strikeout ability as they age. According to research by our Bill Petti and Jeff Zimmerman, a starting pitcher maintains a strikeout rate within a 0.5 point range through age 29. After that age, the rate declines steadily, as they lose about a point through age 32 and another half a point rather quickly before leveling off. Relievers experience a much more rapid and consistent decline as they age.

So as usual, the best starting point for projecting strikeout rate is to use the past several seasons as an initial guide. Factor in age and you are the majority of the way there. After that, it’s time to get nerdy. Check the pitcher’s swinging strike rate, which I found is highly correlated with K/9. If there is a disconnect (e.g., a high SwStk%, but mediocre K/9), it may suggest some hidden upside or downside. Next is a look at the pitcher’s repertoire in the “Pitch Type” section. Not surprisingly, there is a positive relationship between a pitcher’s velocity and his strikeout rate. All else equal, the harder you throw, the more strikeouts you will generate. So when dealing with a pitcher who just experienced a strikeout rate decline or spike, the Pitch Type section might provide the explanation in the form of his average fastball velocity. It could also tell you that he began throwing a new pitch or altered his pitch mix.

Strikeout rates have been on the rise and have increased nearly every season over the last five years. The average starting pitcher’s K/9 has jumped from 6.5 in 2008 all the way up to 7.1 last year. Relievers have experienced the same trend, averaging 8.4 punch outs per nine last season versus 7.5 just five years ago.

Walk Rate (BB/9): Although strikeout rate stabilizes quite quickly, walk rate actually does not. It takes nearly two-thirds of a season for the metric to become reliable. Similarly, it’s not as consistent on a year-to-year basis as K/9, as it bounces around a bit more frequently. Walk rate doesn’t feel the effects of aging as much as strikeout rate, but it does show an interesting trend. A young pitcher will gradually improve his walk rate through age 25-26, at which point it’s at its lowest point. After that age, the rate slowly creeps up to the point where the pitcher gives back all his gains he had made as a youngster. Given the aging curve and lower year-to-year correlation, it follows that young pitchers with control problems are your prime breakout candidates.

Like for strikeout rate, there’s a chance to dive into the nerd pool again to increase the accuracy of your walk rate projection. RotoGraphs leader Eno Sarris showed us that first strike percentage (F-Strike%) is highly correlated with walk rate. Just as you did with SwStk% and K/9, you could look for discrepancies between F-Strike% and BB/9 to uncover pitchers whose walk rates are due to spike or improve.

Amazingly, even though strikeout rates have surged in recent seasons, walk rates have actually declined. While the average starter has posted walk rates from about 3.1 to 3.2 in past years, over the last two seasons they have walked just 2.8 to 2.9 batters per nine. Relievers have also improved their control, but given their higher strikeout rates, do walk more batters than starters by about 0.6 batters per nine.

Batted Ball Distribution (LD%/GB%/FB%): The analysis of a pitcher’s batted ball distribution is similar to that of a hitter’s. A pitcher’s ground ball and fly ball rates are extremely consistent on a year-to-year basis. However, the correlation of a pitcher’s line drive rate is just 0.11, which means that it is almost completely random. Though the effect is rather small, pitchers tend to allow more fly balls and induce fewer ground balls as they age.

Every so often, you will encounter a pitcher who had been a fly ball pitcher suddenly become a ground-ball pitcher or experienced some other change in batted ball distribution. This change is usually the result of an altered repertoire, many times accompanied by the addition of a new pitch, such as a sinker. Sometimes it could be tough to pinpoint the exact reasoning for the altered batted ball distribution and if nothing could be found, then a regression toward historical marks is the likeliest scenario for the upcoming season.

Batting Average on Balls in Play (BABIP): Unlike hitters, pitchers have limited control over their batting average on balls in play. In the sample size test, BABIP never became reliable through 750 batters faced. And its year-to-year correlation? Just 0.20, right near the bottom of the list. These tests suggest that using previous seasons as a guide to project future BABIP is not the best way to go about it. While pitchers do exhibit some ability to control their BABIP marks, primarily through their batted ball distribution, other factors such as team defense, the ballpark and good ‘ole luck play a major role. We also find that BABIP does rise slightly with age, so if an older pitcher suddenly posts an inflated mark, it might not be poor fortune, but a sign that he’s simply losing it.

For rookies and young pitchers with only one or two full seasons of Major League experience, I will nearly always automatically project a league average BABIP. Until these pitchers can prove to me over many innings that they have a knack for preventing hits on balls in play, I have to assume they have no such ability. When dealing with pitchers who have consistently posted below league average BABIP marks for some time now (see: Matt Cain), we come up with explanations and justify to ourselves why it will continue to happen, but we really don’t have the answers. These types of pitchers are riskier as they rely on a less sustainable skill to repeat their strong performances.

While there are always several pitchers who breach this floor in a given year, I have never projected a starting pitcher for a BABIP below .270. On the high end, my ceiling is .320, which I rarely project, and I hand out some .310 marks here and there as well. Over the last five years, the league average BABIP for starting pitchers has remained pretty consistent, ranging narrowly between .293 to .298. Relievers have posted slightly lower marks, having typically sat between .291 and .294, with 2011 coming in at a flukey .287.

Home Runs per Fly Ball Rate (HR/FB): Also unlike hitters, pitchers have limited control over their home run per fly ball ratios. Its year-to-year correlation is just 0.29, which fittingly is identical to that of pitcher wins, and we know those are quite random. A pitcher’s HR/FB rate is influenced significantly by ballpark factors. Take a pitcher on the Padres and make him pitch half his games in Colorado instead. Think his HR/FB rate will increase? Even if a pitcher hasn’t switched teams, you still need to know how his home ballpark plays. It will help you determine how sustainable a lower or higher than league average mark is given the park’s home run factor.

Just like BABIP, there are some pitchers who are consistently below or above the league average each season with no clear explanation for their success or lack thereof. I will project as low as an 8% HR/FB rate and as high as a 12% mark, even though every year there will be pitchers who finish below and above those levels.

Over the last five years, starting pitchers have typically allowed HR/FB ratios between 10% and 11%. However, last season that rate spiked to nearly 12%. Whether it was a one year blip or the start of consistently higher rates, we’ll have to see. Relievers also saw their HR/FB rate jump last season, but not to the same degree. Like BABIP, they have posted marks lower than starters, coming in at around 9% to 10%.

ERA: You should not project ERA itself, but rather calculate a projected mark using all the other peripherals you developed a projection for. It would be impossible to make the adjustments to balance expected changes in the various skills and luck metrics in your head. The best method is to pick an expected ERA formula you like, such as SIERA, and input your projected components to produce your ERA projection. Some of the formulas may need to be modified so that you can use your projected BABIP and HR/FB ratios since they sometimes assume league-average marks in both of those metrics. The whole point of projecting those to begin with is so that you don’t have to assume league average marks, so you want to make sure to make these modifications.

WHIP: This is an easy calculation to make, as you already have innings pitched and walk projections, so you’re just missing the hits allowed component. With BABIP and home runs allowed projections (derived from FB% and HR/FB rate), you will be able to come up with that missing data.

Wins: We all know how frustrating the wins category could be in our fantasy leagues. It is therefore no surprise that its year-to-year correlation is so low. That’s why I stopped trying to project them manually by looking at previous season win totals. Instead, I now use a modified Pythagorean formula, the same one that takes team runs scored and runs allowed and spits out a projected wins and losses total. In this version of the formula, the runs allowed is actually the pitcher’s, rather than the team’s.

Saves: Perhaps the one category with the least amount of magic involved. This is almost like picking a number out of a hat. There have been attempts to predict save totals in the past, but none have proven to be useful. The most important numbers to look at when projecting saves is innings pitched and ERA. The more innings a closer throws, the more opportunity he has to record a save. And of course, the fewer earned runs he allows, the greater chance he has of converting that save opportunity.

Aside from a cursory glance at those basic metrics, I also consider two other factors. First, a closer’s experience is important, but not for the reason you might think. I wholeheartedly do not believe in the “closer mentality.” In my opinion, if you could pitch successfully in relief, then you can close. But experience does come into play. A manager is more likely to remove an inexperienced closer from his role after a rough patch, whereas he would be more patient with a veteran. The second factor is how strong the bullpen is behind the closer. If a team employs a top set-up man behind its closer, there is added pressure for the closer to succeed since they have another man waiting in the wings who could do the job just as well.

John says:

February 4, 2013 at 6:00 pm

“These are not the names of my pet hamsters.” Brilliant.

La Peste says:

February 20, 2014 at 5:50 pm

Hi, I aslo sent you an email but maybe someone can help me here. I purchased projecting x as well as your pod projections. I am trying to set up the projections outlined in projecting x. In columns R2, S2, and T2 I am getting the error message “#DIV/0!”. I believe I have entered the formulas exactly as outlined. This becomes a problem when I try to increase the decimal paces in starting in column R2. I am brand new to excel, so I am not sure if this is my error or if there is something I am overlooking.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG

Projecting X: How to Project Players

Hitters

Pitchers

2 Responses to “Projecting X: How to Project Players”