Statcast Primer: What Do I Need To Know?
Hi. My name is Jim. I write about launch angles, exit velocities, expected on-base averages, barrels and all manner of things that try to answer the question “why?” Why do baseball stats like batting average, WHIP, home runs, earned run average, isolated slugging percentage and on-base percentage become what they become for various major league players? While often times I’ll focus on hitters because their sample sizes are larger and easier to work with, I also do what I can to apply my knowledge to pitchers as well. A large amount of this knowledge comes from what we can mine out of Statcast™. So that poses the following questions: what is Statcast data specifically and why should you care?
Statcast is an information-gathering system introduced by MLB Advanced Media that tracks a whole ton of information on everything that happens in a ball game. For every pitched ball and subsequently every ball put in play, a large amount of data is recorded about those events. That being said, there are a few basic building blocks when constructing a profile of a hitter or pitcher using data collected by Statcast. To get a feel for what I’m talking about, let’s peek behind the data curtain quickly.
“Hey nerd, do you even watch baseball?”
Yes, a whole bunch! But watching baseball alone doesn’t allow me to form hot takes like: “Xander Bogaerts has not changed!” For that, I need this data. Each pitch and subsequent event/outcome has more data than you’ll need to know about, so we’ll stick to some high-level data points you can use to gain a deeper understanding of the players’ stats you collect on your fantasy baseball teams.
For pitchers, I’m interested in the following data points (and I’ll give examples):
- pitch_type – FF, SI, SL, CU, CH, KC, FC, etc.
- pitch_name – 4-seam Fastball, Sinker, Slider, Change-up, Knuckle-Curveball, Cutter, etc.
- release_speed – 94.632 mph
- release_spin_rate – 2108.0 rpms
- description of outcome: swinging_strike, hit_into_play, called_strike, ball, hit_into_play_no_out
- zone – 5, 12, 9
For hitters, I’m interested in:
- bb_type – line_drive, popup, ground_ball, etc.
- event – walk, home_run, single, double, strikeout, etc.
- hit_distance_sc – 381 ft
- launch_angle – 34.96°
- launch__speed (exit velocity) – 102.6 mph
- woba_value – 0.9, 1.25, 2.10, etc.
- estimated_woba_using_speedangle – 1.523, 0.345, 1.983, etc.
Perhaps without knowing it, you consume a lot of this data already. It feeds things like the spray charts you see on Fangraphs, the pitcher profiles you use on Brooks Baseball, and the new player profiles you can check out on Baseball Savant. You also use it when you quote things like LD%, Hard%, HR/FB, BABIP or xStats. What you may not have known is that you can use this data to gain a deeper understanding of a hitter’s batted ball profile to make more in-depth assessments of his talent, especially during April and May or what we call “small sample size season”.
So what should you care about? Let’s start with some simple facts.
- Understanding BABIP requires understanding how frequently players hit the ball in the air, on the ground and everywhere in between. This is because each type of batted ball has a different probability of becoming a hit.
- Understanding HR/FB, SLG%, ISO% requires understanding how frequently players hit the ball at more detailed launch angles in the air. This is because each degree of launch angle has a different probability of becoming a home run or an extra-base hit.
- The combination of exit velocity (how hard the ball is hit into the field of play) and the launch angle at which the ball is hit help predict how successful each batted ball will be.
- Spray angle (pull, straight/center, opposite field) has an impact on all these outcomes as well because balls that are pulled are typically hit harder and farther.
- Balls hit between 24-32° have the highest probability of becoming home runs and extra-base hits
- Balls hit in the air at or above 95 mph have the highest probability of becoming home runs and extra-base hits.
All these things translate into new statistics you’ll hear bandied about. A player’s “launch angle” is simply the average of a player’s launch_angle for every ball put in play from the Statcast database. You’ll often hear this in relation to a player “raising his launch angle,” which for powerful players will give them more chances at extra-base hits, including home runs. A player’s “exit velocity” is a simple average of a player’s launch_speed. A pitcher’s “spin rate” on a pitch type is just the average of his release_spin_rate for every pitch thrown. Similarly, velocity associated with a pitcher or a specific pitch type is the average of his release_speed.
Using the Statcast Leaderboard, you’ll be able to see some of these high-level metrics:
You’ll note that there are a couple of main stats reported here that we haven’t touched on. Statcast reports the number of balls hit at 95+ mph along with Barrels. Barrels are balls with high probabilities of being hits, extra-base hits, and home runs (depending which launch angles they are hit at).
For my own purposes, I’ve created some dashboards that give me a quick look at how frequently players hit balls at various launch angles and exit velocities along with what level of wOBA is produced by those batted balls. While there are many ways you could decide to look at this type of data, I use a method called “binning.” Binning attempts to bundle up similar launch angles in order to look at the results of those batted balls together.
In this visualization, I simply count up the number of batted ball events (BBEs) in 8° chunks. The lower three “bins” represent mostly ground balls. The next two bins of red and grey represent mostly line drives, followed by the black and next upper pink bin representing higher fly balls. The last bin, which is all batted balls over 40° is mostly popups but can also contain balls hit at extreme launch angles to the outfield.
The thing to notice about this visualization is that the distribution of the “bins” is generally similar for a batter from year to year, but with slight variations. These variations are what give us slightly different or completely new outcomes for things like BABIP, AVG, OBP, SLG, ISO, HR/FB, LD% from year to year. The framework of this chart also allows me to see where a hitter has over or under-achieved on a type of batted ball, and will further let me dig deeper to see if he’s acquired a new skill or if it’s just due to luck.
Take, for instance, JD Martinez’ 2017 season. If we walk our eyes upward from the grey box to the black box to the pink box, we can see he homered on 18% of balls hit 16-24°, 53% of balls hit 24-32° and 31% of balls hit 32-40°. If I scan to the left of his 2017 season, I can see that he slightly over-achieved on every type of fly ball in 2017, but the over-performance in pink 32-40° bin looks like the biggest outlier. Martinez achieved a .785 weighted on-base average from those balls in 2017 against a .350 average previously in his career. You can see in the smaller sample size to being the 2018 season he’s back in that .350 range.
We can get a little bit fancier in our analysis and try comparing different players to see what makes them go, while at the same time adding in strikeouts (Blue) and walks (Green).
Here you can quickly train your eye to see that JD Martinez hits quite a few more ideal launch angle balls (Red through Black) than someone like Domingo Santana, while striking out at approximately the same rate. You can compare that to someone like Lorenzo Cain who strikes out in far fewer of his plate appearances while also hitting many more ground balls in the lower pink regions of the chart. Due to his speed and ability to hit the ball hard on the ground, Cain also has higher wOBAs than his slugging counterparts.
We can take a look at a deeper dive into this data in Part II of the Statcast primer, but hopefully this is enough to whet your appetite for why you should consume this information and what it can begin to help you learn. As an exercise for the reader, go take a look at Christian Villanueva and tell me what you’ve learned!