Defense Independent Pitching Statistics (DIPS) - QMIND-Team/Sabermetrics GitHub Wiki
Defense-independent pitching statistics (DIPS) is a metric whose concept was generated by Voros McCracken, a paralegal who participated in a rotisserie league of baseball, in 1999. The idea for this metric was brought to him by the question “How much control, if any, does a pitcher have over whether a batted ball in play falls in for a hit”. The main purpose of pitching is to serve the ball across the plate allowing the batter to hit the ball into play, and though it was slow to catch on the beginning, pitchers soon began to try and deceive the batter, with varying speeds and the path the ball took to cross the plate, in order to get outs. In 1912, statistics for Wins, Losses and earned run average (ERA), were the main measures for the quality/skill level of a pitcher and it was noticed how much variation these stats posed on the outcome of a teams Win/Loss. Throughout the mid 1900s many statistics and rankings were developed involving the pitcher’s stats and how they affected the overall outcome of the game; however, they all ran into issues when evaluating a pitcher solely based on their Win/Loss statistics. The game is broken down 50/50 into achieving runs and preventing the other team from achieving runs, what these models failed to consider was how much affect the pitcher truly has over the prevention of runs for the other team. The problem faced when evaluating a pitcher ranking solely on ERA, Wins and Losses is the fact that ERA is not a mirror image of how the pitcher performed on the mound, it also accounts for the strength of the fielders. Voros McCracken was interested in finding a metric that would reflect only the pitchers affects on the team’s defensive play excluding the fielder’s prevention of runs. In order to do so, he divided up the pitching statistics into two categories: events that were contributed to by the fielders such as – single, double, triple, groundball, etc.; and events that were solely contributed to by the pitcher – walks, strikeouts, homeruns, etc. through examining the data of a pitchers sole contribution to defense, a year-to-year correlation between strikeouts rates and walk rates, as well as having a slight correlation to homerun rates and a low correlation to the rates of batting average on balls in play. There were a few assumptions that were made in McCracken’s original DIPS calculation, that being the assumption the game is played on an average field, with average defense and average luck on balls in play. Two formulas were devised from this; Balls kept out of play = (homeruns + walks + hit by pitch + strike out) / total batters faced; Hits per ball in play = (hits – homeruns) / (outs + hits -strikeouts – homeruns). Of these two new metrics it was then found that the pitcher had control over Balls kept out of play but did not have control over Hits per ball in play. Through work by various other authors, these conclusions around pitching were made: major league pitchers have better BABIP than minor league pitchers; the ability to produce infield pop-ups is a repeated skill by major league pitchers; pitchers have a high level of control over whether the ball is hit in the air or along the ground; pitchers have less control over hits than walks, homeruns and strikeouts; run average (RA = runs/innings pithed * 9) has a much greater year-to-year correlation than ERA does. These various authors were also able to develop a break down of what determines the outcome of a batted ball using regression analysis, that being: luck – 44%; pitcher – 28%; fielding – 17%; park – 11%.
The formulas devised by McCracken, both DIPS and field independent pitching (FIP), are non-reliant on the batted ball data such as – groundballs, flyballs, and line drives – which typically fall into the fielding stats of a team, making the stats solely focused on the pitcher and not on the defensive fielding. The stats used in the calculation of DIPS are as follows: batters faced; homeruns; walks; intentional walks; strikeouts; and hit batsmen. The primary equation for the calculation of DIPS devised by McCracken is: (1B * .50) + (2B * .72) + (3B * 1.04) + (HR * 1.44) + ((TBB + HBP) * .33) – ((BFP – H – TBB – HBP) * .098), this is then multiplied by the percent of all runs earned (0.9297) and then by 9 to get the average for all 9 innings. With the use of this equation, McCracken was then able to improve the formula by slightly simplifying it and accounting for slight differences in specific groups of pitchers: left handed, knuckleball pitchers, etc. this is due to the fact that on average knuckleball pitchers have a BABIP of 0.01 lower than a right handed pitcher, and left handed pitchers had a BABIP of 0.002 lower than right handed pitchers. This was accounted for because the hand used does pose a significance in the statistics of baseball. The second derived equation was as follows: dER = (dH – dHR) * .49674 + dHR * 1.294375 + (dBB – dIBB) * .3325 + dIBB * .0864336 + dSO * (– .084691) + dHP * .3077 + (BFP – dHP – dBB – dSO – dH) * (– .082927) – where dH is Hits allowed, dHR is Homeruns, dBB is the total walks allowed, dIBB is intentional walks allowed, dSO is strikeouts, dHP is hit by pitch, and BFP is batters facing pitcher. From the success of the DIPS sabermetric, other statistics/formulae relating to this topic were released. These being: FIP (or xFIP) – expected fielding independent pitching, which is currently the most widely used defense independent statistic and can be described by the most common formula of: C + (13 * HR + 3 * BB – 2 * SO) / IP, where C is a constant used to adjust the league average ERA (typically near the value of 3.20), HR is homeruns, BB is walks allowed, SO is strike outs and IP is t he innings pitched. xFIP is almost identical to the formula of FIPS however Fly balls (FB) is substituted into the equation in the place of HR. LIPS – Luck independent Pitching statistics, where a similar approach to DIPS was taken but excluding the factors such as luck, defence and park (all achieved through the regression analysis seen earlier). The goal of this sabermetric is to provide a model completely reliant on the pitcher, and not include outside factors. From year-to-year LIPS was found to have a better correlation with ERA than FIP does.