This document summarizes research analyzing how MLB player salaries are determined and how they have changed over time. It finds that while sabermetric stats like on-base percentage are now considered, traditional stats like batting average and RBI are still overvalued. Comparing models from 1985-2003 to 2004-2013 shows sabermetrics have increased in influence since Moneyball popularized them, but traditional stats still carry substantial weight. The results suggest front offices are competent evaluators but could better value power, runs, and walks over batting average and RBI.
2. Background
• How should we evaluate players?
• Triple Crown statistics—crude
and flawed
• Bill James (pictured)—father of
sabermetrics
• Sabermetrics is the empirical
analysis of baseball in pursuit of
objective knowledge about it
• Sabermetric research was still a
niche aspect of the game until
early 2000s
http://www.foxsports.com/mlb/story/Bill-
James-the-man-who-changed-how-we-analyze-
baseball-092111
3. Moneyball
• Published in 2003, focused on
strategies of Oakland A’s GM
Billy Beane
• Hugely successful—#1 New
York Times bestseller and
Oscar-nominated movie
• Popularized sabermetrics
(OBP, walks) and exposed
flaws in baseball groupthink
• Changed the way MLB front
offices evaluate players
http://en.wikipedia.org/wiki/Moneyball
4. Current Market Conditions /Run-
Scoring Environment
• Scarcity of offense: why?
• Limited availability of free agents
• Sport flooded with money from recent TV deals
http://www.hardballtimes.com/wp-content/uploads/2014/08/Judge-1.jpg
5. Data
• Used baseball statistics from the
Lahman baseball database
• Spans 1985 season through 2013
season
• Mix of traditional stats (R, HR, RBI,
Avg.) and sabermetrics (wOBA,
SecA, ISO, RC) to measure different
batting skills
• Combination of career statistics and
most recent season stats
• Looking only at arbitration-eligible
and free agent-eligible players
http://throughthefencebaseball.com/the-hit-list-
most-useless-baseball-statistics-ever/38429
6. Process
• Basic model follows semi-log function
• Ln(salary) = B0 + B1GSavg + B2R + B3HR + B4RBI + B5RC +
B6Avg + B7OPS + B8wOBA + B9SecA + B10BBKratio +
B11ISO + B12YearsinMLB+B13YearsinMLBsq + e
• Dummy variables; C, IF, OF, DH
• Separate regressions for pre-Moneyball and post-
Moneyball eras
• Panel data is time-series and cross-sectional, used fixed
effects to account for heteroskedasticity and
autocorrelation (fixing standard errors in the process)
7. Aggregate Model Results
Career Most recent season
Variable Coefficient T-Stat Coefficient T-stat
constant 5.654496 12.72* 10.95082 95.59*
GSavg .0214179 11.70* .0038353 7.00*
R .0012259 -1.96* .0038035 3.48*
HR -.0029078 -2.04* .0017204 0.52
RBI .0009285 1.58 .0036076 4.04*
RC .0006847 0.79 -.002643 -1.81*
Avg 10.52837 1.75* -2.949597 -1.97*
OPS 8.260738 1.02 3.944636 2.11*
wOBA -12.11063 -0.82 -4.171996 -1.34
SecA .000551 1.47 -.0001244 -0.11
BBKratio .4054353 1.43 .0307032 0.74
ISO 2.483054 0.59 -2.59103 -3.07*
YearsinMLB .2306129 12.94* .4345581 25.12*
YearsinMLBsq -.0125622 -14.48* -.016977 -17.63*
observations 7,782 (1,475 clusters) 7,438 (1,453 clusters)
Regression estimates of ln(salary) aggregated 1985-2013
*means significant at the 10% level
• Most of the skills represented by these stats are
significant determinants of salary or nearly so
• wOBA, ISO, runs, and home runs potentially
undervalued
8. 1985-2003 Results
Career Most recent season
Variable Coefficient T-stat Coefficient T-stat
constant 4.812548 10.16* 10.69811 53.51*
GSavg .0214313 12.95* .0037818 7.01*
R -.0010166 -1.77* .0042812 3.98*
HR .0003011 0.23 .0015807 0.48
RBI .0001063 0.21 .0034555 3.96*
RC .0018103 2.23* -.0031446 -2.17*
Avg 12.99415 2.40* -2.803983 -1.85*
OPS 1.197967 0.16 3.534443 1.77*
wOBA 2.956756 0.21 -3.194013 -0.95
SecA -.0001522 -0.44 .0002442 0.21
BBKratio .1613303 0.61 .0127604 0.30
ISO 5.84307 1.53 -2.40542 -2.81*
YearsinMLB .2368564 12.21* .443404 22.13*
YearsinMLBsq -.0125806 -14.58* -.0168725 -17.44*
observations 7,782 (1,475 clusters) 7,438 (1,453 clusters)
Regression estimates of ln(salary) aggregated 1985-2003
*means significant at the 10% level
• Heavy emphasis on batting average and
power (ISO and RBI)
• Not much attention paid to walks or SecA
9. 2004-2013 Results
Career Most recent season
Variable Coefficient T-stat Coefficient T-stat
constant 5.99605 13.71* 10.95749 95.14*
GSavg .0220074 12.38* .0039531 7.22*
R -.0012323 -2.04* .0036692 3.37*
HR -.0017564 -1.25 .0013545 0.41
RBI .0006166 1.09 .0035256 3.95*
RC .0011461 1.38 -.0027398 -1.88*
Avg 11.95485 1.96* -2.657011 -1.77*
OPS 7.819687 0.94 3.855477 2.03*
wOBA -14.04781 -0.93 -4.358033 -1.37
SecA .0003033 0.84 -.0000462 -0.04
BBKratio .4432352 1.59 .0415112 1.00
ISO 3.407737 0.83 -2.420432 -2.86*
YearsinMLB .2458825 14.12* .444569 25.40*
YearsinMLBsq -.0125346 -14.61* -.0169879 -17.64*
observations 7,782 (1,475 clusters) 7,438 (1,453 clusters)
Regression estimates of ln(salary) aggregated 2004-2013
*means significant at the 10% level
• Note increased emphasis on OPS, SecA, and BBK
• Still a lot of weight given to batting average and
RBI
10. Conclusion
• The market has corrected somewhat since
Moneyball, but traditional statistics still
prominent
• Based on aggregate model, most of the statistics
appear properly valued, suggesting front offices
are competent
• Not surprisingly, batting average and RBI
overvalued
• Power (home runs and ISO) and runs surprisingly
undervalued
• Could also exploit wOBA