Recharging the Batteries: Pitching and Catching Should Lead Youth Movements

Now that I’ve allowed myself one horrifying pun in the title of this piece, it should be explained what could justify something so egregiously disgusting. As someone with a familiarity to the Phillies as an organization, it has been interesting to see them to finally begin to rebuild this off-season.

In trading veterans like Jimmy Rollins, Marlon Byrd, and Antonio Bastardo, the team has infused some additional depth in the minor league system – with a noticeable pattern. Rollins was traded for pitching prospects Zach Eflin and Tom Windle, Byrd was moved for Reds’ pitching prospect Ben Lively, and Bastardo was moved to the Pirates for lefty Joely Rodriguez.

The double-A rotation in Reading is currently overflowing with starting pitchers, and that’s not an accident. The team has had scarce minor league pitching depth in recent seasons, but have a glut of near MLB-ready starting pitchers heading into 2015.

Additionally, most rumors concerning a Cole Hamels trade involve a team with a major catching prospect. Obviously, with this opportunity to receive impact prospects, the Phillies would take the best players possible. However, catching is a position of notable weakness in the minors, and I’d argue that between two comparable prospects of different positions, the Phillies should focus on the catcher. This is not specifically because of that weakness, but instead as a sustainably strategy. I’d similarly argue that, while the focus on pitching may be a coincidence due to what was available, it should be a concerted effort by the team.

With 25 roster spots to fill on a major league team, and nine different positions on the diamond, why would a team focus on two? Five reasons.

Value is value is value

The first point that should be addressed is that prospects are, in a sense, assets. They have a certain level of talent and can have a dollar value attached to them. As such, having a young pitcher worth $20 million and a shortstop worth $20 million should make the two relatively transferable in a trade. There are extenuating circumstances of supply and demand (AKA, desperation), but overall, equally valued players should be seen as such.

This is relevant here because below the major league level, there’s no requirement that every position of every minor league affiliate is filled with a certain threshold of quality. The Cubs are likely the best farm system in baseball, and they still have a lopsided amount of position players to pitchers. If a team developed eight MLB-ready starting pitchers, their surplus talent could be traded for MLB-ready pieces from other teams that could fill holes. There’s no reason to demand a team acquire a wide variety of prospects in trade; it’s about total depth, not positional breadth.

Pitchers are risky

In an article claiming that teams should acquire lots of pitchers and catchers, a bullet-point titled “Pitchers Are Risky” is a bit counter-intuitive. Here’s the reasoning: pitchers are often the most expensive players on the free agent market. In some crazy instances, they are getting seven year deals when as a group they have a demonstrated risk above position players.

Using the disabled list database from BaseballHeatMaps.com, pitchers have accounted for 61.09% of the total days spent on the DL in the last five years, although they only fill 48% of the roster spots. Tens of millions of dollars are wasted each season with pitchers on the DL – here’s an interesting interactive report from the New York Times at the end of the 2013 season. A disproportionately large amount of the players on that list are, you guessed it, pitchers.

In this Jeff Sullivan piece from 2011, he calculates that in a given year, 39.1% of all the prior year’s qualifying starters will end up on the disabled list, with trips averaging about 66.3 days per pitcher. This means that if you are a pitcher who qualifies for the ERA title, you have a 39.1% chance of being hurt the next year.

I would argue that, given all else is equal, there is a higher return on investment on a position player than on a pitcher of the same projected value. If a team can help it, drafting and acquiring young, cheap pitching leaves less of their money in the trainer’s room, making a more efficient team. That’s not to say that a team shouldn’t spend money – just spend it on free agent position players.

To speak about catching, as prospects they have the longest developmental path to reach the majors, and therefore typically debut at an older age. They then also hit free agency for the first time older than most, and typically have a steeper defensive aging curve than others (due to the wear and tear of the job).

More than likely, a team is paying for older years of a catcher than other first-time free agents. Teams are also beginning to commit to more years for the privilege – Russell Martin and Brian McCann each received 5-year deals at over $80 million a piece in the last two off-seasons, for players that are assumed to miss at least 30-40 games a year.

Luke Hochevar Syndrome

“Luke Hochevar Syndrome,” as I’m calling it, is named after the Kansas City Royal and 2006 first-overall draft pick. To summarize the idea, it is that pitchers have an inherently higher floor than other positions. There are twelve spots for pitchers on the typical major league roster, and short of the occasional LOOGY, any one pitcher could hypothetically pitch in any of those twelve spots.

I reference Hochevar in this case, because for him, falling short of his lofty draft position has still resulted in seven major league seasons. He didn’t have a ton of value as a starter, but transitioned to a different role as a reliever, and suddenly was a lights out relief ace, and just re-signed to a $10 million deal. And he’s what is technically considered a “disappointment,” because he didn’t become an ace.

Because there are twelve pitchers a team with different roles that can be filled by (roughly) the same player, the bar for value created is a little lower. If an expected all-star first baseman can’t hit, he won’t make it out of double-A. A projected ace starter who falters is still fairly likely to make it to the majors in some capacity.

This flexibility also exists to a smaller extent with catchers. Below is a chart detailing batting lines by position from 1969-2013, using Retrosheet data.

Pos BA OBP SLG OPS
P 0.144 0.181 0.183 0.364
C 0.252 0.318 0.384 0.702
1B 0.274 0.352 0.451 0.802
2B 0.266 0.329 0.377 0.707
3B 0.264 0.332 0.414 0.747
SS 0.258 0.315 0.363 0.678
LF 0.272 0.343 0.433 0.776
CF 0.269 0.334 0.410 0.745
RF 0.272 0.342 0.441 0.782

Aside from shortstops, catchers have had the lowest offensive bar to entry of the eight positions. Fielding and arm strength are (relatively speaking) easier to predict than hitting, from a scouting perspective. Even though most catchers fail to hit in the end, much of their value is generated from fielding the position, so again there is a higher “floor”, if the defensive scouting is done well.

Depth has heightened value

Depth has higher value for pitchers and catchers than any other positions, so having a lot of cheap, young depth is important. With the amount of time the average pitcher spends on the DL (mentioned above), most teams do not feel comfortable unless they have six or seven strong starting pitching options between triple-A and the majors.

Often, due to lack of depth, the options in triple-A are past-their-prime veterans waiting for another opportunity, a waiver wire acquisition, or similarly unappealing options. A valuable free agent could never be convinced to sign and then wait for months at a time in the minors. The way to have quality minor league depth is through prospects. Having a future 3/4 starter in triple-A to begin a season is a best-case option for a team in the event of injury. Having as many of those guys as possible would be a good thing.

Additionally, relief pitchers are sent back and forth to triple-A more frequently than any other position. They are the most inconsistent kind of ballplayer, and having extra arms to better a team’s options can help combat that inherent unpredictability.

Catchers, on the other hand, have the built-in expectation of missing games. In the last 10 years, there have only been 29 times that a catcher has started 130 game or more. Out of the 300 starting catcher roster spots that have existed during that time, that’s a pretty amount (9.67%). In fact, no catcher has caught 150 games since Gary Carter in 1982.

Due to the number of games they’ll start, having a good backup catcher is more important than other bench positions. Developing a surplus of young catching is a good way to capture value, because while a team might not keep a star on the bench, having multiple league average players could be utilized throughout the year.

Perennially wide trade market

All of the same reasoning listed above dictates that teams without this pitching and catching depth will need to trade for it. Outside of the Nationals and the Dodgers, there isn’t a team in baseball who is wholly content with their five starting pitchers. There isn’t a single team in baseball who has seven relief pitchers that they couldn’t replace. Even if the outfield has three starting positions of depth (although centerfield is a different kind of player than the corners), the Pirates, Dodgers, Padres, Nationals, Red Sox, and Brewers each have three. An almost universal trade market exists for a good pitcher, and if that pitcher has five or six seasons of team control remaining at below market costs? That player would certainly have a most teams interested.

A young, team controlled shortstop also has large value, but there’s only a need for one starting shortstop on a team. The Braves are not looking to upgrade over Andrelton Simmons, The Rockies would rather improve over almost any player but Troy Tulowitzki. On almost any given day, more GMs would call back about a young starting pitcher than a comparably young and talented position player.

This is before factoring in the July trade deadline boost to pitching. It has been postulated that teams are willing to spend more than double the off-season $/Win rate in July, and pitching is more in-demand than offense at the time, possibly because (this is my speculation) of the heightened impact relievers and starters have in individual postseason games. Without a free agent market for teams to turn to instead, contenders must spend that additional money per win on the trade market. Being the team with that pitching to trade is smart, if you are rebuilding.

Also, there’s been a recent trend towards valuing catching more and more. As mentioned above, top free agent catchers have begun to receive longer, more expensive deals. Additionally, sixteen catchers have changed teams through trade this off-season alone, including Derek Norris, Yasmani Grandal, Evan Gattis, John Jaso, Miguel Montero, and Hank Conger. These are not teams cashing out and rebuilding – teams like the A’s, Padres, Astros, and White Sox have taken from their surplus to add pieces and are building towards contention.


For a rebuilding team like the Phillies, a prospect acquisition strategy focusing on pitching and catching is a smart way to lower risk and create value. While the opportunity to get a clearly better player shouldn’t be foregone, if given the choice between equal quality, take the pitching and catching.

Pitchers and catchers of any given quality can be traded for other positions of need once a surplus has been developed. These positions have inherently high floors due to the necessity for redundancy, so there is less risk in picking a starting pitcher or defensive-minded catcher over another player. That league-wide need for redundancy also allows for wider trade markets for these players, and allowing a team to take advantage of July pitching needs, both increasing their value. Once a team is competitive, having these positions filled cheaply with youth leaves less money potentially exposed to wasted production, and any risk that exists can be off-set by greater depth.

The Phillies may insist that a pitching-heavy strategy is “just the way the cards fell,” but they should be happy they have fallen that way.

Recharging the Batteries: Pitching and Catching Should Lead Youth Movements

[Part 2] Ernie Banks’ Performance in Double Headers (Using MySQL and R)

Earlier today, I wrote about Ernie Banks’ performance in double headers, inspired by his famous quote “Let’s Play Two!”, and provided the MySQL queries and R code required to calculate these statistics in Retrosheet. Shortly afterward, I was asked a follow-up question on Twitter:

This is true – I counted all games that were a part of a double-header, regardless of if Banks played in both games. Checking if Banks appeared in both games is slightly more complicated, but still manageable.

As explained in the prior post, Retrosheet game IDs include both the date, and the number of game on that day. By checking if Ernie Banks was both in a game with the ID ending in ‘1’, and in a sub-query checking if he was also in a game with the ID ending in ‘2’ on the same date, we’re golden. I use the ‘GAME_DT’ integer field in the ‘games’ table for simplicity.

SELECT DISTINCT g1.GAME_DT
FROM GAMES as g1, EVENTS as e1
WHERE e1.GAME_ID = g1.GAME_ID
 AND SUBSTRING(e1.GAME_ID,12,1) = 1
 AND e1.BAT_ID = 'banke101'
 AND g1.GAME_DT IN (
  SELECT DISTINCT g2.GAME_DT
  FROM GAMES as g2, EVENTS as e2
  WHERE e2.GAME_ID = g2.GAME_ID
   AND SUBSTRING(e2.GAME_ID,12,1) = 2
   AND e2.BAT_ID = 'banke101'
 );

The returned values here are a series of dates in which Ernie Banks played in both games of a double header. Plugging this in as a sub-query of the original post’s WHERE clause should provide the same counting stats used in the prior calculations, only you’ll notice the number of games now matches for each half of a double-header.

Applying the same R calculations as before, I received these tables as my final output:

Year-By-Year

Year Games PA AB H X2B X3B HR BB IBB HBP SO SF SH DI BA OBP SLG OPS
1 1953 2 8 7 1 0 0 0 1 0 0 1 0 0 0 0.143 0.250 0.143 0.393
2 1954 42 172 162 44 9 2 4 6 1 0 11 1 2 0 0.272 0.300 0.426 0.726
3 1955 34 140 130 46 7 2 12 7 1 0 13 2 0 0 0.354 0.386 0.715 1.101
4 1956 46 191 173 54 14 1 10 14 4 0 16 0 0 0 0.312 0.377 0.578 0.955
5 1957 56 232 204 54 7 0 17 18 5 3 30 2 0 0 0.265 0.345 0.549 0.894
6 1958 38 167 149 44 6 3 7 12 3 1 21 2 0 0 0.295 0.359 0.517 0.876
7 1959 28 123 111 21 3 0 5 6 5 1 14 0 0 0 0.189 0.268 0.351 0.620
8 1960 38 164 142 36 11 0 7 12 7 1 15 2 0 0 0.254 0.341 0.479 0.820
9 1961 36 153 136 39 6 1 6 7 7 1 25 0 0 2 0.287 0.358 0.478 0.836
10 1962 36 146 138 42 8 2 10 5 1 2 18 0 0 0 0.304 0.342 0.609 0.951
11 1963 26 99 88 21 4 0 4 5 2 0 14 3 1 0 0.239 0.286 0.420 0.706
12 1964 34 136 127 41 8 2 7 3 3 2 14 0 1 0 0.323 0.363 0.583 0.946
13 1965 40 162 139 36 5 2 9 15 4 1 12 3 0 0 0.259 0.346 0.518 0.864
14 1966 30 111 107 21 4 1 1 1 1 0 11 2 0 0 0.196 0.207 0.280 0.488
15 1967 38 149 138 40 8 0 9 6 2 0 24 1 2 0 0.290 0.327 0.543 0.870
16 1968 30 121 108 33 5 0 8 7 0 1 13 1 4 0 0.306 0.350 0.574 0.925
17 1969 28 111 97 30 2 1 3 10 1 1 17 1 1 0 0.309 0.382 0.443 0.825
18 1970 6 20 20 7 1 0 2 0 0 0 2 0 0 0 0.350 0.350 0.700 1.050

By Game

Game Games PA AB H X2B X3B HR BB IBB HBP SO SF SH DI BA OBP SLG OPS
1 1 294 1226 1107 321 60 7 60 70 26 7 130 11 5 0 0.290 0.347 0.519 0.867
2 2 294 1179 1069 289 48 10 61 65 21 7 141 9 6 2 0.270 0.326 0.505 0.831

Total Stats

Games PA AB H X2B X3B HR BB IBB HBP SO SF SH DI BA OBP SLG OPS
1 588 2405 2176 610 108 17 121 135 47 14 271 20 11 2 0.280 0.337 0.512 0.849
[Part 2] Ernie Banks’ Performance in Double Headers (Using MySQL and R)

Finding the Houston Astros/Milwaukee Brewers Game From Boyhood

Who are the Astros playing today? The Milwaukee Brewers. Get to know them, get to hate them.

Richard Linklater’s Boyhood is the arguable front-runner for this year’s Academy Award for Best Picture. In the film, the same group of actors is filmed each year for 12 years to tell the story of a Houston-area family’s path through life.

Midway through the film, the divorced father of the two children (Ethan Hawke) takes his children to a Houston Astros game, where they are playing the despised Milwaukee Brewers. Roger Clemens is the starting pitcher, he wins, and the team magically walks-off on a Jason Lane home run.

The home run is seen in the above trailer at 0:46, and the filmmakers were clearly actually at this game. So, I searched through game logs and found that Linklater took some dramatic license with the events of the actual game.

Clemens was a starting pitcher with the Astros for three seasons, from 2004-2006, late into his career. In the film, it is mentioned that he is 42 years old, and on that day has a 1.47 ERA. Clemens was age 42 for the 2005 season, and also did have a league-leading (and career-best) 1.87 ERA that season. He was third in Cy Young Voting that season, and was worth an astounding 7.8 rWAR over 211.1 IP.

The Astros became the National League Champions that season, losing to the Chicago White Sox in the World Series.

But, did he ever face the Brewers in Houston that year? Yes. He faced Milwaukee twice that season, on August 18 and September 9. Only August 18 was a home game.

Jason Lane also did hit a home run in that game – so this was clearly the game attended by the filmmakers. However, although the teams, starting pitcher, and home run were all accurate, most other details were changed to benefit the story.

First, Lane didn’t hit a walk-off home run. He hit a solo shot in the bottom of the 2nd, temporarily putting the Astros ahead 1-0. I say temporarily, because it turns out that Houston didn’t even win this game.

Clemens gave up 4-runs in the 6th and 7th innings, and was replaced by Chad Qualls. Additionally, the detail about Clemens’ ERA entering the day wasn’t totally accurate either. Game logs report that he did have a 1.47 ERA after his July 17 start (roughly a month earlier), but he actually had an even lower 1.32 ERA entering the filmed start.

All this being said, it’s obviously not a criticism of the film that they took dramatic license with a baseball game – it’s a phenomenal film that I will be rooting for at the Oscars.

I have been unable to find the broadcast footage of Jason Lane’s 2nd-inning home run, but I would be very curious to see it if anyone has it.

Finding the Houston Astros/Milwaukee Brewers Game From Boyhood

Ernie Banks’ Performance In Double Headers (Using MySQL and R)

Chicago Cubs’ legend Ernie Banks passed away this week at age 83, leaving behind a legacy of 12 All-Star appearances, two NL MVP awards, and 512 home runs over a 19-year Hall of Fame career. He also earned 67.5 career rWAR, and is considered one of the greatest shortstops in history. However, the most memorable moment in his career is arguably a single quote, before an otherwise irrelevant game:

It’s a great day for a ballgame; let’s play two!

In the days since his passing, “Let’s play two!” has been the popular refrain for anyone referencing his career. Hearing it said so many times, I had an idea – how well did Banks perform when he DID play two?

Unfortunately, nothing in this world is original, and at least one blogger, William Juliano of the Yankees-centric The Captain’s Blog, got to it back in 2011. In an effort to find my own spin on this idea, it seemed simple enough that I could figure these numbers out on my own using MySQL and R, and the Retrosheet database, and reasonably explain how others can do the same.

I’m not an expert in the area, despite having some past database experience, and I consider this as much a learning experience for myself as it is for anyone reading, so anyone with comments or suggestions, please let me know.

Prerequisites

Someone attempting these queries and calculations should have some prior experience with SQL and relational databases, and some understanding of basic statistics. An installation of MySQL and the R statistics package are used by me, and I find programs like Sequel Pro and RStudio are incredibly helpful interfaces for these languages.

As far as the data itself, we’ll be using Retrosheet’s Event Files. Ascii-text file versions of these event logs can be found on their website, for free, linked above. However, the easier (and faster) solution would be to install the SQL dump generously maintained by Baseball Heat Maps, an absolutely awesome site. I’ll be using the schema in their version, but at worst only the column names would change slightly from version to version.

Finding the Games

In Retrosheet, we want to find all games played by Ernie Banks that were part of a double header. Fortunately, unique Retrosheet Game IDs are formatted in such a way as to help with this.

CHN195405221 -> CHN 1954 05 22 1

The prefix is home team’s Retrosheet ID, in this case the Chicago Cubs. The next eight digits are the date of the game, in YYYY/MM/DD format. The final number identifies the number of the game played in that stadium, on that day. On days where only one game is played, this final digit is a 0. In the first game of a double header, this number is 1, and 2 for the second game. This query returns all Game IDs of the relevant games:

SELECT DISTINCT e.GAME_ID
FROM EVENTS as e
WHERE SUBSTRING(e.GAME_ID, 12, 1) != 0
 AND e.BAT_ID = 'banke101';

Next, we’ll want to gather a list of all individual ‘events’ (roughly, plate appearances) that occurred while Banks was at the plate, in each of these games. Since I’ll be calculating some basic stats (Batting Average, On-Base Percentage, Slugging Percentage, and OPS), I want to check if each event falls under a certain category (for these purposes, AB, H, 2B, 3B, HR, BB, IBB, HBP, SO, SF, SH, and Reaching on Defensive Indifference).

Almost all of these events can be determined by the ‘EVENT_CD’ field. It’s an integer value that relates to the data-use table found here. By checking if the EVENT_CD = 3, for instance, one can see if the plate appearance ended in a strikeout. Other fields used here are the ‘AB_FL’ (was this an at-bat?),  ‘SF_FL’ (sacrifice fly), and ‘SH_FL’ (sacrifice hit).

Removing the ‘DISTINCT’ keyword from the previous query, and adding the necessary IF statements, this query returns the results of every Ernie Banks plate appearance in a double header.

SELECT SUBSTRING(e.GAME_ID, 4, 4) as Year, SUBSTRING(e.GAME_ID, 12, 1) as Game,
 IF(e.AB_FL='T', 1, 0) as AB,
 IF(e.EVENT_CD>=20,IF(e.EVENT_CD<=23,1,0),0) as H,
 IF(e.EVENT_CD=21,1,0) as 2B,
 IF(e.EVENT_CD=22,1,0) as 3B,
 IF(e.EVENT_CD=23,1,0) as HR,
 IF(e.EVENT_CD=14,1,0) as BB,
 IF(e.EVENT_CD=15,1,0) as IBB,
 IF(e.EVENT_CD=16,1,0) as HBP,
 IF(e.EVENT_CD=3,1,0) as SO,
 IF(e.SF_FL='T',1,0) as SF,
 IF(e.SH_FL='T',1,0) as SH,
 IF(e.EVENT_CD=5,1,0) as DI
FROM EVENTS as e
WHERE SUBSTRING(e.GAME_ID, 12, 1) != 0
 AND e.BAT_ID = 'banke101';

For ease of use, we want to group these counting stats by year, and by game in the double header (either 1 or 2). Add a COUNT(*) selection to count the number of games played under each year/game combination, a SUM() around each IF statement, and a GROUP BY at the end of the query.

SELECT SUBSTRING(e.GAME_ID, 4, 4) as Year, SUBSTRING(e.GAME_ID, 12, 1) as Game,
 COUNT(DISTINCT e.GAME_ID) as Games,
 SUM(IF(e.AB_FL='T', 1, 0)) as AB,
 SUM(IF(e.EVENT_CD>=20,IF(e.EVENT_CD<=23,1,0),0)) as H,
 SUM(IF(e.EVENT_CD=21,1,0)) as 2B,
 SUM(IF(e.EVENT_CD=22,1,0)) as 3B,
 SUM(IF(e.EVENT_CD=23,1,0)) as HR,
 SUM(IF(e.EVENT_CD=14,1,0)) as BB,
 SUM(IF(e.EVENT_CD=15,1,0)) as IBB,
 SUM(IF(e.EVENT_CD=16,1,0)) as HBP,
 SUM(IF(e.EVENT_CD=3,1,0)) as SO,
 SUM(IF(e.SF_FL='T',1,0)) as SF,
 SUM(IF(e.SH_FL='T',1,0)) as SH,
 SUM(IF(e.EVENT_CD=5,1,0)) as DI
FROM EVENTS as e
WHERE SUBSTRING(e.GAME_ID, 12, 1) != 0
 AND e.BAT_ID = 'banke101'
GROUP BY Year, Game;

Export these results to ‘banks.csv’, and open RStudio to manipulate the data.

Calculating Rates

Once in RStudio, the first thing to do is install and import the “plyr” library, which will be used split and aggregate the data, and then import the data itself from ‘Banks.csv’:

install.packages("plyr")
library("plyr")

data <- read.csv( file = "Desktop/banks.csv", header=TRUE, sep="," )

A new column calculating the number of plate appearances in each row should be calculated for convenience:

data$PA <- data$AB + data$BB + data$IBB + data$HBP + data$SH + data$SF + data$DI

New data frames are needed, as we want to look at Banks’ year-by-year performance, his career performance in either the first or second game of double header, and the 19-year total. The ddply() function will be used to split by year or game and summarize that data, and summarize() will be used for the single career total.

year_stats <- ddply(data, .(Year), summarize, Games = sum(Games), PA = sum(PA), AB = sum(AB), H = sum(H), X2B = sum(X2B), X3B = sum(X3B), HR = sum(HR), BB = sum(BB), IBB = sum(IBB), HBP = sum(HBP), SO = sum(SO), SF = sum(SF), SH = sum(SH), DI = sum(DI))
game_stats <- ddply(data, .(Game), summarize, Games = sum(Games), PA = sum(PA), AB = sum(AB), H = sum(H), X2B = sum(X2B), X3B = sum(X3B), HR = sum(HR), BB = sum(BB), IBB = sum(IBB), HBP = sum(HBP), SO = sum(SO), SF = sum(SF), SH = sum(SH), DI = sum(DI))
total_stats <- summarize(data, Games = sum(Games), PA = sum(PA), AB = sum(AB), H = sum(H), X2B = sum(X2B), X3B = sum(X3B), HR = sum(HR), BB = sum(BB), IBB = sum(IBB), HBP = sum(HBP), SO = sum(SO), SF = sum(SF), SH = sum(SH), DI = sum(DI))

View()-ing each of these data frames, you’ll notice the aggregate of each counting stat category.

The final steps are to append the necessary rate stat columns to each of these new data frames. For simplicity’s sake, we’ll stick with the basics: BA, OBP, SLG, and OPS. The “By Year” stats and the “Double Header Game” stats are calculated in a slightly different way than the total line, due to having more than one row as a result.

Instead of “summarizing” this data again (which creates additional data frames), the “transform” function should be used to append new columns to existing frames:

year_stats <- ddply(year_stats, .(Year), transform, BA = H/AB, OBP = (H+BB+IBB+HBP)/(AB+BB+IBB+HBP+SF), SLG = (H+X2B+(2*X3B)+(3*HR))/AB) 
year_stats$OPS <- year_stats$OBP + year_stats$SLG
game_stats <- ddply(game_stats, .(Game), transform, BA = H/AB, OBP = (H+BB+IBB+HBP)/(AB+BB+IBB+HBP+SF), SLG = (H+X2B+(2*X3B)+(3*HR))/AB) 
game_stats$OPS <- game_stats$OBP + game_stats$SLG

As the “OBP” and “SLG” columns don’t exist at the time of calling the ddply function, the “OPS” column has to be created in a separate step in the code above.

As for the total data frame, we won’t use a function to do the transform. For ease of writing, I attach() the “total_stats” data frame to cut down on typing, and detach it before the final line.

attach(total_stats)
total_stats$BA <- H/AB 
total_stats$OBP <- (H+BB+IBB+HBP)/(AB+BB+IBB+HBP+SF)
total_stats$SLG <- (H+X2B+(2*X3B)+(3*HR))/AB
detach(total_stats)
total_stats$OPS <- total_stats$OBP + total_stats$SLG

The Results

At this point, stat lines have been calculated and are ready for viewing. Below are my results:

year-by-year

Year Games PA AB H X2B X3B HR BB IBB HBP SO SF SH DI BA OBP SLG OPS
1 1953 2 8 7 1 0 0 0 1 0 0 1 0 0 0 0.143 0.250 0.143 0.393
2 1954 46 189 178 52 9 2 6 7 1 0 11 1 2 0 0.292 0.321 0.466 0.787
3 1955 37 153 142 49 8 2 14 8 1 0 16 2 0 0 0.345 0.379 0.725 1.104
4 1956 47 196 176 55 15 1 10 15 5 0 17 0 0 0 0.312 0.383 0.580 0.962
5 1957 56 232 204 54 7 0 17 18 5 3 30 2 0 0 0.265 0.345 0.549 0.894
6 1958 39 171 153 44 6 3 7 12 3 1 21 2 0 0 0.288 0.351 0.503 0.854
7 1959 28 123 111 21 3 0 5 6 5 1 14 0 0 0 0.189 0.268 0.351 0.620
8 1960 38 164 142 36 11 0 7 12 7 1 15 2 0 0 0.254 0.341 0.479 0.820
9 1961 36 153 136 39 6 1 6 7 7 1 25 0 0 2 0.287 0.358 0.478 0.836
10 1962 36 146 138 42 8 2 10 5 1 2 18 0 0 0 0.304 0.342 0.609 0.951
11 1963 30 110 98 23 4 0 4 6 2 0 17 3 1 0 0.235 0.284 0.398 0.682
12 1964 36 145 135 42 8 2 7 4 3 2 16 0 1 0 0.311 0.354 0.556 0.910
13 1965 41 166 143 36 5 2 9 15 4 1 12 3 0 0 0.252 0.337 0.503 0.841
14 1966 34 129 123 25 4 1 1 2 2 0 13 2 0 0 0.203 0.225 0.276 0.501
15 1967 40 158 146 42 8 0 10 6 3 0 25 1 2 0 0.288 0.327 0.548 0.875
16 1968 33 134 119 37 7 0 8 7 1 1 14 1 5 0 0.311 0.357 0.571 0.928
17 1969 29 115 101 31 2 1 3 10 1 1 18 1 1 0 0.307 0.377 0.436 0.813
18 1970 12 33 32 9 1 0 2 1 0 0 5 0 0 0 0.281 0.303 0.500 0.803
19 1971 6 15 14 3 0 0 1 0 0 0 3 0 1 0 0.214 0.214 0.429 0.643

Double Header Game

Game Games PA AB H X2B X3B HR BB IBB HBP SO SF SH DI BA OBP SLG OPS
1 1 319 1317 1188 337 63 7 63 76 29 7 149 11 6 0 0.284 0.343 0.508 0.850
2 2 307 1223 1110 304 49 10 64 66 22 7 142 9 7 2 0.274 0.329 0.509 0.838

Total

Games PA AB H X2B X3B HR BB IBB HBP SO SF SH DI BA OBP SLG OPS
1 626 2540 2298 641 112 17 127 142 51 14 291 20 13 2 0.279 0.336 0.508 0.844

It makes logical sense that a batter would be more fresh (AKA better) in the first game of a double header, as this data correlates. Ernie Banks was a career .274/.330/.500 hitter. In this surprisingly large sample (double headers were more common in the past), Banks was not too far off from his career line. However, he did in fact have a slightly higher OPS on days where he was able to “play two.”

Ernie Banks’ Performance In Double Headers (Using MySQL and R)

Creating a Productive Ryan Howard in 2015

It’s no secret that the Philadelphia Phillies have spent the 2015 off-season attempting to trade first baseman Ryan Howard. It’s even less of a surprise that they’ve had some difficulty in finding a suitor to take on the remaining two years of his contract, while receiving any significant salary relief.

Since tearing his achilles’ tendon on the final play of the 2011 NLDS, Howard has ceased to be the same power-hitter he was before his injury, when he was already beginning to decline. The additional irony of Howard’s injury is that it occurred at exactly the moment when his widely-criticized extension went into effect.

Period BA OBP SLG OPS HR % BB % SO %
2004-2011 .275 .368 .560 .928 6.48 12.25 27.38
2012-2014 .213 .309 .412 .720 3.82 9.15 30.55

So, it’s not pretty. Since the injury, he has struck out a bit more, walked a bit less, and hit fewer home runs. The defense is even worse, and he has never been the fastest runner on the basepaths. As a $25 million commitment each year, it is highly unlikely he produces at a level to be worth it to a team, and a trade appears unlikely (even if heavily subsidized). Odds are he’s with the Phillies opening day, and the goal will be to make him as appealing to potential partners as possible. A creative solution to generating value would be useful, as there is little reason to expect much to change remaining with the status quo.

What’s the problem?

One would assume that the most immediate problem would be the worsening platoon splits against LHP that have plagued Howard for his entire career. While they still are a problem of late, Howard’s performance against each handedness isn’t as different as in the past.

Platoon BA OBP SLG OPS HR % BB % SO %
RHP .248 .325 .418 .743 3.31 10.06 26.29
LHP .200 .272 .397 .669 4.97 7.07 40.31

Howard’s OPS split since 2012 (74 points) is actually exactly one-third his career split (222 points). His performance against RHP is far behind his career average – even when attempting to imagine a reasonable decline, the 212 point drop is extreme. There is another split that is very noticeable since Howard’s injury, however.

Base-State BA OBP SLG OPS HR % BB % SO %
RISP .271 .383 .462 .845 3.81 14.97 27.92
Other .218 .275 .392 .667 3.82 6.49 31.75

Batting with Runners In Scoring Position is not typically the most compelling piece of statistical evidence. Situational stats of which the batter has no control are not good for evaluative purposes – but what if there was something inherently different about Ryan Howard coming to the plate with RISP than other base-states?

It so happens that game conditions are typically different for Howard in that specific situation. With a runner on second and/or third, the opponent has to closely cover third base. It’s a proxy for situations where the fielding team is unable to use an extreme shift to combat his spray charts.

So, while it is not an exact measure of when other teams are shifting against Howard, it appears that when coming to the plate in situations where the shift is impossible, he is still hitting at near an All-Star level.

With this piece of information, is it possible to construct a plan to not only make Ryan Howard a usable player, but a productive one as well? Below are four suggestions for maximizing the return from playing Howard in 2015.

Limit Exposure to Fielding When Possible

The first, and most obvious way to minimize Howard’s deficiencies is to use him in as limited a defensive role as possible. Any inter-league game played by the Phillies in an American League ballpark should see Ryan Howard as the starting designated hitter.He should be used as a pinch hitter on days off, and replaced defensively anytime it seems like he’s had his final plate appearance in the 8th or 9th inning.

If traded to an American League club, there’s no reason for him to be anything other than the DH in the vast majority of games.

Keep Ryan Howard In The Cleanup Spot

A more un-intuitive advisement would be to keep Howard in his traditional line-up position, cleanup hitter. While he’s neither the same power hitter he used to be, nor the best hitter on the team to justify that position, there’s an interesting side effect to keeping him there. In The Book, Tom Tango, Mitchel Lichtman, and Andy Dolphin describe lineup optimization, the most efficient way to order batters using data on how frequently each spot comes to bat with men on-base. A quick summary of their findings can be read here.

Unsurprisingly, one important realization from their research is that the cleanup hitter comes to the plate with men on-base more frequently than any other spot in the lineup. They conclude that the team’s best overall hitter (preferably with some power) should be placed here.

However, the 2015 Phillies are not a team expected to compete – they are more concerned with creating trade value for various veterans than scoring every last run. As mentioned above, Ryan Howard’s largest problem is how frequently teams shift against him – in order for him to produce as much value as possible, the team will want to minimize exposing him to the situations that result in the shift.

The best way to do that is to place him in the lineup position where he’s most likely to come to the plate with runners in scoring position, making a shift impossible. While it may fly in the face of sabermetric wisdom, Ryan Howard’s production will be strongest from a position that may actually be detrimental to the run differential.

Bunt Against The Shift

The argument about having RISP is good, but Howard has been in the cleanup spot in almost every game he’s played since his injury, and he is still in an unideal position in most situations. From 2012-2014, Ryan Howard has had RISP in 31.3% of plate appearances, and shift-possible base states in the other 68.7% of the time. How can this negative majority be neutralized?

He could be platooned, physically limiting the number of times he faces left-handed pitchers inducing weak contact. However, as his platoon splits illustrate, Howard’s lived far below his career rates against either handedness, since 2012. The only obvious way to address this unavoidable situation is a solution written widely about, yet largely dismissed by baseball.

Bunting. The obvious hole created for left-handed hitters at third base by the extreme shift is almost always ignored. In fact, opposing pitchers even get angry with batters for exploiting such an obvious flaw. Why is this sound strategy ignored? It’s largely pride, a loose definition of “unwritten rules”, and some traditional notions about the roles of a slugging first baseman.

It should first be acknowledged that the reason for discussions about Ryan Howard of late is that his injury and age have limited his abilities to the point of not being an effective slugging first baseman – so his role does need to change.

If pride can be swallowed, and bunting into the shift is framed as an intentional walk instead of a concession, and this strategy is truly committed to in all opportunities, how effective would it be?

Copious amounts have been written estimating the value of bunting into the shift. Jeff Sullivan from Fangraphs estimated that 66% of bunts hit fairly would result in the batter reaching first base. Bill James himself even mentioned that he thought a young slugger could hit .700 adopting the bunt.

However, as statistics directly measuring shifts are not readily available, the baseline success rate chosen here comes from Grantland’s Ben Lindbergh, who cites Inside Edge’s statistics on bunts and shifting in 2013:

“…there were only 50 bunt attempts against the shift last season (out of several thousand opportunities) … Twenty-seven of the 50 bunt attempts went for hits — a .540 average…”.

That is still an astoundingly high rate in a disappointingly small sample size. It is yet to be known how teams would adjust to bunting, were it so successful and more widespread. However, for this article, this number will be used (as the only direct measurement of bunts against the shift).

Howard’s body type, injury history, and age make the idea of a .540 average on any kind of ground-ball seem unlikely, and as that number is considered league average, an adjustment should be made for his below-league average speed. From 2012-2014 (since his injury), Ryan Howard had a 5.28% infield-hit rate, whereas the league average over that time was 6.42%.

That means he has converted ground-balls into hits only 82.2% of the league average rate, and when applying that to the .540 bunt average, it works out to .444. To simplify estimating a shift slash line, assume both that every hit is a single, and that he draws no walks while bunting. However, Howard still can strike out and is hit by pitches at his career rate, so expect an on-base percentage slightly higher than his average. In these situations, one could reasonably expect Ryan Howard to have .444/.448/.444 slash line (a .892 OPS), were he to bunt every time.

Howard is also still an effective player against pitchers of either handedness when given the opportunity to face them without the shift. Humor me for a moment, and imagine that between improvements from finally being healthy, and decline related to his increased age, his abilities in 2015 are a relative push from the previous year. If, as he has since the injury, he produces a line near .271/.383/.462 (a .845 OPS) with RISP, that’s still a very productive player. With RISP being a proxy for situations without the shift, the only remaining question is how other teams would adjust to this strategy. The three options are:

  1. Opponents continue to shift, preferring an almost guaranteed single over possible extra-base hits.
  2. Opponents change the manner in which they shift.
  3. Opponents shift less frequently against Howard.

The first option is great, and as a result continues to expose this hole on the way to the Comeback Player of the Year award and the most hilariously strange MVP hot takes of all time. If the proportion of situational plate appearances remains the same (68.7% versus 31.3%), using the above calculated slash lines, you’d see Ryan Howard batting .397/.427/.445 with fewer than 10 home runs, and over 175 strikeouts over 600 plate appearances. It is absolutely ridiculous sounding, and it’s hard to ever imagine this happening in real life.

The next alternative is a fairly likely one, in that teams somehow adjust how they shift to compensate for the bunts. As a slow runner, this would likely have an impact on Howard’s ability to bunt, but it might also compromise the integrity of opponents’ shifts, making Howard a more effective batter in the general case. This is the least preferable scenario, but would still likely improve his numbers over the present situation.

The final option is that the overall frequency of shifts goes down. It would take a few weeks to take effect, likely, and would not disappear all together the rest of the year. If the proportion of shifts roughly flips (and instead occurs in 31.3% of plate appearances), Howard could earn a line around .331/.403/.458, with still fewer than 20 HR and about 175 strikeouts. Still an outrageously strong line, even if he is more Joey Votto than old Ryan Howard. Basically, there doesn’t seem to be a way that the bunting strategy would be detrimental to Howard’s results.

Do not implement a traditional platoon

To extract the most possible value out of Ryan Howard, the Phillies should not artificially limit his perceived abilities by unnecessarily platooning him, despite it being tempting. His numbers may increase some, but enforcing the perception of Howard’s weaknesses and limited role will place an asterisk on his role.

This would be necessary were he totally unproductive otherwise, but these scenarios estimate that he would still be more than justifying his everyday role, given the totality of the performance.

This also isn’t a problem of blocking a prospect, as some have opined. Maikel Franco is, in 2015, actually a better defensive third baseman than Cody Asche, and should be given priority in all positional decisions. Darin Ruf has one more Triple-A option, and that will likely be used.

Additionally, Phillies’ GM Ruben Amaro has stated publicly that he hopes Franco begins 2015 back in Triple-A Lehigh Valley. As the GM of the team, it doesn’t make a lot of sense why he needs to “hope” that, as he has the power to make that happen.

Regardless, Franco may benefit from the extra time, and he only requires 40 days in the minors this season to assure an extra year of team control (through service time weirdness). For all relevant purposes, giving Ryan Howard playing time is not a roadblock.


2015 looks to be a rough season for the Phillies at the major league level. With eyes firmly pointed at the future, the team has a vested interest in moving veterans like Ryan Howard while alleviating as much of their future financial obligations as possible. Given that his value is likely at its valley, it will not hurt the team in any measurable way to try to use 2015 to reconstruct his image in the eyes of the league.

While Howard is likely to never hit 58 home runs again, there are simple things that can be done to improve his productivity, and possibly, result in what most would actually consider to be a good season.

Creating a Productive Ryan Howard in 2015