Data Analysis on the Steroid Era in MLB

Posted on Nov 6, 2020
The skills  demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Data Analysis on the Steroid Era in MLB

Introduction

This is a quick look back at what is referred to as the Steroid ERA in baseball. Data shows starting in the mid 90's and going into the 2000's the MLB had some great home run races between hitters like Barry Bonds, Sammy Sosa, Mark McQuire, and Ken Griffey Jr. The hallowed mark of 61 homers was shattered after over 40 years of being untouchable.

Data Analysis on the Steroid Era in MLB

Objective

The question is:  What accounts for that dramatic rise in homeruns? Could it just be that we had some great hitters during that time? Or could it be a lively ball? Could it be rule changes, or umps calling a tighter strike zone? Can we chalk it up to smaller ball parks or more teams and diluted pitching?

My analysis, we are going to look at the time frame from 1995 - 2005 as the steroid ERA. No one knows for certain what years and which players To try to find the answers, we are going to look at the time frame from 1995 - 2005 as the steroid ERA. In fact, no one knows for certain which players took steroids. For our purposes, we’re relying on a list found here... from bleacher reports for the players on use of steroids.  This is for comparison only -and not to point fingers at particula players.- I'll leave that to Jose Canseco (admitted user and writer of several books on players using steroids). 

Data Analysis on the Steroid Era in MLB

Let's start with a quick look back at baseball and home runs! ​​​

Background Data

The period prior to 1920 is known as the deadball period in baseball. Home runs were not a large part of the game until a man named Babe Ruth showed up. He started hitting the long ball just before the 20's and changed the way batters approached the game.  Babe Ruth dominated baseball for 20 years, sometimes putting up more homeruns than many teams..

Home runs continued to rise in the sport until WWII when it dipped due to top talent serving in the war.After the 40's, home runs continued to rise until the 60's when rule changes impacted the game. “The year of the pitcher” was in 1968 r because the rules had changed so much that the pitchers dominated this time. MLB made changes to the rules like lowering the mound and lowering the strike zone.

After the 40's, homeruns continued to rise until the 60's when rule changes impacted the game. 1968 was called the year of the pitcher because the rules had changed so much, that the pitchers dominated this time. MLB made changes to the rules going forward, steps lowering the mound and lowering the strike zone.

Data

Sudden Increase

From 1994 until 2005, baseball had more batters hitting over 40 home runs per year than any other 10 year time in baseball. In fact, it was more than twice any other 10 year time period in baseball. (Chart below)

Continue Growth

From 1994 until 2005, baseball had more batters hitting over 40 homeruns per year than any other 10 year time in baseball. In fact, it was more than twice any other 10 year time period in baseball. (Chart below)

Below is the same chart scaling all batters to 500 at bats. It is still double any other 10 year period.

Age

The other remarkable oddity during this time is that the players were hitting home runs at an older age.  Players extended their careers during this time by gaining strength as they aged. They also managed to stay healthier later in their careers.hey accomplished remarkable feats.

The left side of the chart shows career home runs. The bottom is HR after 31. Orange is finished career before 1994. Blue is finished or still playing after 1994. Players are hitting more home runs in their 30's than ever before.

Conclusion

My conclusion to all this is that it appears obvious that  something was going on during this time. It may even still be going on (look up Nelson Cruz). When stats are scaled per at bat, number of teams, etc,. we still have more batters hitting over 40 home runs in a season during that time than any other time in history. 

We had the 60 home run mark beaten 6 times during that time and only once in all the other years of​​​​​ baseball. During that time, 55 home runs were hit 11 times compared to 7 times total in all the years of baseball. Baseball parks didn't suddenly grow after that time.  Players didn't start aging faster after that time period, and rules didn't suddenly change. It looks to me like the explanation we have, plus the number of players that have since admitted to or have been leaked from reports, shows the best explanation is stillsSteroids.

About Author

Paul Sprouse

Paul Sprouse, I will add all kinds of great facts to this section shortly.
View all posts by Paul Sprouse >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI