Data Analysis on the Progression of Video Games

Posted on Jun 15, 2020
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Data Analysis on the Progression of Video Game

I used to be an avid gamer from the days of the Super Nintendo System all the way to the recent Playstation 3 system. However, I did take a break and I haven't played video games for the past five years. I can't even recognize the games of today anymore. A lot has happened and with that in mind, I wanted to explore further to see the general data trends of video game sales for the past 20 or even 30 years. 

Data set can be downloaded from here. (Kaggle dataset) I wanted to do some basic exploratory data analysis to get some insight on how and which games sell the most. I've also created a tableau visualization which can be found here. And finally, my code can be found on my GitHub repository which can be found here.

Data

First up was to see the scope and range of the data available. I wanted to see what type of games were available so I got the percentages per video game platform. Data Analysis on the Progression of Video Game Data Analysis on the Progression of Video Game

As you can see, the games are predominantly on the Nintendo DS with games on the Playstation 2 a close second. 

Global Sales

More important than that however I also wanted to get a scope of the global sales, both by platform and by year as well. (sales are in millions)

 

From this chart, we can see that within the data set, most global sales come from Playstation 2 predominantly, with a good mix of Playstation 3 and Xbox 360 thrown in. These games seemed to be from an older generation so I wanted to see how dated my set was. The next step was to see global_sales per year.

From our data set, most global sales were between the years 2007 and 2010. Possible explanations for this include the new release of the PS3 and Xbox360 systems as well as the so-called prime age for the popular PS2 model. 

Histogram

I created a histogram to see how well games were selling in general. 

It seems like there's an outlier(which we'll get to later) so I took a closer look instead.

The histogram may be right-skewed but this data frame includes many games that were released but didn’t do well. For every best-seller like Final Fantasy 7 out there, there are many more like the poorly received “Men in Black II: Alien Escape”

I needed to check the global sales per game to see how these years differ from the others. 

This scatterplot shows a great outlier in around 2007. More than two times the sales of the far next best selling games. I got the head of the data set showing important columns to see just what this game was and how far the next games were.

The numbers are in millions and it's clear to see that Wii Sports is our outlier. More than two times the next best selling game, Super Mario Bros. 

I then noticed the trend that all 5 of the best games were from Nintendo. The next step was to see how strong each publisher was for each area to get an idea of the different regions' tastes.

Data Analysis on the Progression of Video Game

It's interesting to note the similar tastes in games when it comes to North America and Europe. Mainly Nintendo, with the same order of studios following closely behind.  Japan and the rest of the world, on the other hand, showed a different story.

While Nintendo was still the strongest, Japan and the rest of the world showed more love for Sony Computer Entertainment. 

To dive even deeper into this, I wanted to explore the differences in the genre per location to get an idea of what type of games sell best and where.

If the graphs are small in this blog you can easily go over to the tableau page provided above and get a closer look. 

Analysis

Throughout all the regions, the best-sellers are; Action, Sports, and  Shooter. And the highest markets are North America and Europe. Interesting to note that Japan is the prime market when it comes to Role-playing games. (Their highest selling genre)

It's clear to see that games sell best in North America and Europe. Mainly of the action or shooter type. Games do sell well in Japan as well if they're role-playing games. There is seasonality for when games sell best but I think I'll need an updated data set to prove that hypothesis. Finally, the greatest selling games are from Nintendo, which ironically, focuses on family games and not action shooters. 

It was an interesting deep dive but again for future studies, I'd like to see an updated version of the data set to see if indeed, games sell the best during generational shifts in video game platforms.

 

 

 

 

About Author

Ira Villar

Ira is currently a Data Science Fellow at the NYC Data Science Academy. He has nearly a decade of experience in film directing and production. This gives him a unique insight and perspective when it comes to data...
View all posts by Ira Villar >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI