Video Game sales over the past few years

Ira Villar
Posted on Jun 15, 2020


I used to be an avid gamer from the days of the Super Nintendo System all the way to the recent Playstation 3 system. I did take a break however and I haven't played video games for the past five years. I can't even recognize the games of today anymore. A lot has happened and with that in mind, I wanted to explore further to see the general trends of video game sales for the past 20 or even 30 years. 

Data set can be downloaded from here. (Kaggle dataset) I wanted to do some basic exploratory data analysis to get some insight on how and which games sell the most. I've also created a tableau visualization which can be found here. And finally, my code can be found on my GitHub repository which can be found here.

First up was to see the scope and range of the data available. I wanted to see what type of games were available so I got the percentages per video game platform.

As you can see, the games are predominantly on the Nintendo DS with games on the Playstation 2 a close second. 

More important than that however I also wanted to get a scope of the global sales, both by platform and by year as well. (sales are in millions)


From this chart, we can see that within the data set, most global sales come from Playstation 2 predominantly, with a good mix of Playstation 3 and Xbox 360 thrown in. These games seemed to be from an older generation so I wanted to see how dated my set was. The next step was to see global_sales per year.

From our data set, most global sales were between the years 2007 and 2010. Possible explanations for this include the new release of the PS3 and Xbox360 systems as well as the so-called prime age for the popular PS2 model. 

I created a histogram to see how well games were selling in general. 

It seems like there's an outlier(which we'll get to later) so I took a closer look instead.

The histogram may be right-skewed but this data frame includes many games that were released but didn’t do well. For every best-seller like Final Fantasy 7 out there, there are many more like the poorly received “Men in Black II: Alien Escape”

I needed to check the global sales per game to see how these years differ from the others. 

This scatterplot shows a great outlier in around 2007. More than two times the sales of the far next best selling games. I got the head of the data set showing important columns to see just what this game was and how far the next games were.

The numbers are in millions and it's clear to see that Wii Sports is our outlier. More than two times the next best selling game, Super Mario Bros. 

I then noticed the trend that all 5 of the best games were from Nintendo. The next step was to see how strong each publisher was for each area to get an idea of the different regions' tastes.

It's interesting to note the similar tastes in games when it comes to North America and Europe. Mainly Nintendo, with the same order of studios following closely behind.  Japan and the rest of the world, on the other hand, showed a different story.

While Nintendo was still the strongest, Japan and the rest of the world showed more love for Sony Computer Entertainment. 

To dive even deeper into this, I wanted to explore the differences in the genre per location to get an idea of what type of games sell best and where.

If the graphs are small in this blog you can easily go over to the tableau page provided above and get a closer look. 

Throughout all the regions, the best-sellers are; Action, Sports, and  Shooter. And the highest markets are North America and Europe. Interesting to note that Japan is the prime market when it comes to Role-playing games. (Their highest selling genre)

It's clear to see that games sell best in North America and Europe. Mainly of the action or shooter type. Games do sell well in Japan as well if they're role-playing games. There is seasonality for when games sell best but I think I'll need an updated data set to prove that hypothesis. Finally, the greatest selling games are from Nintendo, which ironically, focuses on family games and not action shooters. 

It was an interesting deep dive but again for future studies, I'd like to see an updated version of the data set to see if indeed, games sell the best during generational shifts in video game platforms.





About Author

Ira Villar

Ira Villar

Ira is currently a Data Science Fellow at the NYC Data Science Academy. He has nearly a decade of experience in film directing and production. This gives him a unique insight and perspective when it comes to data...
View all posts by Ira Villar >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp