A Data Analysis of Video Game Popularity

Posted on Feb 27, 2020
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Ever since I was little I have always enjoyed the sheer thrill that video games can bring. Whether it is climbing an enormous mountain to defeat a dangerous foe, to riding a horse across the desert, or to simply just solve a puzzle using a gun that generates portals; the fun and creativity that occurs through the art of video games is endless, and data shows it's getting bigger and better every year. 

Through the immense power that video games have grown to become, and the love that I share for them, I wondered to myself: what is it that makes a video game popular? The style of play? Marketing power? The beautiful design? Critics? Users? These are questions that have motivated me to dive into this project, to see what I could find out.


Keep in mind:

The work done on this project can be located at https://github.com/jhoffme1/Video-Game-Web-Scraping-Project. This project is not yet complete; the material analyzed is based off of data that I have scraped through metacritic.com, more data will be gathered and analyzed for further understanding.

The libraries that I used for this project include:

  • Jupyter notebook
  • Matplotlib
  • Pandas
  • Seaborn
  • nltk
  • Scrapy
  • Anaconda

Metacritic and review scores:

Metacritic has been around for the better part of two decades, and through it critics and users alike have been able to voice their opinion on games that they feel live up to the hype. Whether a game is rated great to play, average across the board, or terrible in all senses, each game has its own reviews. Critics such as PlayStation Universe, PlayStation LifeStyle, Vandal, Forbes, etc have all posted their opinions on hundreds of video games on metacritic. 

The Metascores grades range from 0-100; the greater the score is the more highly regarded the game is. Green colored games mean the game has received very favorable reviews, yellow colored games represent a game with average reviews, red colored games represent games that have received poor reviews. Userscore grades range from 0-10, with the same principles applying to that of meta critic scores.

What a critic score looks like:

A Data Analysis of Video Game Popularity.

What a user score looks like:

A Data Analysis of Video Game Popularity.

Data Scraped:

For this project, PlayStation 4 games were the only games scraped. The data was all processed through scrapy, in it a spider was developed to retrieve the following areas of data:

  • Video game developer
  • Release Date
  • Video game genre
  • Critic Review scores
  • User Review scores
  • Critic names, and separate scores

Data Summary:

Histograms Data:

A Data Analysis of Video Game Popularity.

  • The histogram above represents the sperate reviews grades given out for each seperate video game that has ever come out on PS4. The Average score appears to be 80 for video games, which could indicate that most video games appear to be rated rather well. 
  • It is also interesting to note that most video games do not usually have a bad rating. 

  • The histogram above represents every seperate review grade given for a video game, so for example if one game has 17 reviews, all 17 of those reviews are accounted for in the histogram above. The data above shows similar results that most games over all appear to have generally high ratings. 

  • The histogram above represents the average reviews for video games related to the user scores opinion. It is shown that the user score is a bit more tough when grading video games, with scores ranging more so in the 60s to the 70s. There are also several games that do nit have a user score rating, as they are marked with a 0 towards the left of the histrogram. 

-The information is interesting, as trends for how good a video game is appear to be consistent in all three categories. Further analysis into each of the games should be made to fully understand what this is the case here. 


Deeper Dive:

Video games are all split up into different genres, each genre represents the type of game that the player will be experiencing. For example, puzzle games will have a more strategic type of style that allows you to think, compared to a shooter that may require you to simply shoot the enemy.


This was another category that was interesting to look further into, so taking all the video games, I arranged them into seperate categories: Action, Fantasy, RPG, Indie, and Sports. 

Bar plots of critic and user score data for genres:

  • The bar plot above represents the average critic grades given in each genre specified for all PS4 video games. It is interesting to note that popularity appears to be rather consistent in all fields, Action and RPG appear to show the highest level of popularity however. 

  • The bar plot above represents the average user score grades given in each genre specified for all PS4 video games. Indie appears to have the highest grade among user scores, compared to that of fantasy for example. Action is surprisingly a bit lower for user score grades, but that could be because there are so many more action games than indie games, and therefore the user is more critical of a grader. 


Different Relationship levels using Scatter Plots

A deeper area to look into were the different critic grade and user grades that were presented depending on specific charactersitics that existed in each video game. Characteristics include: release date, title, developer etc. 

  • From what the scatter plot data shows, it can be determined that there is relatively high consistency between all types of video games in regards to there title, developer, and release date. 
  • What this all means is that there should be deeper analysis done into specofic categories. Categories such as developers, which developers are more important than others. Do release dates play a role in how popilar video games are? There are many fields to do deeper analysis into. 


Conclusions, and further question: 

Based on the data that has been collected and analyzed, it can be determined that Action and RPG based video games are the most popular type of video game. This can be seen through the level of reviews they receive on a regular basis through critics and users alike. It can also be determined that video games on average tend to do rather well, averaging around 75-80% with critics and users.

There is also an interesting trend that exists with video game critics; some review far more video games than others do (some critics review just one video game in retrospect). Further questions to look into for the future include, but are not limited to:

  • Why are Action/Adventure video games so popular?
  • How much does a critic with more reviews (such as PlayStation Universe) determine a games popularity compared to a critic with less reviews?
  • What trends do the top video game critics say about different types of video games? What do they think is good/bad? 
  • Does the specific User score ratings say about video games? 
  • Do release dates play a role in popularity?
  • Which developers create more demand? 

With the information analyzed so far, and the information that is yet to come, we will further understand what exactly makes video games so popular!

About Author

Jason Hoffmeier

Jason Hoffmeier is a NYC Data Science fellow that currently resides in New York City. He has a Masters Degree in Systems Engineering from SUNY Binghamton, and has recently earned his Lean Six Sigma Black Belt for quality...
View all posts by Jason Hoffmeier >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI