Metarecommendr: A recommendation system for video games, movies and TV shows

, and
Posted on Apr 5, 2017

Metarecommendr is a recommendation system for video games, TV shows and movies created by Yvonne Lau , Stefan Heinz, and Daniel Epstein.  It uses word-embedding neural networks, sentiment analysis and collaborative filtering to deliver the best suggestions to match your preferences. It is part of our capstone project delivered at the end of the NYC Data Science Academy Data Science Bootcamp program.

You can take a look at our app here. Please keep in mind that for the time being only a scaled-down version of our models is running online due to memory restrictions. Only "Content-based" is functional at this time. The code is online on GitHub.


Finding a piece of media today can be difficult. There are so many games, movies, and tv shows coming out every week, that it is difficult to keep up with. It can take hours to look through blogs, videos, and reviews to determine if a new piece of media is something you will like. Finding a game from the past that you are sure you will like is even harder. Websites like attempt to simplify this process by aggregating reviews. However, there are still some major flaws including:

  • Product suggestions are generally obvious and tied to the title of a product (i.e. if you like Super Mario 64, then you will get inundated with other Mario games)
  • User interface is too crowded with ancillary and unnecessary information
  • The text of reviews does not always match up with the scores associated with them

Hence, for our capstone project, we decided to address these issues by creating an application to improve your search for your next game (and even let you find movies and TV shows if you wish!).  Metarecommendr is a web application that combines a sleek and intuitive user interface with the powers of content-filtering and collaborative-filtering in order to deliver the best recommendation for you.

Project Workflow

Metarecommendr was designed and built in the span of 2 weeks. The project workflow is summarized below:

Project Worfklow

Project Worfklow

Data Collection

To collect all the data and reviews about our items - games, movies and TV shows - , we used the Python web scraping framework Scrapy. In total we implemented 12 spiders - one for each items list, one for the summary and details of each specific item, and one each for the critics and user reviews of each item. While some spiders were finished quickly, the longest one - scraping games reviews - took 10 days in total to finish.

Because we were already expecting a rather big amount of data, we decided to scrape directly into a database instead of using text files. A preliminary version of our database was set in SQlite, a self-contained SQL database engine, which was set up within minutes. After the scraping was finished, we exported the data to a MySQL database running as an Amazon Web Services (AWS) RDS service. To not have to insert 584mb of scraped data from a local machine into a remote database, we uploaded all our data to AWS Simple Storage Service (S3) and implemented an AWS Data Pipeline to directly stream from S3 to RDS via an AWS Elastic Compute Cloud (EC2) instance. This reduced the migration time dramatically by factor 7. Our final app was then ready to read the data directly from the MySQL database.

Exploratory Data Analysis

One of the reasons we opted to implement both content and collaborative-based recommendations was the distribution of ratings found in our dataset. There were in total roughly a million reviews - half from critics, half from users. We found that for both critic and user reviews scores, the distribution of ratings were negatively skewed. Hence, relying solely on ratings (for collaborative filtering) would not offer enough granularity to produce sensible reviews as most products are perceived positively.



In terms of observations scraped from we ended up with:

Item  Games Movies TV Shows Reviews
Observations 20,416 5,470 1,978 998,582

Interestingly, in our early exploration of the dataset, we found that the number of reviews was not necessarily indicative of the quality of a product. Infestation: Survivor Stories(The war Z) is among the most reviewed items and yet it has a very poor average critic and user review. This makes some intuitive sense. Games that skew either very positive or very negative create more discussion. Extremely bad games can be fun to talk about with others, similarly to how bad movies can live on as cult favorites. Mediocre games, where there isn’t much to say, tend to have less discussion, and therefore less reviews.

Reviews Review

Recommendation Systems

There are mainly two types of recommendation algorithms: content-filtering and collaborative filtering.

  • Content-filtering:  makes recommendations based on a product’s metadata. A classic example is how Pandora works.
  • Collaborative filtering; takes into account user’s behaviors and interactions with items. It can be further subdivided into two kinds:
    • User-based: recommendation are items from users who are similar to you.   A classic example is how Spotify works.
    • Item-based: recommendations happen according to an item-item similarity metric which is based on ratings from users. An example is how Amazon works    



a) Content Filtering

Content Filtering

Content Filtering

Since a big portion of the dataset was composed of text data from reviews, the chosen approach for feature engineering on content-based recommendations was Doc2Vec. This is an unsupervised algorithm to generate vectors for documents. It is an extension of the Word2Vec algorithm, where a document (instead of a word) is turned into a vector representation.  Its implementation in Python can be found under Gensim library.

Doc2Vec is able to learn semantical similarities among words, making its implementation more sophisticated than tf-idf. An example output of our model on critic reviews shows that it was able to learn pretty well similar words to the word “Excellent” . Pretty good job!



For metarecommendr, two Doc2Vec models were trained separately on Summary and Critic Reviews. We opted for not using user reviews since there were not enough descriptive words to yield a meaningful recommendation. On the user interface, a user selects a product they like. Products are then recommended according to a cosine similarity metric. The closer to 1, the more similar two vectors(products) are.

b) Collaborative Filtering

i) SVD - Singular Value Decomposition

Collaborative Filtering: SVD

Collaborative Filtering: SVD

A major challenge to implementing collaborative filtering on this particular dataset was the high dimensionality and sparsity of the user-item matrix. There were a total of around 27,500 products and 63,000 users, with an average number of less than 3  reviews per user. To reduce the dimensionality of the user-item matrix, truncated Singular Value Decomposition (SVD) was implemented.

Consider a user-to-item matrix A where aij represents the ratings from user i for product j. SVD states that every matrix Anxp can be approximated by the following equation:

SVD: Formula

where Unxn and Vpxp are orthogonal matrices and Snxp is a nxp diagonal matrix with singular values of A along the diagonal. As S is a diagonal matrix, we can obtain a more compact representation through SVD. Truncated SVD takes this approach one step further by using only the k most significant values of S instead of all values. Under this approach, we compute a rank-k approximation to A such that it minimizes the Frobenius norm error as follows:

SVD: Formula

For metarecommendr, the dataset was split into train and test, and k was chosen to be 13 according to Cattel’s scree plot.

Scree Plot

Once we obtain the rank-k matrix A', we can make recommendations according to the entries in the matrix.  In the context of our dataset, A’ corresponds to a matrix of predicted user ratings where aij'is the predicted user rating from user i for item j. Compared to a baseline where all user ratings for products are simply predicted to be the average user rating (RMSE = 7.50), truncated SVD improves 19% upon the error term on predicted user rating (RMSE = 6.07) .

To sum up, for collaborative filtering-SVD,  a user inputs and ranks a few items. A user-item matrix is then generated and decomposed by SVD. For a given user i, this approach allows us to get a predicted user rating for different items, and recommend items with highest predicted rating.

ii) Pearson's Correlation

To better understand the relationship between item review scores, we compared items against each other using a modified Pearson’s correlation formula. To help scale down this correlation matrix, items with less than 3 overlapping reviews were disregarded, and given a score of 0, or no correlation.

Pearson's Correlation

This item-item matrix approach also allowed us to make cross-category recommendations since the algorithm was no longer bound to an item’s metadata(such as in collaborative filtering). On the user interface, a user has the option to select a product they like, and they receive products with the highest correlation metric.

c) Sentiment Analysis

As mentioned in the introduction, a major problem with Metacritic’s dataset was the fact that sentiment of reviews did not necessarily match the text data. To address this issue, we performed sentiment analysis on the critic reviews. Positive and negative were defined as follows: reviews with scores of 55 and below were classified as negative, and those with scores of 85 and above were classified as positive. Reviews with scores in between these values were not used for sentiment analysis.

Sentiment Analysis used vectors from doc2vec as features. We attempted a few different machine learning models, including: Logistic regression, Naive Bayes, SVM, and different types of Neural Network. The performance of each model is described below:

  • Logistic regression: 75% accuracy
  • SVM: ~ 65% accuracy
  • Naive Bayes ~65% accuracy
  • Long short term memory (LTSM) recurrent neural networks (RNN)[known method for NLP, good for assessing sequential data: ~75% accuracy
  • Convolutional neural networks (CNN) [commonly used in image processing, but also in NLP tasks]: ~88% accuracy

At the end, the best model ended up being a CNNs with an added RNN component, with the following features: 2 convolution and pool layers, 2 recurrent LTSM layers, and 3 dense, fully connected layers. This model lead us to an accuracy rate above 90%.Screen Shot 2017-04-06 at 10.31.32 AM

On Metarecommendr, this sentiment analysis is showcased interactively:  a user types in a review and the text is evaluated according to our model. Users are able to receive feedback on whether the given score aligned or diverged from the text. We hope to continue with this aspect of the project to improve accuracy and use it as another pre-processing step for our recommendation system

Flask App

Since models were built in Python, a natural choice was to use Flask framework to implement our web application.The frontend is an interactive application built on top of Bootstrap, AngularJS and Angular Material. On the backend, The app is able to directly pull data from the aforementioned MySQL database on AWS. Models were exported to Pickle and H5 files which were stored on AWS S3. When a user visits our application, such files are loaded from AWS s3.

Future Improvements

There are a few improvements that could be made to metarecommendr, including:

  • Creating a hybrid recommendation system that blends both content and collaborative filtering.
  • Adding more filters on the user interface to create an even more customizable user experience
  • Expanding sentiment analysis model for a more refined rating prediction using NLP( i.e. a 1-10 score)

About Authors

Stefan Heinz

Stefan received his Bachelor's degree in Logistics from Heilbronn University in Germany, including a one year stopover in Hong Kong. He then went on to graduate cum laude from Maastricht University's School of Business and Economics in the...
View all posts by Stefan Heinz >

Yvonne Lau

Yvonne Lau is a recent Yale University graduate with a B.A. degree in Economics and Mathematics. Hailing from Rio de Janeiro, Brazil, she became interested in data science after serving as a Data Analyst for a nonprofit organization,...
View all posts by Yvonne Lau >

Daniel Epstein

Daniel Epstein is a neuroscience PHD candidate at the University of Utah, expecting to graduate in summer 2017. While performing analyses on behavioral and neuroimaging data, he became interested in utilizing data science to understand human behavior and...
View all posts by Daniel Epstein >

Related Articles

Leave a Comment

Nique Devereaux September 16, 2017
FYI see below for what happens when I try to access your app. Application error An error occurred in the application and your page could not be served. If you are the application owner, check your logs for details.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI