Metarecommendr: A recommendation system for video games, movies and TV shows

Stefan Heinz, Yvonne Lau and Daniel Epstein

Posted on Apr 5, 2017

Metarecommendr is a recommendation system for video games, TV shows and movies created by Yvonne Lau , Stefan Heinz, and Daniel Epstein. It uses word-embedding neural networks, sentiment analysis and collaborative filtering to deliver the best suggestions to match your preferences. It is part of our capstone project delivered at the end of the NYC Data Science Academy Data Science Bootcamp program.

You can take a look at our app here. Please keep in mind that for the time being only a scaled-down version of our models is running online due to memory restrictions. Only "Content-based" is functional at this time. The code is online on GitHub.

Introduction

Finding a piece of media today can be difficult. There are so many games, movies, and tv shows coming out every week, that it is difficult to keep up with. It can take hours to look through blogs, videos, and reviews to determine if a new piece of media is something you will like. Finding a game from the past that you are sure you will like is even harder. Websites like metacritic.com attempt to simplify this process by aggregating reviews. However, there are still some major flaws including:

Product suggestions are generally obvious and tied to the title of a product (i.e. if you like Super Mario 64, then you will get inundated with other Mario games)
User interface is too crowded with ancillary and unnecessary information
The text of reviews does not always match up with the scores associated with them

Hence, for our capstone project, we decided to address these issues by creating an application to improve your search for your next game (and even let you find movies and TV shows if you wish!). Metarecommendr is a web application that combines a sleek and intuitive user interface with the powers of content-filtering and collaborative-filtering in order to deliver the best recommendation for you.

Project Workflow

Metarecommendr was designed and built in the span of 2 weeks. The project workflow is summarized below:

Project Worfklow

Data Collection

To collect all the data and reviews about our items - games, movies and TV shows - , we used the Python web scraping framework Scrapy. In total we implemented 12 spiders - one for each items list, one for the summary and details of each specific item, and one each for the critics and user reviews of each item. While some spiders were finished quickly, the longest one - scraping games reviews - took 10 days in total to finish.

Because we were already expecting a rather big amount of data, we decided to scrape directly into a database instead of using text files. A preliminary version of our database was set in SQlite, a self-contained SQL database engine, which was set up within minutes. After the scraping was finished, we exported the data to a MySQL database running as an Amazon Web Services (AWS) RDS service. To not have to insert 584mb of scraped data from a local machine into a remote database, we uploaded all our data to AWS Simple Storage Service (S3) and implemented an AWS Data Pipeline to directly stream from S3 to RDS via an AWS Elastic Compute Cloud (EC2) instance. This reduced the migration time dramatically by factor 7. Our final app was then ready to read the data directly from the MySQL database.

Exploratory Data Analysis

One of the reasons we opted to implement both content and collaborative-based recommendations was the distribution of ratings found in our dataset. There were in total roughly a million reviews - half from critics, half from users. We found that for both critic and user reviews scores, the distribution of ratings were negatively skewed. Hence, relying solely on ratings (for collaborative filtering) would not offer enough granularity to produce sensible reviews as most products are perceived positively.

EDA

In terms of observations scraped from metacritic.com we ended up with:

Item	Games	Movies	TV Shows	Reviews
Observations	20,416	5,470	1,978	998,582

Interestingly, in our early exploration of the dataset, we found that the number of reviews was not necessarily indicative of the quality of a product. Infestation: Survivor Stories(The war Z) is among the most reviewed items and yet it has a very poor average critic and user review. This makes some intuitive sense. Games that skew either very positive or very negative create more discussion. Extremely bad games can be fun to talk about with others, similarly to how bad movies can live on as cult favorites. Mediocre games, where there isn’t much to say, tend to have less discussion, and therefore less reviews.

Recommendation Systems

There are mainly two types of recommendation algorithms: content-filtering and collaborative filtering.

Content-filtering: makes recommendations based on a product’s metadata. A classic example is how Pandora works.
Collaborative filtering; takes into account user’s behaviors and interactions with items. It can be further subdivided into two kinds:
- User-based: recommendation are items from users who are similar to you. A classic example is how Spotify works.
- Item-based: recommendations happen according to an item-item similarity metric which is based on ratings from users. An example is how Amazon works

a) Content Filtering

Content Filtering

Since a big portion of the dataset was composed of text data from reviews, the chosen approach for feature engineering on content-based recommendations was Doc2Vec. This is an unsupervised algorithm to generate vectors for documents. It is an extension of the Word2Vec algorithm, where a document (instead of a word) is turned into a vector representation. Its implementation in Python can be found under Gensim library.

Doc2Vec is able to learn semantical similarities among words, making its implementation more sophisticated than tf-idf. An example output of our model on critic reviews shows that it was able to learn pretty well similar words to the word “Excellent” . Pretty good job!

Doc2Vec

For metarecommendr, two Doc2Vec models were trained separately on Summary and Critic Reviews. We opted for not using user reviews since there were not enough descriptive words to yield a meaningful recommendation. On the user interface, a user selects a product they like. Products are then recommended according to a cosine similarity metric. The closer to 1, the more similar two vectors(products) are.

b) Collaborative Filtering

i) SVD - Singular Value Decomposition

Collaborative Filtering: SVD

A major challenge to implementing collaborative filtering on this particular dataset was the high dimensionality and sparsity of the user-item matrix. There were a total of around 27,500 products and 63,000 users, with an average number of less than 3 reviews per user. To reduce the dimensionality of the user-item matrix, truncated Singular Value Decomposition (SVD) was implemented.

Consider a user-to-item matrix A where aij represents the ratings from user i for product j. SVD states that every matrix Anxp can be approximated by the following equation:

where U_nxn and V_pxp are orthogonal matrices and S_nxpis a nxp diagonal matrix with singular values of A along the diagonal. As S is a diagonal matrix, we can obtain a more compact representation through SVD. Truncated SVD takes this approach one step further by using only the k most significant values of S instead of all values. Under this approach, we compute a rank-k approximation to A such that it minimizes the Frobenius norm error as follows:

For metarecommendr, the dataset was split into train and test, and k was chosen to be 13 according to Cattel’s scree plot.

Once we obtain the rank-k matrix A', we can make recommendations according to the entries in the matrix. In the context of our dataset, A’ corresponds to a matrix of predicted user ratings where aij'is the predicted user rating from user i for item j. Compared to a baseline where all user ratings for products are simply predicted to be the average user rating (RMSE = 7.50), truncated SVD improves 19% upon the error term on predicted user rating (RMSE = 6.07) .

To sum up, for collaborative filtering-SVD, a user inputs and ranks a few items. A user-item matrix is then generated and decomposed by SVD. For a given user i, this approach allows us to get a predicted user rating for different items, and recommend items with highest predicted rating.

ii) Pearson's Correlation

To better understand the relationship between item review scores, we compared items against each other using a modified Pearson’s correlation formula. To help scale down this correlation matrix, items with less than 3 overlapping reviews were disregarded, and given a score of 0, or no correlation.

This item-item matrix approach also allowed us to make cross-category recommendations since the algorithm was no longer bound to an item’s metadata(such as in collaborative filtering). On the user interface, a user has the option to select a product they like, and they receive products with the highest correlation metric.

c) Sentiment Analysis

As mentioned in the introduction, a major problem with Metacritic’s dataset was the fact that sentiment of reviews did not necessarily match the text data. To address this issue, we performed sentiment analysis on the critic reviews. Positive and negative were defined as follows: reviews with scores of 55 and below were classified as negative, and those with scores of 85 and above were classified as positive. Reviews with scores in between these values were not used for sentiment analysis.

Sentiment Analysis used vectors from doc2vec as features. We attempted a few different machine learning models, including: Logistic regression, Naive Bayes, SVM, and different types of Neural Network. The performance of each model is described below:

Logistic regression: 75% accuracy
SVM: ~ 65% accuracy
Naive Bayes ~65% accuracy
Long short term memory (LTSM) recurrent neural networks (RNN)[known method for NLP, good for assessing sequential data: ~75% accuracy
Convolutional neural networks (CNN) [commonly used in image processing, but also in NLP tasks]: ~88% accuracy

At the end, the best model ended up being a CNNs with an added RNN component, with the following features: 2 convolution and pool layers, 2 recurrent LTSM layers, and 3 dense, fully connected layers. This model lead us to an accuracy rate above 90%.

On Metarecommendr, this sentiment analysis is showcased interactively: a user types in a review and the text is evaluated according to our model. Users are able to receive feedback on whether the given score aligned or diverged from the text. We hope to continue with this aspect of the project to improve accuracy and use it as another pre-processing step for our recommendation system

Flask App

Since models were built in Python, a natural choice was to use Flask framework to implement our web application.The frontend is an interactive application built on top of Bootstrap, AngularJS and Angular Material. On the backend, The app is able to directly pull data from the aforementioned MySQL database on AWS. Models were exported to Pickle and H5 files which were stored on AWS S3. When a user visits our application, such files are loaded from AWS s3.

Future Improvements

There are a few improvements that could be made to metarecommendr, including:

Creating a hybrid recommendation system that blends both content and collaborative filtering.
Adding more filters on the user interface to create an even more customizable user experience
Expanding sentiment analysis model for a more refined rating prediction using NLP( i.e. a 1-10 score)

About Authors

Stefan Heinz

Stefan received his Bachelor's degree in Logistics from Heilbronn University in Germany, including a one year stopover in Hong Kong. He then went on to graduate cum laude from Maastricht University's School of Business and Economics in the...

View all posts by Stefan Heinz >

Yvonne Lau

Yvonne Lau is a recent Yale University graduate with a B.A. degree in Economics and Mathematics. Hailing from Rio de Janeiro, Brazil, she became interested in data science after serving as a Data Analyst for a nonprofit organization,...

View all posts by Yvonne Lau >

Daniel Epstein

Daniel Epstein is a neuroscience PHD candidate at the University of Utah, expecting to graduate in summer 2017. While performing analyses on behavioral and neuroimaging data, he became interested in utilizing data science to understand human behavior and...

View all posts by Daniel Epstein >

AWS

Automated Data Extraction and Transformation Using Python, OpenAI, and AWS

AWS

A.I. Development for Two Sigma Halite II Challenge

Alumni

Alumni Spotlight: Claire Keser, Senior Analyst at Casper

AWS

Predicting Success on Stack Overflow

AWS

Scraping millions of reviews from Amazon.com

Cancel reply

You must be logged in to post a comment.

Nique Devereaux September 16, 2017

FYI see below for what happens when I try to access your app. Application error An error occurred in the application and your page could not be served. If you are the application owner, check your logs for details.

Metarecommendr: A recommendation system for video games, movies and TV shows

Introduction

Project Workflow

Data Collection

Exploratory Data Analysis