# Deep Learning Meets Recommendation Systems

### Introduction

Almost everyone loves to spend their leisure time to watch movies with their family and friends. We all have the same experience when we sit on our couch to choose a movie that we are going to watch and spend the next two hours but can't even find one after 20 minutes. It is so disappointing. We definitely need a computer agent to provide movie recommendation to us when we need to choose a movie and save our time. Apparently, a movie recommendation agent has already become an essential part of our life. According to Data Science Central "Although hard data is difficult to come by, many informed sources estimate that, for the major ecommerce platforms like Amazon and Netflix, that recommenders may be responsible for as much as **10% to 25% of incremental revenue**." In this project, I study some basic recommendation algorithms for movie recommendation and also try to integrate deep learning to my movie recommendation system.

Movies are great examples of a combination of entertainment and visual art. Movie posters often can bring the ideas of movies to an audience directly and immediately. According to DesignMantic, "Post and pre-release of any movie their posters are the main elements which create the hype about them. More than half of the people (i.e., the target audience) decide whether to book tickets and watch the movie or not based on the movie posters." We can even predict any movie's mood by just looking at the typography of is poster. It sounds a bit like magic but it is definitely possible to predict a movie's genre by just looking at its poster. For myself, I know if I want to watch a movie or not by just looking at its poster. For example, since I am not a fan of cartoon movies, so whenever I saw those movie posters with cartoon themes or colors, I knew they are not my options. This decision process is very straightforward and dose not require any review reading (not sure people have time to read reviews). Therefore, in addition to some standard movie recommendation algorithms, I also use deep learning to process movie posters and try to find similar movies to be recommended to users. The goal is to mimic a human's visual ability and to build an intuitive movie recommender by just looking at movie posters based on deep learning. This project is inspired by Ethan Rosenthal's blog posts and I modified his codes in his blog posts to fit the algorithms used here.

We use the movie dataset downloaded from MovieLens website. The dataset consists of 100,000 ratings and 1,300 tag applications applied to 9,066 movies by 671 users. The dataset was last updated in 10/2016.

### Collaborative Filtering

Roughly speaking, there are three types of recommendation systems (excluding simple ranking approach):

-- **Content-based recommendation**

-- **Collaborative filtering**

-- **Hybrid models**

For a content-based recommendation system, it is a regression problem in which we try to make a user-to-item rating prediction using the content of items as features. On the other hand, for a collaborative filtering based recommendation system, we usually don't know the content of features in advance, and by using the similarity between different users (users may give similar ratings to the same items) and the similarity between items (similar movies may be given similar ratings by the users), we learn the latent features and make predictions on user-to-item ratings at the same time. Also, after we learn the features of the items, we can measure the similarity between items and recommend the most similar items to users based on their previous usage information. Content-based and collaborative filtering recommendation were the state of the art more than 10 years ago. Apparently, there are many different models and algorithms to improve the prediction performance. For example, for the case in which we don't have user-to-item rating information in advance, we can use the so-called implicit matrix factorization and replace the user-to-item ratings with some preference and confidence measures such as how many times the users click the corresponding items to perform collaborative filtering. Furthermore, we can also combine content-based and collaborative filtering methods to utilize content as "side information" to improve the prediction performance. This hybrid approach is usually implemented by "Learning to Rank" algorithm.

In this project, I focus on collaborative filtering based approach. First, I will discuss using item (user) similarity to make a user-to-item rating prediction without regression and also make a recommendation based on the item similarity. Then, I will discuss how to use regression to learn the latent features and make a recommendation simultaneously. After that, we will see how to use deep learning in a recommendation system.

### Item Similarity

For collaborative filtering based recommendation system, the first building block is to construct the rating matrix in which each row represents a user and each column corresponds to the rating that this user gives to a particular movie. We build our rating matrix as follows:

https://gist.github.com/Wann-Jiun/d4d6c03d0df4928b8b2320023465dca2

where "ratings.csv" contains user id, movie id, rating, and time information, and "link.csv" contains movie id, IMDB id, and TMDB id. We combine these two tables since the IMDB id information is required for each movie to get the movie poster from The Movie Database website using its API. We examine the sparsity of our rating matrix as follows:

https://gist.github.com/Wann-Jiun/7b99d36a2ebaabad3df6da506a095a92

where the rating matrix is sparse with only 1.40% of non-zero entries. Now, let's split the rating matrix to two smaller matrices for the purpose of training and testing. We remove 10 ratings from the rating matrix and place them in the test set.

https://gist.github.com/Wann-Jiun/d91f7ccbd20659e9725052a9ac5aed10

The (cosine) similarity among users/movies is calculated based on the following formula.

where, *s(u,v) *is just the cosine similarity measure between user *u* and user *v*.

https://gist.github.com/Wann-Jiun/8e0c7169ba60be99bb7c1019bb78b8e2

Using the similarity among the users, we are able to make a prediction for each user-to-movie rating and also calculate the corresponding MSE of our user-to-movie rating prediction. The prediction is made by considering the ratings that a similar user gives. In particular, we can make a user-to-movie rating prediction based on the following formula.

where the prediction for user *u* to movie *i* is a weighted sum (normalized) of ratings that user *v * gives to movie *i* with the similarity between user *u* and *v* as the weight.

https://gist.github.com/Wann-Jiun/b8a9ae273557b8b0a6434704e3b9223e

The MSE we obtained is 9.8252 for our prediction. What does this number mean? Is it a good or bad recommendation? It is not very intuitive to evaluate our prediction performance by just looking at the MSE score. Therefore, let's evaluate the performance by checking the movie recommendation directly. We will query a movie of interest and ask our computer agent to recommend a few movies to us. The first thing to do is to get the corresponding movie posters so that we can see what the recommended movies are. We use the IMDB id numbers to get the movie posters from The Movie Database website using its API.

https://gist.github.com/Wann-Jiun/2ec32c544e68813dad0cc54b8f8a856b

Now, it's fun time! let's see what our recommendation is. We will show four most similar movies along with the move we query. The movie we query is placed on the left-hand side followed by four recommended movies. Let's try query "Heat".

Heat is a 1995 American crime film starring Robert De Niro, Al Pacino. The results look fine. Leaving Las Vegas might not be a good recommendation though. I guess the reason is because Nicolas Cage is in the movie, The ROCK, and it's a good recommendation to an audience who loves Heat. So, it may be one of the disadvantages of using similarity matrix with collaborative filtering. Let's try more examples.

It looks OK. Toy Story 2 definitely should be recommended to an audience who loves Toy Story. But Forrest Gump doesn't make too much sense to me. Apparently, Tom Hanks' voice is in the Toy Story movies so Forrest Gump was recommended. Note that by just looking at the posters, one can tell the differences such as movie type, mood, etc., between Toy Story and Forrest Gump, right? A child may ignore Forrest Gump when he sees its posters assuming every child likes Toy Story.

### Alternating and Stochastic Gradient Descent

In the previous discussion, we simply calculate the cosine similarity of users and items and use this similarity measure to predict user-to-item ratings and also make an item-to-item recommendation. We now formulate our problem as a regression problem. We introduce latent features* ***y** for all movies and weight vectors **x** for all users. The objective is simply to minimize the MSE (with 2-norm regularization terms) of the rating prediction.

Note that now both weight vector and feature vector are decision variables. Apparently, this is not a convex problem. Just for now, don't worry too much about convergence property for this non-convex problem. There are many ways to solve this non-convex optimization problem. One approach is by solving weight vectors (for users) and feature vectors (for movies) in an alternating way. When we solve weight vectors, we assume feature vectors are constant vectors. On the other hand, when we solve feature vectors, we assume weight vectors are constant vectors. Another way to solve this regression problem is to combine the updates of the weight vectors and feature vectors, updating them within the same iteration. Also, one can implement stochastic gradient descent to speed up the computation. Here, I use stochastic gradient descent approach to solve this regression problem. The MSE of my prediction is shown below.

The MSE is much smaller than the one obtained by using similarity matrix. Of course, We can also use grid search and cross-validation to tune the parameters of our model and algorithm.

So again, let's see our recommendation by querying movies of interest.

It doesn't look good. I don't know these four movies that were recommended to me by querying Heat. They look totally irreverent to Heat. They look like romantic/drama movies. What on earth do I want to watch a drama if I am finding a movie that is similar to American crime film with big movie stars? I find it's very intriguing that a good MSE result may give us a very bad recommendation.

So let's discuss the weaknesses of collaborative filtering based recommendation systems.

-- Collaborative filtering approach finds similar users and movies by usage data, which leads to popular items that will be easier to be recommended than unpopular items.

-- It is difficult for collaborative filtering to recommend any new movies to users since there are no many usage data associated with these movies.

In the next discussion, we will consider a different approach to address the issues of collaborative filtering. We use deep learning to recommend movies to users.

### Deep Learning

We will use VGG16 in Keras to train our neural networks. There is no target in our data set and we only consider the fourth-to-last layer as a feature vector. We use this feature vector to characterize each movie in our data set. There are some preprocessing steps before training our neural networks. The training process is summarized below.

https://gist.github.com/Wann-Jiun/475894d3ff5ca89a6fd61d96a885d180

In the codes, we first get the movie posters from TMDB website using its API with IMDB id, then we feed posters to VGG16 and train our neural networks, finally, we calculate the cosine similarity using the features learned by VGG16. After we get the movie similarity, then we can recommend similar movies to uses with the highest similarity. Note that there are total 25088 features learned by VGG16 and we use these features to characterize each movie in our data set.

Let's see the recommendation using deep learning.

There is no love drama along with Heat! these posters definitely share some common characteristics. They are dark blue, have people in the posters, etc. Again, let's try Toy Story.

Forest Gump was not recommended! The results look fine! I am very enjoying doing this, so let's try a few more examples.

Note that these posters have one to two people in them and have a very cold theme or style.

These posters want to let the audience know that the corresponding movies are fun, loud, intensive, and have a lot of actions in them, so the colors and images of the posters are very strong.

On the other hand, these posters want to show the audience that the corresponding movies are all about a single man.

We found some things that are similar to kung fu panda.

This is a very interesting one. We indeed found similar monsters and also found Tom Cruse!

All these posters have a woman with the similar pose. Wait! is that Shaq!?

We successfully found the spider man!

This one found the posters with similar typography.

### Conclusions

There are several ways to use deep learning in recommendation systems:

-- **Unsupervised learning approach.**

-- **Predict the latent features derived from collaborative filtering.**

-- **Use the features generated from deep learning as side information.**

Movie posters have elements which create the hype and interest in the viewers. In this project, we use deep learning as a unsupervised learning approach and learn the similarity of movies by processing movie posters. Apparently, this is just the first step of using deep learning in recommendation systems. There are so many things we can try. For example, we can use deep learning to predict latent features derived from collaborative filtering. Similar approach has been studied by Spotify for music recommendation. Instead of image processing, they consider using deep learning to predict latent features derived from collaborative filtering by processing sound of a song. Another possible direction is to use the features learned by deep learning as side information to improve prediction accuracy.

References:

-- http://blog.ethanrosenthal.com/2015/11/02/intro-to-collaborative-filtering/

-- http://blog.ethanrosenthal.com/2016/01/09/explicit-matrix-factorization-sgd-als/

-- http://blog.ethanrosenthal.com/2016/10/19/implicit-mf-part-1/

-- http://blog.ethanrosenthal.com/2016/11/07/implicit-mf-part-2/

-- http://blog.ethanrosenthal.com/2016/12/05/recasketch-keras/

-- https://www.designmantic.com/blog/2016-movie-poster-design-trends/

-- https://www.designmantic.com/blog/movie-moods-in-typography/

-- http://www.datasciencecentral.com/profiles/blogs/understanding-and-selecting-recommenders-1

-- http://www.datasciencecentral.com/profiles/blogs/5-types-of-recommenders

-- http://benanne.github.io/2014/08/05/spotify-cnns.html

-- Andrew Ng, "Machine Learning," Recommender Systems, 2016

-- Aaron van den Oord, et al., "Deep content-based music recommendation," NIPS, 2013

-- Yifan Hu, et al., "Collaborative Filtering for Implicit Feedback Datasets,"

-- Ste en Rendle, "BPR: Bayesian Personalized Ranking from Implicit Feedback,"