Analyzing Critic and User Movie Reviews

Posted on May 22, 2018


Metacritic stands as one of the most popular review aggregators for all types of media. What that means is that critic reviews from The New York Times, Washington Post, and others are read by staffers at Metacritic, and assigned a numerical grade based on their review reading experience. However, that seems pretty subjective and the main motivation for me personally was to try to reverse-engineer a review scoring system using some recently-learned natural language processing (NLP) techniques. Another motivation of mine was to try to analyze the differences between how critics and users rate movies. All the tools used for the scraping and analysis below, can be found at my GitHub repo.

Data Scraping

I used scrapy to gather review data on all of the most discussed movies from the years 2010 - 2018. An example of the elements scraped from each movie page can be found below:

With 100 movies per each year, making 800 in total, I grabbed a total of 133,180 reviews separated between critics and users.


I started off my analysis at looking at the divide between critics and users. The easiest way to visualize this was quantitatively. In order to better compare the two, I converted the user scores from a 0 - 10 scale to a 0 - 100 scale. After looking at the breakdown of genres amongst the most discussed movies, it became very clear that users are generally more forgiving than critics across the entire spectrum. It came to no surprise that the more than 50% of the most discussed movies were action movies, thanks to the meteoric rise of Marvel superhero movies.

More curious about the biggest divide between both groups, I looked at 20 movies with the biggest difference from both metascore to user score.

Reasons for the divide between critics and users can be inferred from external reasons. For instance, critics gave the 2016 Ghostbusters film a moderate score of 60. The film garnered lots of controversy prior to release from die-hard fans of the original. Bright on Netflix was a similar case, but in the opposite direction: universally panned by critics whereas users gave it a second chance.

After looking at the reviews in a quantitative way, I decided to analyze word choice between the two groups. After filtering and lemmatization of all the reviews, I created a wordcloud with word sizing based on relative frequency.

Critic Reviews                                                                             User Reviews

From the word cloud, we can see some interesting choices that I have highlighted. It seems like critics are more focused on assessing the technical side of a movie while reviewing, with words like "actor", "performance", "director" showing up more often. It seems like aspects of the story and the world of each movie shows up more within the user reviews, with words like "plot", "scene", and "character" having higher emphasis.

Another NLP technique I have employed is sentiment analysis (SA) via the TextBlob python library. With this tool, it is possible to estimate the mood of a passage of text on a scale of -1 to 1 (-1 being a very negative message and 1 being most positive). My thought was that I could measure the mood of each review and rescale it from 0 to 100, being that a more positive review would correlate to a more positive message that SA would read.

As you can see from the graph above, it seems like SA read most reviews as a more neutral score, with both critic and user SA score distributions looking similar. The Pearson correlation coefficient between the user score and the SA of their reviews were 0.5091, and for the critics reviews, the correlation coefficient was 0.2443. The large discrepancy in correlation coefficients make sense, when you consider that reviewing movies is a critic's profession, so having a neutral message while conveying a review is important for that professional environment.


The results show that SA alone is not an accurate tool to predict review scores from a given review. Future work would include implementing a TF*IDF analysis, which measures term frequency multiplied by an inverse document frequency factor per review grade scale (overall positive, mixed score, and negative scores), to look at the important words used for reviews for a given grade range. This would allow me to have a much better fitting predictive score model.

About Author


Josh Vichare

BS in Materials Science & Engineering with a concentration on the study of Nanomaterials at Rutgers University. Josh has worked in the biomedical engineering field for close to 4 years in research and development, analyzing various performance metrics...
View all posts by Josh Vichare >

Related Articles

Leave a Comment

Amir June 11, 2019
Can i know how did you scraped out the user review eg:0-3 negative, 4-6 mixed, and 7-10 positive? Ive been trying to scraped using scrapy but i only could managed to scraped the negative reviews. I dont know how to scrape all the reviews including mixed and positive reviews altogether. Please help me asap. Thank you

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp