Movie Metacritic - Exploring Critics' Movie Reviews

Hanxiao Zhang

Posted on Dec 24, 2019

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Motivation

Do you check IMDB and Rotten Tomato scores before watching a movie? As a regular moviegoer, I always check critic scores on Metacritic. At the time when I was deciding the topic for this project, two movies on my list caught my eyes: Joker and Parasite. They are both crime movies that are highly rated by the audience. Both scored more than 90 after they took home Top Prize at Film Festivals. However, Joker’s critic score dropped significantly to 59 around its release date, while Parasite's score remains the same.

wechat-screenshot-20191223184933-494482-NxE1bCdx | Data Science Blog

Considering the difference in score for these two movies made me think of a two key questions:

Why do movie scores change overtime and how?
Do critics have movie preferences?

This project is intended to answer the questions above by scraping metacritic.com using scrapy and conducting natural language processing (NLP), sentiment and numerical data analysis together with data visualization using Pandas. All Python script and data can be found in my Github repository.

Background: Metacritic and Metascore

Launched in January 2001, Metacritic has evolved over the last decade to distill critics' voices into a single Metascore, a weighted average of the most respected critic reviews online and in print. Metascores range from 0-100; the higher the score, the better the overall reviews. Metascores are highlighted in three colors below: green for favorable reviews, yellow for mixed reviews, and red for unfavorable reviews.

wechat-screenshot-20191223184801-231096-xGjYrKRL | Data Science Blog — How To Create A Metascore

Data Scraped

Two separate spiders are implemented to avoid scraping duplicated information for each movie. Spider 1 scraped the first layer along the list of 'Best Movies of All Time', features including the following:

Movies titles
Movies genre
Distributor
Release date
Metascore and userscore
Number of positive, mixed, negative reviews

Spider 2 goes deeper and scraped each movie’s individual reviews with the following features:

Critic’s Name
Media Name
Critic’s Individual Score
Review Date
Review Content

wechat-screenshot-20191223190137-932452-MVuJwla3 | Data Science Blog

NLP and Sentiment Analysis

The word cloud below is derived based on the reviews of good movies (Metascore over 70 ) and bad movies (Metascore below 30) for easy comparison.

The most frequent words used are Character, Story and Director, for both positive and negative reviews.

1-401814-ZzK3GaKQ | Data Science Blog — Left: Metascore > 70 movies Right: Metascore < 30 movies

2-952599-NYT9KToP | Data Science Blog — Left: Metascore > 70 movies Right: Metascore < 30 movies

Even though good and bad movie reviews show right and left skew accordingly, most critics choose words and express sentiment in a neutral way.

3-721670-pTD9z6Fq | Data Science Blog — Blue: Metascore > 70 movies Orange: Metascore < 30 movies

Movie Genre

Reviews for good and bad movies show different movie genre keywords as well.

Drama and documentary are frequently mentioned in positive reviews while comedy, thriller, action and horror movies often are in negative reviews.

wechat-screenshot-20191223190739-282086-9H642bSl | Data Science Blog — Positive Reviews with Metascore > 70

wechat-screenshot-20191223190755-702892-YpNgGA69 | Data Science Blog — Negative Reviews with Metascore < 30

User scores on Metacritic are used here to better compare critic preference with that of users. In general, user score for each movie genre is higher than Metascore except for 6 genres for which the Metascore averages is 58 with a user average of 67. Metascore also has a higher standard deviation.

wechat-screenshot-20191223190905-774664-VqpboPiX | Data Science Blog

Review Date Analysis

By randomly generating movies and their scatter plot and distribution, we can see that reviews are published mainly before and around release date. There are only a few reviews later than that. For most of the movies, their reviews before release date came out on same dates. Consequently, in some plots, we see two or three straight lines. It's highly possible that there are special screening events for the movie before release date and the next date movies reviews from different media come out at the same time.

wechat-screenshot-20191223191236-026544-KdwU5mr4 | Data Science Blog

Score distribution shows that most of the highest scores come before release date, while the reviews tend to be more neutral after the movie is released.

wechat-screenshot-20191223191248-410142-JrVDw6r7 | Data Science Blog

With these observations, let’s go back and see what happened to the reviews for Joker. The same patterns can be found here as well:

wechat-screenshot-20191223191258-395523-LacXqs3y | Data Science Blog

Review Dates & Review Score

Tier 1: Venice Movie Festival. The review came out the day after the festival screening with most scores above 60.

Tier 2: Critic screening before movie release with mixed reviews.

Tier 3: Around movie release when most review are published, feedback are mixed unlike after movie festivals.

Weighted Average Metascore

At the time I was doing the project, Metascore for Joker was 59, while its equal weighted average is above 59, indicating that the negative reviews in this case have higher weights in the calculating process. The negative reviews came from The New York Times (30), The New Yorker (20 and 30, 2 reviews were published and collected in the same week) ,The WSJ (20), and Time (20).

Please note that scores are assigned by Metacritic at its own discretion. Some of the conversions are obvious (for example, if a critic uses a 0-10 scale, his/her grade is simply multiplied by ten). Some are less obvious or does not exist at all.

Conclusion

Critics favor drama and documentary over comedy, thriller, action and horror movies.
Reviews, as a piece of news, are time critical. They come out mainly after special events, such as movie festivals, critic or private screening events before and around the official movie release, which results in Metascore changing over time.
Scores are higher before release and more mixed around and after movie release, which results in Metascore decreasing over time for most movies.
For Metascore, the media outlet included in the calculating pool and their weights matter.

With this information revealed, we can better understand the score from critic reviews and make the most informed decision possible about the movie we really want to see. 😊

About Author

Hanxiao Zhang

Hanxiao(Mia) Zhang is NYC Data Science Fellow with a Master's Degree in Finance from Fordham University. Before enrolling in the NYCDSA, she worked in the finance and business sector for over 4 years with extensive client interactions on...

View all posts by Hanxiao Zhang >

Capstone

Catching Fraud in the Healthcare System

Data Analysis

Car Sales Report R Shiny App

Data Analysis

Injury Analysis of Soccer Players with Python

Capstone

The Convenience Factor: How Grocery Stores Impact Property Values

Capstone

Acquisition Due Dilligence Automation for Smaller Firms

Cancel reply

You must be logged in to post a comment.

No comments found.

Movie Metacritic - Exploring Critics' Movie Reviews

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Motivation

Background: Metacritic and Metascore

Data Scraped

NLP and Sentiment Analysis

Movie Genre

Review Date Analysis

Conclusion

About Author

Hanxiao Zhang

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Movie Metacritic - Exploring Critics' Movie Reviews

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Motivation

Background: Metacritic and Metascore

Data Scraped

NLP and Sentiment Analysis

Movie Genre

Review Date Analysis

Conclusion

About Author

Hanxiao Zhang

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!