Sentiment Analysis Of Yelp User Review Data
Social data provides important, real-time insights on consumer opinion – on lifestyle, habits, brands, and preferences. Because these opinions are unsolicited, they provide genuine insight into consumer feelings, and, as such, they should be valued. Yelp provides restaurant details including name, price, rating, address and reviews. The ratings given by the users say how good the restaurant is, but do you really think that the ratings alone is sufficient to give the correct information? No, because people who really hated a restaurant would comment on their experience. The same goes for a the good experience. So, Thus, one would expect that performing sentiment analysis would give give a better insight about judging a into the masses’ opinions of restaurants.
Web Scraping the yelp.com to scrape the restaurant data. The data I scraped Restaurant data was scraped from Yelp using the python package BeautifulSoup. The data consists of information such as restaurant name, rating, price, number of reviews, address and user reviews. I split the web scraping module into two tasks. The first one is to scraped the restaurant name, rating, price, number of reviews and address. The second one is to scraped the restaurant name and user reviews. The data sets were then merged. Finally, we merge the two datasets. We use BeautifulSoup to scrape the data.
The code used to scrape the first model is as follows
The code used to scrape the second model is as follows
Exploratory Data Analysis:
My scraping was restricted to the restaurants in a 2 mile radius around Times Square. I scraped the restaurants in and around Times Square. To be precise, I scraped the restaurants placed in a 2 mile radius.
The distribution of the ratings is as follows
Most of the restaurants in the first 20 pages of the yelp data has have a rating of 4.
The distribution of the number of reviews is as follows
The distribution shows that the majority of the restaurant reviews range from 0 to 1000.
Sentiment Analysis was performed using the Natural Language Toolkit. The name of the specific package used is called Vader Sentiment. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains. The code for Sentiment analysis is as follows:
It works on the word level, by classifying splitting each word into either positive, negative, or neutral. I want to We concentrate on the positive and negative words as neutral words doesn't add value. The plot of the sentiment analysis is as follows
There are few interesting observations showing that reviews and ratings contradict. The plot of those observations are as follows
A restaurant with a rating of 4 has an equal mixture of negative and positive words. It’s safe to say that these reviews are mixed So the restaurant has mixed reviews.
The restaurant has a rating of 4, but the sentiment analysis says that the restaraunt has more negative reviews than positive reviews.
The algorithm can be combined with the the text mining so that the dish name specified in the reviews can be combined with incorporated into the sentiment analysis algorithm to give an output saying that say whether or not a particular dish is associated with has a positive sentiment or negative sentiment and a overall score can be specified.