Spicy is good: exploring Mexican restaurants
The Yelp Open Dataset is a collection of data from Yelp (the famous online review platform) that provides data scientists with information regarding businesses, reviews, user information, etc. I intended to draw on this data that spans years to find the answers to two particular questions that piqued my interest.
- Is there a relationship between the star rating and the restaurant's relative priciness based on its location?
- How are Mexican restaurants reviewed by users?
Do food establishments in expensier cities better rated?
To tackle this question I sourced information from Zillow.com, specifically the Median List Price for All Homes that they publish. Also, I used the information from February 2021, which is the cutoff date for the Yelp Dataset. After I cleaned the data an merged the datasets, I got the following results:
Dividing the cities into 5 categories based on Zillow's information shows that there are many more restaurants reviewed in cities with high home list prices and mid-high prices. Both the mean and standard deviation of the star ratings are pretty much the same for the mid-low to high list price cities (with a slight bump in the mid-price cities). It is only in the low-priced homes cities that the mean review drops to 3.3. However, it is important to point out that the number of restaurants reviewed in such cities is much lower than in other locations.
I was surprised by the uniformity of the data. While I didn't believe there'd be much difference in the review ratings for restaurants based on where they are located, I did believe there was going to be a wider gap. Good and bad food can be found in all locations.
What about Mexican restaurants?
I ran the same analysis for Mexican restaurants. The results were pretty much the same, though the dip in the low list price cities was higher. It's possible that the areas have a smaller number of Mexican establishments than other areas to skew the results a bit.
That got me to the next question: Are Mexican restaurants that are located near the border better reviewed?
One would think that Mexican restaurants that are located in border states (California, Arizona, New Mexico and Texas) might be more authentic and better, right? What does the data reveal?
It appears that there is a slight lift for the review of Mexican restaurants on the border. The mean between both groups is only 0.06, and the median and standard deviation is the same.
This indicates that the quality of Mexican restaurants is independent of its location. It doesn't necessarily correlate with higher-priced neighborhoods or border city. The question then is: what does it correlate with? We move on to the hot issue at hand.
Is being reviewed as a spicy restaurant good for its rating?
We all know that Mexican food is famous for being spicy, but is that good for its ratings or not? To answer this question I used the information given by the review dataset of the Yelp Open Dataset. In addition to the user reviews, it shows their comments on the restaurant. With that information I went to answer two questions:
1. Does including a word like spicy (or its synonyms) imply a restaurant is better rated?
2. Is there an ideal number of times to reference such words, given that a suggestion of too spicy might turn off some people?
The type of reviews analyzed were of this sort:
'Chips and salsa were good but you had to pay for them including salsa picante which was perfectly spicy. Food was tasty and margaritas were refreshing. I had the tacos and were pretty good.'
This reviewer included two references to spicy: picante and spicy. So in a new feature called "has_picante" it would be classified as a yes, and in the "picante_counter" it would be a 2.
Then, running this information against the reviews we can see the following:
Mexican restaurants that are reviewed with a picante word have a higher mean and a lower standard deviation. But is that statistically significant? Yes, it is. I ran a t-test to compare both means, and that gave me a p-value of less than 0.05.
Now, what about the optimal number? Using a picante word too much conveys that the food was indeed too spicy for most tastes.
Here we can see the times a picante word was used against the mean star rating for those restaurants. We see that 5 versions of the words appears optimal, and t a steep drop off occurs after 6.
So, when you are choosing which Mexican restaurant to go to, don't take into consideration if it is in an expensive city, or even if it is in a border town; you just want to focus on those that are the right level of spicy!