Can The New York Times Improve Their Recipes?
The New York Times Cooking section is a favorite among many who read the online publication. Many great authors and cooks contribute to the postings although some recipes fare much better than others. This is exhibited both by the overall rating of the recipe and the number of people that they advertise have viewed the recipe. In this project, I set out to explore trends in recipe popularity and quantify those differences that I was observing. The goal was to produce recommendations for future recipe postings to boost customer interaction with the website.
About the Data
I chose 5 variables to analyse within each recipe page:
- Rating/Number of ratings
- Cooking Category
- Length of recipe
- Recipe Description
- Tag Words
Many recipes included all, some or few of these variables. I chose these five categories because they are important visual aspects of the recipes and have the possibility to significantly increase consumer traffic.
About the Scraping
To scrape the data from the Times website, I used Scrapy and built a data collection spider to acquire info from around 1,800 recipes. The scrape required the spider to travel down 3 levels of hyperlinks to arrive at each individual recipe page. There were a total of 7 pages with each collection of 'Editors' Collections' which the spider navigated to collect each recipe.
I transformed the data from the html format into a workable dataframe using both R and Python. I was able to parse the timestamps that were provided by the NYT and use them for time analysis after converting them to a common measurement.
My analysis of the data consisted of 5 main questions
- Is the overall number of ratings correlated with the star rating?
- Does the inclusion of a picture affect popularity?
- Does the inclusion of a lengthy description affect number of reviews?
- Does the inclusion of a "NYT cooking category" affect popularity?
- What are the most used and most popular recipe tags?
The review (0 - 5 stars) and the number of reviews were positively correlated. Although not directly correlated, you could infer that as a recipes number of reviews increased, so did their star rating. I used number of ratings as a variable to assess popularity rather than review because it was a much more granular statistic and the review was more like a categorical variable, having only 3 tiers.
Picture or No Picture
I assumed the inclusion of a picture on a recipe would have a large impact on the popularity i.e. having a picture would result in a more popular post. This did not turn out to be the case. Although the spread of data in the category "Picture" was centered around a higher average number of reviews, this was not a significant difference. Overall, having a picture may indicate a small number of change in reviews although I could not conclude empirically that this was true.
There were 5 different 'recipe categories' that a recipe could have apart from not being given a category:
- Editor's Pick
- Cookbook Exclusive
- Times Classic
These five categories are correlated with very different popularity results. The majority of "Healthy" recipes seem to be much more unpopular than than even the recipes with no category at all. Although the only category that has a significantly different average number of reviews than the 'no category" category, is "Times Classic". This category performs much better than if you are to write a similar recipe without the category tag.
After converting the recipe times to a standard measurement in minutes, you can deduce from the graphs that there is not a real difference in the popularity of recipes depending on the time it takes to cook them. Longer recipes have a slightly lower median number of reviews, though this did not prove to be a significant difference.
The inclusion of a description blurb at the beginning of a recipe proved to make the most significant difference in overall popularity out of all of the variables. This makes sense as the description is a chance for the author to 'sell' the recipe to the reader before they take the plunge to cook it.
The two charts shown here are the most popular tags sorted by average number of ratings of the recipes they are used with, and total number of recipes they are used with, respectively. The two charts provide very different tags and give insight into what categories of recipes people are gravitated towards. Because the charts show different tag words, this proves very useful for designing future recipes. Because a tag word is used in many recipes does not necessarily mean that it is providing value.
Based on the data collected and analysis, I can conclude the following from the findings:
- Popularity of a recipe, measured by the total number of ratings is positively correlated with the 5 star review of the recipe.
- Having a picture does not necessarily negatively impact the recipe as a whole. This is important when developing recipes when thinking whether to pay for a professional food photographer etc. It could be overall beneficial to save money on the back end because it does not significantly negatively affect the front end.
- Having a Recipe Category can both positively and negatively affect the popularity of the recipe. The "Times Classic" Recipe category proves to be correlated with a significantly higher popularity. Although this might be a reason to add the category to more highly rated recipes, this would have to be analysed further. The inclusion of the tag in more recipes may dilute the affect.
- Including a recipe description significantly affects the popularity of the recipe. This is an easy change to make. Add a description and generally you will see your recipe become more popular.
- The inclusion of specific tag words may significantly increase the popularity of the recipe. Alternatively, looking at the most tag words with the highest average number of ratings may be a good "map" to steer the direction of new recipes.