Can The New York Times Improve Their Recipes?

Aydan Ellman
Posted on May 13, 2019


The New York Times Cooking section is a favorite among many who read the online publication. Many great authors and cooks contribute to the postings although some recipes fare much better than others. This is exhibited both by the overall rating of the recipe and the number of people that they advertise have viewed the recipe. In this project, I set out to explore trends in recipe popularity and quantify those differences that I was observing. The goal was to produce recommendations for future recipe postings to boost customer interaction with the website.

My project can be viewed on a R Shiny app here and the code used to scrape and analyse the data can be found on my github here

About the Data

I chose 5 variables to analyze within each recipe page:

  • Rating/Number of ratings
  • Cooking Category
  • Length of recipe
  • Recipe Description
  • Tag Words

Many recipes included all, some or few of these variables. I chose these five categories because they are important visual aspects of the recipes and have the possibility to significantly increase consumer traffic. 

About the Scraping

To scrape the data from the Times website, I used Scrapy and built a data collection spider to acquire info from around 1,800 recipes. The scrape required the spider to travel down 3 levels of hyperlinks to arrive at each individual recipe page. There were a total of 7 pages with each collection of 'Editors' Collections' which the spider navigated to collect each recipe. 

Data Analysis

I transformed the data from the html format into a workable dataframe using both R and Python. I was able to parse the timestamps that were provided by the NYT and use them for time analysis after converting them to a common measurement. 

My analysis of the data consisted of 5 main questions

  • Is the overall number of ratings correlated with the star rating?
  • Does the inclusion of a picture affect popularity?
  • Does the inclusion of a lengthy description affect number of reviews?
  • Does the inclusion of a "NYT cooking category" affect popularity?
  • What are the most used and most popular recipe tags?



The review (0 - 5 stars)  and the number of reviews were positively correlated. Although not directly correlated, you could infer that as a recipes number of reviews increased, so did their star rating. I used number of ratings as a variable to assess popularity rather than review because it was a much more granular statistic and the review was more like a categorical variable, having only 3 tiers. 

Picture or No Picture

I assumed the inclusion of a picture on a recipe would have a large impact on the popularity i.e. having a picture would result in a more popular post. This did not turn out to be the case. Although the spread of data in the category "Picture" was centered around a higher average number of reviews, this was not a significant difference. Overall, having a picture may indicate a small number of change in reviews although I could not conclude empirically that this was true. 

Recipe Category

There were 5 different 'recipe categories' that a recipe could have apart from not being given a category:

  • Easy
  • Editor's Pick
  • Cookbook Exclusive
  • Times Classic
  • Healthy

These five categories are correlated with very different popularity results. The majority of "Healthy" recipes seem to be much more unpopular than than even the recipes with no category at all. Although the only category that has a significantly different average number of reviews than the 'no category" category, is "Times Classic". This category performs much better than if you are to write a similar recipe without the category tag. 



Recipe Length

After converting the recipe times to a standard measurement in minutes, you can deduce from the graphs that there is not a real difference in the popularity of recipes depending on the time it takes to cook them. Longer recipes have a slightly lower median number of reviews, though this did not prove to be a significant difference. 


The inclusion of a description blurb at the beginning of a recipe proved to make the most significant difference in overall popularity out of all of the variables. This makes sense as the description is a chance for the author to 'sell' the recipe to the reader before they take the plunge to cook it. 

Tag Words

The two charts shown here are the most popular tags sorted by average number of ratings of the recipes they are used with, and total number of recipes they are used with, respectively. The two charts provide very different tags and give insight into what categories of recipes people are gravitated towards. Because the charts show different tag words, this proves very useful for designing future recipes. Because a tag word is used in many recipes does not necessarily mean that it is providing value. 


Based on the data collected and analysis, I can conclude the following from the findings:

  • Popularity of a recipe, measured by the total number of ratings is positively correlated with the 5 star review of the recipe. 
  • Having a picture does not necessarily negatively impact the recipe as a whole. This is important when developing recipes when thinking whether to pay for a professional food photographer etc. It could be overall beneficial to save money on the back end because it does not significantly negatively affect the front end. 
  • Having a Recipe Category can both positively and negatively affect the popularity of the recipe. The "Times Classic" Recipe category proves to be correlated with a significantly higher popularity. Although this might be a reason to add the category to more highly rated recipes, this would have to be analysed further. The inclusion of the tag in more recipes may dilute the affect. 
  • Including a recipe description significantly affects the popularity of the recipe. This is an easy change to make. Add a description and generally you will see your recipe become more popular. 
  • The inclusion of specific tag words may significantly increase the popularity of the recipe. Alternatively, looking at the most tag words with the highest average number of ratings may be a good "map" to steer the direction of new recipes.

About Author

Aydan Ellman

Aydan Ellman

Data Science Fellow and Cornell University graduate with demonstrated experience in business development and the application of data analysis to drive business performance.
View all posts by Aydan Ellman >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp