Studying Data and Exploring Food Across the World

Posted on Aug 7, 2016
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

EDIT: Several updates and changes have been made after this blog post has been posted.

Please visit for the latest version of my App.

Source Codes at my Github has also been updated.

Dataset used in this App can be found HERE on Open Food Facts website


Traveling abroad sometimes can be painful, especially when you are trying to keep your regular diet. Imagine you have to spend a few days in Russia, and you do not speak Russian, what should you eat? Well, McDonalds can be a safe choice, but what if you data cannot provide a familiar restaurant or you want to keep your healthy diet?


[Figure 1. Open Food Facts]

Open Food Facts might be useful in helping you find out what to eat. It is an open source database that started in France, which provides nutrition facts for food products sold in each country, and all nutrition information was available in English. This can be very helpful to foreigners to understand food in this country. Currently, this database contains more than 90,000 records, all of the contents are contributed by volunteers, and its entire database can be downloaded for free.

With that being said, however, there a few issues that causing this great website to be less accessible. For example, although the search engine provided by Open Food Facts website gives many options to dig into the dataset, its user interface is not very user-friendly. Users have to switch back and forth between different result pages to check details of products, which makes the searching quite inefficient. Therefore, in an attempt to build a more efficient explorer for the Open Food Fact database, I have made a World Food Explorer with RStudio’s Shiny web application framework.



[Figure 2. Overview]

As shown in Figure 2, the first page of this web app shows a brief overview of all food records within the database. As this summary demonstrates, although there are more than 14 countries included in this database, the majority of product records in the database come from France and other European countries; while users from United States, Canada, Brazil, and Australia also contributed thousands of records. This is understandable since this project was originally founded in France.

Data Explorer


[Figure 3. Country Selector and Nutrition Filters]

The explorer page provides an interface to dig into the dataset. The Nutrition Filters box on top of the page (collapsed by default) provides slider-form filters for 10 major nutritional elements that exist in food products, including total calories, Carbohydrates, Sugar, etc. After selecting a target country on the top of the page, then check interested nutrition and change the range of filtering, this app will filter all the products based on users’ choices.


[Figure 4. DataTable]

Meanwhile, the Matching Items box lists all the products that match the filtering criteria and display corresponding nutritional facts along with the product names and their packaging barcodes. Users can also click the “Info” button on the right side of each record to browse the detail page on the Open Food Facts website.

Scatter Plot

[Figure 5. Scatter Plot]

At the bottom of Explorer page users can find another box called Correlation Between Nutrition. This box projects all the food records in the Matching Items box into a scatterplot, showing the relationship between two selected nutrition items. This chart reveals some very interesting relationships.

For example, Figure 5 shows a scatterplot between sugar and Carbohydrate per 100 gram of product, and immediately you can identify the clear correlation on x = y boundary. What does this line tell us? It means within many food products, sugar is the only carbohydrate that is included! That sounds scary, but what on earth are those items? By narrowing down nutrition filters, the app allows users to further investigate this sweet list.

High Sugar items

[Figure 6. High Sugar Items]

Ha! As shown in Figure 6, it turns out that products with the extremely high volume of sugar are really just pure sugar or syrup. Feel relieved now? Well, let’s look further.

Real High Sugar Food

[Figure 7. High Sugar Items - Below 50g]

When we lower the sugar limit further to below 50 grams per 100 grams of product, we finally find dentists’ top enemies – chocolate, sugary drinks. All those sweet devils are hiding in this range!


Summary - Warning

[Figure 8. Summary Tab]

Now, after exploring different types of food and nutrition items, users might want to save a list of items for further consideration.

This is what Summary tab is for. When users want to save an item in the food in Explorer tab, they can select items in the data table, and then click “Add Selections to Summary” button on the top right of the data table. The number of saved items are shown in both the sidebar and the Explorer tab, and users can reset this list anytime by clicking the “Reset Selections” button in the sidebar.

Saved items can be reviewed in Summary tab; in addition, this tab also calculates the average nutrition level of all selected items, compares it with the FDA’s suggested Daily Value (DV) of each nutritional element, and shows the average DV% that each item in the list contains. If the average DV% exceeds 50%, the corresponding info boxes will turn into yellow as warnings.

End Note

This Shiny App is designed as a replacement for the "unfriendly" user interface on the Open Food Facts website. While it improves users' experience, there is still more that can be done. for example, a download button might be useful for users to download their selected items. If you are interested, please feel free to fork my source code at my Github.


About Author

Jonathan Liu

Through years of self-learning on programming and machine learning, Jonathan has discovered his interests and passion in Data Science. With his B.B.A. in accounting, M.S. in Business Analytics, and two years of experience as operation analyst, he is...
View all posts by Jonathan Liu >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI