Data Analysis on Global Plastic Pollution

Posted on Feb 6, 2020
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

R shiny | Project Code | Linkedin | Github | Presentations | Email: [email protected]


​Click here to check my R shiny data web.

“The global plastic production has mushroomed over the past 70 years. In 1950 the world produced only 2 million tonnes per year. Since then, annual production has increased by nearly 200-fold, reaching 381 million tonnes in 2015. For context, this is roughly equivalent to the mass of two-thirds of the world population”. An estimated 8.3 billion tons of plastic have been produced since the 1950s — that’s equivalent to the weight of more than 800,000 Eiffel Towers. And only 20% of it has been recycled.

No photo description available.

Here are several facts about plastic pollution:

  • Worldwide, about 2 million plastic bags are used every minute.
  • New Yorkers alone use 23 billion plastic bags every year (from NYCDEC)
  • The average time that a plastic bag is used for is 12 minutes, while it takes around 500 years to biodegrade in the ocean.
  • Plastic is killing more than 1.1 million seabirds and animals every year and causing  harm to an astronomical number of animals.
  • The pollution eventually returns to us - The average person eats 70,000 microplastics each year.

Data Analysis on Global Plastic Pollution

Plastic Ocean  vividly portrays the problem. You can watch it:

After knowing these shocking facts, I decided to launch a web app in R shiny that discovered and visualized insights among the factors impacting plastic pollution across countries to help ENGOs resolve issues with a global vision. 

Data Description

The data is collected from Science Advances published by Geyer in 2017 and Jambeck in 2015. After merging the datasets and doing some feature engineering, the variables in the dataset, shown as below, include country, population, coastal population, economic development, etc.:

No photo description available.

Mismanaged waste from  material that is either littered or inadequately disposed of could eventually enter the ocean via inland waterways, wastewater outflows and transport by wind or tides. Inadequately managed waste indicates the waste that is not formally managed, which includes disposal in dumps or open, uncontrolled landfills where it is not fully contained. Both types can end up polluting rivers and oceans. 

Data Analysis and Visualization

As you can see from the picture below, I first explored all the variables by country on a map and bar chart and attached the corresponding finding.

No photo description available.

Here are the observations I got from the exploration:

  • Coastal countries with a large population have higher plastic waste generation. The top 2 countries are China and the United States.
  • 5 out of the top 8 countries that have the highest per capita plastic waste are small island countries. The other 3 top countries are high-income countries. Based on these observations, geographical features (inland or coastal) and economic level(GDP Per Capita) appear to be factors that influence plastic pollution. They should be further discovered.
  • The top 8 countries with the highest mismanaged plastic waste are all developing countries.
  • Developed countries have noticeably lower mismanaged plastic waste per person and share.
  • Developing countries have a significantly higher share of inadequately managed plastic waste, which has the highest risk of pollution.

Based on the observations above, I did a further analysis with an emphasis on the variables of plastic waste, mismanaged plastic waste, and coastal population and geographical features.

Data on Per Capita Plastic Waste 

As can be seen from the graph below, GDP per capita has a positive linear relationship with Plastic Waste Per Capita, which implies that Plastic waste tends to increase as people and countries get more productive / richer. The converse also hold: per capita plastic waste in low-income countries is noticeably smaller.

No photo description available.

The plastic waste per capita in developed countries is significantly higher than in developing countries, as shown below.

No photo description available.

Per Capita Mismanaged Plastic Waste 

As demonstrated in the plot blew, per capita mismanaged plastic waste tends to be higher in industrialized middle-income and fast-growing developing countries.

Image may contain: text

Likely this happens because these countries' waste management infrastructure cannot to keep pace with their rapid industrial and manufacturing growth. The problem is compounded by imported  massive quantities of plastic trash from developed countries. The result is that y developed countries have a small amount of mismanaged plastic waste combined with a significant amount of plastic waste.

Therefore, the development of sufficient waste management infrastructure in middle-income and growing lower-income countries is crucial to tackling the issue of plastic pollution.

As shown below, in contrast to the per capita plastic waste, the per capita mismanaged plastic waste in developing countries is significantly higher than in developed countries.

Image may contain: text

As demonstrated from the plot below, I engineered an economic growth variable by the average per capita GPD growing rate of the country. I measured the per capita mismanaged plastic waste against it. It seems that fast-growing countries have less mismanaged plastic waste on average. That might be  because the most fast-growing countries are either landlocked oil-rich countries or low-income countries whose demand has not surged yet. I will take a closer look at it in future work.

Mismanaged Plastic Waste by Costal population and Geographical Features

As can be seen in the plot below, the coastal population has a positive correlation with mismanaged plastic waste across counties, which might because the waste generated in the coastal region has a higher risk of entering the ocean and producing severe environmental damage.

Note: Coastal population measured as the population within 50 kilometers of a coastline

No photo description available.

As shown below, coastal countries have much higher mismanaged plastic waste per person than landlocked countries.

No photo description available.

Therefore, the coastal countries need more help in building effective plastic management systems due to their higher risk of producing mismanaged plastic waste, especially in developing countries.

The governance of plastic pollution has become the problem that brooks no delay and can be issued starting from ourselves.


  •, Data published by Geyer, R., Jambeck, J. R., & Law, K. L. (2017). Production, use, and fate of all plastics ever made. Science Advances, 3(7), e1700782.
  •, Data published by Jambeck, J. R., Geyer, R., Wilcox, C., Siegler, T. R., Perryman, M., Andrady, A., ... & Law, K. L. (2015). Plastic waste inputs from land into the ocean. Science, 347(6223), 768-771.
  •, Plastic Pollution by Hannah Ritchie and Max Roser, was first published in September 2018.


About Author

Fred (Lefan) Cheng - 程乐帆

Fred Cheng is a certified data scientist who is working as a data science consultant in Zenon. He owns a Master’s Degree in Management and Systems from New York University with a bachelor’s in business management from The...
View all posts by Fred (Lefan) Cheng - 程乐帆 >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI