USA x China: Who is winning the commodity trade war?

Guilherme Strachan
Posted on May 7, 2018


The United States has the title of world's biggest economy for a long time. However, China's economy has been growing at a pace that is threatening that leadership. The Gross Domestic Product (GDP) in the US was worth 18.57 trillion dollars in 2016 while China's GDP was worth 11.2 trillion dollars in the same period.

Recently, the Trump Administration placed tariffs on Chinese products like flat-screen televisions, medical devices, and others. The Chinese counterattacked placing tariffs on products like soybeans and pork. The exportation and importation of these products can have a direct impact on the GDP.

This project is designed to analyze and visualize the commodities exportation and importation around the world with a special focus on China and United States.


The dataset is from the United Nations Statistics Division and covers import and export trade values in USD for 5,000 commodities across most countries on Earth over the last 30 years. The size of the file is 1.25GB and a preprocessing step was necessary to reduce that size.

My first attempt was to create an image of the database inside R that reduced the size of the file to less than 100MB. However, the time to load the data from that file was unfeasible. The next logical attempt was to migrate to SQL database. However, after creating the table, the file size was still greater than 1GB and required additional manipulation. The following steps were made:

  • Drop unused columns
  • Normalize the database
  • Tune the database

The first step is straightforward since I had no need to store values that are not going to be used. The next step was necessary because there were two text columns (commodity and category) that had repeated values all over the dataset. Considering that text usually needs more memory space than an integer, two new domain tables were built to store the text values and the unique identifier created was referenced in the main table. For the last step, primary and foreign keys were created to improve the queries performance. The figure below shows the difference between the original and final database version:


This application has multiple tabs, each one offering a different approach to how we can compare the commodities.

The Bar Graph tab will allow us to choose a country and see what commodities most affect the exportation and importation total amount. For China, the total export trade amount was more than US$ 750 billion. The top 10 categories and the percentage that those ten categories influenced by the total amount are highlighted below.

The Map section, in opposite to the first tab, will give you the opportunity to compare countries trade value for a specific commodity. The map graph below was generated by calculating the balance (export-import) with all commodities taken into consideration. We can see that China and US are polar opposites in terms of trade values. China shows a higher balance trade value, while the US shows a higher negative trade value.

The last visualization option is a bubble chart. The graph enables a user to see the behavior of two commodities over time by selecting more than one country and the flow (Export or Import). Some insights can be extracted from the graphs below. We see that in ten years China has exceeded the US in exportation. Also, in 2016, China was still leading the overall commodities exportation. One curious bit of information that is shown in this graph is that in 2009 all the countries had reduced the trade value amount exported. This was probably because of the 2009 global financial crisis that likely diminished the number of products exported or the value of each commodity.


Clearly, we can see that China is winning the war trade in commodities. China has a positive balance in contrast to the US, and that helps in the gross domestic product calculation. However, the trade balance is only one component in a country’s economy estimate.

This app is very flexible and generic in a sense that analysis on other countries could be done as well.

App in ShinyIO

Code in GitHub

About Author

Guilherme Strachan

Guilherme Strachan

Guilherme Strachan is a software developer but making his way to Data Science field. He has a Master Degree in Electrical Engineering with an emphasis in Computational Intelligence. He is skilled in problem solving, machine learning models and...
View all posts by Guilherme Strachan >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp