Data Study on US Annual Payroll

Posted on Oct 24, 2016
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


How have annual payroll payments in United States evolved during the past ten years and are there any data trends? Do geographic factors have an impact on their distribution? These are the questions I hope to address in this project.
In this study,  we investigate the distribution of US payroll payment from 2005 to 2014 and explore the evolution of business patterns over this period.
The data used in the study is published by US Census Bureau from the link:

It contains the following information which we focus on:

  • State: 50 US states and the district of Columbia
  • Industry: 21 industries classified by NAICS code (North American Industry Classification System)
  • Employee number: the sample collected from the payroll of first quarter during the year
  • Aggregate payroll payment: Total pay to employees for each industry defined above

Due to missing data, the following industries are ignored mostly in the study:

  • Agriculture, forestry, fishing and hunting
  • Mining
  • Utilities (mostly)
  • Industries not classified

Design of the Study

  • First, explore the aggregate payroll payment behavior through year 2005 to 2014
  • Explore the distribution of the average annual pay on a state level and its development through years 2007, 2009 and 2014
  • Explore the distribution of the standard deviation of average annual pay on a state level through years 2007, 2009 and 2014
  • Investigate the top and bottom industries across United States based on average annual pay over time
  • Investigate employment change on industry level over the period

This study was performed through R visualization.

Data Results of the Study

The following plots show the US payroll total annual payment, total employees paid and average annual pay over the period.

Data Study on US Annual Payroll                                                  Data Study on US Annual Payroll  Data Study on US Annual Payroll

The Shading area shows the payroll behavior during the financial crisis and gave us some insight about the business pattern during that time.

2007, 2009, 2014

Let's take three important periods and see how the average annual pay across industries evolved on states level. The following plots shows the distribution in 2007, 2009 and 2014, which are before, during and after financial crisis:


avg2007                                                             avg2009   avg2014

The boxplots show the distribution and development of annual average pay across the United States. They indicate that it increased more from the year 2009 compared with from 2007 to 2009. And they also shows that different states do have different annual average pay and this difference is sizable for some states.

Here is another point of view regarding the distribution of annual average pay we discussed above:



Overall, the plot above shows most of the states get recovered from year 2009.

Annual Average Pay

Then let's continue to look at the variation of annual average pay across United States over those three years.

std2007                                                              std2009


The plots above show the standard deviation distribution of annual average pay across industries for states. It seems to indicate that the difference of annual average pay among industries are increasing after 2009 as well.

Average Pay in Different Industries

Next, we investigate the annual average pay in different industries and collect the top three and bottom three performers each year for the ten year period. Here are the results we get:




The plot above shows that Finance & Insurance, information and management of Enterprise were the top three performers during the period. And we also compare the annual average pay between top and bottom industries as the following:



This graph indicates the gap between the top and bottom ones has been increased over years and it appears this trend is going to stay in the near future.

Employment Change Distribution

Finally, we investigate how the employment change is distributed among industries over years. The employment change is defined as the ratio of annual employee increase compared with employee count in previous year. Here we show the result as the below:

empincr1                              empicr2

It seems that the plots show construction got the biggest hit during financial crisis, and although it seems every industries' employment recovers somewhat after 2009, it does not show those top performing industries have larger increases in the rate of employment.



Based on the study above, the following information is shown:
US payroll aggregate payments have been increasing over the years since 2009. We do see the recession had the most serious impact on total payroll number but less impact on average pay nation wide.
The Northeast and West Coast are the regions with the biggest pay increases over years compared with the Central region and they also have larger pay variance across industries within each state.
Finance, insurance and information are the top average paid sectors across the United states. Retail, Accommodation and Food are the poorest in term of the average paid. The top performers increase the average paid more quickly than those bottom ones. And the trend shows no sign of stopping to in the near future.
After the Financial Crisis in 2009, the payroll has been recovering over past years and return to pre-crisis levels in 2014. However, the top paid industries did not generate the top employment over the recover period.


About Author

Connie Zhang

Connie Zhang, a marketing specialist, has been working in the field of data analysis since 2010. She holds a Ph.D. in Engineering,MBA and an Associateship of the Society of Actuary in United States.
View all posts by Connie Zhang >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI