US Annual Payroll Study

Connie Zhang
Posted on Oct 24, 2016




How have annual payroll payments in United States evolved during the past ten years and are there any trends? Do geographic factors have an impact on their distribution? These are the questions I hope to address in this project.
In this study,  we investigate the distribution of US payroll payment from 2005 to 2014 and explore the evolution of business patterns over this period.
The data used in the study is published by US Census Bureau from the link:

It contains the following information which we focus on:

  • State: 50 US states and the district of Columbia
  • Industry: 21 industries classified by NAICS code (North American Industry Classification System)
  • Employee number: the sample collected from the payroll of first quarter during the year
  • Aggregate payroll payment: Total pay to employees for each industry defined above

Due to missing data, the following industries are ignored mostly in the study:

  • Agriculture, forestry, fishing and hunting
  • Mining
  • Utilities (mostly)
  • Industries not classified

Design of the Study

  • First, explore the aggregate payroll payment behavior through year 2005 to 2014
  • Explore the distribution of the average annual pay on a state level and its development through years 2007, 2009 and 2014
  • Explore the distribution of the standard deviation of average annual pay on a state level through years 2007, 2009 and 2014
  • Investigate the top and bottom industries across United States based on average annual pay over time
  • Investigate employment change on industry level over the period

This study was performed through R visualization.

Results of the Study

The following plots show the US payroll total annual payment, total employees paid and average annual pay over the period.

annualtotalpay                                                  annualtotalemployee  annualavgpayroll

The Shading area shows the payroll behavior during the financial crisis and gave us some insight about the business pattern during that time.

Let's take three important periods and see how the average annual pay across industries evolved on states level. The following plots shows the distribution in 2007, 2009 and 2014, which are before, during and after financial crisis:


avg2007                                                             avg2009   avg2014

The boxplots show the distribution and development of annual average pay across the United States. They indicate that it increased more from the year 2009 compared with from 2007 to 2009. And they also shows that different states do have different annual average pay and this difference is sizable for some states.

Here is another point of view regarding the distribution of annual average pay we discussed above:



Overall, the plot above shows most of the states get recovered from year 2009.

Then let's continue to look at the variation of annual average pay across United States over those three years.

std2007                                                              std2009


The plots above show the standard deviation distribution of annual average pay across industries for states. It seems to indicate that the difference of annual average pay among industries are increasing after 2009 as well.

Next, we investigate the annual average pay in different industries and collect the top three and bottom three performers each year for the ten year period. Here are the results we get:




The plot above shows that Finance & Insurance, information and management of Enterprise were the top three performers during the period. And we also compare the annual average pay between top and bottom industries as the following:



This graph indicates the gap between the top and bottom ones has been increased over years and it appears this trend is going to stay in the near future.

Finally, we investigate how the employment change is distributed among industries over years. The employment change is defined as the ratio of annual employee increase compared with employee count in previous year. Here we show the result as the below:

empincr1                              empicr2

It seems that the plots show construction got the biggest hit during financial crisis, and although it seems every industries' employment recovers somewhat after 2009, it does not show those top performing industries have larger increases in the rate of employment.


Based on the study above, the following information is shown:
US payroll aggregate payments have been increasing over the years since 2009. We do see the recession had the most serious impact on total payroll number but less impact on average pay nation wide.
The Northeast and West Coast are the regions with the biggest pay increases over years compared with the Central region and they also have larger pay variance across industries within each state.
Finance, insurance and information are the top average paid sectors across the United states. Retail, Accommodation and Food are the poorest in term of the average paid. The top performers increase the average paid more quickly than those bottom ones. And the trend shows no sign of stopping to in the near future.
After the Financial Crisis in 2009, the payroll has been recovering over past years and return to pre-crisis levels in 2014. However, the top paid industries did not generate the top employment over the recover period.


About Author

Connie Zhang

Connie Zhang

Connie Zhang, a marketing specialist, has been working in the field of data analysis since 2010. She holds a Ph.D. in Engineering,MBA and an Associateship of the Society of Actuary in United States.
View all posts by Connie Zhang >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp