Data Study on US Annual Payroll
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Overview
How have annual payroll payments in United States evolved during the past ten years and are there any data trends? Do geographic factors have an impact on their distribution? These are the questions I hope to address in this project.
In this study, we investigate the distribution of US payroll payment from 2005 to 2014 and explore the evolution of business patterns over this period.
The data used in the study is published by US Census Bureau from the link:
http://factfinder.census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=BP_2014_00A1&prodType=table
It contains the following information which we focus on:
- State: 50 US states and the district of Columbia
- Industry: 21 industries classified by NAICS code (North American Industry Classification System)
- Employee number: the sample collected from the payroll of first quarter during the year
- Aggregate payroll payment: Total pay to employees for each industry defined above
Due to missing data, the following industries are ignored mostly in the study:
- Agriculture, forestry, fishing and hunting
- Mining
- Utilities (mostly)
- Industries not classified
Design of the Study
- First, explore the aggregate payroll payment behavior through year 2005 to 2014
- Explore the distribution of the average annual pay on a state level and its development through years 2007, 2009 and 2014
- Explore the distribution of the standard deviation of average annual pay on a state level through years 2007, 2009 and 2014
- Investigate the top and bottom industries across United States based on average annual pay over time
- Investigate employment change on industry level over the period
This study was performed through R visualization.
Data Results of the Study
The following plots show the US payroll total annual payment, total employees paid and average annual pay over the period.
The Shading area shows the payroll behavior during the financial crisis and gave us some insight about the business pattern during that time.
2007, 2009, 2014
Let's take three important periods and see how the average annual pay across industries evolved on states level. The following plots shows the distribution in 2007, 2009 and 2014, which are before, during and after financial crisis:
The boxplots show the distribution and development of annual average pay across the United States. They indicate that it increased more from the year 2009 compared with from 2007 to 2009. And they also shows that different states do have different annual average pay and this difference is sizable for some states.
Here is another point of view regarding the distribution of annual average pay we discussed above:
Overall, the plot above shows most of the states get recovered from year 2009.
Annual Average Pay
Then let's continue to look at the variation of annual average pay across United States over those three years.
The plots above show the standard deviation distribution of annual average pay across industries for states. It seems to indicate that the difference of annual average pay among industries are increasing after 2009 as well.
Average Pay in Different Industries
Next, we investigate the annual average pay in different industries and collect the top three and bottom three performers each year for the ten year period. Here are the results we get:
The plot above shows that Finance & Insurance, information and management of Enterprise were the top three performers during the period. And we also compare the annual average pay between top and bottom industries as the following:
This graph indicates the gap between the top and bottom ones has been increased over years and it appears this trend is going to stay in the near future.
Employment Change Distribution
Finally, we investigate how the employment change is distributed among industries over years. The employment change is defined as the ratio of annual employee increase compared with employee count in previous year. Here we show the result as the below:
It seems that the plots show construction got the biggest hit during financial crisis, and although it seems every industries' employment recovers somewhat after 2009, it does not show those top performing industries have larger increases in the rate of employment.
Conclusion