Job Risks in the USA - Occupation Vs Automation

Posted on May 17, 2021

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Throughout history, technological progress has vastly shifted the composition of employment - from manufacturing and clerking to service and management occupations. In this era of Automation, artificial intelligence, and robotics have created the potential to penetrate deeply into occupations with steadily decreasing labor cost. As businesses across industries have started the transition to digital operations, many job positions are being deemed redundant.

This project is to highlight those risk area so that proper audience can plan ahead to reduce unemployment in different states of US. The main audience for this project is the HR departments of all companies along with Government workforce teams in different states. Companies should report planned workforce changes to a global automation observatory as this gives workers time to look for a job before becoming unemployed based on the prediction status and governments/civil society time to plan for specific reskilling, tax revenue forecasting, and recruitment of new businesses.


The dataset is retrieved from Kaggle dataset base – this dataset is made by Carl Benedict Frey and Michael Osborne who did research work on the impact of computerization with the primary objective of analyzing the number of jobs at risk and to find out the relationship between an occupation’s probability of computerization, wages and educational attainment. The dataset consists of 702 different job titles and their probabilities to automation.


An interactive shiny web application has been developed using R, ggplot2, gvismap and shiny dashboard to analyze the current status of the job positions and automation risk probabilities. The web app screen consists of 5 different links to explore the current status of job automation. Following are the detail functionalities and visualization of each tab.

Job Risks in the USA - Occupation Vs Automation

Overview of Automation Job Risks

This screen gives an overall statistic of the most impacted and least impacted states with risk counts of job positions. State based job counts is an interactive map hovering over which, the counts of job loss is highlighted based on automation probability.

Job Risks in the USA - Occupation Vs Automation

Red highlighted states like Texas, California are already in high risk as the number of manual job counts are high in those places which can be automated in near future completely.

This screen contains another histogram, which shows the count of job titles among the different ranges of high, medium and low probability levels.

Job Risks in the USA - Occupation Vs Automation

Explore by Job Title

This link provides the functionality to search by job title and get a statistic of different features. People can find out what are the top safe states for that job position through this functionality and can take decision to get relocated to safe states. Additionally, user can check how many jobs are there in every state for that job title. A very user-friendly interactive functionality has been provided in the screen to look for automation probability of similar kind of title for comparison.

Job Risks in the USA - Occupation Vs Automation

As an example, if user wants to search the status of job title as 'Advertising and Promotional Managers', the top safe states for this title looks like following:

Job Risks in the USA - Occupation Vs Automation

The gvismap will show count of jobs with the same title as 'Advertising and Promotional Managers' for every US state.

If user wants to check the probability of automation for other similar manager title, the bar chart shows the exact statistics.

Explore By State

This link provides the functionality to check the automation status by state name. User gets the interactive visual bar chart for risky jobs in that state and scatter chart for overall job density on that state.

As an example, if someone wants to check the impact of automation for the state of Massachusetts, following is the graph of top risky jobs.

Overall job density of Massachusetts shows that almost 25% of the job in this state are with more than 75% automation probability.

An additional user-friendly interactive feature has been provided here to check the count of employees impacted with similar job title in this state. For an example, if  we want to apply this feature for 'Clerk' job title, below is the bar chart showing almost 60000 office clerks are in high risk. More than 45000 Stock clerk and Order Fillers are impacted as well. This feature provided a easy comparison to get overall idea on the risk of that job title.

Summary of Job Risks from Automations

This link gives a full summary of the data analysis done through visualization and interactive operations and represents four different graphs on concluding the automation risk on Job title.

All over US, here are the top 10 Job Title which will impact majority of people in next decades. It is observed that Retail Salesperson and Cashiers are in vulnerable position.

Job Category plot shows that Sales and Clerical Departments are the two high risk job category whereas Education is still in safer zone.

Automation probability density scatter plot shows that the density is very high for the job titles where automation probability is more than 85%. That is extremely concerning.

The final summary is a column chart which gives the statistics of most impacted top 10 states with the employee counts due to job automation. California, Texas and New York are in first three where millions of people can lose their job if not re-skilled before the crisis time approaches.

Based on above mentioned analysis, the final conclusion comes as 49% of total US jobs are in high risk which is a high number. These statistics will help the HR team of different organizations, Government work force team and the workers for understanding the emergency and plan ahead for reskilling or upskilling to avoid the danger of job loss for millions of people.

For R code, please click here

About Author

Chaitali Majumder

Decisive, analytical-minded Data Scientist and Business Leader with a proven track record of 10+ years of work experience in Business Process Management project implementations.
View all posts by Chaitali Majumder >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI