Exploration of Permanent Labor Certification Applications

Posted on Oct 12, 2022

Shiny App


Permanent Employment Certification (PERM) is one of the methods a foreign national can obtain permanent residence status in the United States.

The H-1B visa is one of the most popular work visas in the US but has a maximum duration of six years. After their six years is up, many H-1B holders want to stay and work in the US. If this is the case, the first step in the process to transition from H-1B to green card (permanent residence) is for the employer to apply for a Permanent Labor (PERM) certification.

Achieving a PERM certification will require that your employer establish the prevailing wage for your position and set your salary to this amount. They will also have to go through a recruitment process, to prove that there are no qualified US candidates for your position. Certification approval is not guaranteed and can depend on many factors, such as the applicants's qualifications, nationality, the position and the employer's location, etc.

This project explores whether and how much various factors contribute to the approval of PERM and provides visualization for all previous applications from 2008-2019.

Data Cleaning

The data from 2008-2014 had very slightly different columns, varying between 25-27 features. The data from 2015 onwards appears to have been standardized and the number of features increased to 125 with many additional minute details regarding the application cases. This project will only explore a few features in detail, including the year, case status, employer state, the SOC code (Occupational code associated with the job being requested for permanent labor certification, as classified by the Standard Occupational Classification (SOC) System), job title, prevailing wage and the applicant's country of origin. The data from each year are combined into one with only the aforementioned columns included.

Specific instances of data cleaning/preparation include:

  • Making sure country names are correct/updated and can be correctly plotted using googleVis package. For example, replacing 'COTE d'IVOIRE' with 'IVORY COAST' and 'BURMA (MYANMAR)' with 'MYANMAR'.
  • Making sure U.S. State names are correct/updated and can be correctly plotted using googleVis package. For example, replacing 'BC' with 'BRITISH COLUMBIA' and 'DISTRICT OF COLUMBIA' with 'WASHINGTON'.
  • Prevailing wage is calculated using PW (prevailing wage) amount and the unit of pay (could be hourly, weekly, bi-weekly monthly or yearly). All wages are converted to yearly to allow for comparison.
  • Making sure the SOC code has the correct number of digits.

Data Exploration and Analysis

Country of Citizenship

The number of applications as well as the certification rate (1 being 100%) across all the years (2008-2019) can be seen below.

As shown, the number of applicants from India is the greatest by an extremely large margin, being more than six times the amount of the country with the second most amount of applications, China. The certification rate appears to be relatively consistent across all the countries and continents, with most being at close to 100% and only a few countries with noticeably lower rates as low as around 50%. The African continent appears to have lower certification rate on average and a few countries in particular stand out with lower certification rates in comparison to their neighbors, including South Sudan, North Korea, Ecuador, Guatemala and Mongolia, to name a few.

Of the top 25 countries with the most applications, only 3 (Mexico, Vietnam and Ecuador) have a certification rate of below 80%. In the author's opinion, it does not appear that the reason for the lower certification rate could necessarily be attributed to prejudice or poor international relationship between the countries, due to the fact that other countries in the same continent or with similar levels of relations with the U.S. suffer from the same statistics. For instance, the certification rate of China is higher than that of Taiwan despite the fact that it is relatively established that the U.S. considers Taiwan to be more of an ally. The certification rates of countries on good relations with the U.S. such as Canada, Australia, the U.K., are not necessarily higher than that of lesser friendly countries such as Iran, Russia or China. Interestingly, the top 2 countries with the most applications also have the highest certification rate amongst the top 25.

The dashboard also shows the number of applications and the certification rate through the years. China is shown below as an example.

As shown, we can see a dramatic increase in the number of applications for Chinese citizens from 2013 to 2016. The China-United States trade war persisted from 2018-2019, however no convincing evidence of the tension in the relationship is reflected in the application count or the certification rate which further indicates it is likely not a major contributor.

While it does not appear that there is blatant prejudice towards particular countries or regions of the world, the number of applications by itself could be telling of some deeper phenomenon. Since H-1B sponsorship is very commonly the precursor to PERM, perhaps there could be trends or "quotas" at that level that would more effectively dictate the number of possible green cards approved for each country.

Employer State

The number of applications as well as the certification rate (1 being 100%) across all the years (2008-2019) can be seen below.

The number of applications for employers from California is the highest, being more than double that of the second highest state, Texas. The certification rate varies quite drastically between states, with the highest being Minnesota at 94.68% and lowest being Oklahoma at 69.07% (only looking at states with more than 1000 applications).

The top 10 states with the most applications have relatively consistent and high certification rate, with the lowest being Florida at 86.16%.

It is relatively well established that H-1B sponsorship is predominantly offered for positions in science and technology, therefore it makes sense and is expected that California would be leading the way in terms of applications and certification rate and states whose industries are less technology focused, such as Oklahoma and Mississippi, would have both lower number of applications and certification rate.

Prevailing Wage

The prevailing wage distributions across the years can be seen in the box plot below.

For the cases that were certified, the prevailing wage has been generally rising as time goes on which makes sense as wages in general are expected to increase over time, even if just accounting for inflation. The rise appears to be more rapid from 2008 to 2013 and then essentially plateaus.

For the cases that were denied, the prevailing wage is lower across the board than that of those certified. This is expected as positions with lower pay are more likely to be replaceable and companies typically do not wish to go through the effort of sponsoring H-1B and applying for PERM unless there is very good reason to do so. It is interesting to note that the prevailing wage for those that were denied also rose along with those that were certified up to 2013 but the median began decreasing after that and the interquartile range also seemed to expand. Another way of looking at this is that the prevailing wage discrepancy between those that are certified versus denied seems to be increasing year after year beginning in 2014. Given that the timing of the plateau for the prevailing wage of those that were certified aligns with the decreasing median wage of those that were denied, it seems probable that for some reason the standards for certification were lowered beginning in 2014 such that those with lower wages were certified when they would have otherwise been denied in previousy years.

The area plot below further shows the increased likelihood of certification with increasing prevailing wage.

Prevailing wages lower than $15000 and above $200000 were filtered out due to many instances of incorrect salary unit reported. This only takes away around the top and bottom 0.3 percentile of wages (leaving 0.003-0.997). As shown, it is evident that towards the lower end of wage range, applications have a much higher likelihood of being denied. A vast majority of the wages fall around the 70K-90K range and the odds are good for any point beyond that.

Job Title

As mentioned previously, SOC stands for Standard Occupational Classification. The SOC system is a federal statistical standard used by federal agencies to classify workers into occupational categories for the purpose of collecting, calculating, or disseminating data.

Since many codes have different associated job titles, the one that occurs the most frequently is used in this case. There are also instances of incorrect code input such as not in the correct form, these are ignored. The table below shows the top 10 SOC codes or job titles with the most amount of applicants.

As shown, 7 out of the 10 job titles are computer/software related, 2 are electronics and mechanical engineers and 1 finance role. The job title with the most applications, software developers, has more than 3 times the amount of applications as the next highest job title, which is also technology related. It is evident that technology companies and their employees are the ones predominantly applying for PERM and the certification rates are extremely high as well. Being in the technology field is evidently the best career choice for those looking to gain permanent residence status through their employer.

Conversely, it is evident that jobs characterized by manual labor or do not require advanced degrees or great expertise have much lower certification rates. This is also reflected in the the median prevailing wage as it is apparent that the wages in the table above are significantly lower than those in the technology field. The scatter plot below illustrates how jobs with higher wages will tend to have a higher certification rate.


The shiny app provides visualizations to explore the details of all PERM applications from 2008-2019 and can allow users to gain insight into their own likelihood of having their application be certfied by selecting the features based on their specific situation.

Many interesting observations were discovered by using the app, some listed below.

  • The number of applications for those with India as their country of origin is by far the highest, being more than six times the amount of the country with the second most amount of applications and accounting for almost half of all applications between 2008-2019.
  • It does not appear that there is blatant prejudice towards certain countries or regions based on their relationship with the U.S.
  • It appears that the certification standards for prevailing wage was loosened beginning 2014.
  • Technology is king in terms of applications and certification rate. Companies in California file the most applications by far while states whose industries are not technology focused tend to have noticeably lower certification rates. Being in the technology field is tied to having higher prevailing wages and higher wages definitely contributes to a higher certification rate. Jobs characterized by manual labor or do not require advanced degrees or great expertise tend to have much lower certification rates.

Future Work

There are many other features that can be studied, such as education level of applicant, specific companies (for instance Fortune 100 versus small companies), class of admission (student visa versus work visa) and the time to arrive at decision, just to name a few. Since the data from 2008-2014 had much more limited number of features than those in the years after, only a few of the features that were present in all the years were studied in this project in order to be able to observe the year over year trends. As more data is released in the subsequent years, it would be great to include them as well as more features into the study.

It would be amazing to incorporate machine learning into the project to determine quantitatively how much the features contribute to the target variable and have the users input their own conditions for the app to predict the likelihood of their application being certified.


Data Source

Project on GitHub


About Author

Cheng Zhao

Certified Data Analyst/Scientist with engineering background in semiconductor and electronics packaging. A detail-oriented problem solver with a passion for analytics and utilizing machine learning techniques to gain insights from data to drive business decisions and to advance automation...
View all posts by Cheng Zhao >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI