Web-scraping Indeed: Exploring the US Job Market

Posted on Aug 1, 2020

When I started my job-Search before entering this Bootcamp, I didn't know that to become a quantitative Trader one, must have  programming skills, I have little knowledge about the skills required for this type of Job.

The Project was a great idea to explore in depth what are the skills required to go to the industry and It was very exciting to built from scratch, Clean the data and become an expert in Scraping website. There was many Websites to look at, e.g. Indeed,Glassdoor or LinkedIn. I decided to focus on Indeed . You can find the data scrape in this all_link and the analysis and the cleaning can be found on the github repository.

Indeed is the number one job posting site worldwide with over 250 million unique visitor every month, 10 jobs are added every second. It's a free platform where both recruiters and job applicant can find their need.


How Salaries differs in the USA?
Does money contribute to happiness in the workplace?
How do tech jobs salaries  compares to different industries?
who earn more data scientist vs data analyst?
which type of developer has the highest salary?
what are the skills needed to become a data scientist?

Data Gathering:

To answer my questions, I went to indeed.com and scraped around 2000 pages of job posting, I searched for all type of jobs so my sample would cover different industries. I looked for the job titles, companies, salaries, review, location, remote option, and description of the job.

Biggest challenges scraping Indeed:

Salaries format in Indeed varies  depending on the job position and sometime the Salary is a range which makes it inaccurate, I had to standardized my results.
also, my sample is very small, I couldn't scrape all the job posting because of time limitation.

Data Analysis:

At first, I checked the distribution of salaries. The results are shown below:

as you can see, the distribution is skewed to the right, with a median salary of 43000$.
we can see that there is an inequality in the US with the top 1% earning at least 126,000$.

A dive into states per job offering. I had to group the jobs offering by state.

California is the state that has the most job offering because it has The largest population, I was surprised that Texas is almost equal to California.
It's maybe because of the low cost of living there and  remote job are soaring in time of COVID-19.
Now lets take a look at the proportion of remote Job  and Businesses that are hiring the most during this pandemic.

I searched for these Businesses and 3 of them are from the health sector  Which is most logical in these times.

Then I wanted to investigate the claim that Income is related with happiness. I drew a Scatter plot and I found that there's no to little coloration between these two.
I was expecting sort of positive coloration but the Pearson Coloration results showed that I was wrong.

Then I turned my focus to tech related jobs. First of all, I wanted to verify the claim that data scientist earn more than data analyst.
On Average, data scientist earns around 105,000$ and data analyst 66,000$.
One limitation of my study is that it doesnt have a good accuracy of the wages.
Then, I wanted to Know which type of developer has the highest income.
Surprisingly, I found that quantitative developers are ranked  first followed by back end dev.

I wanted to conclude by focusing on the FANG ( Facebook,amazon,Netflix,Google), So I scrape a data scientist entry level position to see if a candidate like has a chance there.

Python is the skill to have as a Data Scientist. You must be proficient in another language like C++ OR JAVA and of course SQL.  We notice that a data scientist must have soft skills and for the FANG a master degree or a PHD.


In Summary, we can say that tech industry has a high Income compared to other industries. It's because there are tons of competition and lots of requirements.

some limitations of my analysis:
No historical data Which i believe will  be interesting to see how salary has evolved over the year.
No info about shareholders or CEO which I believe will make the distribution more right skewed.

If I have time in the future, I will include other countries like France or my home country and Compare Income with PPP(Purchasing power parity).
I would also create a machine learning that will read any candidate resume and match a suitable role for the applicant.


