NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship 🏆 Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release 🎉
Free Lesson
Intro to Data Science New Release 🎉
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See 🔥
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular 🔥 Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New 🎉 Generative AI for Finance New 🎉 Generative AI for Marketing New 🎉
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular 🔥 Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular 🔥 Data Science R: Machine Learning Designing and Implementing Production MLOps New 🎉 Natural Language Processing for Production (NLP) New 🎉
Find Inspiration
Get Course Recommendation Must Try 💎 An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release 🎉
Free Lessons
Intro to Data Science New Release 🎉
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See 🔥
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Data Visualization > Linkedin: Exploring the Background of a Data Scientist

Linkedin: Exploring the Background of a Data Scientist

Lauren Taylor
Posted on Mar 12, 2019

Project GitHub | LinkedIn:   Niki   Moritz   Hao-Wei   Matthew   Oren

The skills we demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Reason For Research:

LinkedIn is a social network for professionals making it the Facebook for your career.

When I had first started my journey on changing career paths into coding, that was when I first heard about data science. After countless hours of research my conclusion about what the job consists of was unclear. What I had discovered in my research was how this job title is rapidly growing into a demanding career option. Living in a time of enormous amounts of data, more and more companies are in demand and looking to fill this position.

I began to question, how do you become a data scientists, what degree or level of education do you need, what skills are employers looking for? This led me to LinkedIn. This lead me to explore the backgrounds, experiences, and skills that the current data scientist posses. So the data analysis begins!

At first, I had thought about scraping job sites such as Indeed, Glassdoor, Monster, etc. However, with these websites most of the information that would be gathered are more for the job descriptions and salary. Whereas in my case I am looking more for the individuals who land the job as a Data Scientist.

LinkedIn is a social network for professionals making it the Facebook for your career. This platform is the best for networking and connecting with others within your industry or an industry the user may be trying to enter. Not only is this platform great for social networking but it's also great for job searches! So I decided to do my web-scraping project on LinkedIn.

The Web-scrape:

Once I have chosen the website on which to scrape, I had to decide on what company to pull information from about their current employees. After some research, I had decided to go with Uber due to the enormous amounts of open positions for data scientists within that company.

Besides the ridesharing, Uber has branched into new areas which include Uber Eats, Uber Freight, and Uber Health, and other modes of personal transportation such as bikes and buses. As of recently, Uber has even started a new project for a new mode of ridesharing...in the air! All of these projects are done using big data and a big demand for data scientists which makes this company perfect for my project.

I used selenium and beautiful soup to web-scrape Uber's LinkedIn profile. However, I had encountered some issues while building the script. When searching through the list of current employees on a company's profile, LinkedIn will show a number of pages with 10 employee profiles on each page. After your first page, to continue scraping on to the next set of 10 profiles you have to get to the next page.

The only way to achieve this is by clicking on the "next" button located on the bottom right.  Secondly, to gather the needed information about a current employee, you have to click on the employee's name which is the link to their profile. Using selenium helped me maneuver around this issue. Selenium has a restriction on its speed because the scraping with the browser is much slower. Due to the slowness, to not get banned by LinkedIn the use of the "sleep" statements had to be used in my code multiple times to cause further slow-down.

The second problem was the number of pages scraped. I had to rewrite my code to where the scraping stopped at the 100th page before being "timed out". Making it to where it no longer continued to look for the "next button". Once this problem was solved I was able to create a pandas table which consisted of the scraped information such as employees name, job title, location, and the profile link to the current employee's profile. Then saved the outputs to a csv file.

Once this cvs file was created, I started another scraping process which consisted of scraping the output cvs file from the previous scrape that went into each profile link to grab the information needed for my analysis. This second scrape included information from each employee's experience, education, and skills. Once I was able to retrieve this information I was able to narrow the results down to only "data scientists" type roles. Once I was able to narrow down only to data scientist type roles under the Uber company I was able to make the observations needed to make my conclusions.

Results:

My first analysis was done on the typical data scientists educational background. First, I was curious as to what education requirements are needed to land the data science role. So I took a look at each employee's last education type completed. Categorizing the degrees into a separate table and taking value counts, turns out the majority has a Masters degree as their last completed education with Ph.D.s following.

Percentages of type of education completed

Taking a look at the results, I was curious as to what type of Master degrees were received by these employees. With the majority having Master degrees I felt like this would be a valuable analysis to look into. Once I created a separate table and categorized the degree names I was able to make the pie chart below. As you can see, most of the degrees completed were either Engineering or type of Mathematics degree.

Percentages of type of Master degree completed

Once my analysis on the last degree was completed, I wanted to take a deeper look into what degrees the typical data scientist at Uber started out with. This day in time, it's pretty common for individuals to change career paths after completing their first degree including myself.

So I was curious as to where these employees started off on their career. I created a separate table that entailed each profile's last first education information. The majority started off with a bachelor degree so it made sense to only do an analysis on the bachelor degrees completed. Comparing, you can see there isn't much of a difference from the Master degree results as Engineering, Mathematics, and Computer Science being the top 3 types received.

Percentages of Bachelor degree type received
Value counts of each type of Bachelor degree completed

Next, I wanted to take a look at what skill sets are more in demand from the employers looking to fill these roles. More particularly which code languages are more in demand as coding skills play a big role in data scientist duties. But first, I wanted to gather the employee's skills set and organize into other categories including coding such data analytics (data, research, analysts), and statistics skills (machine learning, modeling, stats).

Looking at the chart below, you see where coding language is more common of a skill set to have over the other categories as suggested earlier. In the bar chart, 3 represents coding language, 1 represents data analytics, 2 represents statistic or machine learning skills and 0 representing other.

Skill types for data scientists

With coding skills clearing being very important in the data science community I looked into which language code is more popular and in demand by the employers. Within the Uber company, the coding skills listed on the employee's LinkedIn consists of Python, R, C++, C, Java, and SQL. Taking the same table and only gathering the coding skill value counts, you can see below Python is clearly the most common and in demand coding skill to learn.

Value counts of coding skill listed on LinkedIn profiles

The next set of information to be analyzed is the employee's experience. Here I decided to take a look at which companies current Uber employees typically worked before their current position. After creating a separate table and cleaning up the data, I was able to compare the top 10 results. However, this comparison wasn't the best analysis for this kind of data as the results didn't show much or give us a big insight.

Looking at the results below, you can see where the numbers weren't great enough to use this as an insight into the most popular companies Uber hires from. All numbers are even across the board for the most part. I was a little shocked as I would assume the most common companies would be Microsoft, Amazon, or even Facebook.

Top 10 companies before Uber

With this set of information not being the best to use for analyzing, I looked into how many years of experience does the average employee have when hired by Uber. To gather this information I had to create a table which incorporated information from the education and experience analysis to see how many years were in between the education completed year to the hired by Uber year.

After cleaning and analyzing, it made a lot more sense as most of the current employees were employed within the first couple of years after finishing their education. Looking at the chart below, the highest peaks were from experience level 0 to 3 years of experience from the employees.

Years of experience before hired at Uber ranging from under 1 year to 21 years

As mentioned earlier in this blog, recently Uber has amped up several projects which require data science type work along with other tech companies. I was curious as to when the data scientists roles became more in popular and in demand so I wanted to take a look into the counts of hires per year.

This year 2019 not being too accurate as it is still considered a little too early, as suspected the number of hires went up drastically starting in 2017 and 2018 compared to earlier years. From hiring 4 employees then jumping up to a total of 19 in 2017 is a pretty big gap in which this was the time the projects starting occurring.

Number of employees hired per year from 2014 - 2018 with 1 hire as of Jan. 2019
Percentage difference between years of hires

To take this analysis a step further, I looked into the current job titles for the Uber employees. With over 21 different job titles in the "Data Scientists" category for job positions, the results showed me that over 60 percent were Data Scientists titles. Following Data Scientists you have Software Engineer, data analysts and data research or data engineer titles coming in at 8 - 10 percent. Last you have Machine Learning Engineer with product at 4 - 7 percent.

Job titles within the Data Science positions

With the Data Scientist title taking more than 60 percent of the positions hired for this category, I wanted to look into the specifics of what kind of Data Scientists are in demand or popular amongst this company. So, I took the data scientists titles and created a separate table with all needed information to categorize.

After cleaning there are a total of 11 different data scientists titles within that 60 percent of positions. Data Scientist being the most common and Data Scientists II and Senior Data Scientists coming in behind. Taking a look at the graphs it's a little hard to read or analyze as there are a good number of types for this category of positions.

Because this was a little tricky and hard to read to compare results I decided to dive more into the data scientists and senior data scientists. I wanted to take a look at these two particular titles to see what makes a difference between the two. What skill sets does the senior data scientist have that the data scientists don't?

How many more years of experience do the senior data scientists have that the data scientist has? What are the highest levels of education for each? To start off this analysis I had to once again create a separate table and gather information for only these two titles. Starting out, the total number of data scientists are 17 and a total of senior data scientists are 9. First I looked into the education differences between two titles and you can see my results in the graph below.

Senior Data Science education level comparison
Data Science education level comparison

Comparing the education levels completed for each, there wasn't much of a difference between the two. For both positions the Master degree was more common with Ph.D following right behind. The only difference you can see amongst the two graphs is that Senior Data Scientists has other type of education completed as Data Scientists doesn't. This is only because there is one employee with a Jurisprudence degree (J.D). These results are not sufficient enough to use at least for comparing the difference between titles.

Next, I gathered information to compare the difference years of experience between the two job titles and the results were better than from comparing the education level. For the Senior Data Scientists role the years of experience ranged from 3 - 11 where Data Scientists role ranged from 0-5 which makes sense as being qualified for a "Senior" role should require more experience. This gave me the information to conclude that to qualify for a Senior level role, you need to have a the least 3-5 years of experience prior.

Experience years needed to qualify for Senior role
Experience years for Data Scientists role

Lastly, I analyzed the skill set difference between both job titles. I created separate tables for each Senior Data Scientists and Data Scientists and gathered the total counts of skills for each profile to see if there are comparisons if any to be made. Just like the difference in education level, for the skill sets there wasn't much of a difference either as they are practically the same.

For both titles, having python and machine learning skills under your belt is a must and the most common which makes sense as machine learning plays a big role amongst data science type positions and python being the most common and used coding language. Data analysis appears in both which makes a lot of sense as a big part of these roles are to analyze data.

The only difference you see between the two charts below is the for Data Scientists you have the skill "R" which is another popular coding language along with "matlab", and for the Senior level you have "Algorithms" along with "Optimization Models" which makes sense for every data scientists to know. The two charts also only show the top 5 skill sets for each as there were different types of skills each employee added to their profile. I wanted to only gather the most common and not add skills to my comparison that were note "data science" type skills.

Top 5 skill sets for Senior role
Top 5 skill sets for Data Science role

Comparing between Data Scientists and Senior Data Scientists, the only major difference between the two titles is the years of experience as the education and skill sets were basically the same. This concludes that in order to qualify for a senior role, you need at the least 3 years underneath your belt with the given skills in this analysis.

Conclusion:

Concerning web scraping, this project was pretty challenging. With LinkedIn constantly updating their script, this causes limitations of the run time and how often the code needs to be updated in order for this analysis to run correctly. For this particular project, it would be interesting to continue gathering information on the current Uber employees to see where they end up for their next position.

It would also be interesting to gather more data to compare the salary jumps from each position. For example, we could compare the differences between data science/ machine learning engineer/analyst job positions to investigate how salary, educational or skill sets requirement differs for different positions.

However, in order to compare these salaries, we would have to intertwine another source for this data such as Glassdoor. I believe this analysis can be a great idea for future projects ahead when continuing the gather of information from LinkedIn and starting salary comparisons from Glassdoor. These ideas could even lead to a possible machine learning project. Such as, with a person's set of skills and educational background there could be recommendations for which jobs you should apply to.

You can view my codes, data visualizations, and csv files at my GitHub page here.

About Author

Lauren Taylor

A Sam Houston State graduate in Business Administration. Looking to change career paths into more of an IT industry involving machine learning and algorithms. Love to continue learning python, SQL, and R code language.
View all posts by Lauren Taylor >

Related Articles

Capstone
Catching Fraud in the Healthcare System
Capstone
The Convenience Factor: How Grocery Stores Impact Property Values
Capstone
Acquisition Due Dilligence Automation for Smaller Firms
Machine Learning
Pandemic Effects on the Ames Housing Market and Lifestyle
Machine Learning
The Ames Data Set: Sales Price Tackled With Diverse Models

Leave a Comment

Cancel reply

You must be logged in to post a comment.

Web scrapping – Andre Pitie December 29, 2019
[…] interesting: web scrapping Linkedin although obviously not […]

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    © 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application