Job satisfaction analysis

Posted on May 2, 2020

Being happy at work and satisfied with our position and responsibilities is a real issue. From the company's point of view, having happy employees increases their productivity and improves the company's image which will attract talented workers at a lower cost. Meanwhile, an employee is trying to maximize its job satisfaction.

The following analysis is based on the assumption that Glassdoor company's ratings are not biased and are locally independent.

Glassdoor provides information on each company, such as the list of the benefits given to their employees, and their reviews and ratings. Each rating may include the location and position of the employee. We will explain how the ratings depend on those features.


The data was scraped from Glassdoor using scrapy. Over a million observations have been collected among 2500 companies. It contains :

  • Name of the company
  • Industry
  • Revenue
  • List of benefits
  • Reviews:
    • Ratings:
      • Overall
      • Career opportunities
      • Compensation and benefits
      • Work-life balance
      • Senior management
      • Culture and values
    • Location (city, state)
    • Position
    • Former/Current employee

What does the data look like?

The following graph is showing the densities and boxplots of the mean ratings per company.

This graph shows a few things. First, peoples seem to give a lower rating for senior management. And that career opportunities rating seems to have a smaller variance with very few low extreme values.

Improving the overall rating by changing benefits

The most obvious strategy to improve a company's rating is by giving more benefits to their employees. Does this strategy have any impact on the different ratings? What is the relationship between the overall rating and the other ones?

The following graphs can help us to better understand the relationships between the different ratings and each benefit. I selected the professional development benefit as it is showing a linear dependence with some of the ratings.

Although we can observe a linear dependence on compensation and benefit and career opportunities ratings, it might not be significant enough. As we will see in the correlation matrix, most of the benefits don't have any obvious relationship with the ratings. Another issue is that some of the benefits are unbalanced (most of the data are around the same value), which induces a bias on the regression.

The different features are apparently not very correlated (they all are less than 30%). Despite everything, the compensation and benefits rating is more correlated to the benefits than the other ratings, which can be explained by two different situations. Either there is an implication between those two features and thus the more benefit, the better is this rating. Otherwise, it can be explained by the supposition that a company offering more benefits may afford to offer better salaries.

The correlations between the different ratings are all very high. This aspect is an inherent property of the data, which is coming from the psychological approach of rating something through different ratios. If one of them is low, the other ones might be lower than what they should be, for the seek of coherence. I, unfortunately, do not have any relevant data to confirm/infirm that theory.

To resume, the ratings seem to globally have a very low correlation with the benefits. Yet, they are slightly more correlated with the compensation and benefits rating.

Improving the overall rating differently

An observation we can make on the data is that the overall rating is not the mean of the other ratings but is provided by the user "independently".

Knowing this, we might wonder: What is the relation between the different ratings? Is one of them explaining the overall rating better than the others?

Despite the fact that some ratings might have more variance around the regression line (eg compensation and benefits), it is likely that there is a very high linear dependence between the different ratings, which confirms our previous theory.

The slopes of career opportunities and senior management regression lines are greater than the other ones which lead to the hypothesis that they are a better explanation of the overall rating.

Ratings and locations

This map displays the mean overall rating per state. It can be improved by adding data as it still has a high variance. 


Based on the collected data, we can conclude that the overall rating of a company is most sensitive to career opportunities and senior management ratings. Then, a good strategy would be to focus on improving those ratings. However, investing in improving the benefits the company provides to its employees also has an effect on the overall rating. We particularly observed a linear dependency from the professional development benefit on most of the ratings. It is also the case for a few other ones, such as company social events, diversity programs, and surprisingly gym membership. 

Adding more data in the study could help to reduce the selection bias induced by the fact that we scraped the first n-pages of glassdoor company's list, which are sorted by popularity. It also would balance our data by having more companies per industry and per state.

A next step to have a better understanding of that question is by including the remaining features in our analysis.


About Author

Dan Toledano

Dan has a background in applied mathematics and quantitative finance with a master degree in applied mathematics from Sorbonne University in Paris. He indeed specialized in random modeling with relevant experience as a quantitative researcher. He is passionate...
View all posts by Dan Toledano >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup music Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp