Data Analysis on Job Satisfaction

Dan Toledano

Posted on May 2, 2020

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Being happy at work and satisfied with our position and responsibilities is a real issue. From the company's point of view, having happy employees increases their productivity and improves the company's image which will attract talented workers at a lower cost. Meanwhile, an employee is trying to maximize its job satisfaction.

The following analysis is based on the assumption that Glassdoor company's ratings are not biased and are locally independent.

Glassdoor provides information on each company, such as the list of the benefits given to their employees, and their reviews and ratings. Each rating may include the location and position of the employee. We will explain how the ratings depend on those features.

Data

The data was scraped from Glassdoor using scrapy. Over a million observations have been collected among 2500 companies. It contains :

Name of the company
Industry
Revenue
List of benefits
Reviews:
- Ratings:
  - Overall
  - Career opportunities
  - Compensation and benefits
  - Work-life balance
  - Senior management
  - Culture and values
- Location (city, state)
- Position
- Former/Current employee

What does the data look like?

The following graph is showing the densities and boxplots of the mean ratings per company.

This graph shows a few things. First, peoples seem to give a lower rating for senior management. And that career opportunities rating seems to have a smaller variance with very few low extreme values.

Improving the overall rating by changing benefits

The most obvious strategy to improve a company's rating is by giving more benefits to their employees. Does this strategy have any impact on the different ratings? What is the relationship between the overall rating and the other ones?

The following graphs can help us to better understand the relationships between the different ratings and each benefit. I selected the professional development benefit as it is showing a linear dependence with some of the ratings.

Although we can observe a linear dependence on compensation and benefit and career opportunities ratings, it might not be significant enough. As we will see in the correlation matrix, most of the benefits don't have any obvious relationship with the ratings. Another issue is that some of the benefits are unbalanced (most of the data are around the same value), which induces a bias on the regression.

Findings

The different features are apparently not very correlated (they all are less than 30%). Despite everything, the compensation and benefits rating is more correlated to the benefits than the other ratings, which can be explained by two different situations. Either there is an implication between those two features and thus the more benefit, the better is this rating. Otherwise, it can be explained by the supposition that a company offering more benefits may afford to offer better salaries.

The correlations between the different ratings are all very high. This aspect is an inherent property of the data, which is coming from the psychological approach of rating something through different ratios. If one of them is low, the other ones might be lower than what they should be, for the seek of coherence. I, unfortunately, do not have any relevant data to confirm/infirm that theory.

To resume, the ratings seem to globally have a very low correlation with the benefits. Yet, they are slightly more correlated with the compensation and benefits rating.

Improving the overall rating differently

An observation we can make on the data is that the overall rating is not the mean of the other ratings but is provided by the user "independently".

Knowing this, we might wonder: What is the relation between the different ratings? Is one of them explaining the overall rating better than the others?

Despite the fact that some ratings might have more variance around the regression line (eg compensation and benefits), it is likely that there is a very high linear dependence between the different ratings, which confirms our previous theory.

The slopes of career opportunities and senior management regression lines are greater than the other ones which lead to the hypothesis that they are a better explanation of the overall rating.

Ratings and locations

This map displays the mean overall rating per state. It can be improved by adding data as it still has a high variance.

Conclusion

Based on the collected data, we can conclude that the overall rating of a company is most sensitive to career opportunities and senior management ratings. Then, a good strategy would be to focus on improving those ratings. However, investing in improving the benefits the company provides to its employees also has an effect on the overall rating. We particularly observed a linear dependency from the professional development benefit on most of the ratings. It is also the case for a few other ones, such as company social events, diversity programs, and surprisingly gym membership.

Adding more data in the study could help to reduce the selection bias induced by the fact that we scraped the first n-pages of glassdoor company's list, which are sorted by popularity. It also would balance our data by having more companies per industry and per state.

A next step to have a better understanding of that question is by including the remaining features in our analysis.

About Author

Dan Toledano

Dan has a background in applied mathematics and quantitative finance with a master degree in applied mathematics from Sorbonne University in Paris. He indeed specialized in random modeling with relevant experience as a quantitative researcher. He is passionate...

View all posts by Dan Toledano >

Data Visualization

Yunnan Sourcing Tea Storefront and Analysis of the High End Tea Market

Student Works

Data Analysis on Streaming Platforms

Web Scraping

Increasing Support Forum User Engagement in SaaS Companies

Data Visualization

The Budget Traveler's Data Guide to Southeast Asia

Python

NYC Real Estate Housing Web Scraping Project

No comments found.

Data Analysis on Job Satisfaction

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Data

What does the data look like?

Improving the overall rating by changing benefits

Findings

Improving the overall rating differently

Ratings and locations

Conclusion

About Author

Dan Toledano

Related Articles

Leave a Comment

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Data Analysis on Job Satisfaction

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Data

What does the data look like?

Improving the overall rating by changing benefits

Findings

Improving the overall rating differently

Ratings and locations

Conclusion

About Author

Dan Toledano

Related Articles

Leave a Comment

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!