Data Analysis on Job Satisfaction
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Being happy at work and satisfied with our position and responsibilities is a real issue. From the company's point of view, having happy employees increases their productivity and improves the company's image which will attract talented workers at a lower cost. Meanwhile, an employee is trying to maximize its job satisfaction.
The following analysis is based on the assumption that Glassdoor company's ratings are not biased and are locally independent.
Glassdoor provides information on each company, such as the list of the benefits given to their employees, and their reviews and ratings. Each rating may include the location and position of the employee. We will explain how the ratings depend on those features.
The data was scraped from Glassdoor using scrapy. Over a million observations have been collected among 2500 companies. It contains :
- Name of the company
- List of benefits
- Career opportunities
- Compensation and benefits
- Work-life balance
- Senior management
- Culture and values
- Location (city, state)
- Former/Current employee
What does the data look like?
The following graph is showing the densities and boxplots of the mean ratings per company.
This graph shows a few things. First, peoples seem to give a lower rating for senior management. And that career opportunities rating seems to have a smaller variance with very few low extreme values.
Improving the overall rating by changing benefits
The most obvious strategy to improve a company's rating is by giving more benefits to their employees. Does this strategy have any impact on the different ratings? What is the relationship between the overall rating and the other ones?
The following graphs can help us to better understand the relationships between the different ratings and each benefit. I selected the professional development benefit as it is showing a linear dependence with some of the ratings.
Although we can observe a linear dependence on compensation and benefit and career opportunities ratings, it might not be significant enough. As we will see in the correlation matrix, most of the benefits don't have any obvious relationship with the ratings. Another issue is that some of the benefits are unbalanced (most of the data are around the same value), which induces a bias on the regression.
The different features are apparently not very correlated (they all are less than 30%). Despite everything, the compensation and benefits rating is more correlated to the benefits than the other ratings, which can be explained by two different situations. Either there is an implication between those two features and thus the more benefit, the better is this rating. Otherwise, it can be explained by the supposition that a company offering more benefits may afford to offer better salaries.
The correlations between the different ratings are all very high. This aspect is an inherent property of the data, which is coming from the psychological approach of rating something through different ratios. If one of them is low, the other ones might be lower than what they should be, for the seek of coherence. I, unfortunately, do not have any relevant data to confirm/infirm that theory.
To resume, the ratings seem to globally have a very low correlation with the benefits. Yet, they are slightly more correlated with the compensation and benefits rating.
Improving the overall rating differently
An observation we can make on the data is that the overall rating is not the mean of the other ratings but is provided by the user "independently".
Knowing this, we might wonder: What is the relation between the different ratings? Is one of them explaining the overall rating better than the others?
Despite the fact that some ratings might have more variance around the regression line (eg compensation and benefits), it is likely that there is a very high linear dependence between the different ratings, which confirms our previous theory.
The slopes of career opportunities and senior management regression lines are greater than the other ones which lead to the hypothesis that they are a better explanation of the overall rating.
Ratings and locations
This map displays the mean overall rating per state. It can be improved by adding data as it still has a high variance.
Based on the collected data, we can conclude that the overall rating of a company is most sensitive to career opportunities and senior management ratings. Then, a good strategy would be to focus on improving those ratings. However, investing in improving the benefits the company provides to its employees also has an effect on the overall rating. We particularly observed a linear dependency from the professional development benefit on most of the ratings. It is also the case for a few other ones, such as company social events, diversity programs, and surprisingly gym membership.
Adding more data in the study could help to reduce the selection bias induced by the fact that we scraped the first n-pages of glassdoor company's list, which are sorted by popularity. It also would balance our data by having more companies per industry and per state.
A next step to have a better understanding of that question is by including the remaining features in our analysis.