Employee Attrition Analysis
Introduction
Employee productivity and efficiency are major areas of focus for many of todayβs top corporations. A company with highly efficient employees gains the benefit of greater output, better products, and superior customer relations. Due to these advantages, vast resources are often used to ensure rapid deployment of new hires into the roles they were brought on to fill. However, even with excellent training programs in place, management will always see a transitional loss in efficiency due to the new employee needing time to acclimate and learn the ins and outs of their new environment.Β
Due to that inevitable loss of productivity when onboarding a new employee, to maintain a consistent level of performance, employers should aim to minimize the necessity of new hires, and do whatever they can to retain the current ones who already know the ropes. But how can this be achieved? By looking at data provided by IBM data scientists, this project attempts to find trends that could help provide the insight employers need not only to hold on to the productive employees they do have, but to identify the traits of individuals that would provide the most stability for the company.Β
Dataset Background
The dataset analyzed for this project was found on Kaggle, and was developed as a study resource by IBM data engineers.Β The data is build on both personal employe information and information about the position that employee held for nearly 1500 individuals. The individuals surveyed represent numerous industries and roles to provide broad insights that could be applied to most industries. For this project, the data columns were separated into three buckets: employee background, employment information, and employee satisfaction.Β
The breakdown is as follows:
Employee Background:
- Age
- Gender
- Education Level
- Commute Length
Employment Information:
- Business travel
- Overtime Y/N
- Wage
Employee Satisfaction:
- Job Satisfaction (Composite Score)
Hypothesis
The initial hypothesis for this investigation is that an employeeβs wage will be the greatest factor in determining the likelihood of attrition. Making money to support themselves, their family, and their lifestyle is the main purpose of most peopleβs employment. As such, the logical assessment would be that wage carries the greatest weight in predicting whether or not an individual will leave their role. Beyond this initial hypothesis, other interesting questions arise: Will an employeeβs background or overall satisfaction have as great an impact in their choice to remain with or leave a company as income? If so, what factors will prove to be the most important?
Analysis
Age

Fig 1: Age Distribution by Age
Figure 1 showcases the distribution of employee ages for both the attrition and non-attrition groups. The two distributions follow a similar trend, as the majority of employees surveyed were in the 20-50 year old range. This curve is expected given that younger people may still be in school, while older people will begin to retire past a certain age. A clear gap in the average age of individuals who did or did not leave their role can be observed. The attrition group average is roughly 5 years younger than the non-attrition average. The attrition distribution is centered around 30 years old, while the non-attrition distribution is roughly 5 years older at 35. This seems to indicate that younger people tend to leave their companies more often than somewhat older people.Β

Figure 2: Quantity and Percentage of Attrition by Age Groups
Figure 2 provides more insight into the quantity and percentage of attrition by age group. Similar to Figure 1, Figure 2 shows that the percentage of attrition starts high for the youngest group, bottoms out in the 35-55 year old range, and starts increasing again in the final age group.Β
Gender
The next classifier to consider is gender. Figure 3 showcases the distribution of male and female employees surveyed, with respect to age. It appears the IBM data scientists did a decent job getting a near matching distribution of males and females by age for this survey. This is key to deriving a one-to-one comparison of attrition between the genders at any age group.

Figure 3: Age Distribution by Gender

Figure 4: Gender Proportion and Attrition Percentages
Figure 4 gives clarity on the proportion of men and women surveyed for this dataset. While Figure 3 proves that the distribution of each gender is very similar, Figure 4 shows that significantly more men were surveyed than women. However, the overall attrition percentages for men and women are about the same, with only 2% more men than women choosing to leave their role around the time of the survey.

Figure 5: Gender Attrition Percentage by Age Group
Figure 5 showcases the slight difference in attrition pattern by gender. Both genders follow a declining attrition pattern in the earliest years and are nearly identical until age group 35-40. It is worth noting that the sample size is much smaller in the first two age groups (as seen in Figure 2), which may account for the disparity in data between genders. Once the data reaches the 35-40 age group, the gender data no longer follows the same pattern. Males continue to follow a smooth decline in attrition that is only interrupted once the earliest retirement ages are reached in age group 45-50 and on.
On the other hand, females follow a much more complicated pattern, where attrition percentage is lowest in age groups 35-40 and 50-55 but bumps up to levels more similar to the men in the other groups. Without more specific data to back any claim of reasoning for this, one may interpret the low female age 35-40 attrition to traditional family roles in which the mother is the primary caregiver for young children. Taking on this role, many women may look for stability in their careers and are therefore dissuaded from leaving their position.Β
Education Level
The dataset contains 5 levels of education representing the highest level of education achieved by each employee:
- 1: Below College
- 2: Associateβs Degree
- 3: Bachelor's Degree
- 4: Masterβs Degree
- 5: Doctorate

Figure 6: Proportion of Attrition for All Education Levels
As can be seen in Figure 6, attrition is mostly even across all education levels. There is just a slight decline as education level increases. The only level that stands out from the rest is the doctorate education level. The first pie chart in Figure 6 shows that only 3% of employees surveyed had a doctorate. It is possible that roles that require doctorate degrees are so rare, or perhaps so specialized, that an individual would seek to acquire their doctorate for the specific purpose of qualifying for a particular role. Thus once the individual reaches their end goal and is employed in the position or field of their choosing, they are more likely to remain there for the long term.

Figure 7: Age Distribution by Education Level
Figure 7 provides slightly more insight into why attrition slowly decreases with education level. It is not surprising to see that the lowest level of education, below college, has the earliest attrition peak. One can surmise that as uneducated individuals age, they are more likely to return to finish their education and thus elevate themselves to higher status positions. This holds true for each subsequently higher level of education (except associateβs degree).Β
Commuting Time

Figure 8: Percentiles of Commute Length for Attrition and Non-Attrition Employees

Table 1: Statistics of Commute Length for Attrition/Non-Attrition Employees
Figure 8 and Table 1 provide information on how commute length affects an employeeβs decision to leave their role. Both show a clear trend where employees with higher commute lengths are more likely to leave their role. This is not surprising, as a longer commute would mean more travel time as well as a higher cost of travel for the employee, which cuts into both income and free time for the employee. Employers looking to open additional office space may want to seriously consider their next location to find a spot close to large population centers or easily accessible via major highways or public transport systems.Β Such locations are more appealing to employees than those that involve a more arduous commute.
Business Travel

Figure 9: Attrition by Frequency of Travel
Figure 9 does not show any convincing trend that the requirement of rare or even frequent business travel has a strong impact on an employeeβs likelihood to leave their position. On first consideration, one might think that frequent travel would have an impact on a personβs decision to remain with a company, as having to be always on the move could be considered a negative aspect of a job to many people. This thought process then coupled with the results of Figure 9 would suggest there might be another hidden factor at play here, such as additional compensation. However Figure 10 shows it can be seen that the frequent travel category actually has the lowest average monthly income. It appears the trend in Figure 9 can be trusted, and that there is no discernable connection between requirement of business travel and employee attrition.

Figure 10: Monthly Income by Business Travel Requirement
Overtime

Figure 11: Attrition by Overtime Requirement
Figure 11 exhibits a strong relationship between the requirement of overtime and employee attrition. As a healthy work-life balance is very important to many, it is not surprising that being required to work beyond the standard 40 hour work week is a motive for leaving a role for many employees.
Wage

Figure 12: Attrition by Monthly Income

Table 2: Attrition Statistics by Monthly Income
The original hypothesis prior to beginning this project was that income would be the single greatest factor in determining an employeeβs likelihood of remaining in their role. Figure 12 and Table 2 back up the prediction that lower wages would lead to more attrition. Every percentile of wage for employees who left their role is lower than their non-attrition counterparts. Clearly there is a correlation between wages and attrition.
Job Satisfaction
The position outlook feature was created by combining the scores of five other categories related to an employeeβs perspective on their role: environment satisfaction, job involvement, job satisfaction, relationship satisfaction, work-life balance. Each of these factors were scored on a 1-5 scale, with 1 being the lowest score and 5 being the highest. The position outlook feature is simply an average of these five scores.Β

Figure 13: Attrition by Position Outlook

Table 3: Attrition Statistics byΒ Position Outlook
Figure 13 and Table 3 support the hypothesis for employees with a higher outlook on the non-monetary aspects of their role, such factors would have discernible impact on their decision to remain at or leave their position. This should not be surprising for anyone who has had the chance to work for companies such as Google or Facebook, who are famous for creating fun, comfortable working environments for their employees. If an employee feels happier at their place of work, they will develop more loyalty there.
Conclusions
The insights that came to light from the research into this project prove that the hypothesis of wage being a critically impactful feature is, in fact, true. Employees who left their role had a much lower average income than their counterparts. Additionally, position outlook and the requirement of overtime proved to be highly correlated features to attrition. All of these features seem to be more important than most of the personal information categories. Age, gender, and education level either had no discernible correlation to attrition or were too weak/inconsistent to be considered a major factor for attrition. The only category investigated in the personal information group that showed a significant correlation with attrition was commuting time.Β
Employers can use the information found here to better prevent unnecessary loss in employee efficiency by taking a look at their own employment practices and comparing themselves to their competitors. By offering more competitive wages, as well as better employee amenities or services, a company can not only expect to retain more of their high level employees but should also expect to continue to make hires that will remain loyal to the company moving forward.Β
Next Steps
While the hypothesis was partially confirmed, it could not be fully upheld using the methods employed in this project. Additionally, many of the categories analyzed in this project may be susceptible to multicollinearity, such as the requirement of overtime and job satisfaction. To properly cross examine all of the features in this dataset by hand using the methods of this project would be impractical. If the multicollinearity could be accounted for, machine learning techniques could be utilized to discover much more information about this dataset, such as being able to predict an employeeβs likelihood of attrition or the salary level required to attain a desired level of certainty of retaining an employee. A feature importance list could also be generated to give an answer to the second half of the original hypothesis.