I Told You So : Higher Education + Work Hard + Happy Marriage = Good Life !!!
Contributed by Conred Wang. He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between September 26th to December 23rd, 2016. This post is based on his first class project - Exploratory Visualization Project (due on the 3rd week ofthe program).
Table of Contents |
|
Higher Education + Work Hard + Happy Marriage = Good Life.
Really?
Most asian parents tell their children, "If you have higher education, work hard and are happily married, you will be blessed with a good life." Using R and ggplot2, we visually explored the Adult dataset, from UC Irvine Machine Learning Repository, in order to find out if what asian parents say is a myth or a fact. In this study, income of more than $50K is used as a proxy for "a good life".
+ + = ?
Data Set
The Adult dataset was originally used as an exercise in predictive analytics to see whether income exceeds $50K/year. It is based on the 1994 census data. In this study, the data is being used to explore the relationship of education, work hours, marital status and earning.
- Information can be found at [https://archive.ics.uci.edu/ml/datasets/Adult].
- Dataset can be downloaded at [https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.datahttps://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data]
- There are 32,561 observations in the dataset.
- There are 15 attributes.
ETL (Extract - Transform - Load)
For the putpose of this investigation, we performed various ETL operations on the dataset:
- After removal of observations with missing values, 30,162 remained.
- We transformed the Marital Status from 7 groups into 5. [Appendix A]
- We transformed the Work Class from 8 groups into 5. [Appendix B]
- There are two attributes, "Education Years" and "Education", in the original dataset. We mapped Education into eight Education Levels [Appendix C]. We plotted Education Years aganist Education Levels, and found they are related : more/longer Education Years equates to higher Education Levels. Thus, we decided to use Education Years as the education indicator.
- We singled out asian adults (i.e. Race = "Asian-Pac-Islander") for our investigation. There are 895 asian adults.
- We furher singled out asian adults who worked in the Private sector (i.e., Work Class = "PRIV") in order to focus on a group that would be expected to have the highest variance in income.
- The following scatterplot shows these asian private sector workers' education years, work hours per week and marital status:
Moment of truth...
+ + = ?
.
Higher Education
-Dad & Mom: You are right.
- The plot shows people who earn more do have a higher mean number of years of education:
valye
.
Work Hard
- Dad & Mom: You are right about boys.
- While there is no noticeable difference for female in the number of hours worked, there appears to be significant difference for males. Men who make more money also work longer hours:
.
Happy Marriage
- Dad & Mom: You are right again!
- The plot shows, in general, married people make more money than people with other marital status:
.
A few thoughts
Two out of three ain't bad, right? Marriage and education both appear highly related to income. However, "hard work" (i.e., length of hours worked) only appears related to income for males.
.
Appendices
Appendix A : Marital Status Transformation
RawData |
Loaded to R |
Divorced |
Divor |
Married-AF-spouse |
Marri |
Married-civ-spouse |
Marri |
Married-spouse-absent |
Marri |
Never-married |
Never |
Separated |
Separ |
Widowed |
Widow |
.
Appendix B : Work Class Transformation
Raw Data |
Loaded to R |
Federal-gov |
GOV |
Local-gov |
GOV |
Never-worked |
NEVER |
Private |
PRIV |
Self-emp-inc |
SELF |
Self-emp-not-inc |
SELF |
State-gov |
GOV |
Without-pay |
NOPAY |
.
Appendix C : Education to Education Level Mapping
Raw Data Education |
Loaded to R Education Level |
Preschool |
1 |
1st-4th |
2 |
5th-6th |
3 |
7th-8th |
3 |
9th |
4 |
10th |
4 |
11th |
4 |
12th |
4 |
HS-grad |
4 |
Some-college |
4 |
Assoc-acdm |
5 |
Assoc-voc |
5 |
Prof-school |
5 |
Bachelors |
6 |
Masters |
7 |
Doctorate |
8 |
.
Appendix D : R code
- I gave my first presentation on this investigation on 10/11/2016.
- I used the knitR package with RMarkdown to peform ETL, plot and generate all the slides for the presentation.
- The R code is available at GitHub Gist conredwang / I.Told.You.So..Rmd
(end)