I Told You So : Higher Education + Work Hard + Happy Marriage = Good Life !!!

Avatar
Posted on Oct 21, 2016

Contributed by Conred Wang.  He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between September 26th to December 23rd, 2016. This post is based on his first class project - Exploratory Visualization Project (due on the 3rd week ofthe program).

 

 Table of Contents

  1. Really?
  2. Data Set
  3. ETL (Extract - Transform - Load)
  4. Moment of truth...

    1. Higher Education
    2. Work Hard
    3. Happy Marriage
  5. A few thoughts
  6. Appendices

    1. Appendix A : Marital Status Transformation
    2. Appendix B : Work Class Transformation
    3. Appendix C : Education to Education Level Mapping
    4. Appendix D : R code
  7. (end)

 

Higher Education + Work Hard + Happy Marriage = Good Life.

Really?


Most asian parents tell their children, "If you have higher education, work hard and are happily married, you will be blessed with a good life."  Using R and ggplot2, we visually explored the Adult dataset, from UC Irvine Machine Learning Repository, in order to find out if what asian parents say is a myth or a fact.  In this study, income of more than $50K is used as a proxy for "a good life".

Higher Education + Work Hard + Happy Marriage = Good Life ?


Data Set

The Adult dataset was originally used as an exercise in predictive analytics to see whether income exceeds $50K/year.  It is based on the 1994 census data.  In this study, the data is being used to explore the relationship of education, work hours, marital status and earning.


 

ETL (Extract - Transform - Load)

For the putpose of this investigation, we performed various ETL operations on the dataset:

  • After removal of observations with missing values, 30,162 remained.
  • We transformed the Marital Status from 7 groups into 5.  [Appendix A]
  • We transformed the Work Class from 8 groups into 5.  [Appendix B]
  • There are two attributes, "Education Years" and "Education", in the original dataset. We mapped Education into eight Education Levels [Appendix C].  We plotted Education Years aganist Education Levels, and found they are related : more/longer Education Years equates to higher Education Levels.  Thus, we decided to use Education Years as the education indicator.

Education Years can be used as education indicator

  • We singled out asian adults (i.e. Race = "Asian-Pac-Islander") for our investigation.  There are 895 asian adults.
  • We furher singled out asian adults who worked in the Private sector (i.e., Work Class = "PRIV") in order to focus on a group that would be expected to have the highest variance in income.

Asian adualt mostly worked in private sector.

  • The following scatterplot shows these asian private sector workers'  education years, work hours per week and marital status:

Asian private sector works


Moment of truth...

Higher Education + Work Hard + Happy Marriage = Good Life ?

.

Higher Education

Higher Education -Dad & Mom: You are right.

  • The plot shows people who earn more do have a higher mean number of years of education:

Plot about Higher Educationvalye
.

Work Hard

Work Hard - Dad & Mom: You are right about boys.

  • While there is no noticeable difference for female in the number of hours worked, there appears to be significant difference for males.  Men who make more money also work longer hours:

Plot about Work Hard
.

Happy Marriage

Happy Marrage - Dad & Mom: You are right again!

  • The plot shows, in general, married people make more money than people with other marital status:

Plot about Happy Marriage
.


A few thoughts

Two out of three ain't bad, right?  Marriage and education both appear highly related to income.  However, "hard work" (i.e., length of hours worked) only appears related to income for males.

.


Appendices

Appendix A : Marital Status Transformation

RawData
Loaded to R
Divorced
Divor
Married-AF-spouse
Marri
Married-civ-spouse
Marri
Married-spouse-absent
Marri
Never-married
Never
Separated
Separ
Widowed
Widow

.


Appendix B : Work Class Transformation

Raw Data
Loaded to R
Federal-gov
GOV
Local-gov
GOV
Never-worked
NEVER
Private
PRIV
Self-emp-inc
SELF
Self-emp-not-inc
SELF
State-gov
GOV
Without-pay
NOPAY

.


Appendix C : Education to Education Level Mapping

Raw Data
Education
Loaded to R
Education Level
Preschool
1
1st-4th
2
5th-6th
3
7th-8th
3
9th
4
10th
4
11th
4
12th
4
HS-grad
4
Some-college
4
Assoc-acdm
5
Assoc-voc
5
Prof-school
5
Bachelors
6
Masters
7
Doctorate
8

.


Appendix D : R code

  • I gave my first presentation on this investigation on 10/11/2016.
  • I used the knitR package with RMarkdown to peform ETL, plot and generate all the slides for the presentation.
  • The R code is available at GitHub Gist conredwang / I.Told.You.So..Rmd

https://gist.github.com/conredwang/5137fa2a6ee5addef4da582abc3f9f07


(end)

 

 

 

About Author

Avatar

Conred

As a software engineer, scrum master and project management professional, Conred Wang believes in, "Worry less, smile more. Don't regret, just learn and grow.", which motivated him to study at NYCDSA and become a data scientist. His exposure...
View all posts by Conred >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp