I Told You So : Higher Education + Work Hard + Happy Marriage = Good Life !!!

Posted on Oct 21, 2016

Contributed by Conred Wang.  He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between September 26th to December 23rd, 2016. This post is based on his first class project - Exploratory Visualization Project (due on the 3rd week ofthe program).

 

 Table of Contents

  1. Really?
  2. Data Set
  3. ETL (Extract - Transform - Load)
  4. Moment of truth...

    1. Higher Education
    2. Work Hard
    3. Happy Marriage
  5. A few thoughts
  6. Appendices

    1. Appendix A : Marital Status Transformation
    2. Appendix B : Work Class Transformation
    3. Appendix C : Education to Education Level Mapping
    4. Appendix D : R code
  7. (end)

 

Higher Education + Work Hard + Happy Marriage = Good Life.

Really?


Most asian parents tell their children, "If you have higher education, work hard and are happily married, you will be blessed with a good life."  Using R and ggplot2, we visually explored the Adult dataset, from UC Irvine Machine Learning Repository, in order to find out if what asian parents say is a myth or a fact.  In this study, income of more than $50K is used as a proxy for "a good life".

Higher Education + Work Hard + Happy Marriage = Good Life ?


Data Set

The Adult dataset was originally used as an exercise in predictive analytics to see whether income exceeds $50K/year.  It is based on the 1994 census data.  In this study, the data is being used to explore the relationship of education, work hours, marital status and earning.


 

ETL (Extract - Transform - Load)

For the putpose of this investigation, we performed various ETL operations on the dataset:

  • After removal of observations with missing values, 30,162 remained.
  • We transformed the Marital Status from 7 groups into 5.  [Appendix A]
  • We transformed the Work Class from 8 groups into 5.  [Appendix B]
  • There are two attributes, "Education Years" and "Education", in the original dataset. We mapped Education into eight Education Levels [Appendix C].  We plotted Education Years aganist Education Levels, and found they are related : more/longer Education Years equates to higher Education Levels.  Thus, we decided to use Education Years as the education indicator.

Education Years can be used as education indicator

  • We singled out asian adults (i.e. Race = "Asian-Pac-Islander") for our investigation.  There are 895 asian adults.
  • We furher singled out asian adults who worked in the Private sector (i.e., Work Class = "PRIV") in order to focus on a group that would be expected to have the highest variance in income.

Asian adualt mostly worked in private sector.

  • The following scatterplot shows these asian private sector workers'  education years, work hours per week and marital status:

Asian private sector works


Moment of truth...

Higher Education + Work Hard + Happy Marriage = Good Life ?

.

Higher Education

Higher Education -Dad & Mom: You are right.

  • The plot shows people who earn more do have a higher mean number of years of education:

Plot about Higher Educationvalye
.

Work Hard

Work Hard - Dad & Mom: You are right about boys.

  • While there is no noticeable difference for female in the number of hours worked, there appears to be significant difference for males.  Men who make more money also work longer hours:

Plot about Work Hard
.

Happy Marriage

Happy Marrage - Dad & Mom: You are right again!

  • The plot shows, in general, married people make more money than people with other marital status:

Plot about Happy Marriage
.


A few thoughts

Two out of three ain't bad, right?  Marriage and education both appear highly related to income.  However, "hard work" (i.e., length of hours worked) only appears related to income for males.

.


Appendices

Appendix A : Marital Status Transformation

RawData
Loaded to R
Divorced
Divor
Married-AF-spouse
Marri
Married-civ-spouse
Marri
Married-spouse-absent
Marri
Never-married
Never
Separated
Separ
Widowed
Widow

.


Appendix B : Work Class Transformation

Raw Data
Loaded to R
Federal-gov
GOV
Local-gov
GOV
Never-worked
NEVER
Private
PRIV
Self-emp-inc
SELF
Self-emp-not-inc
SELF
State-gov
GOV
Without-pay
NOPAY

.


Appendix C : Education to Education Level Mapping

Raw Data
Education
Loaded to R
Education Level
Preschool
1
1st-4th
2
5th-6th
3
7th-8th
3
9th
4
10th
4
11th
4
12th
4
HS-grad
4
Some-college
4
Assoc-acdm
5
Assoc-voc
5
Prof-school
5
Bachelors
6
Masters
7
Doctorate
8

.


Appendix D : R code

  • I gave my first presentation on this investigation on 10/11/2016.
  • I used the knitR package with RMarkdown to peform ETL, plot and generate all the slides for the presentation.
  • The R code is available at GitHub Gist conredwang / I.Told.You.So..Rmd

(end)

 

 

 

About Author

Conred

As a software engineer, scrum master and project management professional, Conred Wang believes in, "Worry less, smile more. Don't regret, just learn and grow.", which motivated him to study at NYCDSA and become a data scientist. His exposure...
View all posts by Conred >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI