Credit Card Defaults, Education and Age: Are They Related?

Gregory Domingo
Posted on Jul 22, 2016

Introduction

In the financial services industry, the occurrence of defaults on credit and the search for ways to understand and control it have always been top of mind for centuries. In this blog, I attempt to understand it using credit card data from Taiwan.

Even with a limited data and very basic analysis, it seems that there is an inverse relationship between credit cards default rates and education. People in Taiwan who had a higher level of education had less defaults. Another interesting finding is that there may be a relationship as well between age and creditworthiness.

The Data

The data was downloaded from the UCI Machine Learning Repository. (https://archive.ics.uci.edu/ml/machine-learning-databases/00350/)

The data had records of 30,000 credit card customers over a period of six months and had 24 customer attributes that included gender, marital status, educational level (graduate, university, high school, others) , and an indicator of the instance of default at the end of the period.

I adjusted some of the data including taking out some outliers and correcting data for some that were obviously erroneous, but overall the data was very clean.

The Sample

Gender. Females dominate the sample 60 :40 (Left chart below). The data source did not indicate if the sample was generated using random sample methodologies. If it were, this might be an indication that credit card companies tend to favor giving credit cards to females over males(unless Taiwan really has 60% females in their population).

 

barbygender

barbymarriage

 

 

 

 

 

 

 

 

 

Marital Status. The chart above on the right shows the sample having more single persons. 46% are married, 53% single

 

barbyeduEducation Level.  16% are high school graduates, 47% university graduates, and an astounding 35% have graduate degrees.

 

 

 

 

 

 

barbyage

Age. The average age is 35.1 years although as one can see in the graph on the right the distribution of age is skewed to younger people particularly between 20 and 40.

 

 

 

Methodology

The Taiwan Credit Card data had information on whether the customer defaulted or not. We refer to this as the outcome. We then look at this outcome vis-a-vis the customer attributes - gender, marital status, education level, and age. Both our outcome and attributes are, in statistics parlance, discrete - that means they can be categorised. In such cases, we are mostly limited to using bar charts in doing the analysis.

So bar charts it is. However, I used a special type of bar chart which seems to be more appropriate for doing the analysis - the 100% stacked bar chart. A standard stacked bar chart would not suffice for our purposes as our sample sizes within a category are different. It would be difficult to interpret the result.

For example, take the case of defaults by gender. There are  18,000 females and 12,000 males in the sample. A standard bar chart will show that there will be more females in default than males simply because there are 50% more females than males in our data. But is that really the information we want to get? The answer is no. The information we want is the proportion of females defaulting on their credit card relative to all females. This is exactly what a 100% stacked bar chart answers.

 

Findings

The two most interesting results - defaults by gender and by education level - are shownDEFBYEDUPROP and discussed below. The rest of the analyses did not exhibit visible relationships.

Credit Card Defaults and Education Level

From the bar chart on the right, we can see that the rate of default (the blue box) decreases as the level of education increases. So it seems that there is a relationship between education level and rates of default.

We can ignore the category "Others" as these seem to be misclassifications and account for less than 2% of the sample.

 

 

 

 

Credit Card Defaults and AgeDEFBYAGEPROP

For this run, I included a 3-point moving average which is represented by a black line on the chart to the right. The line smooths out the individual variations for each age year.

There seems to be an association of declining defaults as one progresses from the age of low 20s to early 30s and then there is a slow but steady increase of default rates from people aged in the early 30s to people aged in the early 50s.

The relationship between defaults and age starts to be erratic for the sample of people aged 60 and higher as the sample in that group is quite small totalling less than 1 %.

Conclusion

It would be interesting to investigate whether the patterns observed in this data are true as well outside of Taiwan.

With a bigger sample size and monthly time series data of at least three years, it may be interesting to do a more in depth analysis of credit card consumer behavior. For example, one can investigate whether there is a relationship between defaults and historical payment behavior patterns such as rising credit card balances.

 

 

 

 

 

 

About Author

Gregory Domingo

Gregory Domingo

Built his career in the financial services industry (fixed income research and fixed income portfolio management) in New York. and moved back to the Philippines in 1995. Has been involved since then in senior management positions in both...
View all posts by Gregory Domingo >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp