Credit Card Defaults, Education and Age: Are They Related?
Introduction
In the financial services industry, the occurrence of defaults on credit and the search for ways to understand and control it have always been top of mind for centuries. In this blog, I attempt to understand it using credit card data from Taiwan.
Even with a limited data and very basic analysis, it seems that there is an inverse relationship between credit cards default rates and education. People in Taiwan who had a higher level of education had less defaults. Another interesting finding is that there may be a relationship as well between age and creditworthiness.
The Data
The data was downloaded from the UCI Machine Learning Repository. (https://archive.ics.uci.edu/ml/machine-learning-databases/00350/)
The data had records of 30,000 credit card customers over a period of six months and had 24 customer attributes that included gender, marital status, educational level (graduate, university, high school, others) , and an indicator of the instance of default at the end of the period.
I adjusted some of the data including taking out some outliers and correcting data for some that were obviously erroneous, but overall the data was very clean.
The Sample
Gender. Females dominate the sample 60 :40 (Left chart below). The data source did not indicate if the sample was generated using random sample methodologies. If it were, this might be an indication that credit card companies tend to favor giving credit cards to females over males(unless Taiwan really has 60% females in their population).
Marital Status. The chart above on the right shows the sample having more single persons. 46% are married, 53% single
Age. The average age is 35.1 years although as one can see in the graph on the right the distribution of age is skewed to younger people particularly between 20 and 40.
Methodology
The Taiwan Credit Card data had information on whether the customer defaulted or not. We refer to this as the outcome. We then look at this outcome vis-a-vis the customer attributes - gender, marital status, education level, and age. Both our outcome and attributes are, in statistics parlance, discrete - that means they can be categorised. In such cases, we are mostly limited to using bar charts in doing the analysis.
So bar charts it is. However, I used a special type of bar chart which seems to be more appropriate for doing the analysis - the 100% stacked bar chart. A standard stacked bar chart would not suffice for our purposes as our sample sizes within a category are different. It would be difficult to interpret the result.
For example, take the case of defaults by gender. There are 18,000 females and 12,000 males in the sample. A standard bar chart will show that there will be more females in default than males simply because there are 50% more females than males in our data. But is that really the information we want to get? The answer is no. The information we want is the proportion of females defaulting on their credit card relative to all females. This is exactly what a 100% stacked bar chart answers.
Findings
The two most interesting results - defaults by gender and by education level - are shown
Credit Card Defaults and Education Level
From the bar chart on the right, we can see that the rate of default (the blue box) decreases as the level of education increases. So it seems that there is a relationship between education level and rates of default.
We can ignore the category "Others" as these seem to be misclassifications and account for less than 2% of the sample.
For this run, I included a 3-point moving average which is represented by a black line on the chart to the right. The line smooths out the individual variations for each age year.
There seems to be an association of declining defaults as one progresses from the age of low 20s to early 30s and then there is a slow but steady increase of default rates from people aged in the early 30s to people aged in the early 50s.
The relationship between defaults and age starts to be erratic for the sample of people aged 60 and higher as the sample in that group is quite small totalling less than 1 %.
Conclusion
It would be interesting to investigate whether the patterns observed in this data are true as well outside of Taiwan.
With a bigger sample size and monthly time series data of at least three years, it may be interesting to do a more in depth analysis of credit card consumer behavior. For example, one can investigate whether there is a relationship between defaults and historical payment behavior patterns such as rising credit card balances.