Analyzing Data to Predict Credit Card Churner

Nixon Lim

Posted on May 31, 2021

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Credit card churners mean lost money for the credit card company. In this analysis, I used a data set of existing and past customers to find what the churners had in common. With that information, I could find the group of people within the existing client base that is most likely to churn their credit card.

Data

This dataset consists of customer information, with a total of 21 variables and 10,127 observations. On the dataset's kaggle page, churning is defined simply as cancelling or attriting the credit card. The definitions of the variable names are as follows:

Variable	Type	Description
Clientnum	Num	Client number. Unique identifier for the customer holding the account
Attrition_Flag	char	Internal event (customer activity) variable - if the account is closed then 1 else 0
Customer_Age	Num	Demographic variable - Customer's Age in Years
Gender	Char	Demographic variable - M=Male, F=Female
Dependent_count	Num	Demographic variable - Number of dependents
Education_Level	Char	Demographic variable - Educational Qualification of the account holder (example: high school, college graduate, etc.)
Marital_Status	Char	Demographic variable - Married, Single, Unknown
Income_Category	Char	Demographic variable - Annual Income Category of the account holder (< $40K, $40K - 60K, $60K - $80K, $80K-$120K, > $120K, Unknown)
Card_Category	Char	Product Variable - Type of Card (Blue, Silver, Gold, Platinum)
Months_on_book	Num	Months on book (Time of Relationship)
Total_Relationship_Count	Num	Total no. of products held by the customer
Months_Inactive_12_mon	Num	No. of months inactive in the last 12 months
Contacts_Count_12_mon	Num	No. of Contacts in the last 12 months
Credit_Limit	Num	Credit Limit on the Credit Card
Total_Revolving_Bal	Num	Total Revolving Balance on the Credit Card
Avg_Open_To_Buy	Num	Open to Buy Credit Line (Average of last 12 months)
Total_Amt_Chng_Q4_Q1	Num	Change in Transaction Amount (Q4 over Q1)
Total_Trans_Amt	Num	Total Transaction Amount (Last 12 months)
Total_Trans_Ct	Num	Total Transaction Count (Last 12 months)
Total_Ct_Chng_Q4_Q1	Num	Change in Transaction Count (Q4 over Q1)
Avg_Utilization_Ratio	Num	Average Card Utilization Ratio

Data Exploration & Cleaning

Before beginning any kind of data exploration or analysis, I changed all the "Unknown" values to be recognized as null values in python. I also removed 2 columns of irrelevant data. After that, I split the data into 2 tables, attrited customers and existing customers using the "Attrition_Flag" variable. The attrited customer table and existing customer table had 1,627 and 8,500 observations, respectively.

I created histograms for the 11 categorical variables and 8 numerical variables in the 2 new tables and compared the peaks and trends.

Analysis

The data visualizations showed me that the categorical data did not vary between the attrited and existing customers considerably. However, some histograms of the numerical variables followed different trends and peaked at different points between the 2 tables. These variables were considered the dependent variables of becoming an attrited customer. Those variables and their peaks are as follows:

Dependent Variables	Peaks
Total_Revolving_bal	0
Total_Amt_Chng_Q4_Q1	0.701
Total_Trans_Amt	2,329
Total_Trans_Ct	43
Total_Ct_Chng_Q4_Q1	0.531

To confirm that the attrited customers would fall at these values, I created scatterplots between each variable.

Results & Conclusion

With confirmation that the peaks showed the points of the attrited customers, I determined that existing customers that fell within the same areas on the scatterplots would also have a high likelihood of attriting. I filtered out the existing customers that fell within a standard deviation of all the peak values. The result was a table of 202 customers that were the most likely to attrit their credit cards.

This table allows the credit card company to use their resources more efficiently. By targeting these 202 card holders, instead of going through the entire list of 8,500 existing customers, the company has a higher chance of contacting customers before they attrit and retain them.

Using this method, a larger or smaller scope of customers can be found if I filter by more or less than the standard deviation. This can increase efficiency even more, by targeting the customers with the highest likelihood of attriting first and contacting more relevant customers as the company deems necessary.

Challenges & Future Work

Originally, I was going to find the credit card churners based on a stricter definition. They would be customers who apply for credit cards to only get the rewards and bonuses. Once they receive and use them, the customer would cancel the credit card. In an effort to spend as little money as possible, the customer would also have a zero or near zero total revolving balance.

A group of credit card churners would cost the company a lot of money. If I could compare the dataset of credit card churners to the other customer dataset, I could predict who in the existing customer group is a churner or even who is a churner among the credit card applicants. Unfortunately, the dataset does not provide sufficient information to find this group. I would need the total revolving balance for the entire relationship with the bank and all data on rewards and bonuses.

Moving forward, I would gather the required data and do a more focused credit card churner prediction.

About Author

Nixon Lim

I am a data science fellow at NYC Data Science Academy with a Bachelors in Mathematics and Psychology. I am looking for opportunities to improve efficiency and maximize resource utilization using data visualization and statistical analysis.

View all posts by Nixon Lim >

No comments found.

Analyzing Data to Predict Credit Card Churner

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Data

Data Exploration & Cleaning

Analysis

Results & Conclusion

Challenges & Future Work

About Author

Nixon Lim

Leave a Comment

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Analyzing Data to Predict Credit Card Churner

The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Data

Data Exploration & Cleaning

Analysis

Results & Conclusion

Challenges & Future Work

About Author

Nixon Lim

Leave a Comment

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!