Analyzing Data to Predict Credit Card Churner

Posted on May 31, 2021
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Credit card churners mean lost money for the credit card company. In this analysis, I used a data set of existing and past customers to find what the churners had in common. With that information, I could find the group of people within the existing client base that is most likely to churn their credit card.


This dataset consists of  customer information, with a total of 21 variables and 10,127 observations. On the dataset's kaggle page, churning is defined simply as cancelling or attriting the credit card. The definitions of the variable names are as follows:

Clientnum Num Client number. Unique identifier for the customer holding the account
Attrition_Flag char Internal event (customer activity) variable - if the account is closed then 1 else 0
Customer_Age Num Demographic variable - Customer's Age in Years
Gender Char Demographic variable - M=Male, F=Female
Dependent_count Num Demographic variable - Number of dependents
Education_Level Char Demographic variable - Educational Qualification of the account holder (example: high school, college graduate, etc.)
Marital_Status Char Demographic variable - Married, Single, Unknown
Income_Category Char Demographic variable - Annual Income Category of the account holder (< $40K, $40K - 60K, $60K - $80K, $80K-$120K, > $120K, Unknown)
Card_Category Char Product Variable - Type of Card (Blue, Silver, Gold, Platinum)
Months_on_book Num Months on book (Time of Relationship)
Total_Relationship_Count Num Total no. of products held by the customer
Months_Inactive_12_mon Num No. of months inactive in the last 12 months
Contacts_Count_12_mon Num No. of Contacts in the last 12 months
Credit_Limit Num Credit Limit on the Credit Card
Total_Revolving_Bal Num Total Revolving Balance on the Credit Card
Avg_Open_To_Buy Num Open to Buy Credit Line (Average of last 12 months)
Total_Amt_Chng_Q4_Q1 Num Change in Transaction Amount (Q4 over Q1) 
Total_Trans_Amt Num Total Transaction Amount (Last 12 months)
Total_Trans_Ct Num Total Transaction Count (Last 12 months)
Total_Ct_Chng_Q4_Q1 Num Change in Transaction Count (Q4 over Q1) 
Avg_Utilization_Ratio Num Average Card Utilization Ratio

Data Exploration & Cleaning

Before beginning any kind of data exploration or analysis, I changed all the "Unknown" values to be recognized as null values in python. I also removed 2 columns of irrelevant data. After that, I split the data into 2 tables, attrited customers and existing customers using the "Attrition_Flag" variable. The attrited customer table and existing customer table had 1,627 and 8,500 observations, respectively.

I created histograms for the 11 categorical variables and 8 numerical variables in the 2 new tables and compared the peaks and trends.


The data visualizations showed me that the categorical data did not vary between the attrited and existing customers considerably. However,  some histograms of the numerical variables followed different trends and peaked at different points between the 2 tables. These variables were considered the dependent variables of becoming an attrited customer. Those variables and their peaks are as follows: 

Dependent Variables Peaks
Total_Revolving_bal 0
Total_Amt_Chng_Q4_Q1 0.701
Total_Trans_Amt 2,329
Total_Trans_Ct 43
Total_Ct_Chng_Q4_Q1 0.531


To confirm that the attrited customers would fall at these values, I created scatterplots between each variable.

Results & Conclusion

With confirmation that the peaks showed the points of the attrited customers, I determined that existing customers that fell within the same areas on the scatterplots would also have a high likelihood of attriting. I filtered out the existing customers that fell within a standard deviation of all the peak values. The result was a table of 202 customers that were the most likely to attrit their credit cards.

This table allows the credit card company to use their resources more efficiently. By targeting these 202 card holders, instead of going through the entire list of 8,500 existing customers, the company has a higher chance of contacting customers before they attrit and retain them.

Using this method, a larger or smaller scope of customers can be found if I filter by more or less than the standard deviation. This can increase efficiency even more, by targeting the customers with the highest likelihood of attriting first and contacting more relevant customers as the company deems necessary.

Challenges & Future Work

Originally, I was going to find the credit card churners based on a stricter definition. They would be customers who apply for credit cards to only get the rewards and bonuses. Once they receive and use them, the customer would cancel the credit card. In an effort to spend as little money as possible, the customer would also have a zero or near zero total revolving balance.

A group of credit card churners would cost the company a lot of money. If I could compare the dataset of credit card churners to the other customer dataset, I could predict who in the existing customer group is a churner or even who is a churner among the credit card applicants. Unfortunately, the dataset does not provide sufficient information to find this group. I would need the total revolving balance for the entire relationship with the bank and all data on rewards and bonuses.

Moving forward, I would gather the required data and do a more focused credit card churner prediction.

About Author

Nixon Lim

I am a data science fellow at NYC Data Science Academy with a Bachelors in Mathematics and Psychology. I am looking for opportunities to improve efficiency and maximize resource utilization using data visualization and statistical analysis.
View all posts by Nixon Lim >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI