NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Artificial Intelligence > Data-Driven Strategies to Boost Bank Term Deposits and Customer Loyalty

Data-Driven Strategies to Boost Bank Term Deposits and Customer Loyalty

Nawaraj Paudel, PhD
Posted on Sep 28, 2024

Segmentation and Classification Strategy Boost CD Deposits

Overview

A term deposit or certificate of deposit (CD) or time deposit is a type of savings account offered by financial institutions. It features a lock-up period that can range from as little as 1 month to several years and provides a better interest rate than traditional bank savings accounts. Short-term CDs (less than a year) generally provide lower interest rates than long-term CDs (more than a year). Despite the promise of better returns (exceeding 6% in 2023) and the same Federal Deposit Insurance Corporation (FDIC) security guarantee of up to $250,000, CDs have been a lot less popular than savings accounts.

The challenge of this project was to craft a marketing strategy that taps into the huge potential market for CDs. To that end, we studied Portuguese bank data collected from 2008 to 2012. Our goal was to reduce marketing campaign costs and optimize revenue by taking a strategic approach to contacting potential customers. That involved a dual strategy of segmentation and classification. Instead of contacting all customers or doing so randomly, we aimed to identify potential customers who are more likely to be interested in a CD.

Finding the Best Model

We used customer segmentation and binary classification techniques, employing models such as Logistic Regression, Random Forest Classifier, K-Neighbors Classifier, XGBoost Classifier, Voting Classifier, and Neural Networks to predict whether a customer will subscribe to a CD. The hyper-tuned XGBoost model performed the best. It achieved a recall of 77%, meaning that out of all the actual subscribers, it correctly identified 77% of them. It also achieved an F1 score of 0.662, indicating a good balance between precision and recall, especially given the highly imbalanced nature of our data. This performance was achieved with an optimized probability threshold of 0.29. In this blog post, we will lay out recommendations and strategies to boost marketing campaigns based on these data-driven results.

Understanding Saversโ€™ Motivation and Concerns

While CDs offer savers the advantage of secured, higher interest rates with no maintenance fee, the drawback that puts some of them off is the penalty for withdrawing money before the maturity date. In contrast, traditional savings accounts typically allow up to six convenient monthly withdrawals before charging a penalty, though this may vary by bank. While the S&P 500 annual return is over 7%, it comes with market volatility and the risk of capital loss. In contrast, CDs provide a secure return, especially appealing when federal interest rates are high. Given that the average year-to-year inflation is over 2%, the CD yield of 6% can be attractive for people who want to stay ahead of inflation but not to risk their capital in the stock market.

Motivation

According to a 2023 Forbes Advisor survey, only 3% of people have never opened a savings account compared to 41% who have never opened a CD. However, the survey also shows that 41% of people opening high-yield savings accounts did so to take advantage of recent interest rate hikes, compared to only 31% of those who opened CDs. This indicates that despite the fact that CDs offer a higher fixed rate of interest than savings accounts do, people still tend to prefer the savings accounts.

Understanding the motivation of banking customers is crucial for developing effective marketing strategies. This raises important questions: Is the lack of interest in CDs due to a lack of product knowledge, ineffective marketing, or simply a preference for more flexible savings options?

To address these questions, we should explore our data to identify the characteristics of potential CD customers and understand the barriers preventing others from considering CDs. For instance, educational marketing campaigns could target customer groups unfamiliar with CDs, such as younger customers or those primarily using digital banking services, by using channels like social media, online ads, and email newsletters to highlight the security and higher interest rates of CDs.

Additionally, product subscription campaigns should focus on risk-averse investors, retirees, and conservative savers. Such campaigns would highlight the stability and guaranteed returns of CDs, especially in a high-interest-rate environment. Personalized marketing through direct mail, in-person demos, and financial advisor recommendations can effectively reach these groups.

Conversely, identifying customer groups who prefer high-risk, high-reward investments, such as active stock market traders, is also important to avoid wasted effort and expense in targeting them with CD promotions. Such groups can be sent materials on products like investment accounts or mutual funds.

By addressing these factors, banks can better target potential customers and enhance the appeal of CDs, ultimately increasing subscriptions and customer loyalty. Strategic marketing campaigns can utilize various channels and mediums to maximize impact, ensuring the messaging and tone align with the businessโ€™s brand. The four Ps of marketingโ€”product, price, place, and promotionโ€”are key elements in crafting these campaigns. By exploring our data to answer these questions, we can develop targeted marketing strategies that highlight the benefits of CDs, address customer concerns, and ultimately increase CD subscriptions and enhance customer loyalty.

Challenges

Given that our data was collected between 2008 and 2012 and the communication method was primarily through telephone calls, we face several challenges in addressing the research questions posed earlier. On the upside, though, it also presented an exciting opportunity to adapt and innovate. User behavior has significantly shifted from learning about banking products in-person to discovering them on digital platforms where they can easily access various investment options. This evolution means that, while our historical data may not fully capture the current dynamics and preferences of potential CD customers, it still provides valuable insights into the foundational aspects of bank marketing strategies.

Another issue with the dataset was the imbalance resulting from random calls that resulted in a subscription rate of only 11.3%. This imbalance poses a challenge for machine learning models, as it makes it difficult to accurately predict who will subscribe to a CD. Imbalanced datasets can lead to models that are biased towards the majority class, which, in this case would be those who did not open a CD. That means we have lower predictive accuracy than the minority class โ€“ the people who did open the CD.

Despite these challenges, we were optimistic about the new avenues to reach potential customers more effectively thanks to the evolution of marketing channels and strategies in the digital age. We can leverage social media platforms, educational content, and promotional materials on the homepage of bank apps to engage with customers. These modern channels allow for more targeted and personalized marketing efforts, increasing the likelihood that the right message reaches the right audience.

By combining the insights from our historical data with contemporary marketing techniques, we can still capture the essence of effective bank marketing strategies. This approach not only addresses the challenges posed by the data but also enhances our ability to connect with todayโ€™s digital-savvy customers. By leveraging innovative and data-driven marketing strategies, we can showcase the advantages of CD, address any customer concerns, and ultimately boost CD subscriptions and foster customer loyalty.

Methodologies

We collected data from UCI Machine Learning Repository and performed several preprocessing steps, including handling missing values, encoding categorical variables, normalizing numerical features, model selection, cross-validation and hyper-parameter tuning.

Handling Missing Data

As mentioned above, we had to compensate for certain shortcomings in the data. Missing values in categorical variables were replaced with โ€˜unknownโ€™. The dataset is imbalanced, with 11.3% of customers subscribing to CDs and 88.7% not subscribing. The dataset contains 41188 rows and 21 columns:

  • Age: The age of the customer.
  • Job: The type of job the customer has.
  • Marital: The marital status of the customer.
  • Education: The level of education of the customer.
  • Default: Indicates if the customer has credit in default.
  • Housing: Indicates if the customer has a housing loan.
  • Loan: Indicates if the customer has a personal loan.
  • Contact: The type of communication contact used.
  • Month: The last contact month of the year.
  • Day_of_week: The last contact day of the week.
  • Duration: The duration of the last contact in seconds.
  • Campaign: The number of contacts performed during this campaign and for this client.
  • Pdays: The number of days that passed by after the client was last contacted from a previous campaign.
  • Previous: The number of contacts performed before this campaign and for this client.
  • Poutcome: The outcome of the previous marketing campaign.
  • Emp.var.rate: Employment variation rate - quarterly indicator.
  • Cons.price.idx: Consumer price index - monthly indicator.
  • Cons.conf.idx: Consumer confidence index - monthly indicator.
  • Euribor3m: Euribor 3 month rate - daily indicator.
  • Nr.employed: Number of employees - quarterly indicator.
  • Subscription: Indicates whether a customer subscribed to CDs.

Customer Segmentation

We divided the features into two segments:
Demographic Segmentation: That included the following categories: age, job, marital, and education.


Subscription vs Campaign Number
Figure 1: CD subscription trends by age: Customers over the age of 60 are the most likely to subscribe

The demographic analysis shows that customers aged over 60 are the most likely to subscribe to CDs as shown in Figure 1.


Subscription vs Campaign Number
Figure 2: CD subscription trends by job type: Retirees, administrative, and technical roles are most likely to subscribe, while blue-collar workers are less likely

Figure 2 shows that retirees, as well as individuals working in administrative and technician roles, are the most likely to subscribe to CD. In contrast, blue-collar workers are less likely to subscribe.


Subscription vs Campaign Number
Figure 3: CD subscription trends by education level: Individuals with university degrees and professional course certifications are more likely to subscribe

Additionally, single individuals and those with university or professional degrees show a higher likelihood of subscribing to CD as shown in Figure 3.

Behavioral Segmentation: This included the categories of default, housing, loan, poutcome, campaign, pdays, and previous.


Housing
Figure 4: CD subscription rates by housing status: House ownership does not influence CD subscription rates

Subscription type by loan
Figure 5: CD subscription rates by loan status: Having a loan does not affect CD subscription rates

Figure 4 and 5 show housing and loans have no significant effect on the likelihood of subscribing.


Subscription vs Campaign Number
Figure 6: CD subscription rates by default status: Customers without defaults show higher CD subscription rates

Figure 6 shows customers with no default history are the most likely to subscribe to CDs.


Subscription vs Campaign Number
Figure 7: CD subscription rates by previous subscribers: Over 65% of past subscribers purchase again

People who have subscribed before have over a 65% chance of subscribing again as shown in Figure 7. Additionally, customers who were contacted recently have a higher likelihood of subscribing to CDs.


Subscription vs Campaign Number
Figure 8: CD subscription rates by campaign number: The rate of subscription drops exponentially with campaign number

The likelihood of subscribing to CDs decreases exponentially with the number of calls made. Most customers who are likely to subscribe do so within the first few contacts. After 15 contacts, the likelihood of subscription almost drops to zero as shown in Figure 8.

Outlier Handling


Call Duration
Figure 9: The distribution of call duration before and after handling outliers using Winsorization

Outliers in numerical variables were handled using the Winsorization method, capping between the 5th and 95th percentiles. These thresholds were determined using the non-parametric Wilcoxon Rank Sum Test to ensure the distribution of variables remained consistent before and after transformation as shown in Figure 9.

Multicollinearity Check


Correlation among numerical features
Figure 10: Correlation analysis of numerical features: emp.var.rate, euribor3m, nr.employed, and cons.price.idx show high correlations with at least one other feature

We examined potential multicollinearity issues among numerical variables using correlation as shown in Figure 10. The variation inflation factor (VIF) for numerical features is shown in Table 1.

Table 1: Variance Inflation Factor (VIF) for Numerical Features

Numerical Features VIF
age 1.014473
duration 1.013629
campaign 1.040797
pdays 1.453027
previous 1.597798
emp.var.rate 34.874531
cons.price.idx 4.979912
cons.conf.idx 2.889608
euribor3m 68.186368
nr.employed 40.106793

Variables with a correlation higher than 0.8 and VIF greater than 5, such as emp.var.rate, cons.price.idx, euribor3m, and nr.employed, were dropped.

Categorical Feature Transformation

Categorical variables with rare subcategories were adjusted:
Those include: housing, default, loan, marital. Subcategories with low frequency were merged with the dominant category. Job and education encompassed a number of subcategories with less than 5% frequency that were combined into a new category named โ€˜Othersโ€™.


Barplot of Education Category
Figure 11: CD subscription rates by education level: Count and percentage of subscriptions for each education category. โ€˜Illiterateโ€™ and โ€˜Unknownโ€™ categories are rare and are merged into a new โ€˜Othersโ€™ category for predictive modeling.โ€

The subcategories of the education feature are illustrated below in Figure 11. For predictive modeling, the categories illiterate and unknown are merged into a new category labeled โ€˜Othersโ€™. Due to the data being collected over only 10 months in a calendar year, with Q1 and Q4 having lower proportions, we grouped the data into Q2, Q3, and โ€˜Othersโ€™.

We assessed the association of categorical features with the target variable and among themselves using Cramerโ€™s V score and the chi-square test. The Cramer's V and Chi-square p-value for test of independence of categorical features with Subscription is shown in Table 2.

Table 2: Cramer's V and Chi-square p-value for test of independence of categorical features with Subscription

Column Cramรฉr's V p-value
job 0.131109 1.399904e-147
marital 0.054133 6.222129e-27
education 0.068280 9.538740e-39
default 0.099219 3.624164e-90
housing 0.011023 2.529191e-02
loan 0.004374 3.747513e-01
contact 0.144709 1.452825e-189
month 0.274401 0.000000e+00
day_of_week 0.025188 2.981214e-05
poutcome 0.320483 0.000000e+00
quarter 0.110048 4.960959e-109

The day_of_week feature, which showed a weak association with the target variable 'Subscription' as illustrated in Table 2, was dropped to avoid sparsity in the model.

Encoding Categorical Features

Categorical features were one-hot encoded, and one random subcategory was dropped using scikit-learnโ€™s OneHotEncoder.

Handling Imbalanced Data

To address class imbalance, we employed several techniques:

  • Oversampling the Minority Class: We duplicated the minority class observations in the training dataset to balance it with the majority class.
  • Class Weight Adjustment: We assigned higher weights to the minority class during model training.
  • Threshold Tuning: The probability threshold for determining crisp labels was fine-tuned, rather than using the default threshold of 0.5.

For models such as Logistic Regression, Random Forest, SVM, and XGBoost, we used class_weight='balanced'provided by the scikit-learn library. For Neural Networks, we used two approaches: optimizing class weights using GridSearch and upsampling the minority class.

Model Selection, Evaluation and Hyperparameter Tuning

We experimented with following machine learning models:

  • Logistic Regression
  • Random Forest Classifier
  • Support Vector Machine (SVM)
  • K-Nearest Neighbors (KNN)
  • XGBoost Classifier
  • Voting Classifier
  • Neural Network

Evaluation Metrics

The models were evaluated using various sets of metrics, including accuracy, precision, recall, F1 score, and ROC AUC. Since our goal was to accurately predict whether a customer would subscribe to CDs, we focused on optimizing recall while maintaining the highest possible F1 score.

Hyperparameter Tuning

We performed hyperparameter tuning using Grid Search with Cross-Validation to identify the optimal parameters for the best model, selected based on its ability to achieve optimal recall and F1 score, and generalize well to unseen data.

Discussion

Model Performance

We trained several models and evaluated them using the following metrics:

Table 3: Evaluation metrics of select classifier models.
Model Train Time (sec) Train Score Test Score Train Precision Test Precision Train Recall Test Recall Test F1 Score Test ROC AUC Score
Logistic 0.1610 0.8381 0.8371 0.3934 0.3914 0.8051 0.8040 0.5265 0.9017
Random Forest 0.9669 0.8505 0.8488 0.4207 0.4160 0.8681 0.8478 0.5582 0.9238
SVM 125.6661 0.8453 0.8361 0.4149 0.3957 0.9099 0.8628 0.5426 0.9144
KNN 0.0238 0.9249 0.8970 0.7512 0.5704 0.4992 0.3468 0.4314 0.8363
XGBoost 0.3031 0.9539 0.9103 0.8501 0.6236 0.7170 0.5140 0.5636 0.9400
Hypertuned XGBoost 0.4043 0.9219 0.9109 0.6956 0.6333 0.5455 0.4964 0.5565 0.9447
Threshold-Tuned XGBoost* 0.4043 0.9219 0.9109 0.5710 0.5538 0.7883 0.7721 0.6623 0.9446
Hypertuned Voting Classifier 2.3032 0.9296 0.9001 0.6487 0.5467 0.8181 0.7082 0.6169 0.9354
Hypertuned Neural Network 76.1232 0.8157 0.7952 0.6123 0.3412 0.9045 0.8793 0.7303 0.4917

We have used logistics regression as a base model and analyzed the feature importance based on odds ratios for logistic regression.


Feature importance
Figure 12: Feature Importance Analysis Using Logistic Regression: Call duration and cons.conf.idx emerged as the most significant features

Figure 12 shows that call duration emerges as the most significant feature, followed by the consumer confidence index, single status, retirees, and the success of previous outcomes. The least significant feature is the method of contact. These feature importances align well with observations from exploratory data analysis (EDA), highlighting the key factors influencing the modelโ€™s predictions. The features poutcome_nonexistent, previous, loan_yes, and housing_yes were not statistically significant at the 5% level and showed a very weak association with the target variable. Therefore, these features were dropped from the model.


ROC-AUC
Figure 13: ROC-AUC curve comparison: Two best performing hyper-tuned models versus base logistic model

Our goal was to predict whether a customer will subscribe to CDs, focusing on optimizing recall while maintaining a balance with precision to achieve a higher F1 score. Among the models we trained, XGBoost stands out by providing a comparable F1 score to other models but with a better balance between precision and recall. XGBoost also generalizes well to unseen data. For those reasons, we decided to proceed with hyperparameter tuning for the XGBoost model. Although we also hyper-tuned voting classifiers and neural networks, they underperformed compared to the XGBoost model. The XGBoost model was fine-tuned using GridSearchCV, resulting in optimal parameters: {colsample_bytree: 1.0, gamma: 0.5, max_depth: 4, min_child_weight: 5, subsample: 1.0}, achieving a ROC-AUC score of 0.943 as shown in Figure 13.


Threshold tuning
Figure 14: Precision-Recall curve of fine-tuned XGBoost model. The red dot indicates the point where the optimized F1 score is achieved

Further optimization of the probability threshold significantly improved the modelโ€™s performance on unseen data, boosting recall to 77.21%, precision to 55.4%, and the F1 score to 0.6623.


Confusion Matrix
Figure 15: Confusion matrix for hyper-tuned and threshold-optimized XGBoost model

Given the recall of 77.21%, the model correctly identifies 77.21% of actual subscribers. The remaining 22.79% of actual subscribers are missed. Precision of 55.4% indicates that out of all customers predicted as subscribers, 55.4% are correct, and the rest are false positives. Figure 6 presents the confusion matrix for the unseen data.

Model Deployment

After the model is built, we release it onto the cloud so stakeholders and non-technical users may utilize it for marketing strategy. We chose Streamlit because of its lightweight design, active community, and effective generative AI integration. Streamlit's app interacts seamlessly with cloud computing platforms such as AWS, Google, and Azure, enabling web-app capabilities and real-time predictions.


Streamlit Demo Image
Figure 16: The deployment of a machine learning model using Streamlit, showcasing the app interface and providing a comprehensive user guide for seamless interaction and result visualization, Please Click here to view the App

Figure 16 illustrates a static image of the Streamlit app, with a detailed user guide, ML model outputs in a human-readable report, and critical demographic and consumer behavioral data for the selected customer.

We should always monitor model drift, which is the loss of a model's capacity to predict CD subscription as a result of changes in real-world situations. Model drift is induced by two sources: Concept drift, where the properties of the dependent variable change, and data drift, where the underlying distributions of the features change over time. To detect model drift, we should continuously monitor the model and data distributions using methods like continuous evaluation, population stability index (PSI), and Z-score. Depending on the business case, we may need to retrain the model periodically, ranging from daily to quarterly.

Impact of Data Balancing Techniques

Given that only 11.3% of customers opened CDs, our data is heavily imbalanced, leading to a bias towards the majority class. To address this, we trained our models using balanced class weights, heavily penalizing errors in the minority class. For the Neural Network, class weights were optimized through Grid Search. We also employed oversampling of the minority class. While the Neural Network performed well on the training set, it significantly underperformed on the test set compared to the XGBoost model. The performance of the XGBoost model was further improved through probability threshold optimization. The optimized probability threshold was 0.29.

Challenges and Limitations

Despite the improvements, some challenges remain, such as the potential for overfitting and the need for continuous model updates to adapt to changing customer behavior. To address these challenges, we can continuously update the model with new data to capture evolving customer behavior and incorporate additional relevant features like bank balance, savings account status, and other financial behaviors. Including factors such as CD yield percentage and savings yield percentage can also enhance the modelโ€™s accuracy and relevance.

This dataset includes only the month and day of the week, but not the year. For temporal data, it is recommended to split the training, validation, and test sets based on time to better mimic real-life scenarios and predict future outcomes. The earliest temporal data should be used for training and validation, while the later data should be reserved for testing. This approach ensures that the model is evaluated on its ability to generalize to future data.

Recommendations

Different marketing strategies should be developed for customers with varying likelihoods of subscribing to CDs. We have segmented customers into three categories based on their probability of subscription and divided them into three tiers:

Tier 1: High Probability (0.8 - 1.0) Group

These customers have a high likelihood of subscribing to CDs. We recommend the following strategy for Tier 1:

  • Personalized Offers: Leverage customer data such as age, job, and education level to personalize offers. For instance, provide higher interest rates or exclusive benefits to retirees and professionals. Focus on young customers, as they offer greater lifetime value.
  • Direct Communication: Utilize the preferred contact method (e.g., telephone) to reach out directly with personalized messages.
  • Exclusive Deals: Highlight the benefits of subscribing now, such as limited-time offers or bonuses for early subscribers.
  • Financial Advisory: Offer personalized financial advice sessions to help them understand the benefits of CD and how it fits into their financial plans.

Tier 2: Medium Probability (0.5 - 0.8) Group

These customers have a moderate likelihood of subscribing to CD. We recommend the following strategy for Tier 2:

  • Targeted Marketing Campaigns: Use digital marketing and mobile banking apps to send targeted ads and notifications about CD benefits. For detailed information on compliance, please refer to the official CAN-SPAM Act Compliance Guide for Business on the FTC website.
  • Educational Content: Provide educational videos and articles on the banking homepage about the advantages of CD and how they compare to other savings options.
  • Incentives: Offer incentives such as small bonuses or interest rate boosts for subscribing within a certain period.
  • Quarterly Promotions: Leverage the quarter information to run seasonal promotions that align with their financial planning cycles.

Tier 3: Low Probability (0.3 to 0.5) Group

These customers have a low likelihood of subscribing to CD. We recommend the following strategy for Tier 3:

  • Awareness Campaigns: Use broad marketing strategies to raise awareness about CD, focusing on their safety and reliability as an investment.
  • General Promotions: Offer general promotions that appeal to a wider audience, such as introductory rates or flexible terms.
  • Cross-Selling: Promote other banking products that might be of interest, such as savings accounts or loans, and subtly introduce CD as a complementary product.
  • Customer Engagement: Engage with these customers through surveys or feedback forms to understand their financial needs and tailor future offers accordingly.

Practical Applications

Banks can use these predictive models to:

  • Targeted Marketing: Develop personalized marketing campaigns to attract potential customers.
  • Resource Allocation: Optimize resource allocation by focusing efforts on high-probability customers.

Please refer to this GitHub page to access the executive summary PowerPoint deck.

Future Work

For future advancements, we can improve our model by including other data sources such as client transaction history, investment activities, and savings account information. This will allow us to better forecast customer behavior, opportunity costs, acquisition expenses, and possible lifetime value. To determine how much a client would invest in a Certificate of Deposit (CD), we can look at previous CD subscription data. We can detect crucial indications and trends by analyzing their savings patterns, transaction histories, and investment activity. Customers with higher savings balances and frequent investment activity, for example, may be more likely to invest in CDs in larger amounts. We can also segment customers depending on their financial behavior and personalize our marketing campaigns to those with a higher propensity to invest.

Using our model predictions, we can segment customers identified by our model prediction and randomly select them to run an A/B test. The control group will consist of customers selected randomly, similar to how we did before using the ML model, and will receive the standard marketing approach. The treatment group will consist of customers identified by our ML model as having a higher probability of subscribing and will receive the new, targeted marketing approach based on our modelโ€™s predictions.

By comparing the conversion rates between the control group and the treatment group, we can determine the effectiveness of our segmentation strategy. Conducting this A/B test will help us validate our approach and optimize our marketing campaigns for higher conversion rates, thus increasing revenue and decreasing marketing campaign costs.

Acknowledgement

I would like to express my gratitude to Jonathan Presley, Vinod Chugani, Philippe Heitzmann, Vivian S. Zhang, Cole Ingraham and Khuzaima Shahid for their support and constructive feedback.

Quick Links

LinkedIn
GitHub Repository
Watch my presentation at timestamp 28:57

About Author

Nawaraj Paudel, PhD

Data Science leader with a PhD in Quantitative Modeling and close to a decade of experience driving high-impact analytics initiatives. Proven track record of leveraging machine learning, deep learning, NLP, and data engineering to optimize business performance, improve...
View all posts by Nawaraj Paudel, PhD >

Related Articles

Capstone
Catching Fraud in the Healthcare System
Data Analysis
Car Sales Report R Shiny App
Data Analysis
Injury Analysis of Soccer Players with Python
Capstone
Acquisition Due Dilligence Automation for Smaller Firms
R Shiny
Forecasting NY State Tax Credits: R Shiny App for Businesses

Leave a Comment

No comments found.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application