NYC Data Science Academy| Blog
Bootcamps
Lifetime Job Support Available Financing Available
Bootcamps
Data Science with Machine Learning Flagship ๐Ÿ† Data Analytics Bootcamp Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lesson
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories Testimonials Alumni Directory Alumni Exclusive Study Program
Courses
View Bundled Courses
Financing Available
Bootcamp Prep Popular ๐Ÿ”ฅ Data Science Mastery Data Science Launchpad with Python View AI Courses Generative AI for Everyone New ๐ŸŽ‰ Generative AI for Finance New ๐ŸŽ‰ Generative AI for Marketing New ๐ŸŽ‰
Bundle Up
Learn More and Save More
Combination of data science courses.
View Data Science Courses
Beginner
Introductory Python
Intermediate
Data Science Python: Data Analysis and Visualization Popular ๐Ÿ”ฅ Data Science R: Data Analysis and Visualization
Advanced
Data Science Python: Machine Learning Popular ๐Ÿ”ฅ Data Science R: Machine Learning Designing and Implementing Production MLOps New ๐ŸŽ‰ Natural Language Processing for Production (NLP) New ๐ŸŽ‰
Find Inspiration
Get Course Recommendation Must Try ๐Ÿ’Ž An Ultimate Guide to Become a Data Scientist
For Companies
For Companies
Corporate Offerings Hiring Partners Candidate Portfolio Hire Our Graduates
Students Work
Students Work
All Posts Capstone Data Visualization Machine Learning Python Projects R Projects
Tutorials
About
About
About Us Accreditation Contact Us Join Us FAQ Webinars Subscription An Ultimate Guide to
Become a Data Scientist
    Login
NYC Data Science Acedemy
Bootcamps
Courses
Students Work
About
Bootcamps
Bootcamps
Data Science with Machine Learning Flagship
Data Analytics Bootcamp
Artificial Intelligence Bootcamp New Release ๐ŸŽ‰
Free Lessons
Intro to Data Science New Release ๐ŸŽ‰
Find Inspiration
Find Alumni with Similar Background
Job Outlook
Occupational Outlook
Graduate Outcomes Must See ๐Ÿ”ฅ
Alumni
Success Stories
Testimonials
Alumni Directory
Alumni Exclusive Study Program
Courses
Bundles
financing available
View All Bundles
Bootcamp Prep
Data Science Mastery
Data Science Launchpad with Python NEW!
View AI Courses
Generative AI for Everyone
Generative AI for Finance
Generative AI for Marketing
View Data Science Courses
View All Professional Development Courses
Beginner
Introductory Python
Intermediate
Python: Data Analysis and Visualization
R: Data Analysis and Visualization
Advanced
Python: Machine Learning
R: Machine Learning
Designing and Implementing Production MLOps
Natural Language Processing for Production (NLP)
For Companies
Corporate Offerings
Hiring Partners
Candidate Portfolio
Hire Our Graduates
Students Work
All Posts
Capstone
Data Visualization
Machine Learning
Python Projects
R Projects
About
Accreditation
About Us
Contact Us
Join Us
FAQ
Webinars
Subscription
An Ultimate Guide to Become a Data Scientist
Tutorials
Data Analytics
  • Learn Pandas
  • Learn NumPy
  • Learn SciPy
  • Learn Matplotlib
Machine Learning
  • Boosting
  • Random Forest
  • Linear Regression
  • Decision Tree
  • PCA
Interview by Companies
  • JPMC
  • Google
  • Facebook
Artificial Intelligence
  • Learn Generative AI
  • Learn ChatGPT-3.5
  • Learn ChatGPT-4
  • Learn Google Bard
Coding
  • Learn Python
  • Learn SQL
  • Learn MySQL
  • Learn NoSQL
  • Learn PySpark
  • Learn PyTorch
Interview Questions
  • Python Hard
  • R Easy
  • R Hard
  • SQL Easy
  • SQL Hard
  • Python Easy
Data Science Blog > Capstone > LendingClub ROI Improvement Using Machine Learning

LendingClub ROI Improvement Using Machine Learning

Brian Perez Joseph
Posted on Mar 9, 2021
LendingClub High-Grade Loans ROI Improvement Using Machine Learning

The skills the author demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Executive Summary

The central aim of the current project is to improve the return on investment in LendingClub loans. A major factor that affects the profitability of a loan is whether a borrower fails to pay the borrowed sum and the loan defaults, which often results in lenders losing out on their investment.

Therefore, a potential strategy to better improve the overall profits from loans would be to predict loan outcomes before the commencement of a loan. To that end, a classification model was built using LendingClub's open access dataset to predict whether a given a loan would default based on the borrower's financial history.

The classification model predicted the loan status of LendingClub's highest-grade loans with a training accuracy of 75% and generated approximately an average 3% increase in the percent return on investment. Overall, the model may prove beneficial usage in down-stream decision-making when trying to decide which loans to are worth investing.

Investiture in Loans

In the world of finance, loans are sums of money lent to borrowers with the expectation that the initial amount borrowed be paid back with interest. The interest rate is how lenders are able to generate a potential profit from the loan and is determined by the borrower's credit and financial history; the more unlikely that a borrower is able to pay off a loan based on his history, the higher the interest rate on the loan, and in certain instances, the borrower may be denied the loan altogether.

In order to ensure that the loan is paid off in a timely manner, amortization schedules are set up that outline the timeframe of all the individual payments towards the loan. Should borrowers deviate from the set schedule of payments and make late payments, the loan will go into delinquency, and the borrower will likely face additional financial penalties on top of the loan payments.

Persistent delinquencies will often result in the borrower failing to meet the conditions of his loan, causing the loan to go into default. Once a loan has defaulted, the borrower has been deemed unable to pay off the remaining balance on the loan, resulting in major credit damage towards the borrower and prompting the lender to pursue alternative legal methods towards collecting a portion of the remaining balance.

Generally, loans are obtained and overseen through banking intermediaries and other large financial institutions; however, companies such as LendingClub provide an alternative way of acquiring loans through person-to-person lending.

Overview of LendingClub

In person-to-person (P2P) lending, lenders/investors and borrowers are able to directly connect with one another without the presence of an intermediary. These direct transactions typically carried out on large online platforms hosted by companies such as LendingClub, where the company charges a fee for the use of its services.

The advantage of this direct method of loan acquisition is the absence of an intermediary means that the investor is able to collect more from the interest rate and gain larger profits while the borrower is able to secure loans at lower interest rates compared to those from a bank. LendingClub was one of the first companies to find major financial success with the model during the early 2010s.

LendingClub High-Grade Loans ROI Improvement Using Machine Learning
Figure 1: Line plot of the yearly total of issued loans by LendingClub from 2007 to 2018.

Figure 1 shows the total number of loans issued on LendingClub's platform from 2007 to 2018. The graph highlights the massive popularity in LendingClub's platform, seeing an explosion of over half a million issued loans after the year 2012 and exchanging a total of nearly fifteen billion dollars in loans through the platform. Given the company's success, analyzing the company's current model and performance can yield insights in how to better improve outcomes for investors.

LendingClub's Loan Grade System - Pt. 1

A key feature of LendingClub's business model is that investors and borrowers connect directly with another. A borrower seeking a loan will go through LendingClub's application where the borrower's financial background will be evaluated along with the loan amount to determine if the applicant has sufficient creditworthiness for their application to be posted on the platform.

Once approved, the borrower's loan will receive a custom grade by LendingClub reflecting the quality of the loan; the loan grade system ranges from A to G with A being the highest score and G being the lowest. The grade is based on the borrower's credit history and reflects how likely the he is able to pay off the loan. Each loan grade has its own range of interest rates that increase as the loan grade decreases. The loan grade along with the applicant's financial history is then available for viewing by potential investors who make the final decision on whether or not to invest in a particular loan.

LendingClub's Loan Grade System - Pt. 2

LendingClub profits off the fees the company charges for usage of the platform; however, the profits made from the loan go directly to the investors, making the P2P lending more financially worthwhile for the investor than if they were to go through a bank. Moreover, the interest rates assigned by LendingClub tend to be comparatively lower than those of traditional banks, allowing borrowers to secure loans at cheaper interest rates.

LendingClub High-Grade Loans ROI Improvement Using Machine Learning
Figure 2: Bar plot of the total number of loans within each grade

Figure 2 shows the number of loans associated with each loan grade. From the plot the most predominant loans present in LendingClub are B and C-grade loans. A majority of the loans approved by LendingClub tend to be high-grade/safe loans while the more underperforming loans comprise less than ten percent of the total number of loans.

Evaluation of the Return on Investment in LendingClub Loans - Pt. 1

To better understand the profitability different types of loans, the return on investment was examined across loan grades and among defaulted and paid-off loans.

LendingClub High-Grade Loans ROI Improvement Using Machine Learning
Figure 3: Boxplots of the average return on investments across loan grades

Figure 3 shows a series of boxplots of the average percent return on investment (ROI) for each loan grade. The percent ROI was calculated by finding the difference between the total loan payment and initial loan amount, dividing by the initial amount, and converting the value to a percentage. The boxplots show that as the distribution of the ROI widens as the loan grade worsens.

The boxplots illustrate the high risk/high return aspect of loan investments: the safer, high-grade loans have lower profitability due to low interest rates but are more likely to result in a net positive return while the riskier low-grade loans have much higher returns due to high interest rates but are also likely to result in large losses due to their increased likelihood to default.

Evaluation of the Return on Investment in LendingClub Loans - Pt. 2

LendingClub High-Grade Loans ROI Improvement Using Machine Learning
Figure 4: Boxplots of the distribution of ROI among fully paid and charged off loans

Figure 4 shows the distribution of ROI across fully paid and charged-off loans. A charged-off loan is a loan that has defaulted for such an extended duration of time that the lender has declared the investment as a loss and no longer purses collecting the outstanding balance. Based on the distributions, defaulted loans have a large tendency to result in a negative ROI and subsequent loss of investment for the investor.

Fully paid loans, however, are more likely to result in a positive ROI with a few exceptions. Given the effect that loan defaults may have on an investor's ROI, better predicting loan outcomes can help improve and protect loan investments. Therefore, constructing a classification model to predict loan defaults could have potential as an additional verification of flagging loans that have high risk of defaulting.

Implementation of the Machine Learning Model

The central aim in constructing the model was to be able to make accurate predictions on loan status prior to the commencement of the loan and improve the return on investment. Therefore, the data the model was trained on was restricted to the financial information and credit history of the loan applicant. Due to time constraints and limited resources, the model was constructed on the highest-grade loans (A & B) in LendingClub.

Furthermore, no loans after the year 2015 were included in the model because many of these were still ongoing. After a series of tests, the final model that was developed was an XGBoost model with a training accuracy of 76% and a testing accuracy of 68%.

LendingClub High-Grade Loans ROI Improvement Using Machine Learning
Figure 5: Receiver-Operator Curve of the XGBoost model

Figure 5 shows the ROC curve for the XGBoost model. The area under the ROC curve was 0.75, indicating that the model has on overall moderate classification performance. However, for the purposes of better detecting loan defaults, this performance was deemed sufficient.

The model was then evaluated to determine whether it could produce an improvement on ROI. This was done by having the model predict the loan status on all high-grade loans within the dataset, randomly selecting one hundred loans that the model predicted to not default, randomly selecting another set of hundred loans regardless of status, computing the average percent ROI for each group, and repeating this process.

After two thousand iterations, there was an average three percent increase in the ROI for the model's predicted loans when compared to the randomly selected loans.

Selection Strategy and Future Work on LendingClub

Overall, the model was able to improve the percent ROI amongst loans. A potential strategy utilizing this model would involve using the model to make predictions based on the applicants' financial history. Rather than relying directly on the model prediction for the loan outcome, the predicted probability of defaulting could be used an indicator of assessing the potential risk associated with loan.

A future expansion of this project would involve constructing models for the mid-grade and low-grade loans and repeating the process of analysis with those models. Afterwards, the models would be ensembled to generate a series of model designed to predict loan outcomes across loan grades and be incorporated into a broader strategy for default predictions.  

LinkedIn

GitHub

 

About Author

Brian Perez Joseph

With a background in biomedical research and data science, Brian aims to utilize his quantitative background in the sciences and data programming skills to provide data-driven decision making strategies and key insights for real-world business problems.
View all posts by Brian Perez Joseph >

Leave a Comment

Google January 28, 2022
Google Please pay a visit to the sites we stick to, including this 1, as it represents our picks from the web.

View Posts by Categories

All Posts 2399 posts
AI 7 posts
AI Agent 2 posts
AI-based hotel recommendation 1 posts
AIForGood 1 posts
Alumni 60 posts
Animated Maps 1 posts
APIs 41 posts
Artificial Intelligence 2 posts
Artificial Intelligence 2 posts
AWS 13 posts
Banking 1 posts
Big Data 50 posts
Branch Analysis 1 posts
Capstone 206 posts
Career Education 7 posts
CLIP 1 posts
Community 72 posts
Congestion Zone 1 posts
Content Recommendation 1 posts
Cosine SImilarity 1 posts
Data Analysis 5 posts
Data Engineering 1 posts
Data Engineering 3 posts
Data Science 7 posts
Data Science News and Sharing 73 posts
Data Visualization 324 posts
Events 5 posts
Featured 37 posts
Function calling 1 posts
FutureTech 1 posts
Generative AI 5 posts
Hadoop 13 posts
Image Classification 1 posts
Innovation 2 posts
Kmeans Cluster 1 posts
LLM 6 posts
Machine Learning 364 posts
Marketing 1 posts
Meetup 144 posts
MLOPs 1 posts
Model Deployment 1 posts
Nagamas69 1 posts
NLP 1 posts
OpenAI 5 posts
OpenNYC Data 1 posts
pySpark 1 posts
Python 16 posts
Python 458 posts
Python data analysis 4 posts
Python Shiny 2 posts
R 404 posts
R Data Analysis 1 posts
R Shiny 560 posts
R Visualization 445 posts
RAG 1 posts
RoBERTa 1 posts
semantic rearch 2 posts
Spark 17 posts
SQL 1 posts
Streamlit 2 posts
Student Works 1687 posts
Tableau 12 posts
TensorFlow 3 posts
Traffic 1 posts
User Preference Modeling 1 posts
Vector database 2 posts
Web Scraping 483 posts
wukong138 1 posts

Our Recent Popular Posts

AI 4 AI: ChatGPT Unifies My Blog Posts
by Vinod Chugani
Dec 18, 2022
Meet Your Machine Learning Mentors: Kyle Gallatin
by Vivian Zhang
Nov 4, 2020
NICU Admissions and CCHD: Predicting Based on Data Analysis
by Paul Lee, Aron Berke, Bee Kim, Bettina Meier and Ira Villar
Jan 7, 2020

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day ChatGPT citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay football gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income industry Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI

NYC Data Science Academy

NYC Data Science Academy teaches data science, trains companies and their employees to better profit from data, excels at big data project consulting, and connects trained Data Scientists to our industry.

NYC Data Science Academy is licensed by New York State Education Department.

Get detailed curriculum information about our
amazing bootcamp!

Please enter a valid email address
Sign up completed. Thank you!

Offerings

  • HOME
  • DATA SCIENCE BOOTCAMP
  • ONLINE DATA SCIENCE BOOTCAMP
  • Professional Development Courses
  • CORPORATE OFFERINGS
  • HIRING PARTNERS
  • About

  • About Us
  • Alumni
  • Blog
  • FAQ
  • Contact Us
  • Refund Policy
  • Join Us
  • SOCIAL MEDIA

    ยฉ 2025 NYC Data Science Academy
    All rights reserved. | Site Map
    Privacy Policy | Terms of Service
    Bootcamp Application