LendingClub ROI Improvement Using Machine Learning

Posted on Mar 9, 2021
LendingClub High-Grade Loans ROI Improvement Using Machine Learning

The skills the author demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Executive Summary

The central aim of the current project is to improve the return on investment in LendingClub loans. A major factor that affects the profitability of a loan is whether a borrower fails to pay the borrowed sum and the loan defaults, which often results in lenders losing out on their investment.

Therefore, a potential strategy to better improve the overall profits from loans would be to predict loan outcomes before the commencement of a loan. To that end, a classification model was built using LendingClub's open access dataset to predict whether a given a loan would default based on the borrower's financial history.

The classification model predicted the loan status of LendingClub's highest-grade loans with a training accuracy of 75% and generated approximately an average 3% increase in the percent return on investment. Overall, the model may prove beneficial usage in down-stream decision-making when trying to decide which loans to are worth investing.

Investiture in Loans

In the world of finance, loans are sums of money lent to borrowers with the expectation that the initial amount borrowed be paid back with interest. The interest rate is how lenders are able to generate a potential profit from the loan and is determined by the borrower's credit and financial history; the more unlikely that a borrower is able to pay off a loan based on his history, the higher the interest rate on the loan, and in certain instances, the borrower may be denied the loan altogether.

In order to ensure that the loan is paid off in a timely manner, amortization schedules are set up that outline the timeframe of all the individual payments towards the loan. Should borrowers deviate from the set schedule of payments and make late payments, the loan will go into delinquency, and the borrower will likely face additional financial penalties on top of the loan payments.

Persistent delinquencies will often result in the borrower failing to meet the conditions of his loan, causing the loan to go into default. Once a loan has defaulted, the borrower has been deemed unable to pay off the remaining balance on the loan, resulting in major credit damage towards the borrower and prompting the lender to pursue alternative legal methods towards collecting a portion of the remaining balance.

Generally, loans are obtained and overseen through banking intermediaries and other large financial institutions; however, companies such as LendingClub provide an alternative way of acquiring loans through person-to-person lending.

Overview of LendingClub

In person-to-person (P2P) lending, lenders/investors and borrowers are able to directly connect with one another without the presence of an intermediary. These direct transactions typically carried out on large online platforms hosted by companies such as LendingClub, where the company charges a fee for the use of its services.

The advantage of this direct method of loan acquisition is the absence of an intermediary means that the investor is able to collect more from the interest rate and gain larger profits while the borrower is able to secure loans at lower interest rates compared to those from a bank. LendingClub was one of the first companies to find major financial success with the model during the early 2010s.

LendingClub High-Grade Loans ROI Improvement Using Machine Learning
Figure 1: Line plot of the yearly total of issued loans by LendingClub from 2007 to 2018.

Figure 1 shows the total number of loans issued on LendingClub's platform from 2007 to 2018. The graph highlights the massive popularity in LendingClub's platform, seeing an explosion of over half a million issued loans after the year 2012 and exchanging a total of nearly fifteen billion dollars in loans through the platform. Given the company's success, analyzing the company's current model and performance can yield insights in how to better improve outcomes for investors.

LendingClub's Loan Grade System - Pt. 1

A key feature of LendingClub's business model is that investors and borrowers connect directly with another. A borrower seeking a loan will go through LendingClub's application where the borrower's financial background will be evaluated along with the loan amount to determine if the applicant has sufficient creditworthiness for their application to be posted on the platform.

Once approved, the borrower's loan will receive a custom grade by LendingClub reflecting the quality of the loan; the loan grade system ranges from A to G with A being the highest score and G being the lowest. The grade is based on the borrower's credit history and reflects how likely the he is able to pay off the loan. Each loan grade has its own range of interest rates that increase as the loan grade decreases. The loan grade along with the applicant's financial history is then available for viewing by potential investors who make the final decision on whether or not to invest in a particular loan.

LendingClub's Loan Grade System - Pt. 2

LendingClub profits off the fees the company charges for usage of the platform; however, the profits made from the loan go directly to the investors, making the P2P lending more financially worthwhile for the investor than if they were to go through a bank. Moreover, the interest rates assigned by LendingClub tend to be comparatively lower than those of traditional banks, allowing borrowers to secure loans at cheaper interest rates.

LendingClub High-Grade Loans ROI Improvement Using Machine Learning
Figure 2: Bar plot of the total number of loans within each grade

Figure 2 shows the number of loans associated with each loan grade. From the plot the most predominant loans present in LendingClub are B and C-grade loans. A majority of the loans approved by LendingClub tend to be high-grade/safe loans while the more underperforming loans comprise less than ten percent of the total number of loans.

Evaluation of the Return on Investment in LendingClub Loans - Pt. 1

To better understand the profitability different types of loans, the return on investment was examined across loan grades and among defaulted and paid-off loans.

LendingClub High-Grade Loans ROI Improvement Using Machine Learning
Figure 3: Boxplots of the average return on investments across loan grades

Figure 3 shows a series of boxplots of the average percent return on investment (ROI) for each loan grade. The percent ROI was calculated by finding the difference between the total loan payment and initial loan amount, dividing by the initial amount, and converting the value to a percentage. The boxplots show that as the distribution of the ROI widens as the loan grade worsens.

The boxplots illustrate the high risk/high return aspect of loan investments: the safer, high-grade loans have lower profitability due to low interest rates but are more likely to result in a net positive return while the riskier low-grade loans have much higher returns due to high interest rates but are also likely to result in large losses due to their increased likelihood to default.

Evaluation of the Return on Investment in LendingClub Loans - Pt. 2

LendingClub High-Grade Loans ROI Improvement Using Machine Learning
Figure 4: Boxplots of the distribution of ROI among fully paid and charged off loans

Figure 4 shows the distribution of ROI across fully paid and charged-off loans. A charged-off loan is a loan that has defaulted for such an extended duration of time that the lender has declared the investment as a loss and no longer purses collecting the outstanding balance. Based on the distributions, defaulted loans have a large tendency to result in a negative ROI and subsequent loss of investment for the investor.

Fully paid loans, however, are more likely to result in a positive ROI with a few exceptions. Given the effect that loan defaults may have on an investor's ROI, better predicting loan outcomes can help improve and protect loan investments. Therefore, constructing a classification model to predict loan defaults could have potential as an additional verification of flagging loans that have high risk of defaulting.

Implementation of the Machine Learning Model

The central aim in constructing the model was to be able to make accurate predictions on loan status prior to the commencement of the loan and improve the return on investment. Therefore, the data the model was trained on was restricted to the financial information and credit history of the loan applicant. Due to time constraints and limited resources, the model was constructed on the highest-grade loans (A & B) in LendingClub.

Furthermore, no loans after the year 2015 were included in the model because many of these were still ongoing. After a series of tests, the final model that was developed was an XGBoost model with a training accuracy of 76% and a testing accuracy of 68%.

LendingClub High-Grade Loans ROI Improvement Using Machine Learning
Figure 5: Receiver-Operator Curve of the XGBoost model

Figure 5 shows the ROC curve for the XGBoost model. The area under the ROC curve was 0.75, indicating that the model has on overall moderate classification performance. However, for the purposes of better detecting loan defaults, this performance was deemed sufficient.

The model was then evaluated to determine whether it could produce an improvement on ROI. This was done by having the model predict the loan status on all high-grade loans within the dataset, randomly selecting one hundred loans that the model predicted to not default, randomly selecting another set of hundred loans regardless of status, computing the average percent ROI for each group, and repeating this process.

After two thousand iterations, there was an average three percent increase in the ROI for the model's predicted loans when compared to the randomly selected loans.

Selection Strategy and Future Work on LendingClub

Overall, the model was able to improve the percent ROI amongst loans. A potential strategy utilizing this model would involve using the model to make predictions based on the applicants' financial history. Rather than relying directly on the model prediction for the loan outcome, the predicted probability of defaulting could be used an indicator of assessing the potential risk associated with loan.

A future expansion of this project would involve constructing models for the mid-grade and low-grade loans and repeating the process of analysis with those models. Afterwards, the models would be ensembled to generate a series of model designed to predict loan outcomes across loan grades and be incorporated into a broader strategy for default predictions.  

 

About Author

Brian Perez Joseph

With a background in biomedical research and data science, Brian aims to utilize his quantitative background in the sciences and data programming skills to provide data-driven decision making strategies and key insights for real-world business problems.
View all posts by Brian Perez Joseph >

Leave a Comment

Google January 28, 2022
Google Please pay a visit to the sites we stick to, including this 1, as it represents our picks from the web.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI