Machine Learning Strategy to Improve ROI in Lending Club's High-Grade Loans

Brian Perez Joseph
Posted on Mar 9, 2021

Executive Summary

The central aim of the current project is to improve the return on investment in Lending Club loans. A major factor that affects the profitability of a loan is whether a borrower fails to pay the borrowed sum and the loan defaults, which often results in lenders losing out on their investment. Therefore, a potential strategy to better improve the overall profits from loans would be to predict loan outcomes before the commencement of a loan. To that end, a classification model was built using Lending Club's open access dataset to predict whether a given a loan would default based on the borrower's financial history. The classification model predicted the loan status of Lending Club's highest grade loans with a training accuracy of 75% and generated approximately an average 3% increase in the percent return on investment. Overall, the model may prove beneficial usage in down-stream decision-making when trying to decide which loans to are worth investing.

Investiture in Loans

In the world of finance, loans are sums of money lent to borrowers with the expectation that the initial amount borrowed be paid back with interest. The interest rate is how lenders are able to generate a potential profit from the loan and is determined by the borrower's credit and financial history; the more unlikely that a borrower is able to pay off a loan based on his history, the higher the interest rate on the loan, and in certain instances, the borrower may be denied the loan altogether. In order to ensure that the loan is paid off in a timely manner, amortization schedules are set up that outline the timeframe of all the individual payments towards the loan. Should borrowers deviate from the set schedule of payments and make late payments, the loan will go into delinquency, and the borrower will likely face additional financial penalties on top of the loan payments. Persistent delinquencies will often result in the borrower failing to meet the conditions of his loan, causing the loan to go into default. Once a loan has defaulted, the borrower has been deemed unable to pay off the remaining balance on the loan, resulting in major credit damage towards the borrower and prompting the lender to pursue alternative legal methods towards collecting a portion of the remaining balance. Generally, loans are obtained and overseen through banking intermediaries and other large financial institutions; however, companies such as Lending Club provide an alternative way of acquiring loans through person-to-person lending.

Overview of Lending Club

In person-to-person (P2P) lending, lenders/investors and borrowers are able to directly connect with one another without the presence of an intermediary. These direct transactions typically carried out on large online platforms hosted by companies such as Lending Club, where the company charges a fee for the use of its services. The advantage of this direct method of loan acquisition is the absence of an intermediary means that the investor is able to collect more from the interest rate and gain larger profits while the borrower is able to secure loans at lower interest rates compared to those from a bank. Lending Club was one of the first companies to find major financial success with the model during the early 2010s.

Figure 1: Line plot of the yearly total of issued loans by Lending Club from 2007 to 2018.

Figure 1 shows the total number of loans issued on Lending Club's platform from 2007 to 2018. The graph highlights the massive popularity in Lending Club's platform, seeing an explosion of over half a million issued loans after the year 2012 and exchanging a total of nearly fifteen billion dollars in loans through the platform. Given the company's success, analyzing the companies current model and performance can yield insights in how to better improve outcomes for investors.

Lending Club's Loan Grade System

A key feature of Lending Club's business model is that investors and borrowers connect directly with another. A borrower seeking a loan will go through Lending Club's application where the borrower's financial background will be evaluated along with the loan amount to determine if the applicant has sufficient creditworthiness for their application to be posted on the platform. Once approved, the borrower's loan will receive a custom grade by lending club reflecting the quality of the loan; the loan grade system ranges from A to G with A being the highest score and G being the lowest. The grade is based on the borrower's credit history and reflects how likely the he is able to pay off the loan. Each loan grade has its own range of interest rates that increase as the loan grade decreases. The loan grade along with the applicant's financial history is then available for viewing by potential investors who make the final decision on whether or not to invest in a particular loan. Lending Club profits off the fees the company charges for usage of the platform; however, the profits made from the loan go directly to the investors, making the P2P lending more financially worthwhile for the investor than if they were to go through a bank. Moreover, the interest rates assigned by Lending Club tend to be comparatively lower than than those of traditional banks, allowing borrowers to secure loans at cheaper interest rates.

Figure 2: Bar plot of the total number of loans within each grade

Figure 2 shows the number of loans associated with each loan grade. From the plot the most predominant loans present in Lending Club are B and C-grade loans. A majority of the loans approved by Lending Club tend to be high-grade/safe loans while the more underperforming loans comprise less than ten percent of the total number of loans.

Evaluation of the Return on Investment in Lending Club Loans

To better understand the profitability different types of loans, the return on investment was examined across loan grades and among defaulted and paid-off loans.

Figure 3: Boxplots of the average return on investments across loan grades

Figure 3 shows a series of boxplots of the average percent return on investment (ROI) for each loan grade. The percent ROI was calculated by finding the difference between the total loan payment and initial loan amount, dividing by the initial amount, and converting the value to a percentage. The boxplots show that as the distribution of the ROI widens as the loan grade worsens. The boxplots illustrate the high risk/high return aspect of loan investments: the safer, high-grade loans have lower profitability due to low interest rates but are more likely to result in a net positive return while the riskier low-grade loans have much higher returns due to high interest rates but are also likely to result in large losses due to their increased likelihood to default.

Figure 4: Boxplots of the distribution of ROI among fully paid and charged off loans

Figure 4 shows the distribution of ROI across fully paid and charged-off loans. A charged-off loan is a loan that has defaulted for such an extended duration of time that the lender has declared the investment as a loss and no longer purses collecting the outstanding balance. Based on the distributions, defaulted loans have a large tendency to result in a negative ROI and subsequent loss of investment for the investor. Fully paid loans, however, are more likely to result in a positive ROI with a few exceptions. Given the effect that loan defaults may have on an investor's ROI, better predicting loan outcomes can help improve and protect loan investments. Therefore, constructing a classification model to predict loan defaults could have potential as an additional verification of flagging loans that have high risk of defaulting.

Implementation of the Machine Learning Model

The central aim in constructing the model was to be able to make accurate predictions on loan status prior to the commencement of the loan and improve the return on investment. Therefore, the data the model was trained on was restricted to the financial information and credit history of the loan applicant. Due to time constraints and limited resources, the model was constructed on the highest grade loans (A & B) in Lending Club. Furthermore, no loans after the year 2015 were included in the model because many of these were still ongoing. After a series of tests, the final model that was developed was an XGBoost model with a training accuracy of 76% and a testing accuracy of 68%.

Figure 5: Receiver-Operator Curve of the XGBoost model

Figure 5 shows the ROC curve for the XGBoost model. The area under the ROC curve was 0.75, indicating that the model has on overall moderate classification performance. However, for the purposes of better detecting loan defaults, this performance was deemed sufficient.

The model was then evaluated to determine whether it could produce an improvement on ROI. This was done by having the model predict the loan status on all high-grade loans within the dataset, randomly selecting one hundred loans that the model predicted to not default, randomly selecting another set of hundred loans regardless of status, computing the average percent ROI for each group, and repeating this process. After two thousand iterations, there was an average three percent increase in the ROI for the model's predicted loans when compared to the randomly selected loans.

Selection Strategy and Future Steps

Overall, the model was able to improve the percent ROI amongst loans. A potential strategy utilizing this model would involve using the model to make predictions based on the applicants financial history. Rather than relying directly on the model prediction for the loan outcome, the predicted probability of defaulting could be used an indicator of assessing the potential risk associated with loan. A future expansion of this project would involve constructing models for the mid-grade and low-grade loans and repeating the process of analysis with those models. Afterwards, the models would be ensembled to generate a series of model designed to predict loan outcomes across loan grades and be incorporated into a broader strategy for default predictions.  

About Author

Brian Perez Joseph

Brian Perez Joseph

With a background in biomedical research and data science, Brian aims to utilize his quantitative background in the sciences and data programming skills to provide data-driven decision making strategies and key insights for real-world business problems.
View all posts by Brian Perez Joseph >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp