LendingClub ROI Improvement Using Machine Learning
The skills the author demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Executive Summary
The central aim of the current project is to improve the return on investment in LendingClub loans. A major factor that affects the profitability of a loan is whether a borrower fails to pay the borrowed sum and the loan defaults, which often results in lenders losing out on their investment.
Therefore, a potential strategy to better improve the overall profits from loans would be to predict loan outcomes before the commencement of a loan. To that end, a classification model was built using LendingClub's open access dataset to predict whether a given a loan would default based on the borrower's financial history.
The classification model predicted the loan status of LendingClub's highest-grade loans with a training accuracy of 75% and generated approximately an average 3% increase in the percent return on investment. Overall, the model may prove beneficial usage in down-stream decision-making when trying to decide which loans to are worth investing.
Investiture in Loans
In the world of finance, loans are sums of money lent to borrowers with the expectation that the initial amount borrowed be paid back with interest. The interest rate is how lenders are able to generate a potential profit from the loan and is determined by the borrower's credit and financial history; the more unlikely that a borrower is able to pay off a loan based on his history, the higher the interest rate on the loan, and in certain instances, the borrower may be denied the loan altogether.
In order to ensure that the loan is paid off in a timely manner, amortization schedules are set up that outline the timeframe of all the individual payments towards the loan. Should borrowers deviate from the set schedule of payments and make late payments, the loan will go into delinquency, and the borrower will likely face additional financial penalties on top of the loan payments.
Persistent delinquencies will often result in the borrower failing to meet the conditions of his loan, causing the loan to go into default. Once a loan has defaulted, the borrower has been deemed unable to pay off the remaining balance on the loan, resulting in major credit damage towards the borrower and prompting the lender to pursue alternative legal methods towards collecting a portion of the remaining balance.
Generally, loans are obtained and overseen through banking intermediaries and other large financial institutions; however, companies such as LendingClub provide an alternative way of acquiring loans through person-to-person lending.
Overview of LendingClub
In person-to-person (P2P) lending, lenders/investors and borrowers are able to directly connect with one another without the presence of an intermediary. These direct transactions typically carried out on large online platforms hosted by companies such as LendingClub, where the company charges a fee for the use of its services.
The advantage of this direct method of loan acquisition is the absence of an intermediary means that the investor is able to collect more from the interest rate and gain larger profits while the borrower is able to secure loans at lower interest rates compared to those from a bank. LendingClub was one of the first companies to find major financial success with the model during the early 2010s.
Figure 1 shows the total number of loans issued on LendingClub's platform from 2007 to 2018. The graph highlights the massive popularity in LendingClub's platform, seeing an explosion of over half a million issued loans after the year 2012 and exchanging a total of nearly fifteen billion dollars in loans through the platform. Given the company's success, analyzing the company's current model and performance can yield insights in how to better improve outcomes for investors.
LendingClub's Loan Grade System - Pt. 1
A key feature of LendingClub's business model is that investors and borrowers connect directly with another. A borrower seeking a loan will go through LendingClub's application where the borrower's financial background will be evaluated along with the loan amount to determine if the applicant has sufficient creditworthiness for their application to be posted on the platform.
Once approved, the borrower's loan will receive a custom grade by LendingClub reflecting the quality of the loan; the loan grade system ranges from A to G with A being the highest score and G being the lowest. The grade is based on the borrower's credit history and reflects how likely the he is able to pay off the loan. Each loan grade has its own range of interest rates that increase as the loan grade decreases. The loan grade along with the applicant's financial history is then available for viewing by potential investors who make the final decision on whether or not to invest in a particular loan.
LendingClub's Loan Grade System - Pt. 2
LendingClub profits off the fees the company charges for usage of the platform; however, the profits made from the loan go directly to the investors, making the P2P lending more financially worthwhile for the investor than if they were to go through a bank. Moreover, the interest rates assigned by LendingClub tend to be comparatively lower than those of traditional banks, allowing borrowers to secure loans at cheaper interest rates.
Figure 2 shows the number of loans associated with each loan grade. From the plot the most predominant loans present in LendingClub are B and C-grade loans. A majority of the loans approved by LendingClub tend to be high-grade/safe loans while the more underperforming loans comprise less than ten percent of the total number of loans.
Evaluation of the Return on Investment in LendingClub Loans - Pt. 1
To better understand the profitability different types of loans, the return on investment was examined across loan grades and among defaulted and paid-off loans.
Figure 3 shows a series of boxplots of the average percent return on investment (ROI) for each loan grade. The percent ROI was calculated by finding the difference between the total loan payment and initial loan amount, dividing by the initial amount, and converting the value to a percentage. The boxplots show that as the distribution of the ROI widens as the loan grade worsens.
The boxplots illustrate the high risk/high return aspect of loan investments: the safer, high-grade loans have lower profitability due to low interest rates but are more likely to result in a net positive return while the riskier low-grade loans have much higher returns due to high interest rates but are also likely to result in large losses due to their increased likelihood to default.
Evaluation of the Return on Investment in LendingClub Loans - Pt. 2
Figure 4 shows the distribution of ROI across fully paid and charged-off loans. A charged-off loan is a loan that has defaulted for such an extended duration of time that the lender has declared the investment as a loss and no longer purses collecting the outstanding balance. Based on the distributions, defaulted loans have a large tendency to result in a negative ROI and subsequent loss of investment for the investor.
Fully paid loans, however, are more likely to result in a positive ROI with a few exceptions. Given the effect that loan defaults may have on an investor's ROI, better predicting loan outcomes can help improve and protect loan investments. Therefore, constructing a classification model to predict loan defaults could have potential as an additional verification of flagging loans that have high risk of defaulting.
Implementation of the Machine Learning Model
The central aim in constructing the model was to be able to make accurate predictions on loan status prior to the commencement of the loan and improve the return on investment. Therefore, the data the model was trained on was restricted to the financial information and credit history of the loan applicant. Due to time constraints and limited resources, the model was constructed on the highest-grade loans (A & B) in LendingClub.
Furthermore, no loans after the year 2015 were included in the model because many of these were still ongoing. After a series of tests, the final model that was developed was an XGBoost model with a training accuracy of 76% and a testing accuracy of 68%.
Figure 5 shows the ROC curve for the XGBoost model. The area under the ROC curve was 0.75, indicating that the model has on overall moderate classification performance. However, for the purposes of better detecting loan defaults, this performance was deemed sufficient.
The model was then evaluated to determine whether it could produce an improvement on ROI. This was done by having the model predict the loan status on all high-grade loans within the dataset, randomly selecting one hundred loans that the model predicted to not default, randomly selecting another set of hundred loans regardless of status, computing the average percent ROI for each group, and repeating this process.
After two thousand iterations, there was an average three percent increase in the ROI for the model's predicted loans when compared to the randomly selected loans.
Selection Strategy and Future Work on LendingClub
Overall, the model was able to improve the percent ROI amongst loans. A potential strategy utilizing this model would involve using the model to make predictions based on the applicants' financial history. Rather than relying directly on the model prediction for the loan outcome, the predicted probability of defaulting could be used an indicator of assessing the potential risk associated with loan.
A future expansion of this project would involve constructing models for the mid-grade and low-grade loans and repeating the process of analysis with those models. Afterwards, the models would be ensembled to generate a series of model designed to predict loan outcomes across loan grades and be incorporated into a broader strategy for default predictions.