Lending Club investment simulator

Jean-Francois Darre

Posted on Nov 17, 2015

Contributed by Jean-Francois Darre. Jean took NYC Data Science Academy 12 week full time Data Science Bootcamp program between Sept 23 to Dec 18, 2015. The post was based on his second class project(due at 4th week of the program).

Please see the app here! You can also find the code of the app here.

Introduction:

Lending Club (LC) is a peer to peer online lending platform. It is the world’s largest marketplace connecting borrowers and investors, where consumers and small business owners lower the cost of their credit and enjoy a better experience than traditional bank lending, and investors earn attractive risk-adjusted returns.

How it works:

Customers interested in a loan complete a simple application at LendingClub.com
LC leverage online data and technology to quickly assess risk, determine a credit rating and assign appropriate interest rates.
Qualified applicants receive offers in just minutes and can evaluate loan options with no impact to their credit score
Investors ranging from individuals to institutions select loans in which to invest and can earn monthly returns
The entire process is online, using technology to lower the cost of credit and pass the savings back in the form of lower rates for borrowers and solid returns for investors.

Here is the link to more details about Lending Club.

The app:

We build 2 mains tools to explore and run simulations on the data provided quarterly by Lending Club.

The first analysis:

For our first project, we already did some analysis on this data.

You can find the blog post here and the full R publication here!

Data exploration and visualization:

The first tool's main focus is to allow users to explore the data both visually and with data frames summaries. For the visuals, we decided to use bubble graphs as they have the advantages of enabling the visualisation of 4 dimensions in a user friendly and accessible way:

One discrete dimensions, or groups, represented by each bubble and three continuous variables, for the abscissa, the ordinate and the size of the bubbles.

The groups available are:

LC_Grade,               the LC grades range from A to G
Home_Ownership,         the home ownership status of the applicant: owner, rent, mortgage or other
Purpose,                the purpose of the loan: education, small business, debt or purchase
Delinquencies_bucket,   the number of delinquencies of the applicant
Inquiries_bucket,       the number of inquiries the applicant has made in the past 6 months
Public_Record_bucket,   the number of public records
Annual_Income_qbucket,  the annual income has been bucketed in 10% quantiles
DTI_qbucket,            the DTI is the Debt To Income ratio also bucketed in 10% quantiles
Revol_Util_qbucket,     the utilization percentage of the revolving balance available to the applicant
Revol_Bal_qbucket,      the size of the revolving balance
Total_Accounts_qbucket, the total number of accounts the applicant has ever opened
Open_Accounts_qbucket,  the number of accounts still active
Credit_Age_qbucket,     how long is the credit history of the applicant

The continuous variables available are:

LC_score,               we converted the sub-grades that range from A1, A2, to G5 to numbers
FICO_score,             the usual FICO score ranging from 660 to 800+
Defaults,               the default percentage of the selected group
Avg_Loan_Amount,        the average loan amount
Loan_Amount_in_mm,      the total loan amount
Term,                   the average term
Interest,               the average interest rate
Employment_Length,      the average employment length
Annual_Income,          the average annual income
DTI,                    the average debt to income ratio
Delinquency_2yrs,       the number of delinquencies in the past 2 years
Credit_Age,             the average age to the credit history
Inquieries_6mths,       the average number of inquiries made in the 6 month before the application
Number_of_Accounts,     the average number of accounts
Public_Records,         the average number of public records
Revolving_Balance,      the average size of the revolving balance
Revolving_Utilized,     the average utilization of the revolving balance

Finally the user can generate these graphs on different sub-groups of Lending Club's data:

ALL,                    the entire data set
Matured Only,           only on loans that have matured of would have matured in case of default
Survived,               all loans that have survived
Defaulted,              all loans that have defaulted
Current,                all the current loans

Using these options the user can create this type of visualisation:

For the user's convenience we added 2 visualizations:

One visualisation summarizing the total amount of loans issued for the group selected by the user to put things in perspective especially if the groups differ greatly:

And an other visualization showing the number of loans issued in each group selected by the user. This is particularly useful to detect if there is a bias of size, i.e. if loan in a given group tend to be bigger. For example, here we can see that higher the income, the bigger the loans are: indeed up to a factor of 3 between the lowest quantile and the top quantile:

Finally, we added a summary in the form of a data frame to have access to the basic statistics of each group:

Investment simulation:

The second tool we built is the investment simulator. It allows the user to run investment scenarios based on LC's historical data. The user can tune and filter the loans in order to select a subset of loans with specific properties that the user suspect, thanks to his previous analysis using the first tool, will outperform Lending Club's rating system and hence improve the performance of their portfolio.

The user can filter/select the loans using filters on:

The loans' details (size, LC rating, FICO score, interest rate, term and purpose)
The borrower's personal information (number of inquiries in past 6 months prior to the application, annual income, DTI, employment length and home ownership status)
The borrower's financial details (number of delinquencies in the past 2 years at the time of the application, number of public records, the age of the credit history, revolving balance, the utilization of the revolving balance and the number of accounts)

Screen Shot 2015-11-17 at 9.50.30 AM

The user can also set some parameters for the investments:

Amount to be invested
Maximum amount per loan
Start date of the simulation
Proportion of the revenues (interests + principal repayments) that should be re-invested to purchase more loans
The interest rates the user thinks he could get on his cash

Additional features:

The user can also set the seed of the randomization to ensure he is testing his assumptions fairly.
The user can save the settings of his current investment strategy and will be able to load them again in the future
If satisfied with a strategy, the user can also submit his strategy which will be published on a public leaderboard

After running a simulation, the user has access to the following outputs:

Visualization of the investment over time:

Statistics on the portfolio:

This table contains statistics one almost all feature of the loans and borrowers in the portfolio. This table can be transposed to see more details by checking the box 'transpose summary' just above the 'invest!' button:

Regular/compact summary:

Transposed summary:

Finally the user has full access to his portfolio and can choose to display all the loans or filter them in anyway he/she may want:

Our results:

We manage to improve the returns from an average of ~6% with no selection to ~12% with our best strategy! This is a huge increase and well over-performs the market standard in fixed income.

These are example of funds boasting their superior investment performances:

As we see the performance achievable on Lending Club's platform exceeds these industry averages on portfolios of great diversity. Indeed one of our ~12% performing strategy ended with 584 loans!

For our next steps we wish to implement machine learning algorithms to build strategies automatically but with taking great precautions in avoiding overfit to our data.

Additional details:

Here is the hall of fame:

The code is included in the app for the curious mind:

You can also find the code of the app here.

The code for the investment simulation is split over 2 functions:

One function keeps track and build the portfolio and the second function is used on every cycle to simulate the purchasing of loan with the available cash:

invest = function(my_data, to_invest, t, re_invest, max_amount, cash_rate, seed) {
  # just transforming date format, adding 2 columns to our data to keep track of
  # the investments and create a range to iterate upon
  t = as.numeric(format(t, "%Y%m"))
  my_data = mutate(my_data, invested = 0, pymnt = 0)
  range = sort(unique(pmax(range,t)))


  # initializing portfolio which will store the loans we buy
  # initializing the summary of our investments, it will help us keep track of 
  # our cash available on each period potential purchases
  portfolio = c()
  summary = c()
  summary$time = t
  summary$invested = 0
  summary$received = 0
  summary$Reinvested = 0
  summary$Cash = 0
  summary$Principal = 0
  summary = as.data.frame(summary)
  i = 0


  # This is just filling the loading bar in the top of the screen when running the function
  withProgress(message = 'Running simulation...', min = 0, max = length(range) + 10, {

    # main loop!
    for (t in range) {
      i = i + 1
      incProgress(1, detail = paste0("Period: ", i) )
      summary_temp = c()
      summary_temp[1] = i

      # updating the portfolio and summary
      if (!is.null(portfolio)) { 
        # payments are: if this the last payment and the loan is fully paid then you get your principal 
        # (you have to consider prepayments, i.e. people paying back the loan before the scheduled end)
        # if not you get the last payment and the recoveries collected and the loan is over
        # if before the last payment then you get the portion of the installment that is owed to you
        # for the principal, we just adjust it by the portion of the installment - the interest payment
        portfolio = mutate(
                      portfolio, 
                      pymnt =  ifelse(
                                 last_pymnt_ym == t, 
                                 ifelse(loan_status_new == "Fully Paid",
                                   prncpl + invested / loan_amnt * recoveries,
                                   invested / loan_amnt * (last_pymnt_amnt + recoveries)),
                                 ifelse(last_pymnt_ym  > t, invested / loan_amnt * installment, 0)),
                      prncpl = ifelse(last_pymnt_ym  > t, prncpl - 
                                 (installment*invested/loan_amnt - prncpl*(rate/1200)), 0)
                    )

        # to_invest is incremented by the amount collected this month (keeping in mind people
        # might not want to 're_invest' everything. We also update the summary
        to_invest = to_invest + re_invest/100 * sum(portfolio$pymnt)
        summary_temp[2] = sum(portfolio$invested)
        summary_temp[3] = sum(portfolio$pymnt)
        summary_temp[4] = to_invest #(re_invest/100) * sum(portfolio$pymnt) + 
                            tail(summary$Reinvested, 1) * (1 + cash_rate / 100)^(1/12)
        summary_temp[5] = (1 - re_invest/100) * sum(portfolio$pymnt) + 
                            tail(summary$Cash,1) * (1 + cash_rate / 100)^(1/12)
        summary_temp[6] = sum(portfolio$prncpl)
        } else { 
        # first loop here, just initializing summary_temp
        summary_temp[2] = to_invest
        summary_temp[3] = 0
        summary_temp[4] = 0
        summary_temp[5] = 0
        summary_temp[6] = 0
      }

      
      # update the summary with this month's summary
      summary = rbind(summary, summary_temp)


      # we filter the data to only the loans available this month, if none are available we move on
      data = filter(my_data, issue_ym == t)
      if (nrow(data) == 0) { next }


      # now we buy some additional new loan with our to_invest money
      purchase = buy(data, to_invest, max_amount, seed)


      # after buying we update our to_invest and portfolio
      to_invest = purchase$to_invest_next
      portfolio = rbind(portfolio, purchase$purchased)
    }
  })


  # that's it, now we post the results!
  summary = summary[2:nrow(summary),]
  result = list(portfolio = portfolio,
                portfolio_short = portfolio[,c("id","issue_d","loan_amnt",
                                               "term","rate","grade","Purpose",
                                               "invested","loan_status_new")],
                summary = summary)
  return(result)
}

Here is the buy function that we called in our invest function:

# function made to process the purchase of loans for each period
buy = function(data, to_invest, max_amount, seed) {
  # intializing variable to keep track of the loans we purchased and this 
  # is where the 'seed' from the UI is used
  n = nrow(data)
  purchased_loans = c()
  set.seed(seed)

  # we just go thru the filtered data, pick a loan at random, buy it if we 
  # have enough money otherwise leave. If we buy reduce to_invest.
  # If we are out of loans, move on too... et voila!
  for (i in 1:n) {
    if (to_invest < max_amount) { break }
    select = ceiling(runif(1)*nrow(data))
    temp = data[select,]
    invest = pmin(max_amount, temp$loan_amnt)
    temp$invested = invest
    temp$prncpl = invest
    to_invest = to_invest - invest
    purchased_loans = rbind(purchased_loans, temp)
    data = data[-select,]
    if (nrow(data) == 0) { break }
  }

Please see the app here! You can also find the code of the app here.

About Author

Jean-Francois Darre

Jean-Francois holds a MSc in Statistics from Stanford University and MSc in Applied Math with a minor in Physics from École des Mines de Nancy in France.

View all posts by Jean-Francois Darre >

Capstone

Catching Fraud in the Healthcare System

Data Analysis

Car Sales Report R Shiny App

Data Analysis

Injury Analysis of Soccer Players with Python

Capstone

Acquisition Due Dilligence Automation for Smaller Firms

R Shiny

Forecasting NY State Tax Credits: R Shiny App for Businesses

Cancel reply

You must be logged in to post a comment.

falso cartier anello con 5 diamanti January 7, 2017

cartierlovejesduas You can save cost upto 12% by claiming refunds for the late deliveries by UPS or FedEx. Even 60 seconds late is eligible for claiming the refund of the full shipping charges. Try Lateshipment.com falso cartier anello con 5 diamanti http://www.mercibassocosto.net/

https://www.youtube.com/watch?v=MjDC_vxA0Kk&feature=youtu.be March 6, 2016

What's up Dear, are you in fact visiting this site regularly, if so after that you will definitely get fastidious know-how.

Lending Club investment simulator

Introduction:

How it works:

The app:

The first analysis:

Data exploration and visualization:

Investment simulation:

Our results:

Additional details:

One function keeps track and build the portfolio and the second function is used on every cycle to simulate the purchasing of loan with the available cash:

Here is the buy function that we called in our invest function:

About Author

Jean-Francois Darre

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Lending Club investment simulator

Introduction:

How it works:

The app:

The first analysis:

Data exploration and visualization:

Investment simulation:

Our results:

Additional details:

One function keeps track and build the portfolio and the second function is used on every cycle to simulate the purchasing of loan with the available cash:

Here is the buy function that we called in our invest function:

About Author

Jean-Francois Darre

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!