Lending Club investment simulator

Jean-Francois Darre
Posted on Nov 17, 2015

Contributed by Jean-Francois Darre. Jean took NYC Data Science Academy 12 week full time Data Science Bootcamp program between Sept 23 to Dec 18, 2015. The post was based on his second class project(due at 4th week of the program).

Please see the app here! You can also find the code of the app here.


Lending Club (LC) is a peer to peer online lending platform. It is the world’s largest marketplace connecting borrowers and investors, where consumers and small business owners lower the cost of their credit and enjoy a better experience than traditional bank lending, and investors earn attractive risk-adjusted returns.

How it works:

  1. Customers interested in a loan complete a simple application at LendingClub.com
  2. LC leverage online data and technology to quickly assess risk, determine a credit rating and assign appropriate interest rates.
  3. Qualified applicants receive offers in just minutes and can evaluate loan options with no impact to their credit score
  4. Investors ranging from individuals to institutions select loans in which to invest and can earn monthly returns
  5. The entire process is online, using technology to lower the cost of credit and pass the savings back in the form of lower rates for borrowers and solid returns for investors.

Here is the link to more details about Lending Club.

The app:

We build 2 mains tools to explore and run simulations on the data provided quarterly by Lending Club.

The first analysis:

For our first project, we already did some analysis on this data.

You can find the blog post here and the full R publication here!

Data exploration and visualization:

The first tool's main focus is to allow users to explore the data both visually and with data frames summaries. For the visuals, we decided to use bubble graphs as they have the advantages of enabling the visualisation of 4 dimensions in a user friendly and accessible way:

One discrete dimensions, or groups, represented by each bubble and three continuous variables, for the abscissa, the ordinate and the size of the bubbles.

Screen Shot 2015-11-17 at 6.06.24 AM

The groups available are:

LC_Grade,               the LC grades range from A to G
Home_Ownership,         the home ownership status of the applicant: owner, rent, mortgage or other
Purpose,                the purpose of the loan: education, small business, debt or purchase
Delinquencies_bucket,   the number of delinquencies of the applicant
Inquiries_bucket,       the number of inquiries the applicant has made in the past 6 months
Public_Record_bucket,   the number of public records
Annual_Income_qbucket,  the annual income has been bucketed in 10% quantiles
DTI_qbucket,            the DTI is the Debt To Income ratio also bucketed in 10% quantiles
Revol_Util_qbucket,     the utilization percentage of the revolving balance available to the applicant
Revol_Bal_qbucket,      the size of the revolving balance
Total_Accounts_qbucket, the total number of accounts the applicant has ever opened
Open_Accounts_qbucket,  the number of accounts still active
Credit_Age_qbucket,     how long is the credit history of the applicant

The continuous variables available are:

LC_score,               we converted the sub-grades that range from A1, A2, to G5 to numbers
FICO_score,             the usual FICO score ranging from 660 to 800+
Defaults,               the default percentage of the selected group
Avg_Loan_Amount,        the average loan amount
Loan_Amount_in_mm,      the total loan amount
Term,                   the average term
Interest,               the average interest rate
Employment_Length,      the average employment length
Annual_Income,          the average annual income
DTI,                    the average debt to income ratio
Delinquency_2yrs,       the number of delinquencies in the past 2 years
Credit_Age,             the average age to the credit history
Inquieries_6mths,       the average number of inquiries made in the 6 month before the application
Number_of_Accounts,     the average number of accounts
Public_Records,         the average number of public records
Revolving_Balance,      the average size of the revolving balance
Revolving_Utilized,     the average utilization of the revolving balance

Finally the user can generate these graphs on different sub-groups of Lending Club's data:

ALL,                    the entire data set
Matured Only,           only on loans that have matured of would have matured in case of default
Survived,               all loans that have survived
Defaulted,              all loans that have defaulted
Current,                all the current loans

Using these options the user can create this type of visualisation:

Screen Shot 2015-11-17 at 7.03.32 AM

For the user's convenience we added 2 visualizations:

One visualisation summarizing the total amount of loans issued for the group selected by the user to put things in perspective especially if the groups differ greatly:

Screen Shot 2015-11-17 at 7.05.08 AM

And an other visualization showing the number of loans issued in each group selected by the user. This is particularly useful to detect if there is a bias of size, i.e. if loan in a given group tend to be bigger. For example, here we can see that higher the income, the bigger the loans are: indeed up to a factor of 3 between the lowest quantile and the top quantile:

Screen Shot 2015-11-17 at 7.11.31 AM

Finally, we added a summary in the form of a data frame to have access to the basic statistics of each group:

Screen Shot 2015-11-17 at 7.14.55 AM

Investment simulation:

The second tool we built is the investment simulator. It allows the user to run investment scenarios based on LC's historical data. The user can tune and filter the loans in order to select a subset of loans with specific properties that the user suspect, thanks to his previous analysis using the first tool,  will outperform Lending Club's rating system and hence improve the performance of their portfolio.

Screen Shot 2015-11-17 at 6.07.33 AM

The user can filter/select the loans using filters on:

  • The loans' details (size, LC rating, FICO score, interest rate, term and purpose)
  • The borrower's personal information (number of inquiries in past 6 months prior to the application, annual income, DTI, employment length and home ownership status)
  • The borrower's financial details (number of delinquencies in the past 2 years at the time of the application, number of public records, the age of the credit history,  revolving balance, the utilization of the revolving balance and the number of accounts)

Screen Shot 2015-11-17 at 9.50.09 AM Screen Shot 2015-11-17 at 9.50.19 AM Screen Shot 2015-11-17 at 9.50.30 AM

The user can also set some parameters for the investments:

  • Amount to be invested
  • Maximum amount per loan
  • Start date of the simulation
  • Proportion of the revenues (interests + principal repayments) that should be re-invested to purchase more loans
  • The interest rates the user thinks he could get on his cash

Screen Shot 2015-11-17 at 9.50.38 AM

Additional features:

  • The user can also set the seed of the randomization to ensure he is testing his assumptions fairly.
  • The user can save the settings of his current investment strategy and will be able to load them again in the future
  • If satisfied with a strategy, the user can also submit his strategy which will be published on a public leaderboard

Screen Shot 2015-11-17 at 9.50.55 AM Screen Shot 2015-11-17 at 9.50.46 AM

After running a simulation, the user has access to the following outputs:

Visualization of the investment over time:

Screen Shot 2015-11-17 at 10.02.06 AM

Statistics on the portfolio:

This table contains statistics one almost all feature of the loans and borrowers in the portfolio. This table can be transposed to see more details by checking the box 'transpose summary' just above the 'invest!' button:

Regular/compact summary:

Screen Shot 2015-11-17 at 10.02.17 AM

Transposed summary:

Screen Shot 2015-11-17 at 10.02.27 AM

Finally the user has full access to his portfolio and can choose to display all the loans or filter them in anyway he/she may want:

Screen Shot 2015-11-17 at 10.02.45 AM

Our results:

We manage to improve the returns from an average of ~6% with no selection to ~12% with our best strategy! This is a huge increase and well over-performs the market standard in fixed income.

These are example of funds boasting their superior investment performances:

Screen Shot 2015-11-17 at 10.13.33 AM

Screen Shot 2015-11-17 at 10.12.14 AM

As we see the performance achievable on Lending Club's platform exceeds these industry averages on portfolios of great diversity. Indeed one of our ~12% performing strategy ended with 584 loans!

For our next steps we wish to implement machine learning algorithms to build strategies automatically but with taking great precautions in avoiding overfit to our data.

Additional details:

Here is the hall of fame:

Screen Shot 2015-11-17 at 10.22.09 AM

The code is included in the app for the curious mind:

Screen Shot 2015-11-17 at 10.23.04 AM

You can also find the code of the app here.

The code for the investment simulation is split over 2 functions:

One function keeps track and build the portfolio and the second function is used on every cycle to simulate the purchasing of loan with the available cash:

invest = function(my_data, to_invest, t, re_invest, max_amount, cash_rate, seed) {
  # just transforming date format, adding 2 columns to our data to keep track of
  # the investments and create a range to iterate upon
  t = as.numeric(format(t, "%Y%m"))
  my_data = mutate(my_data, invested = 0, pymnt = 0)
  range = sort(unique(pmax(range,t)))

  # initializing portfolio which will store the loans we buy
  # initializing the summary of our investments, it will help us keep track of 
  # our cash available on each period potential purchases
  portfolio = c()
  summary = c()
  summary$time = t
  summary$invested = 0
  summary$received = 0
  summary$Reinvested = 0
  summary$Cash = 0
  summary$Principal = 0
  summary = as.data.frame(summary)
  i = 0

  # This is just filling the loading bar in the top of the screen when running the function
  withProgress(message = 'Running simulation...', min = 0, max = length(range) + 10, {

    # main loop!
    for (t in range) {
      i = i + 1
      incProgress(1, detail = paste0("Period: ", i) )
      summary_temp = c()
      summary_temp[1] = i

      # updating the portfolio and summary
      if (!is.null(portfolio)) { 
        # payments are: if this the last payment and the loan is fully paid then you get your principal 
        # (you have to consider prepayments, i.e. people paying back the loan before the scheduled end)
        # if not you get the last payment and the recoveries collected and the loan is over
        # if before the last payment then you get the portion of the installment that is owed to you
        # for the principal, we just adjust it by the portion of the installment - the interest payment
        portfolio = mutate(
                      pymnt =  ifelse(
                                 last_pymnt_ym == t, 
                                 ifelse(loan_status_new == "Fully Paid",
                                   prncpl + invested / loan_amnt * recoveries,
                                   invested / loan_amnt * (last_pymnt_amnt + recoveries)),
                                 ifelse(last_pymnt_ym  > t, invested / loan_amnt * installment, 0)),
                      prncpl = ifelse(last_pymnt_ym  > t, prncpl - 
                                 (installment*invested/loan_amnt - prncpl*(rate/1200)), 0)

        # to_invest is incremented by the amount collected this month (keeping in mind people
        # might not want to 're_invest' everything. We also update the summary
        to_invest = to_invest + re_invest/100 * sum(portfolio$pymnt)
        summary_temp[2] = sum(portfolio$invested)
        summary_temp[3] = sum(portfolio$pymnt)
        summary_temp[4] = to_invest #(re_invest/100) * sum(portfolio$pymnt) + 
                            tail(summary$Reinvested, 1) * (1 + cash_rate / 100)^(1/12)
        summary_temp[5] = (1 - re_invest/100) * sum(portfolio$pymnt) + 
                            tail(summary$Cash,1) * (1 + cash_rate / 100)^(1/12)
        summary_temp[6] = sum(portfolio$prncpl)
        } else { 
        # first loop here, just initializing summary_temp
        summary_temp[2] = to_invest
        summary_temp[3] = 0
        summary_temp[4] = 0
        summary_temp[5] = 0
        summary_temp[6] = 0

      # update the summary with this month's summary
      summary = rbind(summary, summary_temp)

      # we filter the data to only the loans available this month, if none are available we move on
      data = filter(my_data, issue_ym == t)
      if (nrow(data) == 0) { next }

      # now we buy some additional new loan with our to_invest money
      purchase = buy(data, to_invest, max_amount, seed)

      # after buying we update our to_invest and portfolio
      to_invest = purchase$to_invest_next
      portfolio = rbind(portfolio, purchase$purchased)

  # that's it, now we post the results!
  summary = summary[2:nrow(summary),]
  result = list(portfolio = portfolio,
                portfolio_short = portfolio[,c("id","issue_d","loan_amnt",
                summary = summary)

Here is the buy function that we called in our invest function:

# function made to process the purchase of loans for each period
buy = function(data, to_invest, max_amount, seed) {
  # intializing variable to keep track of the loans we purchased and this 
  # is where the 'seed' from the UI is used
  n = nrow(data)
  purchased_loans = c()

  # we just go thru the filtered data, pick a loan at random, buy it if we 
  # have enough money otherwise leave. If we buy reduce to_invest.
  # If we are out of loans, move on too... et voila!
  for (i in 1:n) {
    if (to_invest < max_amount) { break }
    select = ceiling(runif(1)*nrow(data))
    temp = data[select,]
    invest = pmin(max_amount, temp$loan_amnt)
    temp$invested = invest
    temp$prncpl = invest
    to_invest = to_invest - invest
    purchased_loans = rbind(purchased_loans, temp)
    data = data[-select,]
    if (nrow(data) == 0) { break }

Please see the app here! You can also find the code of the app here.

About Author

Jean-Francois Darre

Jean-Francois Darre

Jean-Francois holds a MSc in Statistics from Stanford University and MSc in Applied Math with a minor in Physics from École des Mines de Nancy in France.
View all posts by Jean-Francois Darre >

Related Articles

Leave a Comment

falso cartier anello con 5 diamanti January 7, 2017
cartierlovejesduas You can save cost upto 12% by claiming refunds for the late deliveries by UPS or FedEx. Even 60 seconds late is eligible for claiming the refund of the full shipping charges. Try Lateshipment.com falso cartier anello con 5 diamanti http://www.mercibassocosto.net/
https://www.youtube.com/watch?v=MjDC_vxA0Kk&feature=youtu.be March 6, 2016
What's up Dear, are you in fact visiting this site regularly, if so after that you will definitely get fastidious know-how.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp