EDA of Consumer Complaint data from the CFPB

Mark Schott
Posted on May 1, 2017

The Consumer Financial Protection Bureau (CFPB) was created as a result of the Dodd-Frank bill, a legislative response to the 2008 financial crisis. Unlike other government agencies, the CFPB is funded directly by the Federal Reserve and has its head-honcho, currently Richard Cordray, appointed by the POTUS. This has allowed the CFPB to carry out its duties as financial sector watchdog without going begging to Congress every fiscal year, which of course has made them controversial to some powerful people.

Their activities include things like levying fines, suing various companies, providing financial advice to the public, and handling complaints from consumers. They amass these individual complaints, do some quick validation, and pass them along to the corresponding company so that they have an opportunity to respond. The company is given 15 days to respond to the claims made in the complaint to be added to the data. Since 2011, roughly 750,000 complaints have been collected and made publicly available. The data set is updated often.

Despite the politics, as a data scientist I am in support of open data endeavors because it gives me something to explore. Thus I decided to pick this data set as the focal point of my RShiny project and brief analysis. This data consists of 18 variables such as: Timestamps, Location (Zip Code, State), Company, Company Response, Product, Issue, Submission Method, and others.

To help visualize this data set, I created an interactive RShiny application which can be viewed here. There you can find features such as a chloropeth (a fancy map), time series counts, weekday and month complaint frequencies, custom bar charts and mosaic plots, and a pretty useless word cloud (I still enjoyed making it and want to add some sentiment analysis to this).

Please explore this RShiny app and see if you can find some interesting or curious things.

As for my own analysis of the data set, I really only scratched the surface. First of all you can see the number of complaints over time colored by the submission type with web being an increasingly preferred medium. The CFPB is purportedly putting a lot of effort into improving their technology and perhaps this is a simple indication of its success.

complaints1

 

unnamed-chunk-5-1

Subsequently I wanted to see how the complaint frequency varied between the day of the week and what month it was to see if there were any trends there. To the eye it does seem that there are more complaints in the middle of the week and in the first quarter. I ran a Bartlett test on each observation, but the variances were deemed to be unequal. I have not pursued it further. Why would complaints spike during the first quarter of the year though if indeed it is a real trend?

 

Complaint Frequency by Day

Complaint Frequency by Month

Looking at the breakdown of the complaints by the Product Type, one can see that the majority of the complaints are related to Mortgages. In a distant second place is Debt Collection and close to that is Credit Reporting. Furthermore, the vast majority of complaints receive a timely response from the company. A timely response just means that the complaint was acknowledged and the company picks one of a few choices to characterize their stance such as agree, disagree, or needs further review. At the very least, they do seem to take these complaints seriously. Notice the disproportionate amount of untimely responses for the Debt Collection category. Also for now I'll point out a fairly small untimely response fraction for the Bank Account and Service category. More on that later.

 

All the complaints in the data broken down by Product Type

All the complaints in the data broken down by Product Type

 

I then turned my focus to complaints concerning Wells Fargo, specifically how events in the news corresponded with the complaint frequency. I simply marked the controversial events listed in Wikipedia for Wells Fargo. One of these includes the recent revelation in September 2016 that 2 million bank accounts were opened to boost numbers.

 

The red lines represent various controversies that afflicted Wells Fargo.

The vertical red lines represent various controversies that afflicted Wells Fargo. The second from the right corresponds to the revelation in September 2016 that 2 million fake bank accounts were opened to boost numbers. Each data point is the mean complaint count for the 7 day week.

 

Although there is some wildly interesting movement to this graph, to acquire any realistic correlation, I would need to carry out further statistical analysis and research into the types of complaints and the subject of the controversy for the time before and after the event. There does seem to be quite a large spike after the September 2016 revelation, which indicates that the news gave people motivation to complain (or perhaps the confidence to speak up). At any rate, I wondered what was the breakdown of complaints according to Products for Wells Fargo over the entire data and for this specific time period?

Notice the huge disproportion between timely and untimely responses for the Bank Account or Service product. 48% untimely to be exact.

 

In the complaint counts broken down by product for Wells Fargo over the entire data set  notice the larger proportion of untimely responses in the Bank Account or Service category.

 

A quick by the numbers to help elucidate the disparity:

2186/(82778+2186) ~ 2.5% of bank account complaints don't have a timely response

1495/(1495+11662) ~ 11.3% of bank account complaints of Wells Fargo's are untimely

11662/82778 ~ 14.1% of bank account complaints are for Wells Fargo's

50/750 ~ 7% of total complaints are for Wells Fargo (2nd among all companies behind Bank of America and ahead of Equifax)

662/(662+703) ~ 48% of bank account complaints for Wells Fargo in specific time region of interest were untimely

662/2186 ~ 30% of the entire untimely response for bank accounts comes from this window.

In summation, this clearly shows there was a very strong uptick in Bank Account or Service related complaints against Wells Fargo over this time period, and many of those complaints were not resolved with timely responses. This slow response was extremely abnormal for the entire data set and for Wells Fargo in general. And this uptick was immediately preceded by the headline news of Wells Fargo's illegal activity. Furthermore, there has been a near linear increase in complaints since the inception of the CFPB driven primarily by the increased number of web entered complaints.

As for this data set there is so much more to extracted from it. I encourage you to play around with my web app at mesnaround.shinyapps.io/consumer_complaints.

Thank you very much for reading!

About Author

Mark Schott

Mark Schott

Mark is originally from outside Detroit, MI. For college, he first attended the University of California at Santa Cruz before transferring to Wayne State University in Detroit where he graduated Cum Laude in General Physics. While an undergraduate,...
View all posts by Mark Schott >

Related Articles

Leave a Comment

Avatar
cartier pendant rose imitation June 17, 2017
не помогло, стандартный VGA ТАК И ОСТАЛСЯ, А ТО ЧТО НУЖНЫЙ ДРАЙВЕР ПОДБИРАЕТ ЭТО ДА. cartier pendant rose imitation http://www.amoregioielli.ru/en/replica-cartier-pink-gold-love-chain-double-rings-necklace-p724/

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Classes Demo Day Demo Lesson Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet Lectures linear regression Live Chat Live Online Bootcamp Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Lectures Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking Realtime Interaction recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp