Data Analysis of Consumer Complaint data from the CFPB

Posted on May 1, 2017
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


The Consumer Financial Protection Bureau (CFPB) was created as a result of the Dodd-Frank bill, a legislative response to the 2008 financial crisis. Unlike other government agencies, data shows the CFPB is funded directly by the Federal Reserve and has its head-honcho, currently Richard Cordray, appointed by the POTUS. This has allowed the CFPB to carry out its duties as financial sector watchdog without going begging to Congress every fiscal year, which of course has made them controversial to some powerful people.

Their activities include things like levying fines, suing various companies, providing financial advice to the public, and handling complaints from consumers. They amass these individual complaints, do some quick validation, and pass them along to the corresponding company so that they have an opportunity to respond. The company is given 15 days to respond to the claims made in the complaint to be added to the data. Since 2011, roughly 750,000 complaints have been collected and made publicly available. The data set is updated often.

Despite the politics, as a data scientist I am in support of open data endeavors because it gives me something to explore. Thus I decided to pick this data set as the focal point of my RShiny project and brief analysis. This data consists of 18 variables such as: Timestamps, Location (Zip Code, State), Company, Company Response, Product, Issue, Submission Method, and others.

Shiny App

To help visualize this data set, I created an interactive RShiny application which can be viewed here. There you can find features such as a chloropeth (a fancy map), time series counts, weekday and month complaint frequencies, custom bar charts and mosaic plots, and a pretty useless word cloud (I still enjoyed making it and want to add some sentiment analysis to this).

Please explore this RShiny app and see if you can find some interesting or curious things.

As for my own analysis of the data set, I really only scratched the surface. First of all you can see the number of complaints over time colored by the submission type with web being an increasingly preferred medium. The CFPB is purportedly putting a lot of effort into improving their technology and perhaps this is a simple indication of its success.




Subsequently I wanted to see how the complaint frequency varied between the day of the week and what month it was to see if there were any trends there. To the eye it does seem that there are more complaints in the middle of the week and in the first quarter. I ran a Bartlett test on each observation, but the variances were deemed to be unequal. I have not pursued it further. Why would complaints spike during the first quarter of the year though if indeed it is a real trend?

Complaint Frequency by Day

Complaint Frequency by Month

Looking at the breakdown of the complaints by the Product Type, one can see that the majority of the complaints are related to Mortgages. In a distant second place is Debt Collection and close to that is Credit Reporting. Furthermore, the vast majority of complaints receive a timely response from the company. A timely response just means that the complaint was acknowledged and the company picks one of a few choices to characterize their stance such as agree, disagree, or needs further review.

At the very least, they do seem to take these complaints seriously. Notice the disproportionate amount of untimely responses for the Debt Collection category. Also for now I'll point out a fairly small untimely response fraction for the Bank Account and Service category. More on that later.

All the complaints in the data broken down by Product Type

All the complaints in the data broken down by Product Type

I then turned my focus to complaints concerning Wells Fargo, specifically how events in the news corresponded with the complaint frequency. I simply marked the controversial events listed in Wikipedia for Wells Fargo. One of these includes the recent revelation in September 2016 that 2 million bank accounts were opened to boost numbers.

The red lines represent various controversies that afflicted Wells Fargo.

The vertical red lines represent various controversies that afflicted Wells Fargo. The second from the right corresponds to the revelation in September 2016 that 2 million fake bank accounts were opened to boost numbers. Each data point is the mean complaint count for the 7 day week.

Although there is some wildly interesting movement to this graph, to acquire any realistic correlation, I would need to carry out further statistical analysis and research into the types of complaints and the subject of the controversy for the time before and after the event. There does seem to be quite a large spike after the September 2016 revelation, which indicates that the news gave people motivation to complain (or perhaps the confidence to speak up). At any rate, I wondered what was the breakdown of complaints according to Products for Wells Fargo over the entire data and for this specific time period?

Notice the huge disproportion between timely and untimely responses for the Bank Account or Service product. 48% untimely to be exact.

In the complaint counts broken down by product for Wells Fargo over the entire data set  notice the larger proportion of untimely responses in the Bank Account or Service category.

A quick by the numbers to help elucidate the disparity:

2186/(82778+2186) ~ 2.5% of bank account complaints don't have a timely response

1495/(1495+11662) ~ 11.3% of bank account complaints of Wells Fargo's are untimely

11662/82778 ~ 14.1% of bank account complaints are for Wells Fargo's

50/750 ~ 7% of total complaints are for Wells Fargo (2nd among all companies behind Bank of America and ahead of Equifax)

662/(662+703) ~ 48% of bank account complaints for Wells Fargo in specific time region of interest were untimely

662/2186 ~ 30% of the entire untimely response for bank accounts comes from this window.


In summation, this clearly shows there was a very strong uptick in Bank Account or Service related complaints against Wells Fargo over this time period, and many of those complaints were not resolved with timely responses. This slow response was extremely abnormal for the entire data set and for Wells Fargo in general. And this uptick was immediately preceded by the headline news of Wells Fargo's illegal activity. Furthermore, there has been a near linear increase in complaints since the inception of the CFPB driven primarily by the increased number of web entered complaints.

As for this data set there is so much more to extracted from it. I encourage you to play around with my web app at

Thank you very much for reading!

About Author

Mark Schott

Mark is originally from outside Detroit, MI. For college, he first attended the University of California at Santa Cruz before transferring to Wayne State University in Detroit where he graduated Cum Laude in General Physics. While an undergraduate,...
View all posts by Mark Schott >

Related Articles

Leave a Comment

cartier pendant rose imitation June 17, 2017
не помогло, стандартный VGA ТАК И ОСТАЛСЯ, А ТО ЧТО НУЖНЫЙ ДРАЙВЕР ПОДБИРАЕТ ЭТО ДА. cartier pendant rose imitation

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI