Data Analysis of Consumer Complaint data from the CFPB

Mark Schott

Posted on May 1, 2017

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

The Consumer Financial Protection Bureau (CFPB) was created as a result of the Dodd-Frank bill, a legislative response to the 2008 financial crisis. Unlike other government agencies, data shows the CFPB is funded directly by the Federal Reserve and has its head-honcho, currently Richard Cordray, appointed by the POTUS. This has allowed the CFPB to carry out its duties as financial sector watchdog without going begging to Congress every fiscal year, which of course has made them controversial to some powerful people.

Their activities include things like levying fines, suing various companies, providing financial advice to the public, and handling complaints from consumers. They amass these individual complaints, do some quick validation, and pass them along to the corresponding company so that they have an opportunity to respond. The company is given 15 days to respond to the claims made in the complaint to be added to the data. Since 2011, roughly 750,000 complaints have been collected and made publicly available. The data set is updated often.

Despite the politics, as a data scientist I am in support of open data endeavors because it gives me something to explore. Thus I decided to pick this data set as the focal point of my RShiny project and brief analysis. This data consists of 18 variables such as: Timestamps, Location (Zip Code, State), Company, Company Response, Product, Issue, Submission Method, and others.

Shiny App

To help visualize this data set, I created an interactive RShiny application which can be viewed here. There you can find features such as a chloropeth (a fancy map), time series counts, weekday and month complaint frequencies, custom bar charts and mosaic plots, and a pretty useless word cloud (I still enjoyed making it and want to add some sentiment analysis to this).

Please explore this RShiny app and see if you can find some interesting or curious things.

As for my own analysis of the data set, I really only scratched the surface. First of all you can see the number of complaints over time colored by the submission type with web being an increasingly preferred medium. The CFPB is purportedly putting a lot of effort into improving their technology and perhaps this is a simple indication of its success.

Data

Subsequently I wanted to see how the complaint frequency varied between the day of the week and what month it was to see if there were any trends there. To the eye it does seem that there are more complaints in the middle of the week and in the first quarter. I ran a Bartlett test on each observation, but the variances were deemed to be unequal. I have not pursued it further. Why would complaints spike during the first quarter of the year though if indeed it is a real trend?

Looking at the breakdown of the complaints by the Product Type, one can see that the majority of the complaints are related to Mortgages. In a distant second place is Debt Collection and close to that is Credit Reporting. Furthermore, the vast majority of complaints receive a timely response from the company. A timely response just means that the complaint was acknowledged and the company picks one of a few choices to characterize their stance such as agree, disagree, or needs further review.

At the very least, they do seem to take these complaints seriously. Notice the disproportionate amount of untimely responses for the Debt Collection category. Also for now I'll point out a fairly small untimely response fraction for the Bank Account and Service category. More on that later.

All the complaints in the data broken down by Product Type

I then turned my focus to complaints concerning Wells Fargo, specifically how events in the news corresponded with the complaint frequency. I simply marked the controversial events listed in Wikipedia for Wells Fargo. One of these includes the recent revelation in September 2016 that 2 million bank accounts were opened to boost numbers.

The red lines represent various controversies that afflicted Wells Fargo.

The vertical red lines represent various controversies that afflicted Wells Fargo. The second from the right corresponds to the revelation in September 2016 that 2 million fake bank accounts were opened to boost numbers. Each data point is the mean complaint count for the 7 day week.

Although there is some wildly interesting movement to this graph, to acquire any realistic correlation, I would need to carry out further statistical analysis and research into the types of complaints and the subject of the controversy for the time before and after the event. There does seem to be quite a large spike after the September 2016 revelation, which indicates that the news gave people motivation to complain (or perhaps the confidence to speak up). At any rate, I wondered what was the breakdown of complaints according to Products for Wells Fargo over the entire data and for this specific time period?

Notice the huge disproportion between timely and untimely responses for the Bank Account or Service product. 48% untimely to be exact.

In the complaint counts broken down by product for Wells Fargo over the entire data set notice the larger proportion of untimely responses in the Bank Account or Service category.

A quick by the numbers to help elucidate the disparity:

2186/(82778+2186) ~ 2.5% of bank account complaints don't have a timely response

1495/(1495+11662) ~ 11.3% of bank account complaints of Wells Fargo's are untimely

11662/82778 ~ 14.1% of bank account complaints are for Wells Fargo's

50/750 ~ 7% of total complaints are for Wells Fargo (2nd among all companies behind Bank of America and ahead of Equifax)

662/(662+703) ~ 48% of bank account complaints for Wells Fargo in specific time region of interest were untimely

662/2186 ~ 30% of the entire untimely response for bank accounts comes from this window.

Conlclusion

In summation, this clearly shows there was a very strong uptick in Bank Account or Service related complaints against Wells Fargo over this time period, and many of those complaints were not resolved with timely responses. This slow response was extremely abnormal for the entire data set and for Wells Fargo in general. And this uptick was immediately preceded by the headline news of Wells Fargo's illegal activity. Furthermore, there has been a near linear increase in complaints since the inception of the CFPB driven primarily by the increased number of web entered complaints.

As for this data set there is so much more to extracted from it. I encourage you to play around with my web app at mesnaround.shinyapps.io/consumer_complaints.

Thank you very much for reading!

About Author

Mark Schott

Mark is originally from outside Detroit, MI. For college, he first attended the University of California at Santa Cruz before transferring to Wayne State University in Detroit where he graduated Cum Laude in General Physics. While an undergraduate,...

View all posts by Mark Schott >

Meetup

Building a Safer Future

Student Works

Airbnb vs Long-Term Rentals: Understanding NYC Real Estate

Python

CitiBike Supply and Demand in NYC

Python

Comparison of Uber and Lyft Cab Services in Boston, MA

Data Visualization

The Data Behind EV Driving

Cancel reply

You must be logged in to post a comment.

cartier pendant rose imitation June 17, 2017

не помогло, стандартный VGA ТАК И ОСТАЛСЯ, А ТО ЧТО НУЖНЫЙ ДРАЙВЕР ПОДБИРАЕТ ЭТО ДА. cartier pendant rose imitation http://www.amoregioielli.ru/en/replica-cartier-pink-gold-love-chain-double-rings-necklace-p724/

Data Analysis of Consumer Complaint data from the CFPB

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Shiny App

Data

Conlclusion

About Author

Mark Schott

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Data Analysis of Consumer Complaint data from the CFPB

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Shiny App

Data

Conlclusion

About Author

Mark Schott

Related Articles

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Get detailed curriculum information about our
amazing bootcamp!