Data Analysis of Consumer Complaint data from the CFPB
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
The Consumer Financial Protection Bureau (CFPB) was created as a result of the Dodd-Frank bill, a legislative response to the 2008 financial crisis. Unlike other government agencies, data shows the CFPB is funded directly by the Federal Reserve and has its head-honcho, currently Richard Cordray, appointed by the POTUS. This has allowed the CFPB to carry out its duties as financial sector watchdog without going begging to Congress every fiscal year, which of course has made them controversial to some powerful people.
Their activities include things like levying fines, suing various companies, providing financial advice to the public, and handling complaints from consumers. They amass these individual complaints, do some quick validation, and pass them along to the corresponding company so that they have an opportunity to respond. The company is given 15 days to respond to the claims made in the complaint to be added to the data. Since 2011, roughly 750,000 complaints have been collected and made publicly available. The data set is updated often.
Despite the politics, as a data scientist I am in support of open data endeavors because it gives me something to explore. Thus I decided to pick this data set as the focal point of my RShiny project and brief analysis. This data consists of 18 variables such as: Timestamps, Location (Zip Code, State), Company, Company Response, Product, Issue, Submission Method, and others.
To help visualize this data set, I created an interactive RShiny application which can be viewed here. There you can find features such as a chloropeth (a fancy map), time series counts, weekday and month complaint frequencies, custom bar charts and mosaic plots, and a pretty useless word cloud (I still enjoyed making it and want to add some sentiment analysis to this).
Please explore this RShiny app and see if you can find some interesting or curious things.
As for my own analysis of the data set, I really only scratched the surface. First of all you can see the number of complaints over time colored by the submission type with web being an increasingly preferred medium. The CFPB is purportedly putting a lot of effort into improving their technology and perhaps this is a simple indication of its success.
Subsequently I wanted to see how the complaint frequency varied between the day of the week and what month it was to see if there were any trends there. To the eye it does seem that there are more complaints in the middle of the week and in the first quarter. I ran a Bartlett test on each observation, but the variances were deemed to be unequal. I have not pursued it further. Why would complaints spike during the first quarter of the year though if indeed it is a real trend?
Looking at the breakdown of the complaints by the Product Type, one can see that the majority of the complaints are related to Mortgages. In a distant second place is Debt Collection and close to that is Credit Reporting. Furthermore, the vast majority of complaints receive a timely response from the company. A timely response just means that the complaint was acknowledged and the company picks one of a few choices to characterize their stance such as agree, disagree, or needs further review.
At the very least, they do seem to take these complaints seriously. Notice the disproportionate amount of untimely responses for the Debt Collection category. Also for now I'll point out a fairly small untimely response fraction for the Bank Account and Service category. More on that later.
I then turned my focus to complaints concerning Wells Fargo, specifically how events in the news corresponded with the complaint frequency. I simply marked the controversial events listed in Wikipedia for Wells Fargo. One of these includes the recent revelation in September 2016 that 2 million bank accounts were opened to boost numbers.
Although there is some wildly interesting movement to this graph, to acquire any realistic correlation, I would need to carry out further statistical analysis and research into the types of complaints and the subject of the controversy for the time before and after the event. There does seem to be quite a large spike after the September 2016 revelation, which indicates that the news gave people motivation to complain (or perhaps the confidence to speak up). At any rate, I wondered what was the breakdown of complaints according to Products for Wells Fargo over the entire data and for this specific time period?
A quick by the numbers to help elucidate the disparity:
2186/(82778+2186) ~ 2.5% of bank account complaints don't have a timely response
1495/(1495+11662) ~ 11.3% of bank account complaints of Wells Fargo's are untimely
11662/82778 ~ 14.1% of bank account complaints are for Wells Fargo's
50/750 ~ 7% of total complaints are for Wells Fargo (2nd among all companies behind Bank of America and ahead of Equifax)
662/(662+703) ~ 48% of bank account complaints for Wells Fargo in specific time region of interest were untimely
662/2186 ~ 30% of the entire untimely response for bank accounts comes from this window.
In summation, this clearly shows there was a very strong uptick in Bank Account or Service related complaints against Wells Fargo over this time period, and many of those complaints were not resolved with timely responses. This slow response was extremely abnormal for the entire data set and for Wells Fargo in general. And this uptick was immediately preceded by the headline news of Wells Fargo's illegal activity. Furthermore, there has been a near linear increase in complaints since the inception of the CFPB driven primarily by the increased number of web entered complaints.
As for this data set there is so much more to extracted from it. I encourage you to play around with my web app at mesnaround.shinyapps.io/consumer_complaints.
Thank you very much for reading!