Data Study on financial products and services Complaints

Wansang Lim

Posted on Jan 31, 2016

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Contributed by Wansang Lim. He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his first class project - R visualization (due on the 2th week of the program).

Introduction

Each week, data shows the government sent thousands of consumers’ complaints about financial products and services to companies for response. Those complaints are published after the company responds or after 15 days, whichever comes first. By adding their voice, consumers help improve the financial marketplace.

The narrative of consumer complaints is like "The insurance company is responsible for this charge, I was and presently am a Federal employee and medical /dental insurance is extended 31 days after separation from employment. I left my job the same month I visited this dental office, they are aware of the 31 day extension and refuse to bill the correct party for this charge, the dental insurance company. I 've regained my Federal employment status and hold the same insurance. They sent this account to collections in error. They need to bill the insurance company and remove the fraudulent charge off of all my credit reports, they have caused me a tremendous hardship. "

The web site for reporting complaints is https://www.consumerfinance.gov/complaintdatabase/

Data

The data was downloaded from https://catalog.data.gov/dataset/consumer-complaint-database. The data is quite huge. It has 0.5 million rows and 16 columns.The time period is 2011/December ~ 2015/December.

First, the number of reports are grouped by year to see if there is any relationships.

```{r}
dateData <- select(Consumer_Complaints, year, month)
groupByYear % summarise(., yearNum = n())
groupByYear

<pre><code>Table 1. Consumer complaints data group by Year.

year yearNum
2012 72523
2013 108272
2014 153164
2015 161703

The table 1 shows that consumer increases by year. Then I decide to divide by month to look at the trends and regress it.

```{R}
ByYearMonth % summarise(., total = n())
ByYearMonth$seq <- c(1:nrow(ByYearMonth))

regre <- lm(total ~ seq, data=ByYearMonth)
summary(regre)

Data Study on financial products and services Complaints

The total number of reports grouped by Year

Figure 1. Regress plots of total reported number of complaints by each month from 2011/Dec to 2012/Dec

Then, I grouped it by company. The number of all reported companies is 3, 4111. The 20 reported companies are 66% of all complaints.

````````````{r}
top20Company <- companyRank[1:20,]
top20Company$Company <- factor(top20Company$Company, levels = top20Company$Company)
top20Company$percentage <- top20Company$total/nrow(Consumer_complainYearMonth)*100

library(ggplot2)
barCompany <- ggplot(top20Company, aes(x=Company, y=percentage)) + geom_bar(stat="identity")
barCompany + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + ylab("%")
````````````````

Figure. The top 20 reported companies

The complaints was grouped by products.
````````{r}
byProduct % summarise(., total= n() ) %>% arrange(., desc(total))
byProduct$Product <- factor(byProduct$Product, levels = byProduct$Product)

byProduct$Percentage <- byProduct$total/nrow(Consumer_complainYearMonth)*100
barPro <- ggplot(byProduct, aes(x=Product, y=Percentage)) + geom_bar(stat="identity")
barPro + theme(axis.text.x = element_text(angle = 90, hjust = 1), panel.background = element_blank()) + ylab("%")
`````````

The total number of reports are grouped by top 10 company and product.

```{r}
CcTop10Company <- filter(Consumer_complainYearMonth, Company %in% top10Company$Company)
CcTop10Company$Company <- factor(CcTop10Company$Company, levels = top10Company$Company)
byCompanyProduct <- group_by(CcTop10Company, Company, Product)
byCompanyProductTotal <- summarise(byCompanyProduct, total = n())
byCompanyProductArrange <- arrange(byCompanyProductTotal, desc(total))

library(ggplot2)
library(lattice)
groupXYplot <- xyplot(byCompanyProductArrange$total ~ byCompanyProductArrange$Product | byCompanyProductArrange$Company, data=byCompanyProductArrange, xlab = list(label = "Product", cex=1), scales=list(x=list(rot=45)), groups = Product)
groupXYplot

<a href="https://nycdsa-blog-files.s3.us-east-2.amazonaws.com/2016/01/ComPro.jpg"><img src="https://nycdsa-blog-files.s3.us-east-2.amazonaws.com/2016/01/ComPro-300x214.jpg" alt="ComPro" width="300" height="214" class="alignnone size-medium wp-image-8998" /></a>

The total complaints are grouped by state.
```{r}
onlyState <- select(Consumer_complainYearMonth, State)
byState % summarise(., total = n())
byState <- byState[-1,]

onlyState <- select(Consumer_complainYearMonth, State)
byState % summarise(., total = n())
byState <- byState[-1,]

stateAbb <- read.csv("/media/wan/64GB/R_CLASS/Project/data/stateAbb.csv", stringsAsFactors=FALSE)
fullName <- c()
total <- c()
byStateFullname <- data.frame(fullName, total)
for (i in 1: nrow(byState)) {
posi <- match(byState$State[i], stateAbb$abbreviation)
if (!is.na(posi)) {
lastPosi <- length(fullName) + 1
fullName[lastPosi] <- stateAbb$state.name[posi]
total[lastPosi] <- byState$total[i]
}
}

byState <- data.frame(fullName, total)

library(maps)
library(ggplot2)
all_state <-map_data("state")
byState$region <- byState$fullName
Total <- merge(all_state, byState, by="region")
Total <- Total[Total$region!="district of columbia",]
p <- ggplot()
p <- p + geom_polygon(data=Total, aes(x=long, y=lat, group = group, fill=Total$total),colour="white"
) + scale_fill_continuous(low = "thistle2", high = "darkred", guide="colorbar")
p1 <- p + theme_bw() + labs(fill = "Total Reports"
,title = "Total Reports by State", x="", y="")
p1 + scale_y_continuous(breaks=c()) + scale_x_continuous(breaks=c()) + theme(panel.border = element_blank())
</code></pre>

Then total reports are analyzed by reporting methods like mail, interent.
The result is checked by time passing.

```{r}
byMediaYearMonth % summarise(., total = n())
byMediaYearMonth <- arrange(byMediaYearMonth, total)

byMediaYearMonth$month <- as.character(byMediaYearMonth$month)
for (i in 1: nrow(byMediaYearMonth)) {
if ( nchar(byMediaYearMonth$month[i], type = "chars", allowNA = FALSE, keepNA = NA) == 1) {
byMediaYearMonth$month[i] <- paste(0, byMediaYearMonth$month[i], sep = "" )
}
}

byMediaYearMonth$yymm <- paste(byMediaYearMonth$year, byMediaYearMonth$month, sep = "/")
byMediaYearMonth <- filter(byMediaYearMonth, yymm != "2015/12")

graph <- qplot(yymm, total, data = byMediaYearMonth, color = Submitted.via)
graph45 <- graph + theme(axis.text.x = element_text(angle = 45,size = 14, hjust = 1), panel.background = element_blank())

yymm <- byMediaYearMonth$yymm
yymm <- unique(yymm)
YearMonth <- data.frame(yymm, stringsAsFactors=FALSE)

YearMonth$yymm <- as.character(YearMonth$yymm)
ticks5 <- c()
lastRow <- nrow(YearMonth)
for (i in 1: lastRow) {
if (i %% 5 == 0) {
ticks5[i] <- YearMonth$yymm[i]
cat("yymm", YearMonth$yymm[i])
} else {
ticks5[i] <- ""
}
}

graph45five <- graph45 + scale_x_discrete(breaks = ticks5,labels =ticks5)
graph45five+ theme(axis.text.x = element_text(angle = 45, hjust = 1))
```

About Author

Wansang Lim

I recently completed MS computer science degree in New York University(Manhattan NY) concentrating on machine learning and big data . Before it, I studied software development and android development. Also, I am Ph.D of Agriculture with a lot...

View all posts by Wansang Lim >

Cancel reply

You must be logged in to post a comment.

No comments found.

Data Study on financial products and services Complaints

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Contributed by Wansang Lim. He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his first class project - R visualization (due on the 2th week of the program).

Introduction

Data

Figure 1. Regress plots of total reported number of complaints by each month from 2011/Dec to 2012/Dec

The total number of reports are grouped by top 10 company and product.

Then total reports are analyzed by reporting methods like mail, interent.
The result is checked by time passing.

About Author

Wansang Lim

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our
amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Data Study on financial products and services Complaints

The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Contributed by Wansang Lim. He is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on his first class project - R visualization (due on the 2th week of the program).

Introduction

Data

Figure 1. Regress plots of total reported number of complaints by each month from 2011/Dec to 2012/Dec

The total number of reports are grouped by top 10 company and product.

Then total reports are analyzed by reporting methods like mail, interent. The result is checked by time passing.

About Author

Wansang Lim

Leave a Comment

Cancel reply

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

NYC Data Science Academy

Get detailed curriculum information about our amazing bootcamp!

Offerings

About

SOCIAL MEDIA

Then total reports are analyzed by reporting methods like mail, interent.
The result is checked by time passing.

Get detailed curriculum information about our
amazing bootcamp!