How to Help Shelter Animals
Contributed by Chuan Hong. Chuan is currently in the NYC Data Science Academy 12 week full-time Data Science Bootcamp program taking place between September 26th to December 23rd, 2016. This post is based on her class project - R Visualization.
Each year, approximately 7.6 million companion animals enter animal shelters nationwide (ASPCA). Of those, approximately 3.9 million are dogs and 3.4 million are cats. About 2.7 million shelter animals are adopted each year (1.4 million dogs and 1.3 million cats). Meanwhile, about 649,000 animals who enter shelters as strays are returned to their owners (542,000 dogs and 100,000 cats). Compared to these lucky cats and dogs finding their families to take them home, many shelter animals face an uncertain future. It is estimated that 2.7 million cats and dogs are euthanized in the US every year. Given the differences in outcomes for shelter animals, we can analyze the factors that make some cats and dogs more likely to get adopted.
Two months ago, Kaggle hosted a competition to predict the outcome of shelter animals, in order to help shelters focus their energy on specific animals who need a little extra help finding a new home. The dataset was from Austin Animal Center.
In this dataset, there are ten variables, which are "AnimalID", "Name", "DateTime", "AnimalType"(Dog/Cat), "SexuponOutcome"(Neuteraed Male/Spayed Female/Intact Male/Intace Female), "AgeuponOutcome", "Breed", "Color", "OutcomeType"(Return_to_owner/Adoption/Transfer/Euthanasia/Died), and "OutcomeSubtype"(Other/Foster/Offsite/Partner/Barn/SCRP/Suffering/etc.).
After a quick check of these variables, I decided that"Color" and "OutcomeSubtype" would not be included in this visualization project. This was because that there were 300+ unique colors in this dataset. It was way too many to visualize factor by factor. Meanwhile, based on the Sankey plot below, we can see that the "OutcomeSubtype" is a detailed explanation of the variable "Outcome".
Exploratory Data Analysis (EDA)
In this project, I did some EDA to investigate the potential relationships between factors and animal outcomes, especially adoption situation.
Does animal type matter? Cats vs. Dogs
First, let's look at how many cats and dogs we have in this dataset and how different outcomes are distributed. From the two graphs shown below, we can see that both cats and dogs were commonly adopted, but dogs are much more likely to be returned to their owners than cats, and cats are transferred between shelters more often than dogs. It also appears that very few animals died or got euthanized overall.
Does name matter?
There are quite a few cats and dogs in this dataset who sadly don’t have names. I was curious to see if having a name affected their fate. The graphs below indicate that the situation was different between cats and dogs. Cats with names were more likely to be adopted; while for dogs, the percentage of adoption was similar whether having a name or not.
Does sex matter?
The "SexuponOutcome" (Neutered Male/Spayed Female/Intact Male/Intact Female) variable contains two types of information: if the cat/dog was male or female, and if it was neutered/spayed or intact. So, there are two distinctive features in fact. I then encoded this variable into two, "sex" and "isNeutered". It seems like the adoption count and percentage were similar between male and female in both cats and dogs.
Does spaying/neutering matter?
The graphs below show that neutered (or spayed) was a potentially strong factor. Cats or dogs were more likely to be adopted if they’ve been neutered.
Does mixed breed matter?
Further, we have information about "Breed" in this dataset. Some animals had pure or mixed breed. I wondered if breed purity has some positive impact on the fate of an animal. Then, I created three variables from the original variable “Breed”, "isMix", "primarybreed", and "secondarybreed". However, there were no obvious differences between pure and mixed breeds ( see the graph of the percentage below).
Does breed matter?
The breed variable has way too many levels, so, for the breed analysis, I just selected the top eight most popular breeds in this dataset for cat and dog, respectively.
(1) For Top 8 cat breeds
From the graph of the count, we can see that the majority breeds of cats are Shorthair, Median hair, Longhair, and Siamese. But, the percentage graph shows that the adoption percentage is similar for these top four groups. So, the breed may not a strong factor affecting the fate of cats.
(2) For Top 8 dog breeds
Likewise, the percentage of adoption among the top eight breeds of dogs are similar too.
Does age matter?
Another potential factor is "Age", but we have this variable in different units (i.e. years, months, weeks, and days). So, we converted every "Age" into "Ageinyear" and "Ageinmonth", then explored whether there were some different trends related to age.
Based on the two pairs of graphs below, outcome by age in years and outcome by age in months, we can see that most of the animals in the shelter were 0-1 years old. Meanwhile, it seems like that young cats and dogs have much higher chances to be adopted, while older cats and dogs with approximately equal probability can be adopted.
(1) By year
(2) By month
Does outcome time matter?
Finally, one very important factor is "DateTime", which is the time when the outcome happened. It looks like that cats are more likely to be adopted during summer and winter and dogs are more likely to be adopted during winter too (based on the graph by month). Meanwhile, we assume that the adoption peaks are weekends and 4:00 pm to 6:00 pm (graph of by hour).
(1) By month
(2) By weekday
(3) By hour
Heat map of adoption: weekdays and hours
To explore and understand the trends of adoption peak, two heat maps with the number of adoption vs. weekday and hour were created. We can see that adoptions are more likely happening during weekends and from 4:00 pm to 6:00 pm. The trend of cats is similar to that of dogs.
- "Age", "DateTime", and "isNeutered" might be driving factors.
- "sex" and "isMix" might not be important.
- "hasName" and "Breed" may result in different outcomes between cats and dogs.
Based on the findings, animal shelters may need to turn to unique promotions to encourage potential owners to take relatively older cats or dogs. Meanwhile, shelters can reduce the adoption fee for a cat or dog older than one-year-old, and they can bring only older cats and dogs during adoption peak, such as weekends, to highlight them.
- To look deep into the pattern of missingness and use proper ways to do imputation.
- To do some statistical analysis (e.g. Chi-square test, ANOVA. etc.).
- To apply multiclass classification (e.g. randomForest, XGboost, etc.) to investigate which potential factor is the strongest one.
You may also explore this project via Chuan's GitHub.
The American Society for the Prevention of Cruelty to Animals (ASPCA), Pet Statistics