How to Help Shelter Animals

Chuan Hong
Posted on Oct 25, 2016

Contributed by Chuan Hong. Chuan is currently in the NYC Data Science Academy 12 week full-time Data Science Bootcamp program taking place between September 26th to December 23rd, 2016. This post is based on her class project - R  Visualization.


Each year, approximately 7.6 million companion animals enter animal shelters nationwide (ASPCA). Of those, approximately 3.9 million are dogs and 3.4 million are cats.  About 2.7 million shelter animals are adopted each year (1.4 million dogs and 1.3 million cats). Meanwhile, about 649,000 animals who enter shelters as strays are returned to their owners (542,000 dogs and 100,000 cats). Compared to these lucky cats and dogs finding their families to take them home, many shelter animals face an uncertain future. It is estimated that 2.7 million cats and dogs are euthanized in the US every year. Given the differences in outcomes for shelter animals, we can analyze the factors that make some cats and dogs more likely to get adopted.



Two months ago, Kaggle hosted a competition to predict the outcome of shelter animals, in order to help shelters focus their energy on specific animals who need a little extra help finding a new home. The dataset was from Austin Animal Center.

In this dataset, there are ten variables, which are  "AnimalID", "Name", "DateTime", "AnimalType"(Dog/Cat), "SexuponOutcome"(Neuteraed Male/Spayed Female/Intact Male/Intace Female), "AgeuponOutcome", "Breed", "Color", "OutcomeType"(Return_to_owner/Adoption/Transfer/Euthanasia/Died), and "OutcomeSubtype"(Other/Foster/Offsite/Partner/Barn/SCRP/Suffering/etc.).

After a quick check of these variables, I decided that"Color" and "OutcomeSubtype" would not be included in this visualization project. This was because that there were 300+ unique colors in this dataset. It was way too many to visualize factor by factor. Meanwhile, based on the Sankey plot below, we can see that the "OutcomeSubtype" is a detailed explanation of the variable "Outcome". 


Sankey Plot of OutcomeType versus OutcomeSubtype


Exploratory Data Analysis (EDA)

In this project, I did some EDA to investigate the potential relationships between factors and animal outcomes, especially adoption situation.

  • Does animal type matter? Cats vs. Dogs

First, let's look at how many cats and dogs we have in this dataset and how different outcomes are distributed. From the two graphs shown below, we can see that both cats and dogs were commonly adopted, but dogs are much more likely to be returned to their owners than cats, and cats are transferred between shelters more often than dogs. It also appears that very few animals died or got euthanized overall.

catsvsdogs catsvsdogspct

  • Does name matter?

There are quite a few cats and dogs in this dataset who sadly don’t have names. I was curious to see if having a name affected their fate. The graphs below indicate that the situation was different between cats and dogs. Cats with names were more likely to be adopted; while for dogs, the percentage of adoption was similar whether having a name or not.

name name

  • Does sex matter?

The "SexuponOutcome" (Neutered Male/Spayed Female/Intact Male/Intact Female) variable contains two types of information: if the cat/dog was male or female, and if it was neutered/spayed or intact. So, there are two distinctive features in fact. I then encoded this variable into two, "sex" and "isNeutered". It seems like the adoption count and percentage were similar between male and female in both cats and dogs.

sex sex

  • Does spaying/neutering matter?

The graphs below show that neutered (or spayed) was a potentially strong factor. Cats or dogs were more likely to be adopted if they’ve been neutered.

isneutered isneutered

  • Does mixed breed matter?

Further, we have information about "Breed" in this dataset. Some animals had pure or mixed breed. I wondered if breed purity has some positive impact on the fate of an animal. Then, I created three variables from the original variable “Breed”, "isMix", "primarybreed", and "secondarybreed". However, there were no obvious differences between pure and mixed breeds ( see the graph of the percentage below).


  • Does breed matter?

The breed variable has way too many levels, so, for the breed analysis, I just selected the top eight most popular breeds in this dataset for cat and dog, respectively.

(1) For Top 8 cat breeds

From the graph of the count, we can see that the majority breeds of cats are Shorthair, Median hair, Longhair, and Siamese. But, the percentage graph shows that the adoption percentage is similar for these top four groups. So, the breed may not a strong factor affecting the fate of cats.



(2) For Top 8 dog breeds

Likewise, the percentage of adoption among the top eight breeds of dogs are similar too.


  • Does age matter?

Another potential factor is "Age", but we have this variable in different units (i.e.  years, months, weeks, and days). So, we converted every "Age" into "Ageinyear" and "Ageinmonth", then explored whether there were some different trends related to age.

Based on the two pairs of graphs below, outcome by age in years and outcome by age in months, we can see that most of the animals in the shelter were  0-1 years old. Meanwhile, it seems like that young cats and dogs have much higher chances to be adopted, while older cats and dogs with approximately equal probability can be adopted.

(1) By year

ageinyear ageinyear

(2) By month

ageinmon ageinmon

  • Does outcome time matter?

Finally, one very important factor is "DateTime", which is the time when the outcome happened. It looks like that cats are more likely to be adopted during summer and winter and dogs are more likely to be adopted during winter too (based on the graph by month). Meanwhile, we assume that the adoption peaks are weekends and 4:00 pm to 6:00 pm (graph of by hour).

(1) By month


(2) By weekday


(3) By hour


  • Heat map of adoption: weekdays and hours

To explore and understand the trends of adoption peak, two heat maps with the number of adoption vs. weekday and hour were created. We can see that adoptions are more likely happening during weekends and from 4:00 pm to 6:00 pm. The trend of cats is similar to that of dogs.




  • "Age", "DateTime", and "isNeutered" might be driving factors.
  • "sex" and "isMix" might not be important.
  • "hasName" and "Breed" may result in different outcomes between cats and dogs.

Based on the findings, animal shelters may need to turn to unique promotions to encourage potential owners to take relatively older cats or dogs. Meanwhile, shelters can reduce the adoption fee for a cat or dog older than one-year-old, and they can bring only older cats and dogs during adoption peak, such as weekends, to highlight them.


Future Works

  • To look deep into the pattern of missingness and use proper ways to do imputation.
  • To do some statistical analysis (e.g. Chi-square test, ANOVA. etc.).
  • To apply multiclass classification (e.g. randomForest, XGboost, etc.) to investigate which potential factor is the strongest one.

You may also explore this project via Chuan's GitHub.



The American Society for the Prevention of Cruelty to Animals (ASPCA), Pet Statistics


About Author

Chuan Hong

Chuan Hong

Chuan Hong is a Ph.D. Candidate majoring in Public Health at the University of South Carolina. Her main research areas are environmental health sciences, with a focus on environmental epidemiology. By using a series of data collection, statistical...
View all posts by Chuan Hong >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python python machine learning python scrapy python web scraping python webscraping Python Workshop R R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp