Scraping petfinder.com for Popular Dog Breeds and Adoption Trends

Posted on Aug 13, 2018

Within 100 miles of New York City, over 8000 dogs are waiting to be adopted today through petfinder.com. So many of them, from puppies to senior dogs, have been waiting for weeks, or even months or years to find a new home. With so many different breeds, sizes, ages, and temperaments, it’s hard to know which dogs will be adopted next.

Objectives

Using web scraping, we are able to dig a lot deeper into the data found online and learn more about trends. While Petfinder.com shows plenty of adorable pictures and allows its users to sort through the different pets available, we can also obtain a larger dataset from the website to learn more about the different types of dogs listed for adoption. From this data, we find out which ones are more common in a certain region, which get adopted more quickly, and whether there are common characteristics among the more popular dog breeds.

Method

Using Selenium to automate the scraping, I was able to analyze the listings for all dogs currently listed for adoption within 100 miles of White Plains, NY. The search included urban, suburban, and rural areas in 5 different states surrounding the NYC metro area. I was able to scrape the visible information for each dog (age, breed, size, color, description, location, etc.) as well as hidden information including each animal’s posting date. All of these were incorporated into a dataset where I could analyze the demographics of the dogs available for adoption in the area and ascertain the rate at which each category was getting adopted.

Analysis

From the data scraped for the NYC region, I was able to see there were 126 different types of purebred dogs as well as a multitude of mixed-breed combinations. The ten breeds with the most listings in the area are shown in the bar chart below, which shows that a significant number of adult Pit Bull Terriers are available for adoption. However, further analysis shows that only 50 are located with the five boroughs of NYC.

While the site does not show adoption dates for any of the animals, I assumed that most listings are removed from the site fairly soon after they are adopted. The scraped data also provided publishing dates for each listing, so I was able to see how long the dogs currently listed on the site have been available for adoption. In the histogram below, each vertical bar approximately represents a week. Assuming a consistent rate of posting and adoption, it appears that 500 dogs were adopted with a week and after three weeks, only one third of dogs listed were still looking for a home.

The adoption rate for the most popular demographic of dogs, medium-size puppies, can also been seen in the following histogram. Female and male puppies were adopted at similar rates, which can be seen from the red (female) and blue (male) bars.

Amongst all dogs available for adoption, the median number of days they were listed before adoption was 55 days. For each pet, the website also provided information about whether the animals were good with other dogs, cats, and children. While these issues were not important for puppies, these greatly improved the adoption rate for young, adult, and senior dogs, as shown by the box plot below for dogs good with children. Median time until adoption was approximately 60 days quicker for dogs that were good with children than those who were not. Data showed a similar trend for dogs good with other dogs.

Finally, when comparing the descriptions for several different breeds, I created word clouds to represent some of the key words used in their descriptions. The word clouds for Labrador retrievers, Chihuahuas, and Dachshunds are shown below. While some themes showed up for each breed, an even more common trend for all postings was to include β€œlove” and β€œsweet” in the descriptions of each dog.

Β 

Continued Work & Future Applications

While NYC has a large number of dogs seeking homes, there are thousands more across the country. It would be more revealing to scrape a larger dataset to compare trends among different cities or regions across the country. Similar analyses can also be conducted for the many other pet species available on the site, such as cats, birds, reptiles, and farm animals.

One potential use for this study could be to assist volunteers in finding homes for abandoned pets who are being evacuated from areas that suffered due to natural disasters such as hurricanes or wildfires. If volunteers know of a region where a particular breed is more popular or a shelter that is successful at adopting certain groups of animals, they may be able to find new homes for these animals.

More information on this project and its related code are available on Github

About Author

Erin Dugan

As an engineer with a strong background in R&D and acoustics, Erin enjoys finding creative ways to interpret and communicate complex information, whether it's for product development or project design. She holds a Master of Engineering Management (MEM)...
View all posts by Erin Dugan >

Leave a Comment

No comments found.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI