Scraping petfinder.com for Popular Dog Breeds and Adoption Trends
Within 100 miles of New York City, over 8000 dogs are waiting to be adopted today through petfinder.com. So many of them, from puppies to senior dogs, have been waiting for weeks, or even months or years to find a new home. With so many different breeds, sizes, ages, and temperaments, it’s hard to know which dogs will be adopted next.
Using web scraping, we are able to dig a lot deeper into the data found online and learn more about trends. While Petfinder.com shows plenty of adorable pictures and allows its users to sort through the different pets available, we can also obtain a larger dataset from the website to learn more about the different types of dogs listed for adoption. From this data, we find out which ones are more common in a certain region, which get adopted more quickly, and whether there are common characteristics among the more popular dog breeds.
Using Selenium to automate the scraping, I was able to analyze the listings for all dogs currently listed for adoption within 100 miles of White Plains, NY. The search included urban, suburban, and rural areas in 5 different states surrounding the NYC metro area. I was able to scrape the visible information for each dog (age, breed, size, color, description, location, etc.) as well as hidden information including each animal’s posting date. All of these were incorporated into a dataset where I could analyze the demographics of the dogs available for adoption in the area and ascertain the rate at which each category was getting adopted.
From the data scraped for the NYC region, I was able to see there were 126 different types of purebred dogs as well as a multitude of mixed-breed combinations. The ten breeds with the most listings in the area are shown in the bar chart below, which shows that a significant number of adult Pit Bull Terriers are available for adoption. However, further analysis shows that only 50 are located with the five boroughs of NYC.
While the site does not show adoption dates for any of the animals, I assumed that most listings are removed from the site fairly soon after they are adopted. The scraped data also provided publishing dates for each listing, so I was able to see how long the dogs currently listed on the site have been available for adoption. In the histogram below, each vertical bar approximately represents a week. Assuming a consistent rate of posting and adoption, it appears that 500 dogs were adopted with a week and after three weeks, only one third of dogs listed were still looking for a home.
The adoption rate for the most popular demographic of dogs, medium-size puppies, can also been seen in the following histogram. Female and male puppies were adopted at similar rates, which can be seen from the red (female) and blue (male) bars.
Amongst all dogs available for adoption, the median number of days they were listed before adoption was 55 days. For each pet, the website also provided information about whether the animals were good with other dogs, cats, and children. While these issues were not important for puppies, these greatly improved the adoption rate for young, adult, and senior dogs, as shown by the box plot below for dogs good with children. Median time until adoption was approximately 60 days quicker for dogs that were good with children than those who were not. Data showed a similar trend for dogs good with other dogs.
Finally, when comparing the descriptions for several different breeds, I created word clouds to represent some of the key words used in their descriptions. The word clouds for Labrador retrievers, Chihuahuas, and Dachshunds are shown below. While some themes showed up for each breed, an even more common trend for all postings was to include “love” and “sweet” in the descriptions of each dog.
Continued Work & Future Applications
While NYC has a large number of dogs seeking homes, there are thousands more across the country. It would be more revealing to scrape a larger dataset to compare trends among different cities or regions across the country. Similar analyses can also be conducted for the many other pet species available on the site, such as cats, birds, reptiles, and farm animals.
One potential use for this study could be to assist volunteers in finding homes for abandoned pets who are being evacuated from areas that suffered due to natural disasters such as hurricanes or wildfires. If volunteers know of a region where a particular breed is more popular or a shelter that is successful at adopting certain groups of animals, they may be able to find new homes for these animals.
More information on this project and its related code are available on Github