Scraping Data From Every Single ZocDoc Doctor

Posted on Feb 23, 2017
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Introduction

Data shows internet start-ups have continuously replaced the herculean task of picking up the phone and calling an actual human with a few taps on the phone. Instead of calling a cab, you tap a button and one arrives. Instead of calling for late night pizza, tap a few times and the order is on its way. Founded in 2007, ZocDoc allows patients to avoid the horror of a phone call and book their doctors appointment with a few taps on their phone.

As a lazy millennial, I’ve always appreciated ZocDoc’s convenience and use it for the majority of my medical booking. ZocDoc allows me to book doctors based on other patient’s reviews, easily reschedule or cancel, and fill out forms beforehand by my appointment.

Curious about the size of their business, the best doctors, and basically everything about their service, I set to work web scraping every single doctor on their site. After spending 95 hours scraping approximately 1.2 million doctors on their site, I discovered that most of doctors on their site are placeholders. See, for example, this profile. ZocDoc has populated its site with placeholder profiles via data from the American Board Of Medical Specialties. After filtering out these placeholders, there are 47,363 doctors on their site. Let's dig in. 

The Business of ZocDoc

Following the logic used here, given about 47k doctors on their site and ZocDoc's annual fee of (as of 2016)   $3000/year annual fee from doctors that use its service, we can roughly estimate their yearly revenue to be about 141 million/year. 

In 2015, in a Series D funding round, ZocDoc raised $130 million at a $1.8 billion valuation. Thus, following the 2015 valuation, they were valued in 2015 at 14x of their current revenue. 

Reviews

ZocDoc has 1.88 million total reviews.

A full review consists of text and a rating in three rating categories: overall, bedside manner, and wait time.  For overall reviews, 85% are five-star reviews,  9% are four-star reviews, 2% are three-star reviews, 2% are two-star reviews, and 2% are one-star reviews.

Overall ZocDoc Ratings

Bedside manner reviews following a similar distribution.

Beside Manner Reviews

While still mostly positive, wait time reviews are a little less positive with more reviews under 5 relative to the other review categories.

Wait Time Reviews

Bias of Zoc Doc?

So ZocDoc's review system appears biased towards positive reviews. Why?

One explanation is ZocDoc patients are really happy with their Doctors. Patients have just given good reviews and that's that.

Another explanation could be their moderation policy. ZocDoc requires patients to visit a doctor in order to review the doctor and all reviews are moderated. They remove reviews if they contain profanity, personal information, pricing specifics, accuracy of treatment or diagnosis info, and promotional content. This moderation criteria — particularly profanity, pricing specifics, and accuracy of treatment — seems biased against negative reviews. If you swear at your overpriced doctor who misdiagnosed you, your review will not make it passed their moderation.

Another explanation is ZocDoc or doctors are artificially inflating their positive reviews with fake positive reviews. Fake reviews are combatted through moderation and requiring patients to see a doctor in order to review a doctor. However, fake reviews are common across the internet and fake reviews are certainly possible by a determined actor.

ZocDoc has a financial incentive to keep their reviews mostly positive. The doctors pay them to use ZocDoc. If a doctor gets mostly negative reviews, then they will be unhappy, not get patients through the service, and eventually leave.

What is in the reviews?

Of all those reviews, 1.19 (64%) million do not have any text. This leaves 36%, or 685k reviews, with text. The text reviews follow a similar pattern the non-text reviews. The vast majority of the overall ratings for reviews with text, 88% (599,779),  are five-star reviews.  Only 2% (14,672) of text reviews have one star.

We can look at some of the most common bigrams, two-word combinations,  to identify patterns in the top five and one star reviews. These bigrams have been filtered for common words to reveal interesting words.

What are common bigrams in five-star reviews?

Below are some of the top bigrams. Patients like doctors who answer questions, take time, make them feel comfortable with friendly staff.

Bigram Frequency
highly recommend 57,866
answers questions 31,846
took time 25966
made feel 22,748
recommend dr 22,709
feel comfortable 21,e555
bedside manner 21,331
explains everything 18,174
great doctor 16,589
takes time 16,328
office staff 16,053
wait time 14,561
staff friendly 13,455
visit dr 13,056
definitely recommend 12,918
make feel 12,439


Data on the common bigrams in one-star reviews

The following are the most common bigrams from one-star reviews. People seem to dislike doctors who keep them in waiting in wait room and waste time. Again, these top bigrams are most likely biased by ZocDoc's moderation policy which prohibits reviews that complain about price or accuracy of diagnosis.

Bigram Frequency
go back 1287
will never 918
waste time 910
wait hour 872
never go 804
call back 693
bedside manner 680
front desk 644
doctor office 632
come back 626
see doctor 604
ask question 578
wait room 562
wait time 540
another doctor 452
office staff 439

Data on Doctor's Locations

ZocDoc was started and is headquartered in New York City.  So, not surprisingly, New York state is the home to the most ZocDoc doctors with 13,053 followed by Texas with 4,569 and California with 4,454. The top five states for ZocDoc doctors (New York, Texas, California, Florida, and New Jersey) account for 60% of the doctors on ZocDoc. 

Doctors By State

state

count

percent

NY

13,053

27

TX

4,569

10

CA

4,454

9

FL

3,495

7

NJ

3,108

7

IL

3,033

6

MD

1,754

4

VA

1,301

3

GA

1,240

3

AZ

1,227

3

Cities

New York city,  Brooklyn, and the Bronx account for 15% of the doctors on Zocdoc. Other major cities include Chicago, Houston, and Washington D.C.

Doctors By City

city_state

count

percent

NEW YORK, NY

6998

11

BROOKLYN, NY

2022

3

CHICAGO, IL

1204

2

HOUSTON, TX

1068

2

WASHINGTON, DC

683

1

BRONX, NY

522

1

DALLAS, TX

503

1

SAN ANTONIO, TX

492

1

AUSTIN, TX

438

1

MINEOLA, NY

423

1

How Often Are ZocDoc Doctors Available?

On the night of Feb 11, I collected data on doctor availability for the next forty-five days  (which includes 32 weekdays) of all the doctors on ZocDoc. The median ZocDoc doctor has 107 appointment slots available in the next 45 days or about 2.3 a weekday. 

ZocDoc Doctor Availability

Data on types of doctors on ZocDoc

ZocDoc is the home to many different types of Doctors. The most popular doctor is a dentist (14% of all doctors), followed by an internist (9% of all doctors), and a family physician (7% of all doctors).

Doctor Types

What Doctor Names Are Over-Represented By Doctors?

By comparing the first names of doctors against the most common U.S. baby names from 1951 to 1992 (25 yr old to 65 yr old, a rough range for when doctors were born) and filtering for first names with at least 20 doctors on ZocDoc , we see names that are more common with doctors vs  the general U.S. born population.  The top five names are Russian: Dmitry, Inna, Yelena, Alla, and Igor. Perhaps not popular baby names during the Cold War.

Doctor First Name

Baby Names Count

Zoc Doc Count

Ratio

Dmitry

7

21

3.00

Inna

19

33

1.74

Yelena

122

36

0.30

Alla

97

27

0.28

Igor

187

42

0.22

Irina

362

56

0.16

Babak

256

23

0.09

Reza

419

28

0.07

Syed

1177

53

0.05

Rajesh

588

23

0.04

Vijay

712

26

0.04

Sanjay

935

33

0.04

Seema

800

27

0.03

Anil

801

26

0.03

Sunil

738

23

0.03

Boris

1941

45

0.02

Ravi

1401

29

0.02

Vladimir

1554

32

0.02

Muhammad

1949

40

0.02

Maryam

1211

24

0.02

Doctor Languages

 Some ZocDoc doctors list the languages they speak on their profile page. After English, which is by far the most popular language spoken by doctors, here are the other languages ZocDoc doctors speak in U.S.

Top Non-English Languages

Want to Become a Data Scientist?  Apply Now.

Apply for Data Science Bootcamp

About Author

Jake Bialer

During the past eight years, I’ve worked as a full-stack developer, data analyst, and journalist. I’ve a track record of finding unique datasets through web scraping and using them to help companies solve key business problems. My NYCDSA...
View all posts by Jake Bialer >

Related Articles

Leave a Comment

CBD Oil For Dogs December 14, 2020
CBD Oil For Dogs [...]that would be the finish of this post. Right here you’ll come across some web pages that we feel you’ll enjoy, just click the links over[...]
MKsOrb August 26, 2020
MKsOrb [...]very couple of internet websites that come about to become detailed below, from our point of view are undoubtedly nicely really worth checking out[...]
onhaxx.me August 19, 2020
onhaxx.me [...]just beneath, are many entirely not associated websites to ours, however, they may be certainly really worth going over[...]
Google August 13, 2020
Google Here are some hyperlinks to sites that we link to because we assume they're really worth visiting.
Google August 12, 2020
Google Sites of interest we have a link to.
mksorb.com August 5, 2020
mksorb.com [...]usually posts some very interesting stuff like this. If you’re new to this site[...]
cbd oil for cats July 9, 2020
cbd oil for cats [...]Wonderful story, reckoned we could combine several unrelated information, nonetheless truly worth taking a appear, whoa did a single learn about Mid East has got much more problerms also [...]
Larry Ding June 24, 2020
Fantastic analysis and it is really useful to me as I am building a competitor to ZocDoc and it's going to be only 99.99 a month.
How To Build a Great Healthcare Practice with Zocdoc Reviews February 11, 2020
[…] As of 2017, Zocdoc has 1.88 million + total reviews […]
CARL LONEY September 23, 2019
If you look at the reviews from vitals, healthgrades and google, the reviews often don't match with the data from zocdoc. The thing with zocdoc is, they are often flooded with fake positive reviews that people overlook the negative (real) ones. I was looking at 1 specific doctor today, and she was getting on avg of 2 star reviews on yelp, gugle, and healthgrades, while the had a 4.5 on zocdoc.
Camila B. February 12, 2019
Great analysis!
Taylor February 12, 2019
They absolutely scrub reviews. It's something like any review with a rating below 4 stars is automagically withheld. I think it's still possible to get it posted, but they require some additional verification, a phone interview, etc., a barrier large enough that a normal person doesn't bother with it.
Taylor Archibald February 12, 2019
They absolutely scrub reviews. It's something like any review with a rating below 4 stars is automagically withheld. I think it's still possible to get it posted, but they require some additional verification, a phone interview, etc., a barrier large enough that a normal person doesn't bother with it.
Notta Victim September 12, 2018
As a physician with an established busy practice, I chose to pass on ZocDoc when it was first pitched to me because it basically amounted to paying for referrals and because it prioritized patients who did not present from established referral sources and were likely to "no show". Of course, they stacked their site with fake positive reviews in order to keep the patients coming and to keep their customers happy. Many of my colleagues decided to go with ZocDoc and considered it a good deal for $300/mo initially, however over time they came to over rely on ZocDoc for all of their reception duties and are now crying uncle while ZocDoc attempts to convert them to a $35 per patient referral fee with charges taken even for no shows. I am glad I grew my practice organically and have never had to rely on a site like ZocDoc which requires kickbacks for referrals.
Geeta Poptani March 25, 2018
I am one of those people who has worked in their back end office and seen how it works. All reviews get to the site. On it the profanities and other stuff are removed. The rest is kept. Every patient who visits writes the reviews and those are typed out by the back end.
Steve October 20, 2017
I tried ZocDoc for a time. I am a gastroenterologist in Ilinois. I had 3 appointments scheduled over a year through the service, 2 didn't show, and 1 that came didn't need specialty services. Then Insaw that ZocDoc was putting a list of other physicians that the patient might want to see on the page that was my practice. So, lousy referral experience and the company advertised my competitors on the page that was supposed to drive patients to me. What were they thinking. You may be interested in the following. After a few months of no referrals I called to cancel. They offered to leave my profile up for free! Well, I continued that for another 6 months, but after about 9-12 months I decided they were a negative on my practice for he above reasons, so I demanded that they drop my name entirely. So, I suspect a lot of he doctors on the site are not paying for the service at all.
Sam June 25, 2017
Hello Jake, Is this data available online on GitHub or is the code for scraping data available on Github? If possible, can you please share me the link
AdamP March 28, 2017
Hi. Looks like you put the same bigram table twice. Cheers.
Apna March 3, 2017
Wonderful analysis. I'd like you to share some details about how you scraped data from Zocdoc's website. Thanks.

View Posts by Categories


Our Recent Popular Posts


View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI