Data Study on Maternal Mortality

Posted on Aug 29, 2018
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


I recently read a shocking data statistic: the U.S. has the worst maternal mortality rate during childbirth in the developed world. The CDC estimates that 60% of these deaths are preventable. How did this happen? Now that we are aware of the problem, what is the American hospital system doing about it?


Upon further reading, I discovered that in the 1950, the maternal death rate in the U.S. had decreased to such a point that the Journal of American Medical Association declared it “irreducible.” It then decreased from 1 in 100,000 in the following years. Since then, however, the focus had shifted towards the infant during childbirth and away from the mother. In the last twenty years, the maternal death rate has increased significantly in the U.S., even as it has decreased in other countries in the developed world.


Data Study on Maternal Mortality

So what’s being done about this? Since 2006, the California health system has introduced new protocols to help prevent those 60% preventable deaths. The hospitals that have taken up this new protocol have been able to dramatically reduce their maternal mortality rate, and the state’s maternal mortality rate has decreased by 55%. The rate in the U.S. overall, however, has since continued to increase.

Data Study on Maternal Mortality

I decided to investigate n the question of progress on this front in  New York State: Have we reduced our maternal mortality rate as California has? In order to examine this issue, I chose to look at medical malpractice from obstetrician-gynecologists licensed in New York City (I limited myself to the city due to time constraints).

I used Selenium  from the New York State Department of Health’s website,to get information about all the doctors licensed to practice in the five boroughs: their names, medical schools, graduation year, and any payments they made from either settlements or arbitrations of malpractice. The information about these lawsuits is only publicly available for 10 years, so I looked at the total number of lawsuits per month since 2009 (see the plot below). As  you can see, tThere has been no significant downward trend.


Data Study on Maternal Mortality

Comparing Doctors from Different Medical Schools

Next, I decided to compared doctors from different medical schools to see if that had an effect on number of malpractice lawsuits. The doctors in my dataset graduated from over 200 different medical schools, so I first filtered out and looked at the international schools. The histogram below shows number of doctors who have or have not been sued (successfully) in the last ten years.

The two surprising countries I saw below were Grenada and Mexico. Grenada, an island in the Caribbean, produces a surprisingly large portion of American doctors . This was less surprising after I did some research and found out that students who struggle to get accepted by American medical schools will apply to universities in Grenada and Mexico.

Comparing Doctors from Different States

Next I looked at doctors from medical schools in different states from the U.S. Find below the histogram for raw number of doctors who have been sued or not in the last ten years from medical schools in different states (but all practicing in New York City):


There was nothing very interesting to be found by comparing the schools by state: they all have relatively similar proportions of malpractice, except states where fewer than 3 doctors studied (West Virginia, Arizona).

Comparing Elite Medical Schools

I next divided out a group of elite medical schools to compare them with the rest. They made up a much smaller portion of the dataset. I wanted to see if these schools would have a smaller proportion of doctors who had been sued from the elite medical schools than the rest. Again, find the plot with raw number of doctors on the left and normalized on the right:

Once normalized, one can see that a slightly smaller proportion appears in the first group. I conducted a t-test between the elite schools and the rest of the schools from the U.S. and found no significant result (T-stat: -1.37, p=0.169). and

So if not education, what other parameters might predict which ob-gyn physicians commit malpractice or not? Since this is a web-scraping project, I decided to look at doctor ratings on two different physician databases: and

I scraped the name and rating (out of five stars) for ob-gyns in the New York City area. As I was pressed for time, it was not comprehensive, but I was able to download about 400 doctors from each website (compared with ~1,000 from the NYS Department of Health dataset). Merging these datasets by name of physician, I was able to compare number of lawsuits with ratings on each website.

As a lot of our web-scraping presentations indicated, online ratings tend to not follow a normal distribution. Most rating distributions tend towards five stars. I found the same phenomenon on both WebMD and Healthgrades. Most of the doctors are ranked between four and five stars, including doctors that have made payments for malpractice (on the x-axis). In the Healthgrades plot, there is a doctor with seven malpractice lawsuits in the last ten years, and he has a four star rating. I did compute a Pearson correlation coefficient for both datasets, and, needless to say, they were both close to 0 and insignificant.


As a reality check, I also plotted the two websites against each other, to see if their ratings matched for the doctors (on the right). As expected, most of the doctors are in a cluster around 4.5-5 stars.

I would hypothesize that these online ratings are mostly influenced by a physician’s bedside manner rather than  by her/his competency. This would explain how a doctor who is very friendly could have a four or five star rating online despite having been charged for malpractice multiple times.


I have found that the number of medical malpractice lawsuits against obstetrician-gynecologists in NYC have not gone down since 2009. Then, I compared doctors from medical schools to determine if that could be a predictor of malpractice. I did not find a significant difference between doctors from elite medical schools and other U.S. schools. I might have needed more samples, or maybe there is more to medical malpractice lawsuits that could be driving this negative result.

Comparing doctor’s malpractice numbers with her/his rating on websites such as WebMD or Healthgrades reveals no correlation. It seems online ratings are not a predictor of malpractice. I hypothesize that they tend to  function as a predictor of friendly bedside manner by the doctor.

There are many parameters that would be interesting to add to this model to determine if it is possible to predict malpractice: graduate medical education (residencies), board certifications, any board sanctions, years in practice, patient load, etc.

As for the story that got me interested in this project, in California, where the maternal mortality rate dropped by 55% since 2006,most of this change was from toolkits designed to handle those deaths that the CDC had determined preventable (due to hemorrhage).  This change was mostly implemented with hospital-wide procedures, not individual obstetrician-gynecologists. In hospitals in California that did not implement changes, the mortality rate only dropped by 1%,as compared with 21% lower mortality from hemorrhage. So while this project led me to some interesting questions about parameters that might predict malpractice or the accuracy (or meaning) of online ratings, it did not address the need for changes to hospital protocol for handling complications of the mother during childbirth.

Further reading:

While I did not address the influence of race on maternal mortality in America, it is a large part of the issue. If the reader is interested, I would recommend:

and .





About Author

Sophie Geoghan

Sophie is currently a NYC Data Science Fellow. She graduated from MIT with a BS in Brain and Cognitive Sciences in 2016 and spent one year working as a research assistant in neuroscience labs through MIT's MISTI program....
View all posts by Sophie Geoghan >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI