Visualizing Data on Diabetes Clinical Studies

Posted on May 7, 2018
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Diseases are part of human life. Data shows we have been dealing with them throughout history: from the outbreak of smallpox in Athens, Greece in 430 BC, to the mosquito-borne Zika virus that plagued the Americas in recent years. Fortunately, scientific methods applied to modern medicine have helped alleviate the human condition by providing effective and efficient ways to fight various diseases. At the heart of the scientific method applied to assessing medications' effectiveness is a clinical study of human participants.

One of the studies that clinical studies could help alleviate is diabetes.  I present an R Shiny prototype called Diabetes Studies App. This app was envisioned to provide information about diabetes clinical studies in the United States. The information may be useful in the following ways:

  • You, a loved one, a friend, or someone you know, may be suffering from diabetes and may be interested in participating in a clinical study or want to know more about the types of treatments available or that could be available in the future.
  • You are interested in medicine and want to know more about real-life clinical studies.
  • Maybe you are a part of the medical community and want to learn about clinical study sponsors and the state of medical treatments that may affect your organization's constituents directly or indirectly.
  • You are interested in investing in pharmaceutical products and want to learn about sponsor data and pipelines in product development.

You can find the R Shiny codes that created the App at my GitHub Page.

Data Set

This project was inspired by a previous work by Xiao Jia. The dataset was obtained from, a repository for clinical studies in the United States and around the world. It is a resource provided by the U.S. National Library of Medicine.

For the purpose of creating a prototype app, the scope of this work was limited to a particular disease (diabetes), sponsors based in the United States, and the two main study types (clinical trials and observational studies). Briefly, a clinical trial is usually conducted to test, say, the efficacy of a drug or treatment before the drug is marketed to the public, while an observational study is usually conducted to assess safety when the drug is already in the market.

App Features and Data Insights

The app's main page shows five selection features in the left sidebar: Intro, Studies Info, Annual Data, Sponsor Data, and Map. The Intro presents videos that provide background information about diabetes and the difference between a clinical trial and an observational study. I will describe selected insights in the following app features.

The second feature, Studies Info, presents bar charts of the number of clinical studies by a selection of specific features (sponsor type, (diabetes) condition type, intervention type, status of studies, enrollment, phase (for clinical trials only), and duration). Bar charts are presented separately for clinical trials and observational studies. Some insights from this feature are:

  • There are more diabetes clinical trials than observational studies.
  • The studies are sponsored mosty by Industry (pharmaceutical companies) and Others (these include academic and other institutions and non-profit organizations).
  • The most common intervention types (what the studies are investigating) are drugs, behavior, and devices.

Visualizing Data on Diabetes Clinical Studies

Visualizing Data on Diabetes Clinical Studies


Annual Data

The third feature is Annual Data where you will find the number of studies that are started or completed per year. Each page presents line graphs by sponsor (left graph) and intervention type (right graph). If you select "Studies by Start Year," you will learn from the graph on the left that:

  • Most studies are sponsored by Industry and Others, which confirms our earlier insight in Studies Info.
  • There was a large increase in the number of clinical studies sponsored by Industry and Others (academic and other institutions and non-profit organizations) in the decade from 2000-2010. A possible reason for this may be that diabetes has become a serious public health concern during those years, which motivated investment in new diabetes treatments.
  • There appears to be a decline in the number of studies sponsored by Industry after 2010 but the number of studies by Others remains relatively the same. Perhaps this means that Industry does not see diabetes as a profitable investment disease area anymore? It may be worth researching further why this is happening.

Visualizing Data on Diabetes Clinical Studies

Sponsor Data

The fourth feature, Sponsor Data, presents a table of summary data by specific sponsors. The table includes the number of studies per sponsor as well as summary data on the number of participants (enrollment total, mean, minimum, and maximum) and duration (duration mean, minimum, and maximum in years). If you sort the table in descending order of number of studies per sponsor (, you will learn that:

  • There are seven Industry sponsors in the top ten.
  • AstraZeneca has the most number of studies (168).
  • The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), an Institute within the National Institutes of Health (NIH), occupies the top third spot.

Sponsor Locations

The fifth and last feature is Maps which presents a map of the sponsor locations within the United States. You can select among the study or sponsor type and zoom in or out. The sponsor locations initially form large clusters, but if you zoom in, the clusters will split into smaller ones. With this feature, you can determine whether there is a clinical study sponsor in your area.


The Diabetes Studies App was created in a span of approximately two weeks. With additional time and effort, this app can be enhanced by including: (1) other diseases, such as cancers, heart diseases, skin diseases, etc. (2) sponsors who are based outside of the United States, and (3) other study types available such as registry studies. Additional features can also be added, such as summaries of disease areas and clinical studies phases by sponsor. These latter additional features can be useful to pharmaceutical investors in gauging long-term investment opportunities among the different sponsors. This will enhance the business utility of the app.

Blog Post Image Source:

About Author

John Yap

A statistician with several years of experience working on clinical trials and observational studies. John attended NYCDSA to learn about this very exciting data science field. He hopes to use his experience and data science knowledge to help...
View all posts by John Yap >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI