Visualizing Diabetes Clinical Studies Data

Posted on May 7, 2018


Diseases are part of human life. We have been dealing with them throughout history: from the outbreak of smallpox in Athens, Greece in 430 BC, to the mosquito-borne Zika virus that plagued the Americas in recent years. Fortunately, scientific methods applied to modern medicine have helped alleviate the human condition by providing effective and efficient ways to fight various diseases. At the heart of the scientific method applied to assessing medications' effectiveness is a clinical study of human participants.

One of the studies that clinical studies could help alleviate is diabetes.  I present an R Shiny prototype called Diabetes Studies App. This app was envisioned to provide information about diabetes clinical studies in the United States. The information may be useful in the following ways:

  • You, a loved one, a friend, or someone you know, may be suffering from diabetes and may be interested in participating in a clinical study or want to know more about the types of treatments available or that could be available in the future.
  • You are interested in medicine and want to know more about real-life clinical studies.
  • You are a part of the medical community and want to learn about clinical study sponsors and the state of medical treatments that may affect your organization's constituents directly or indirectly.
  • You are interested in investing in pharmaceutical products and want to learn about sponsor data and pipelines in product development.

You can find the R Shiny codes that created the App at my GitHub Page.


This project was inspired by a previous work by Xiao Jia. The dataset was obtained from, a repository for clinical studies in the United States and around the world. It is a resource provided by the U.S. National Library of Medicine.

For the purpose of creating a prototype app, the scope of this work was limited to a particular disease (diabetes), sponsors based in the United States, and the two main study types (clinical trials and observational studies). Briefly, a clinical trial is usually conducted to test, say, the efficacy of a drug or treatment before the drug is marketed to the public, while an observational study is usually conducted to assess safety when the drug is already in the market.

App Features and Insights

The app's main page shows five selection features in the left sidebar: Intro, Studies Info, Annual Data, Sponsor Data, and Map. The Intro presents videos that provide background information about diabetes and the difference between a clinical trial and an observational study. I will describe selected insights in the following app features.

The second feature, Studies Info, presents bar charts of the number of clinical studies by a selection of specific features (sponsor type, (diabetes) condition type, intervention type, status of studies, enrollment, phase (for clinical trials only), and duration). Bar charts are presented separately for clinical trials and observational studies. Some insights from this feature are:

  • There are more diabetes clinical trials than observational studies.
  • The studies are sponsored mosty by Industry (pharmaceutical companies) and Others (these include academic and other institutions and non-profit organizations).
  • The most common intervention types (what the studies are investigating) are drugs, behavior, and devices.

The third feature is Annual Data where you will find the number of studies that are started or completed per year. Each page presents line graphs by sponsor (left graph) and intervention type (right graph). If you select "Studies by Start Year," you will learn from the graph on the left that:

  • Most studies are sponsored by Industry and Others, which confirms our earlier insight in Studies Info.
  • There was a large increase in the number of clinical studies sponsored by Industry and Others (academic and other institutions and non-profit organizations) in the decade from 2000-2010. A possible reason for this may be that diabetes has become a serious public health concern during those years, which motivated investment in new diabetes treatments.
  • There appears to be a decline in the number of studies sponsored by Industry after 2010 but the number of studies by Others remains relatively the same. Perhaps this means that Industry does not see diabetes as a profitable investment disease area anymore? It may be worth researching further why this is happening.

The fourth feature, Sponsor Data, presents a table of summary data by specific sponsors. The table includes the number of studies per sponsor as well as summary data on the number of participants (enrollment total, mean, minimum, and maximum) and duration (duration mean, minimum, and maximum in years). If you sort the table in descending order of number of studies per sponsor (, you will learn that:

  • There are seven Industry sponsors in the top ten.
  • AstraZeneca has the most number of studies (168).
  • The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), an Institute within the National Institutes of Health (NIH), occupies the top third spot.

The fifth and last feature is Maps which presents a map of the sponsor locations within the United States. You can select among the study or sponsor type and zoom in or out. The sponsor locations initially form large clusters, but if you zoom in, the clusters will split into smaller ones. With this feature, you can determine whether there is a clinical study sponsor in your area.


The Diabetes Studies App was created in a span of approximately two weeks. With additional time and effort, this app can be enhanced by including: (1) other diseases, such as cancers, heart diseases, skin diseases, etc. (2) sponsors who are based outside of the United States, and (3) other study types available such as registry studies. Additional features can also be added, such as summaries of disease areas and clinical studies phases by sponsor. These latter additional features can be useful to pharmaceutical investors in gauging long-term investment opportunities among the different sponsors. This will enhance the business utility of the app.

Blog Post Image Source:

About Author

John Yap

A statistician with several years of experience working on clinical trials and observational studies. John attended NYCDSA to learn about this very exciting data science field. He hopes to use his experience and data science knowledge to help...
View all posts by John Yap >

Related Articles

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 airbnb Alex Baransky alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus API Application artist aws beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep Bundles California Cancer Research capstone Career Career Day citibike clustering Coding Course Demo Course Report D3.js data Data Analyst data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization Deep Learning Demo Day Discount dplyr employer networking feature engineering Finance Financial Data Science Flask gbm Get Hired ggplot2 googleVis Hadoop higgs boson Hiring hiring partner events Hiring Partners Industry Experts Instructor Blog Instructor Interview Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter lasso regression Lead Data Scienctist Lead Data Scientist leaflet linear regression Logistic Regression machine learning Maps matplotlib Medical Research Meet the team meetup Networking neural network Neural networks New Courses nlp NYC NYC Data Science nyc data science academy NYC Open Data NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time Portfolio Development prediction Prework Programming PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn Selenium sentiment analysis Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau team TensorFlow Testimonial tf-idf Top Data Science Bootcamp twitter visualization web scraping Weekend Course What to expect word cloud word2vec XGBoost yelp