Data Visualization of Cardiovascular Clinical Trials

Posted on Feb 21, 2021
The skills I demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.

Data Visualization of Cardiovascular Clinical Trials

Github|Linkedin|R Shiny App


Clinical research is medical data research involving human subjects to test the safety and the efficacy of a drug, therapy, or treatment in order to alleviate or cure a certain disease or injury. It is an essential part of our society and integral to developing scientific and medical breakthroughs. As a result of the implementation of clinical research studies, healthcare has only progressed, and life expectancy has been extended, especially with the application of modern technology.

One of the most common clinical trials is related to cardiovascular disease, the leading cause of death in the United States. In order to make it easier to view the data on cardiovascular clinical trials, I have developed an R Shiny Application. The purpose of this application is to visualize and provide useful information regarding these clinical trials and it may be useful for:

  • Anyone or knows of anyone suffering from this disease and may potentially be interested in participating in these clinical trials.
  • Anyone that has a family history of this disease and may want to get more information on what types of prevention methods are available.
  • Medical researchers interested in the specific drug, therapy, or treatment utilized in the research study.

Data set

The dataset used was acquired from, a repository for clinical trials in the United States provided by the U.S National Library of Medicine. The dataset was filtered for cardiovascular related trials for the purpose of developing the Shiny Application.

Data Features & Insights

In the Shiny Application, there are 5 main tabs located on the left side bar of the main page. The first tab is the introduction, which states the background information and purpose of this application. It also displays brief videos on more information about clinical trials and cardiovascular disease.

Locations of Trials

The second tab displays a map of the specific locations of cardiovascular related clinical trials in the United States. The map includes, a clustering feature that allows the user to view the specific locations cluster as they zoom in and out of the map. When the user zooms in on the map, they can see the clusters split into sections that are  smaller and more specific to their location. This is a neat feature that may be useful for anyone interested in the study who would like to locate if there are studies in the nearby area. It can be seen from the map that New York, California, and Texas are the regions with the most cardiovascular related clinical trials.


Data Visualization of Cardiovascular Clinical Trials

Different Studies of the Trials

The third tab, Information, displays bar charts of 4 different categories of these studies (Sponsor Type, Intervention Type, Patient Status, and Clinical Phase). Some visual insights gathered from looking the cardiovascular related clinical trials data are:

  • They are mostly sponsored by Industry companies, such as pharmaceutical companies and Other, which are academic institutions and non-profit organizations.
  • Drugs are the most common intervention followed by the use of a  medical device.
  • Currently there are 1,611 cardiovascular related trials that are recruiting patients.
  • Most studies are currently in phase 2, which has an emphasis on the effectiveness of a certain drug or medical device. The goal of this phase is to gather preliminary data of the patient's  progress and the effectiveness of the drug or device. This phase is also intended to gather data on the safety of the intervention such as short-term side effects.

Data Visualization of Cardiovascular Clinical Trials



Data Visualization of Cardiovascular Clinical Trials

Data Exploration

In the fourth tab, Exploration, there is an interaction box plot enrollment number and duration of each trial for cardiovascular related clinical trials. These plots are an interactive feature that allows you to see the median, interquartile range, and minimum and maximum values. The median patient enrollment number for interventional studies is 535, while observation studies have a median value of 516. That indicates  that both types of studies are pretty equal in terms of patient enrollment. Additionally, the median value of duration for interventional studies is 9 years, while observational studies tend to go on longer with a median duration of 11 years.


In the last tab, Data, presents a table of summary data by specific sponsors. The table includes features such as organization, total number of studies, average/minimum/maximum enrollment, and the length of study. When this table is filtered in descending order, we can see that hospitals, government organizations, and academic institutions have the highest number of cardiovascular related clinical trials, which is quite interesting to see. As for the sponsors, the medical company Abott has the highest number of cardiovascular related clinical trials.


It is certainly important for patients to have access to information regarding available clinical trials and have the opportunity to enroll in order to ultimately improve their overall quality of life. This application is a useful tool for anyone that may be potentially interested in participating in cardiovascular related clinical trials.  This provides general information of what kind of trials are currently available by organization and is a great way to get started.

 For the purpose of creating this application, the project was limited to visualizing cardiovascular related clinical trials data. However, given additional time and research I would explore further by pursuing these avenues:

  • Extracting statistical inference and hypothesis testing to different variables, such as the age and race demographics to explore further insights
  • Expand the scope of the application for leading causes of death, such as cancer, and explore the differences of the insights it provides
  • Conducting additional research and validation of the accuracy of source data 
  • Collecting global data points to further validate the trends and insights found from the United States


About Author

Marcus Choi

Marcus graduated from Rutgers University with a bachelor's degree in Kinesiology. Upon graduation, he worked in oncology clinical research in data management, which sparked his passion for utilizing data in order to gain valuable insights to ultimately make...
View all posts by Marcus Choi >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI