Data Visualizing Operations in Breast and Prostate Cancer

Posted on Feb 5, 2018
The skills the author demoed here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Clinical trials are experiments or observations done in clinical research. They are designed for participants to participate in the medical, observational or behavioral interventions. Clinical trials involve the testing of investigations including new treatments such as novel vaccines, drugs, dietary supplements, and medical devices that warrant further data studies to find out better ways to treat, and understand the progress of disease.

The primary purpose of doing clinical trials is to gain more information about the risks and effectiveness of an experimental treatment in humans. Different types of people participate in clinical trials. Some are healthy, some may have illnesses. They both play important roles in clinical trials. 

People might ask: " Are the clinical trials safe? "  As with any type of medical care or therapy of daily living, clinical trials can have risks. If you want to join a clinical trial, the staff will describe the risks that may cause to the participants. You can weigh the risk factors and decide whether or not you would like to participate.

Business Motivations

The process of developing a new drug from the original idea to launch a finished product often takes 12–15 years. The drug discovery begins in the laboratory.

When the researchers create a new therapy or drug, it is tested in the laboratory and on animals at first. If the initial lab research is successful, researchers will send the data to the Food and Drug Administration (FDA) to get approval to continue testing on human beings. Depending on product types and development stages, investigators enroll volunteers into small group of studies, and conduct larger scale studies progressively. Because Breast cancer rates the highest in female and prostate cancer rates the highest in male, I chose to analyze these two cancers in clinical trials.


I got the data from The dataset contains:

  • The number of recruitment of breast cancer and prostate cancer
  • Locations, latitude, longitude
  • The number of total trials in breast cancer and prostate cancer
  • Phases
  • Durations of the clinical trials
  • Sponsor.Collaborators
  • Study types
  • Rate of recruitment

Shiny App Data Analysis

The map below shows the concentrate of  total clinical trials around the world. From the legend we can see that the darker the color on the map, the more trials are recruiting, or already completed. The map shows that the United States, Europe and China have the most clinical trials in breast and prostate cancer compared to other countries.

Data Visualizing Operations in Breast and Prostate Cancer

These two maps below show the clinical trials in both breast and prostate cancer that are recruiting now around the world. When you room in it, you can see the trials at specific cities and locations. 

Data Visualizing Operations in Breast and Prostate Cancer Data Visualizing Operations in Breast and Prostate Cancer


I created the histograms that show the number of recruitments(active but not ready to recruiting, recruiting, completed and etc), phases, study types and top 10 sponsor collaborators. We can see the different counts between the two cancers clearly by looking at the histograms.


From the two histograms below, we can see that the three biggest cancer centers sponsor the most trials both in breast and prostate cancer. The pharmaceutical companies Roche ranks 4th in breast cancer and AstraZeneca ranks the 6th in Prostate cancer.

There are four phases in clinical trials:

First phase assesses the safety of a drug or device.

Second phase tests the efficacy of a drug or device.

Third phase involves randomized and blind testing in hundreds to thousands patients.

Fourth phase is often called Post Marketing Surveillance Trials that are conducted after a drug or device has been approved for consumer sale.

You can see that there is a large number of NA in the two histograms above, that is because some companies or research centers didn't report the integrated report to the

Data on Rate of Recruitment

The boxplot above shows how fast the different sponsor collaborators recruit people for the two cancers. ROR means the rate of recruitment.

ROR = total amount of volunteers/ total number of months.

From the boxplot above, we can see that the rate of recruitment of majority pharmaceutical companies are faster than the academic institutes. In breast cancer, Bayer ranks the 1st in ROR. In prostate cancer, Ferring ranks the 1st. 

The boxplot above shows how fast the people get recruited in different phases. Breast cancer is the fastest in Phase2&3, and prostate cancer is the fastest in Phase 4. 


Clinical trials has become the hottest topic in the healthcare industry. The pharmaceutical, biotech and medical devices companies, even some governmental organizations spend millions on clinical trials every year. It uses the big data to analyze and recruit people in different types of experiments to help the industries reduce the cost and improve the efficiency of the recruitment. My Shiny app is available here.  Source code is available at Github.


About Author

Xiao Jia

Xiao received a MS degree in Biomedical Informatics from Nova Southeastern University in Florida. She was working as a data analyst at a healthcare IT company in Fort Lauderdale, where she developed her passion and got to know...
View all posts by Xiao Jia >

Related Articles

Leave a Comment

Google March 19, 2020
Google Just beneath, are various completely not connected web pages to ours, having said that, they are certainly worth going over.
Google January 22, 2020
Google One of our guests lately suggested the following website.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI