Data Science in Drug Discovery Biological Characteristics

Posted on Aug 21, 2022

The skills the authors demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.


Drug Development Process Data

Owing to the better understanding of biological characteristics of various diseases and due to technological advances in drug discovery, biological targets and drug candidates identification are becoming less challenging. Drug Development process is highly time-consuming, as it takes on average 12-15 years for a new medication to be approved for use by the FDA.

In the early stage of Drug Discovery, thousands of chemical compounds are tested against multiple biological targets through automated High-Throughput Screening. Hits, compounds that show activity to a certain target, are then studied further. Studying some basic benchmarks of drug-likeness, such as the Lipinski, are essential for proving hits potential. Lipinski Rule of 5 is used as a rule of thumb to indicate how drug’s properties, in terms of size, lipophilicity, and intermolecular attraction, are affecting its absorption, distribution, metabolism, and excretion from a human body.

In this project, I explored the bioactivity libraries of Benzodiazepine family in order to find similar biological activity among compounds. Moreover, I studied parameters that significantly contribute to compounds sharing similar bioactivity.




All Bioactivity profiles for compounds related to Diazepam and Alprazolam were downloaded from PubChem Library using Selenium Package on Python. Out of 2800 compounds, only 389 compounds had biological test results, and only 68 compounds were studied on more than 50 biological tests.


Using Pandas and rdkit Packages, Data was then analyzed using two different approaches:


  • Compound Based Approach: Selecting compounds that were tested on similar bioassays only
  • Target Based Approach: Selecting a bioassay with the maximum number of hits



Compound Based Approach

Only 49 compounds were found to be tested on maximum number of shared bioassays (114 shared Bioassays). 15 compounds showed activity on 13 bioassays, and only two of them were having activity on the same bioassay.






Target Based Approach

It was found that β€œqHTS for Inhibitors of human tyrosyl-DNA phosphodiesterase 1 (TDP1): qHTS in cells Β  in absence of CPT” (AID: 686978) has shown the highest number of hits, i.e. 12 out 50 compounds showed activity forΒ  this bioassay.

In an attempt to study the drug-likeness of all the compounds that was tested on AID: 686978 bioassay, Rule of Five - Lipinski Parameters were used for assessment to show that Molecular weight and Hydrogen donors did not play a significant role on determining activity.



Although, data did not show similarities in biological activity, the results showed a similar biological behavior which makes benzodiazepine a High quality core structure.

LogP and Number of Hydrogen Acceptors in compounds played a significant role in determining hits toward the target.


About Author

Layal Hammad

Having a 6-year experience at a medical distribution company, Dimensions Healthcare Company, gave me the opportunity to be exposed to all value chain elements in the medical industry and opened my eyes to various obstacles and challenges that...
View all posts by Layal Hammad >

Leave a Comment

No comments found.

View Posts by Categories

Our Recent Popular Posts

View Posts by Tags

#python #trainwithnycdsa 2019 2020 Revenue 3-points agriculture air quality airbnb airline alcohol Alex Baransky algorithm alumni Alumni Interview Alumni Reviews Alumni Spotlight alumni story Alumnus ames dataset ames housing dataset apartment rent API Application artist aws bank loans beautiful soup Best Bootcamp Best Data Science 2019 Best Data Science Bootcamp Best Data Science Bootcamp 2020 Best Ranked Big Data Book Launch Book-Signing bootcamp Bootcamp Alumni Bootcamp Prep boston safety Bundles cake recipe California Cancer Research capstone car price Career Career Day citibike classic cars classpass clustering Coding Course Demo Course Report covid 19 credit credit card crime frequency crops D3.js data data analysis Data Analyst data analytics data for tripadvisor reviews data science Data Science Academy Data Science Bootcamp Data science jobs Data Science Reviews Data Scientist Data Scientist Jobs data visualization database Deep Learning Demo Day Discount disney dplyr drug data e-commerce economy employee employee burnout employer networking environment feature engineering Finance Financial Data Science fitness studio Flask flight delay gbm Get Hired ggplot2 googleVis H20 Hadoop hallmark holiday movie happiness healthcare frauds higgs boson Hiring hiring partner events Hiring Partners hotels housing housing data housing predictions housing price hy-vee Income Industry Experts Injuries Instructor Blog Instructor Interview insurance italki Job Job Placement Jobs Jon Krohn JP Morgan Chase Kaggle Kickstarter las vegas airport lasso regression Lead Data Scienctist Lead Data Scientist leaflet league linear regression Logistic Regression machine learning Maps market matplotlib Medical Research Meet the team meetup methal health miami beach movie music Napoli NBA netflix Networking neural network Neural networks New Courses NHL nlp NYC NYC Data Science nyc data science academy NYC Open Data nyc property NYCDSA NYCDSA Alumni Online Online Bootcamp Online Training Open Data painter pandas Part-time performance phoenix pollutants Portfolio Development precision measurement prediction Prework Programming public safety PwC python Python Data Analysis python machine learning python scrapy python web scraping python webscraping Python Workshop R R Data Analysis R language R Programming R Shiny r studio R Visualization R Workshop R-bloggers random forest Ranking recommendation recommendation system regression Remote remote data science bootcamp Scrapy scrapy visualization seaborn seafood type Selenium sentiment analysis sentiment classification Shiny Shiny Dashboard Spark Special Special Summer Sports statistics streaming Student Interview Student Showcase SVM Switchup Tableau teachers team team performance TensorFlow Testimonial tf-idf Top Data Science Bootcamp Top manufacturing companies Transfers tweets twitter videos visualization wallstreet wallstreetbets web scraping Weekend Course What to expect whiskey whiskeyadvocate wildfire word cloud word2vec XGBoost yelp youtube trending ZORI