Data Science in Drug Discovery Biological Characteristics
The skills the authors demonstrated here can be learned through taking Data Science with Machine Learning bootcamp with NYC Data Science Academy.
Introduction
Drug Development Process Data
Owing to the better understanding of biological characteristics of various diseases and due to technological advances in drug discovery, biological targets and drug candidates identification are becoming less challenging. Drug Development process is highly time-consuming, as it takes on average 12-15 years for a new medication to be approved for use by the FDA.
In the early stage of Drug Discovery, thousands of chemical compounds are tested against multiple biological targets through automated High-Throughput Screening. Hits, compounds that show activity to a certain target, are then studied further. Studying some basic benchmarks of drug-likeness, such as the Lipinski, are essential for proving hits potential. Lipinski Rule of 5 is used as a rule of thumb to indicate how drugโs properties, in terms of size, lipophilicity, and intermolecular attraction, are affecting its absorption, distribution, metabolism, and excretion from a human body.
In this project, I explored the bioactivity libraries of Benzodiazepine family in order to find similar biological activity among compounds. Moreover, I studied parameters that significantly contribute to compounds sharing similar bioactivity.
Methodology
Data
All Bioactivity profiles for compounds related to Diazepam and Alprazolam were downloaded from PubChem Library using Selenium Package on Python. Out of 2800 compounds, only 389 compounds had biological test results, and only 68 compounds were studied on more than 50 biological tests.
Using Pandas and rdkit Packages, Data was then analyzed using two different approaches:
- Compound Based Approach: Selecting compounds that were tested on similar bioassays only
- Target Based Approach: Selecting a bioassay with the maximum number of hits
Results
Compound Based Approach
Only 49 compounds were found to be tested on maximum number of shared bioassays (114 shared Bioassays). 15 compounds showed activity on 13 bioassays, and only two of them were having activity on the same bioassay.
Target Based Approach
It was found that โqHTS for Inhibitors of human tyrosyl-DNA phosphodiesterase 1 (TDP1): qHTS in cells in absence of CPTโ (AID: 686978) has shown the highest number of hits, i.e. 12 out 50 compounds showed activity for this bioassay.
Conclusion
Although, data did not show similarities in biological activity, the results showed a similar biological behavior which makes benzodiazepine a High quality core structure.
LogP and Number of Hydrogen Acceptors in compounds played a significant role in determining hits toward the target.