Drawing the Borders of Olfactory Space
Contributed by Wendy Yu. She is currently in the NYC Data Science Academy 12 week full time Data Science Bootcamp program taking place between January 11th to April 1st, 2016. This post is based on her previous project - R visualization.
Drawing the Borders of Olfactory Space
Chung Wen Yu, Katharine Prokop-Prigge, Lindsay Warrenburg, and Joel Mainland
37th Annual Conference of Association for Chemoreception Science
A common refrain in the olfactory literature is that humans can detect 10,000 different odorants, however both the source and quality of this estimate is unclear. Here we set out the answer to this question quantitatively. We developed machine-learning models that can distinguish odorous from odorless compounds based on their physicochemical properties. Machine-learning algorithms used include logistic regression, random forest, support vector machine, and gradient boosting. In cross validation, our best performing model had 94% accuracy and AUC of 0.96. To further test this model, we asked 15 participants to distinguish test molecules from blank jars using five alternative forced choice tests for each compound. In this external validation, our model could distinguish between odorous and odorless molecules with 72% accuracy and AUC of 0.82.
Next, we applied the model to the Chemical Universe Database, a collection of 166 billion molecules that are both chemically stale and synthetically feasible with up to 17 atoms of carbon, hydrogen, nitrogen, oxygen, sulfur or halogens. Since existing catalogs of odorous molecules rarely contain compounds with more than 21 heavy atoms, we then extrapolated the result to 21 heavy atoms. We estimate that there are approximately 2.7 trillion molecules with 21 or fewer heavy atoms. We predict that over 27 billion of these 2.7 trillion molecules will have an odor. Our findings define the borders of olfactory space, and enables rational sampling of all volatile compounds. Such a set can be applied to build desirable odor screening panels that will facilitate research in the field of olfaction.