Tetron Publications, focused on publishing scholarly works, is driven by the mission to ‘Empower Knowledge, Foster Growth, and Shape the Future‘.

[wpdreams_ajaxsearchlite]

Article Title: Improving Compound Selection in Drug Discovery: A Quantitative Approach for Biased Data Modelling
Volume Number: 1
Issue: 1
Year: 2025
Article Type: Original Article
Author Names: Shema Shirley Mamachen
Page Number: 26-41
PDF: [Download]
DOI: https://doi.org/10.64368/ejcmr.vol.1.issue1.5
Affiliations: PhD Scholar, Department of Business in Healthcare, University of the Cumberlands, Williamsburg, KY, USA Email: shemashirley@gmail.com
Keywords: Drug Discovery; Antimalarial Compounds; Machine Learning; Bias Correction; Random Forest; Ridge Regression; Tanimoto Similarity; Multi-Parameter Optimisation (MPO); Computational Modelling
Abstract: Accurate prediction of antimalarial activity is critical for the discovery of new compounds to combat malaria. In this study, we introduce a quantitative modelling framework that corrects for bias in screening data to predict the potency and toxicity of candidate compounds. Using established chem-informatics techniques, including Morgan fingerprints, Random Forest models, and Ridge Regression integrated with a Bayesian bias correction mechanism based on Tanimoto similarity, our approach provides robust predictions for structurally novel molecules. This framework is further leveraged in a multi-parameter optimisation (MPO) setting to select promising candidates from commercially available compound libraries.
References:
1. Gawehn E, Hiss JA, Schneider G. Deep Learning in Drug Discovery. Mol Inform. 2016 Jan;35(1):3–14.
2. Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today. 2018 Jan;23(6):1241–1250.
3. Kang S, Cho K. Conditional molecular design with deep generative models. J Chem Inf Model. 2018 Jul;acs.jcim.8b00263.
4. Öztürk H, Özgür A, Ozkirimli E. DeepDTA: deep drug–target binding affinity prediction. Bioinformatics. 2018 Sep;34(17):i821–i829.
5. Gamo FJ, Sanz LM, Vidal J, De Cozar C, Alvarez E, Lavandera JL, et al. Thousands of chemical starting points for anti-malarial lead identification. Nature. 2010;465(7296):305–310.
6. Davies M, Nowotka M, Papadatos G, Dedman N, Gaulton A, Atkinson F, et al. ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res. 2015 Jul;43(W1):W612–W620.
7. Fourches D, Muratov E, Tropsha A. Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J Chem Inf Model. 2010;50(7):1189–1204.
8. Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model. 2010 May;50(5):742–754.
9. Landrum G. RDKit: Open-source cheminformatics [Internet]. Available from: https://www.rdkit.org/
10. Koutsoukas A, Paricharak S, Galloway WRJD, Spring DR, IJzerman AP, Glen RC, et al. How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space. J Chem Inf Model. 2013 Dec;54(1):230–242.
11. O’Boyle NM, Sayle RA. Comparing structural fingerprints using a literature-based similarity benchmark. J Cheminform. 2016;8(1):36.
12. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12:2825–2830.
13. Watson OW, Cortes I. A decision theoretic approach to model evaluation in computational drug discovery. Bioinformatics. In press. 2019.
14. Walters WP. Modeling, informatics, and the quest for reproducibility. J Chem Inf Model. 2013;53(7):1529–1530.
15. Landrum GA, Stiefl N. Is that a scientific publication or an advertisement? Reproducibility, source code and data in the computational chemistry literature. Future Med Chem. 2012 Oct;4(15):1885–1887.
16. Kalliokoski T, Kramer C, Vulpetti A. Quality Issues with Public Domain Chemogenomics Data. Mol Inform. 2013 Dec;32(11-12):898–905.
17. Kalliokoski T, Kramer C, Vulpetti A, Gedeck P. Comparability of mixed IC data—a statistical analysis. PLoS One. 2013 Jan;8(4):e61007.
18. Cortés-Ciriano I, Bender A. How consistent are publicly reported cytotoxicity data? Large-scale statistical analysis of the concordance of public independent cytotoxicity measurements. ChemMedChem. 2015 Jan;11(1):57–71.
19. Alexander DL, Tropsha A, Winkler DA. Beware of R2: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models. J Chem Inf Model. 2015;55(7):1316–1322.
20. Bender A, Glen RC. Molecular similarity: a key technique in molecular informatics. Org Biomol Chem. 2004 Nov;2(22):3204–3218.
21. Bajusz D, Rácz A, Héberger K. Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J Cheminform. 2015;7:20.
22. Cherukuri M. Webchecker: A versatile evl plugin for validating HTML pages with bootstrap frameworks. 2025.
23. Scientific Explore Publications I. Exploring ethnic and gender patterns in higher education enrollment: A data mining approach. Int Explore J Comput Sci Appl. 2025;3(1):1–17.
24. Awasthi A. Leveraging GANs for active appearance models optimized model fitting. 2025.
25. Malhotra S. Evaluating fault tolerance and scalability in distributed file systems: A case study of GFS, HDFS, and MinIO. 2025.
26. Cortes-Ciriano I, Firth NC, Bender A, Watson O. Discovering highly potent molecules from an initial set of inactives using iterative screening. J Chem Inf Model. 2018;58(9):2000–2014.
27. Koutsoukas A, Lowe R, KalantarMotamedi Y, Mussa HY, Klaffke W, Mitchell JBO, et al. In Silico Target Predictions: Defining a Benchmarking Data Set and Comparison of Performance of the Multiclass Naive Bayes and Parzen-Rosenblatt Window. J Chem Inf Model. 2013;53(8):1957–1966.
28. Ludwig LS, Lareau CA, Ulirsch JC, Buenrostro JD, Regev A, Sankaran VG. Lineage Tracing in Humans Enabled by Mitochondrial Mutations and Single-Cell Genomics. 2019.
29. Leo A, Hansch C, Elkins D. Partition coefficients and their uses. Chem Rev. 1971;71(6):525–616.

Scroll to Top