Large-scale prediction of biological activities with Active-IT system
Almeida V.L.1, dos Santos O.D.H.2, Lopes J.C.D.3
1. Chemoinformatics Group — NEQUIM, Departamento de Quimica, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil; Servico de Fitoquimica e Prospeccao Farmaceutica, Fundacao Ezequiel Dias (FUNED), Belo Horizonte, Brazil 2. Departamento de Farmacia, Escola de Farmacia, Universidade Federal de Ouro Preto (UFOP), Brazil 3. Chemoinformatics Group — NEQUIM, Departamento de Quimica, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
Traditional testing methods in pharmaceutical development can be time-consuming and costly, but in silico evaluation tools can offer a solution. Our in-house Active-IT system, a Ligand-Based Virtual Screening (LBVS) tool, was developed to predict the biological and pharmacological activities of small organic molecules. It includes four independent modules for generating molecular descriptors (3D-Pharma), machine learning modeling (ExCVBA), a database of bioactivity models, and a prediction module. Activity data collected from the PubChem BioAssay database was used for modelling SVM and Naïve Bayes machine learning methods. Models have been constructed using a recursive stratified partition method and validated through an activity randomization (Y-random) process. Over 3500 bioassays were modeled, each comprising 30 SVM and 30 Naïve Bayes models and 60 randomized models. Bioassays with low performance or discrimination between regular and randomized were discarded. Using the Active-IT system we have evaluated three bioactive compounds of Ayahuasca tea. The predictions were thoroughly validated using known targets described in several public databases. The external validation results are noteworthy, with 16 of 33 (48.5% with p-value<0.0001) known targets correctly predicted. This level of accuracy in large-scale virtual screening methods is very significant and demonstrates the effectiveness of the Active-IT methodology in predicting the potential biological activities of small organic molecules.
Download PDF:
Keywords: ligand-based virtual screening, bioactivity prediction, machine learning modeling, recursive stratified random partition, pharmacophore fingerprint, 3D molecular structures
Citation:
Almeida V.L., dos Santos O.D.H., Lopes J.C.D. (2024) Large-scale prediction of biological activities with Active-IT system. Biomeditsinskaya Khimiya, 70(6), 435-441.
Almeida V.L. et al. Large-scale prediction of biological activities with Active-IT system // Biomeditsinskaya Khimiya. - 2024. - V. 70. -N 6. - P. 435-441.
Almeida V.L. et al., "Large-scale prediction of biological activities with Active-IT system." Biomeditsinskaya Khimiya 70.6 (2024): 435-441.
Almeida, V. L., dos, Santos, O. D. H., Lopes, J. C. D. (2024). Large-scale prediction of biological activities with Active-IT system. Biomeditsinskaya Khimiya, 70(6), 435-441.
References
Rocha M.P., Campana P.R.V., Scoaris D.O., Almeida V.L., Lopes J.C.D., Shaw J.M.H., Silva C.G. (2018) Combined in vitro studies and in silico target fishing for the evaluation of the biological activities of Diphylleia cymosa and Podophyllum hexandrum. Molecules (Basel), 23(12), 3303. CrossRef Scholar google search
Rocha M.P., Campana P.R.V., Scoaris D.O., Almeida V.L., Lopes J.C.D., Silva F.A., Pieters L., Silva G.C. (2018) Biological activities of extracts from Aspidosperma subincanum Mart. and in silico prediction for inhibition of acetylcholinesterase. Phytother. Res., 32(10), 2021–2033. CrossRef Scholar google search
Briñez-Ortega E., Almeida V.L., Lopes J.C.D., Burgos A.E. (2020) Partial inclusion of bis(1,10-phenanthroline)silver(I) salicylate in β-cyclodextrin: Spectroscopic characterization, in vitro and in silico antimicrobial evaluation. Anais da Academia Brasileira de Ciências, 92(3), e20181323. CrossRef Scholar google search
da Silva R.G., Almeida T.C., Reis A.C.C., Filho S.A.V., Brandão G.C., da Silva G.N., de Sousa H.C., de Almeida V.L., Lopes J.C.D., de Souza G.H.B. (2021) In silico pharmacological prediction and cytotoxicity of flavonoids glycosides identified by UPLC-DAD-ESI-MS/MS in extracts of Humulus lupulus leaves cultivated in Brazil. Nat. Prod. Res., 35(24), 5918–5923. CrossRef Scholar google search
Sudan C.R.C., Pereira L.C., Silva A.F., Moreira C.P.S., de Oliveira D.S., Faria G., dos Santos J.S.C., Leclercq S.Y., Caldas S., Silva C.G., Lopes J.C.D., de Almeida V.L. (2021) Biological activities of extracts from Ageratum fastigiatum: Phytochemical study and in silico target fishing approach. Planta Medica, 87(12–13), 1045–1060. CrossRef Scholar google search
Axen S.D., Huang X.P., Cáceres E.L., Gendelev L., Roth B.L., Keiser M.J. (2017) A simple representation of three-dimensional molecular structure. J. Med. Chem., 60(17), 7393–7409. CrossRef Scholar google search
Gonçalves J., Luís Â., Gallardo E., Duarte A.P. (2023) A systematic review on the therapeutic effects of Ayahuasca. Plants, 12(13), 2573. CrossRef Scholar google search
Pires A.P., de Oliveira C.D., Moura S., Dörr F.A., Silva W.A., Yonamine M. (2009) Gas chromatographic analysis of dimethyltryptamine and beta-carboline alkaloids in Ayahuasca, an Amazonian psychoactive plant beverage. Phytochem. Anal., 20(2), 149–153. CrossRef Scholar google search
Callaway J.C., McKenna D.J., Grob C.S., Brito G.S., Raymon L.P., Poland R.E., Andrade E.N., Andrade E.O., Mash D.C. (1999) Pharmacokinetics of Hoasca alkaloids in healthy humans. J. Ethnopharmacology, 65(3), 243–256. CrossRef Scholar google search
Domingues B.F., Martins-José A., Lopes J.C.D. (2024) 3D-Pharma, a ligand-based virtual screening tool using 3D pharmacophore fingerprints. ChemRxiv (Preprint), 2024, DOI: 10.26434/chemrxiv-2024-dkvf8. CrossRef Scholar google search
Sud M. (2016) Mayachemtools: An open source package for computational drug discovery. J. Chem. Inf. Model., 56(12), 2292–2297. CrossRef Scholar google search
Abrahamian E., Fox P.C., Naerum L., Christensen I.T., Thøgersen H., Clark R.D. (2003) Efficient generation, storage, and manipulation of fully flexible pharmacophore multiplets and their use in 3-D similarity searching. J. Chem. Inf. Comput. Sci., 43(2), 458–468. CrossRef Scholar google search
Shemetulskis N.E., Weininger D., Blankley C.J., Yang J.J., Humblet C. (1996) Stigmata: An algorithm to determine structural commonalities in diverse datasets. J. Chem. Inf. Comput. Sci., 36(4), 862–871. CrossRef Scholar google search
Domingues B.F., Lopes J.C.D. (2012) 3D-Pharma: Uma Ferramenta para Triagem Virtual Baseada em Fingerprints de Farmacyforos. [Doctoral dissertation, Universidade Federal de Minas Gerais]. UFMG Institutional Repository. (in Portuguese) Retrieved September 29, 2024 from: http://hdl.handle.net/1843/BUBD-9DKHDA. Scholar google search
Kim S., Chen J., Cheng T., Gindulyte A., He J., He S., Li Q.,Shoemaker B.A., Thiessen P.A., Yu B., Zaslavsky L., Zhang J., Bolton E.E. (2023) PubChem 2023 update. Nucleic Acids Res., 51(D1), D1373–D1380. CrossRef Scholar google search
Kim S., Bolton E.E. (2024) PubChem: A Large-Scale Public Chemical Database For Drug Discovery. In: Open Access Databases and Datasets for Drug Discovery (Daina A., Przewosny M., Zoete V., eds.). pp. 39–66. CrossRef Scholar google search
Bolton E.E., Chen J., Kim S., Han L., He S., Shi W., Simonyan V., Sun Y., Thiessen P.A., Wang J., Yu B., Zhang J., Bryant S.H. (2011) PubChem3D: A new resource for scientists. J. Cheminformatics, 3(1), 32. CrossRef Scholar google search
Santos F.M., de Winter H., Augustyns K., Lopes J.C.D. (2015) Use of extensive cross-validation and bootstrap application (ExCVBA) for molecular modeling of some pharmacokinetics properties. Poster presented at OPENTOX EURO 2015 — OpenTox InterAction Meeting — Innovation in Predictive Toxicology, Dublin, Ireland. CrossRef Scholar google search
Chang C., Lin C. (2011) LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3), 27. CrossRef Scholar google search
Williams K. (2004) Naïve Bayes algorithm at comprehensive perl archive network. Retrieved September 29, 2024 from: https://metacpan.org/pod/Algorithm::NaiveBayes. Scholar google search
Tropsha A. (2010) Best practices for QSAR model development, validation, and exploitation. Mol. Inform., 29(6–7), 476–488. CrossRef Scholar google search
Lopes J.C.D., dos Santos F.M., Martins-José A., Augustyns K., de Winter H. (2017) The power metric: A new statistically robust enrichment-type metric for virtual screening applications with early recovery capability. J. Cheminformatics, 9, 7. CrossRef Scholar google search
de Winter H., Lopes J.C.D. (2018) Reply to the comment made by Šicho, Vorśilák and Svozil on “The power metric: A new statistically robust enrichment-type metric for virtual screening applications with early recovery capability”. J. Cheminformatics, 10, 14. CrossRef Scholar google search
Hawkins P.C., Nicholls A. (2012) Conformer generation with OMEGA: Learning from the data set and the analysis of failures. J. Chem. Inf. Model., 52(11), 2919–2936. CrossRef Scholar google search
Filimonov D.A., Lagunin A.A., Gloriozova T.A., Rudik A.V., Druzhilovskii D.S., Pogodin P.V., Poroikov V.V. (2014) Prediction of the biological activity spectra of organic compounds using the PASS online web resource. Chem. Heterocycl. Compd., 50(3), 444–457. CrossRef Scholar google search
Nicholls A. (2014) Confidence limits, error bars and method comparison in molecular modeling. Part 1: The calculation of confidence intervals. J. Comput.-Aided Mol. Des., 28(9), 887–918. CrossRef Scholar google search