Molecular Diversity Assessment using Chemotypes

Hugo    O.    Villar; Raghav       Mandayan; Mark    R.    Hansen

doi:10.2174/1573409917666210203092432

Abstract

Background: Many techniques to design chemical libraries for screening have been put forward over time. General use libraries are still important when screening against novel targets, and their design has relied on the use of molecular descriptors. In contrast, chemotype or scaffold analysis has been used less often.

Objective: We describe a simple method to assess chemical diversity based on counts of the chemotypes that offers an alternative to model chemical diversity. We describe a simple method to assess chemical diversity based on counts of the chemotypes that offers an alternative to model chemical diversity based on computed molecular properties. We show how chemotype counts can be used to evaluate the diversity of a library and compare diversity selection algorithms. We demonstrate an efficient compound selection algorithm based on chemotype analysis.

Methods: We use automated chemotype perception algorithms and compare them to traditional techniques for diversity analysis to check their effectiveness in designing diverse libraries for screening.

Results: The best type of molecular fingerprints for diversity selection in our analysis are extended circular fingerprints, but they can be outperformed by the use of a chemotype diversity algorithm, which can be more intuitive than traditional techniques based on molecular descriptors. Chemotype- -based algorithms retrieve a larger share of the chemotypes contained in a library when picking a subset of the chemicals in a collection.

Conclusions: Chemotype analysis offers an alternative for the generation of a general-purpose screening library as it maximizes the number of chemotypes present in a subset with the smallest number of compounds. The applications of methods based on chemotype analysis that does not resort to the use of molecular descriptors are a very promising but seldom explored area of chemoinformatics.

Keywords: Chemical library design, scaffold analysis, chemical diversity, chemical diversity algorithms, molecular fingerprints, chemotype

[1] 
Martin, E.J.; Blaney, J.M.; Siani, M.A.; Spellmeyer, D.C.; Wong, A.K.; Moos, W.H. Measuring diversity: experimental design of combinatorial libraries for drug discovery. J. Med. Chem.,  1995, 38(9), 1431-1436.
[http://dx.doi.org/10.1021/jm00009a003] [PMID: 7739001] 
[2] 
Gillet, V.J. New directions in library design and analysis. Curr. Opin. Chem. Biol.,  2008, 12(3), 372-378.
[http://dx.doi.org/10.1016/j.cbpa.2008.02.015] [PMID: 18331851] 
[3] 
Ashenden, S.K. Screening library design. Methods Enzymol.,  2018, 610, 73-96.
[http://dx.doi.org/10.1016/bs.mie.2018.09.016] [PMID: 30390806] 
[4] 
Kitchen, D.B.; Stahura, F.L.; Bajorath, J. Computational techniques for diversity analysis and compound classification. Mini Rev. Med. Chem.,  2004, 4(10), 1029-1039.
[http://dx.doi.org/10.2174/1389557043402982] [PMID: 15579111] 
[5] 
Dixon, S.L.; Villar, H.O. Investigation of classification methods for the prediction of activity in diverse chemical libraries. J. Comput. Aided Mol. Des.,  1999, 13(5), 533-545.
[http://dx.doi.org/10.1023/A:1008061017938] [PMID: 10483533] 
[6] 
Raymond, J.W.; Blankley, C.J.; Willett, P. Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures. J. Mol. Graph. Model.,  2003, 21(5), 421-433.
[http://dx.doi.org/10.1016/S1093-3263(02)00188-2] [PMID: 12543138] 
[7] 
Naderi, M.; Alvin, C.; Ding, Y.; Mukhopadhyay, S.; Brylinski, M. A graph-based approach to construct target-focused libraries for virtual screening. J. Cheminform.,  2016, 8, 14.
[http://dx.doi.org/10.1186/s13321-016-0126-6] [PMID: 26981157] 
[8] 
Slater, O.; Kontoyianni, M. The compromise of virtual screening and its impact on drug discovery. Expert Opin. Drug Discov.,  2019, 14(7), 619-637.
[http://dx.doi.org/10.1080/17460441.2019.1604677] [PMID: 31025886] 
[9] 
Wingert, B.M.; Camacho, C.J. Improving small molecule virtual screening strategies for the next generation of therapeutics. Curr. Opin. Chem. Biol.,  2018, 44, 87-92.
[http://dx.doi.org/10.1016/j.cbpa.2018.06.006] [PMID: 29920436] 
[10] 
Lyu, J.; Wang, S.; Balius, T.E.; Singh, I.; Levit, A.; Moroz, Y.S.; O’Meara, M.J.; Che, T.; Algaa, E.; Tolmachova, K.; Tolmachev, A.A.; Shoichet, B.K.; Roth, B.L.; Irwin, J.J. Ultra-large library docking for discovering new chemotypes. Nature,  2019, 566(7743), 224-229.
[http://dx.doi.org/10.1038/s41586-019-0917-9] [PMID: 30728502] 
[11] 
Yang, X.; Wang, Y.; Byrne, R.; Schneider, G.; Yang, S. Concepts of artificial intelligence for computer-assisted drug discovery. Chem. Rev.,  2019, 119(18), 10520-10594.
[http://dx.doi.org/10.1021/acs.chemrev.8b00728] [PMID: 31294972] 
[12] 
Wassermann, A.M.; Geppert, H.; Bajorath, J. Searching for target-selective compounds using different combinations of multiclass support vector machine ranking methods, kernel functions, and fingerprint descriptors. J. Chem. Inf. Model.,  2009, 49(3), 582-592.
[http://dx.doi.org/10.1021/ci800441c] [PMID: 19249858] 
[13] 
Lipkin, M.J.; Stevens, A.P.; Livingstone, D.J.; Harris, C.J. How large does a compound screening collection need to be? Comb. Chem. High Throughput Screen.,  2008, 11(6), 482-493.
[http://dx.doi.org/10.2174/138620708784911492] [PMID: 18673276] 
[14] 
Böcker, A. Toward an improved clustering of large data sets using maximum common substructures and topological fingerprints. J. Chem. Inf. Model.,  2008, 48(11), 2097-2107.
[http://dx.doi.org/10.1021/ci8000887] [PMID: 18956832] 
[15] 
Wild, D.J.; Blankley, C.J. Comparison of 2D fingerprint types and hierarchy level selection methods for structural grouping using Ward’s clustering. J. Chem. Inf. Comput. Sci.,  2000, 40(1), 155-162.
[http://dx.doi.org/10.1021/ci990086j] [PMID: 10661562] 
[16] 
Villar, H.O.; Hansen, M.R. Design of chemical libraries for screening. Expert Opin. Drug Discov.,  2009, 4(12), 1215-1220.
[http://dx.doi.org/10.1517/17460440903397368] [PMID: 23480462] 
[17] 
Schuffenhauer, A.; Varin, T. Rule based classification of chemical structures by scaffold. Mol. Inform.,  2011, 30(8), 646-664.
[http://dx.doi.org/10.1002/minf.201100078] [PMID: 27467257] 
[18] 
Langdon, S.R.; Brown, N.; Blagg, J. Scaffold diversity of exemplified medicinal chemistry space. J. Chem. Inf. Model.,  2011, 51(9), 2174-2185.
[http://dx.doi.org/10.1021/ci2001428] [PMID: 21877753] 
[19] 
Schuffenhauer, A.; Schneider, N.; Hintermann, S.; Auld, D.; Blank, J.; Cotesta, S.; Engeloch, C.; Fechner, N.; Gaul, C.; Giovannoni, J.; Jansen, J.; Joslin, J.; Krastel, P.; Lounkine, E.; Manchester, J.; Monovich, L.G.; Pelliccioli, A.P.; Schwarze, M.; Shultz, M.D.; Stiefl, N.; Baeschlin, D.K. Evolution of Novartis’ Small Molecule Screening Deck Design. J. Med. Chem.,  2020, 63(23), 14425-14447.
[http://dx.doi.org/10.1021/acs.jmedchem.0c01332] [PMID: 33140646] 
[20] 
Nicolaou, C.A.; Tamura, S.Y.; Kelley, B.P.; Bassett, S.I.; Nutt, R.F. Analysis of large screening data sets via adaptively grown phylogenetic-like trees. J. Chem. Inf. Comput. Sci.,  2002, 42(5), 1069-1079.
[http://dx.doi.org/10.1021/ci010244i] [PMID: 12376993] 
[21] 
Roberts, G.; Myatt, G.J.; Johnson, W.P.; Cross, K.P.; Blower, P.E., Jr LeadScope: software for exploring large sets of screening data. J. Chem. Inf. Comput. Sci.,  2000, 40(6), 1302-1314.
[http://dx.doi.org/10.1021/ci0000631] [PMID: 11128088] 
[22] 
Singh, N.; Guha, R.; Giulianotti, M.A.; Pinilla, C.; Houghten, R.A.; Medina-Franco, J.L. Chemoinformatic analysis of combinatorial libraries, drugs, natural products, and molecular libraries small molecule repository. J. Chem. Inf. Model.,  2009, 49(4), 1010-1024.
[http://dx.doi.org/10.1021/ci800426u] [PMID: 19301827] 
[23] 
Castro, H.C.; Abreu, P.A.; Geraldo, R.B.; Martins, R.C.; dos Santos, R.; Loureiro, N.I.; Cabral, L.M.; Rodrigues, C.R. Looking at the proteases from a simple perspective. J. Mol. Recognit.,  2011, 24(2), 165-181.
[http://dx.doi.org/10.1002/jmr.1091] [PMID: 21360607] 
[24] 
Kho, R.; Hodges, J.A.; Hansen, M.R.; Villar, H.O. Ring systems in mutagenicity databases. J. Med. Chem.,  2005, 48(21), 6671-6678.
[http://dx.doi.org/10.1021/jm050564j] [PMID: 16220983] 
[25] 
Reichard, G.A. SARVision Plus. J. Chem. Inf. Model.,  2008, 48(6), 1287-1288.
[http://dx.doi.org/10.1021/ci800152v] [PMID: 18528997] 
[26] 
Bone, R.G.A.; Villar, H.O. Exhaustive enumeration of molecular substructures. J. Comput. Chem.,  1997, 18, 86-107.
[http://dx.doi.org/10.1002/(SICI)1096-987X(19970115)18:1<86::AID-JCC9>3.0.CO;2-W] 
[27] 
Schuffenhauer, A.; Ertl, P.; Roggo, S.; Wetzel, S.; Koch, M.A.; Waldmann, H. The scaffold tree--visualization of the scaffold universe by hierarchical scaffold classification. J. Chem. Inf. Model.,  2007, 47(1), 47-58.
[http://dx.doi.org/10.1021/ci600338x] [PMID: 17238248] 
[28] 
ZINC15. Available from: http://zinc.docking.org//
[29] 
National Center for Biotechnology Information. Available from: https://pubchem.ncbi.nlm.nih.gov/
[30] 
Dixon, S.L.; Koehler, R.T. The hidden component of size in two-dimensional fragment descriptors: side effects on sampling in bioactive libraries. J. Med. Chem.,  1999, 42(15), 2887-2900.
[http://dx.doi.org/10.1021/jm980708c] [PMID: 10425098] 
[31] 
Rogers, D.; Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model.,  2010, 50(5), 742-754.
[http://dx.doi.org/10.1021/ci100050t] [PMID: 20426451] 
[32] 
Ashton, M.; Barnard, J.; Casset, F.; Charlton, M.; Downs, G.; Gorse, D.; Holliday, J.; Lahana, R.; Willett, P. Identification of diverse database subsets using property based and fragment based molecular descriptors. Quant. Struct.-. Act. Relat.,  2002, 21, 598-604.
[http://dx.doi.org/10.1002/qsar.200290002] 
[33] 
Morgan, H.L. The Generation of a Unique Machine Description for Chemical Structures - A Technique Developed at Chemical Abstracts Service. J. Chem. Doc.,  1965, 5, 107-112.
[http://dx.doi.org/10.1021/c160017a018] 
[34] 
Gobbi, A.; Poppinger, D. Genetic optimization of combinatorial libraries. Biotechnol. Bioeng.,  1998, 61(1), 47-54.
[http://dx.doi.org/10.1002/(SICI)1097-0290(199824)61:1<47::AID-BIT9>3.0.CO;2-Z] [PMID: 10099495] 
[35] 
The RDKit Book. Available from: http://rdkit.org/docs/RDKit_Book.htm
[36] 
Durant, J.L.; Leland, B.A.; Henry, D.R.; Nourse, J.G. Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci.,  2002, 42(6), 1273-1280.
[http://dx.doi.org/10.1021/ci010132r] [PMID: 12444722] 

Rights & Permissions Print Cite

Article Metrics

55

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1573409917666210203092432	Print ISSN 1573-4099
Publisher Name Bentham Science Publisher	Online ISSN 1875-6697

Current Computer-Aided Drug Design

Molecular Diversity Assessment using Chemotypes

Abstract

Graphical Abstract

Artificial Intelligence in Biomedical Research: Enhancing Data Analysis for Drug Discovery and Development

Computer-Aided Drug Discoveries for Emerging Diseases

Deep Learning Approaches in Bioinformatics for Computer-Aided Drug Development Targeting Brain Tumors

Current Computer-Aided Drug Design

Molecular Diversity Assessment using Chemotypes

Abstract

Graphical Abstract

Call for Papers in Thematic Issues

Artificial Intelligence in Biomedical Research: Enhancing Data Analysis for Drug Discovery and Development

Computer-Aided Drug Discoveries for Emerging Diseases

Deep Learning Approaches in Bioinformatics for Computer-Aided Drug Development Targeting Brain Tumors

Related Journals

Related Books