Title:Molecular Diversity Assessment using Chemotypes
Volume: 18
Issue: 1
Author(s): Hugo O. Villar*, Raghav Mandayan and Mark R. Hansen
Affiliation:
- Altoris, Inc. 7770 Regents Rd #113-557 San Diego, CA 92122,United States
Keywords:
Chemical library design, scaffold analysis, chemical diversity, chemical diversity algorithms, molecular fingerprints, chemotype
Abstract: Background: Many techniques to design chemical libraries for screening have been put
forward over time. General use libraries are still important when screening against novel targets,
and their design has relied on the use of molecular descriptors. In contrast, chemotype or scaffold
analysis has been used less often.
Objective: We describe a simple method to assess chemical diversity based on counts of the chemotypes
that offers an alternative to model chemical diversity. We describe a simple method to assess
chemical diversity based on counts of the chemotypes that offers an alternative to model chemical
diversity based on computed molecular properties. We show how chemotype counts can be used to
evaluate the diversity of a library and compare diversity selection algorithms. We demonstrate an
efficient compound selection algorithm based on chemotype analysis.
Methods: We use automated chemotype perception algorithms and compare them to traditional
techniques for diversity analysis to check their effectiveness in designing diverse libraries for
screening.
Results: The best type of molecular fingerprints for diversity selection in our analysis are extended
circular fingerprints, but they can be outperformed by the use of a chemotype diversity algorithm,
which can be more intuitive than traditional techniques based on molecular descriptors. Chemotype-
-based algorithms retrieve a larger share of the chemotypes contained in a library when picking a
subset of the chemicals in a collection.
Conclusions: Chemotype analysis offers an alternative for the generation of a general-purpose
screening library as it maximizes the number of chemotypes present in a subset with the smallest
number of compounds. The applications of methods based on chemotype analysis that does not resort
to the use of molecular descriptors are a very promising but seldom explored area of chemoinformatics.