A primary goal of quantitative structure-activity relationships (QSARs) and
quantitative structure-property relationships (QSPRs) is to predict chemical activities
from chemical structure. Chemical structure can be quantified in many ways resulting in
hundreds, if not thousands, of measurements for every chemical. Chemical activities
measures how the chemical interacts with other chemicals, e.g. toxicity,
biodegradability, boiling point, and vapor pressure. Typically there are more chemical
structure measurements than chemicals being measured, the so-called large-p, small-n
problem. Here we review some of the statistical procedures that have been commonly
used to explore these problems in the past and provide several examples of their use.
Finally, we peek into the future to discuss two areas that we believe will see
dramatically increased attention in the near future: model averaging and Bayesian
techniques.
Keywords: AIC, Bayesian analysis, BIC, cross-validation, elastic net, k-means
clustering, LASSO, model averaging, model selection, modeling, partial least
squares, prediction, principal component analysis, principal component
regression, regression, ridge regression.