How to Judge Predictive Quality of Classification and Regression Based QSAR Models?

Abstract

Quantitative structure-activity relationship (QSAR) is a statistical modelling approach that can be used in drug discovery, environmental fate modeling, property and activity prediction of new, untested compounds. Validation has been identified as one of the important steps for checking the robustness and reliability of QSAR models. Various methodological aspects of validation of QSARs have been a subject of strong debate within the academic and regulatory communities. One of the principles (Principle 4) of the Organization for Economic Cooperation and Development (OECD) refers to the need to establish “appropriate measures of goodness-of-fit, robustness and predictivity” for any QSAR model. Validation strategies are recognized decisive steps to check the statistical acceptability and applicability of the constructed models on a new set of data in order to judge the confidence of predictions. Validation is a holistic practice that comprises evaluation of issues such as quality of data, applicability of the model for prediction purpose and mechanistic interpretation in addition to statistical judgment. Validation strategies are largely dependent on various validation metrics. Viewing the importance of QSAR validation approaches and different validation parameters in the development of successful and acceptable QSAR models, we herein focus to have an overview of different traditional as well as relatively new validation metrics used to judge the quality of the regression as well as classification based QSAR models.

Keywords: Applicability domain, OECD, QSAR, randomization, validation, virtual screening.

Cite as