Title:Boosting Granular Support Vector Machines for the Accurate Prediction of Protein-Nucleotide Binding Sites
Volume: 22
Issue: 7
Author(s): Yi-Heng Zhu, Jun Hu, Yong Qi, Xiao-Ning Song and Dong-Jun Yu*
Affiliation:
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094,China
Keywords:
Imbalance learning, granular computing, support vector machine, classifier ensemble, protein-nucleotide binding
sites.
Abstract:
Aim and Objective: The accurate identification of protein-ligand binding sites helps
elucidate protein function and facilitate the design of new drugs. Machine-learning-based methods
have been widely used for the prediction of protein-ligand binding sites. Nevertheless, the severe
class imbalance phenomenon, where the number of nonbinding (majority) residues is far greater
than that of binding (minority) residues, has a negative impact on the performance of such
machine-learning-based predictors.
Materials and Methods: In this study, we aim to relieve the negative impact of class imbalance by
Boosting Multiple Granular Support Vector Machines (BGSVM). In BGSVM, each base SVM is
trained on a granular training subset consisting of all minority samples and some reasonably
selected majority samples. The efficacy of BGSVM for dealing with class imbalance was validated
by benchmarking it with several typical imbalance learning algorithms. We further implemented a
protein-nucleotide binding site predictor, called BGSVM-NUC, with the BGSVM algorithm.
Results: Rigorous cross-validation and independent validation tests for five types of proteinnucleotide
interactions demonstrated that the proposed BGSVM-NUC achieves promising
prediction performance and outperforms several popular sequence-based protein-nucleotide
binding site predictors. The BGSVM-NUC web server is freely available at
http://csbio.njust.edu.cn/bioinf/BGSVM-NUC/ for academic use.