Title:Prediction Model of Thermophilic Protein Based on Stacking Method
Volume: 16
Issue: 10
Author(s): Xian-Fang Wang*, Fan Lu, Zhi-Yong Du and Qi-Meng Li
Affiliation:
- School of Computer Science and Technology, Henan Institute of Technology, Henan,China
Keywords:
Thermophilic proteins, stacking, amino acid composition, g-gap, entropy density, autocorrelation coefficient.
Abstract:
Background: Through the in-depth study of the thermophilic protein heat resistance principle,
it is of great significance for people to deeply understand the folding, structure, function, and the
evolution of proteins, and the directed design and modification of protein molecules in protein processing.
Objective: Aiming at the problem of low accuracy and low efficiency of thermophilic protein prediction,
a thermophilic protein prediction model based on the Stacking method is proposed.
Methods: Based on the idea of Stacking, this paper uses five features extraction methods, including
amino acid composition, g-gap dipeptide, encoding based on grouped weight, entropy density, and autocorrelation
coefficient to characterize protein sequences for the selected standard data set. Then, the
SVM based on the Gaussian kernel function is used to design the classification prediction model; by
taking the prediction results of the five methods as the second layer input, the logistic regression model
is used to integrate the experimental results to build a thermophilic protein prediction model based on
the Stacking method.
Results: The accuracy of the proposed method was found up to 93.75% when verified by the Jackknife
method, and a number of performance evaluation indexes were observed to be higher than those of other
models, and the overall performance better than that of most of the reported methods.
Conclusion: The model presented in this paper has shown strong robustness and can significantly improve
the prediction performance of thermophilic proteins.