Title:Prediction and Identification of Krüppel-Like Transcription Factors by Machine Learning Method
Volume: 20
Issue: 7
Author(s): Zhijun Liao, Xinrui Wang, Xingyong Chen and Quan Zou*
Affiliation:
- School of Computer and Technology, Tianjin University, Tianjin,China
Keywords:
Krüppel-like factor, binary-class classification, phylogenetic analysis, motif, a library for support vector
machine, machine learning method.
Abstract: Aim and Objective: The Krüppel-like factors (KLFs) are a family of containing Zn
finger(ZF) motif transcription factors with 18 members in human genome, among them, KLF18 is
predicted by bioinformatics. KLFs possess various physiological function involving in a number of
cancers and other diseases. Here we perform a binary-class classification of KLFs and non-KLFs by
machine learning methods.
Material and Method: The protein sequences of KLFs and non-KLFs were searched from UniProt
and randomly separate them into training dataset(containing positive and negative sequences) and
test dataset(containing only negative sequences), after extracting the 188-dimensional(188D) feature
vectors we carry out category with four classifiers(GBDT, libSVM, RF, and k-NN). On the human
KLFs, we further dig into the evolutionary relationship and motif distribution, and finally we analyze
the conserved amino acid residue of three zinc fingers.
Results: The classifier model from training dataset were well constructed, and the highest
specificity(Sp) was 99.83% from a library for support vector machine(libSVM) and all the correctly
classified rates were over 70% for 10-fold cross-validation on test dataset. The 18 human KLFs can
be further divided into 7 groups and the zinc finger domains were located at the carboxyl terminus,
and many conserved amino acid residues including Cysteine and Histidine, and the span and interval
between them were consistent in the three ZF domains.
Conclusion: Two classification models for KLFs prediction have been built by novel machine
learning methods.