Title:Predicting LncRNA Subcellular Localization Using Unbalanced Pseudo-k Nucleotide Compositions
Volume: 15
Issue: 6
Author(s): Xiao-Fei Yang, Yuan-Ke Zhou, Lin Zhang, Yang Gao*Pu-Feng Du*
Affiliation:
- School of Medicine, Nankai University, Tianjin 300071,China
- College of Intelligence and Computing, Tianjin University, Tianjin 300350,China
Keywords:
Long non-coding RNA, subcellular localization, sequence order correlated factors, feature selection, analysis of
variance, support vector machine.
Abstract:
Background: Long non-coding RNAs (lncRNAs) are transcripts with a length more
than 200 nucleotides, functioning in the regulation of gene expression. More evidence has shown
that the biological functions of lncRNAs are intimately related to their subcellular localizations.
Therefore, it is very important to confirm the lncRNA subcellular localization.
Methods: In this paper, we proposed a novel method to predict the subcellular localization of
lncRNAs. To more comprehensively utilize lncRNA sequence information, we exploited both kmer
nucleotide composition and sequence order correlated factors of lncRNA to formulate
lncRNA sequences. Meanwhile, a feature selection technique which was based on the Analysis Of
Variance (ANOVA) was applied to obtain the optimal feature subset. Finally, we used the support
vector machine (SVM) to perform the prediction.
Results: The AUC value of the proposed method can reach 0.9695, which indicated the proposed
predictor is an efficient and reliable tool for determining lncRNA subcellular localization. Furthermore,
the predictor can reach the maximum overall accuracy of 90.37% in leave-one-out cross
validation, which clearly outperforms the existing state-of- the-art method.
Conclusion: It is demonstrated that the proposed predictor is feasible and powerful for the prediction
of lncRNA subcellular. To facilitate subsequent genetic sequence research, we shared the
source code at https://github.com/NicoleYXF/lncRNA.