Title:iRSpotH-TNCPseAAC: Identifying Recombination Spots in Human by Using Pseudo Trinucleotide Composition With an Ensemble of Support Vector Machine Classifiers
Volume: 14
Issue: 9
Author(s): Zhao-Chun Xu, Wang-Ren Qiu*Xuan Xiao*
Affiliation:
- Computer Department, Jing-De- Zhen Ceramic Institute, Jing-De-Zhen 333403,China
- Gordon Life Science Institute, Boston, MA 02478,China
Keywords:
Pseudo amino acid composition, support vector machine, web-server, iRSpotH-TNCPseAAC, meiosis, coldspots,
hotspots.
Abstract: Background: For the formation of human gametes, meiotic recombination is crucial.
Meanwhile, it has played an important role in the process that generates genetic diversity for that it is a
defining event in the formation of human sperm and eggs. However, the recombination isn't a random
occurrence across a genome, it usually occurs in some genomic regions, the so-called “hotspots”, with
higher probability, while in the so-called “coldspots” with lower probability. Research has shown that
new combinations of genetic variations can be provided by recombination. Therefore, the useful insights
for in-depth studying of the genome evolution process and the mechanism of recombination
would be provided based on the information of the coldspots and hotspots. Currently, the recombination
regions would be determined by experiments, but it's a tedious job, which generally requires precious
instruments and takes a long time. So in the study the work is starting to be studied by computational
predicting models to address the above problems.
Method: In this paper, a new predictor, called ‘iRSpotH-TNCPseAAC’ was developed to identify the
human recombination coldspots and hotspots. In the new discrete predictive model, a feature vector
called ‘pseudo trinucleotide composition’ or PseTNC is proposed to formulate the given DNA segment
with its sequence-order information as complete as possible.
Results: In this study, based on the rigorous jackknife test the overall success rate obtained by iRSpotH-
TNCPseAAC is higher than 93% in identifying human’s recombination spots, and with mean
success rate is 76.07% of the concerned 18 chromosomes. It means that our predictor can become a
useful complementary tool in this area. Not only that, the PseTNC method can be used to further explore
many other DNA-related problems. Finally, a web- server called iRSpotH-TNCPseAAC, which
has the advantages of easy operation and convenient for using, is built and freely accessible at
http://www.jci-bioinfo.cn/iRSpotH-TNCPseAAC.
Conclusion: To timely acquire the information of recombination spots in DNA sequence is very significant
to make in-depth study on epigenetic inheritance and analyze human diseases. Furthermore, it
will facilitate drug development. A certain conclusion is that the iRSpotH-TNCPseAAC predictor may
become a very practical online predictive high throughput tools in identifying recombination spots.