Machine Learning Sequence Classification Techniques: Application To Cysteine Protease Cleavage Prediction

ISSN: 2212-392X (Online)
ISSN: 1574-8936 (Print)

Volume 10, 5 Issues, 2015

Download PDF Flyer

Current Bioinformatics

Aims & ScopeAbstracted/Indexed in

Ranking and Category:
  • 20th of 52 in Mathematical & Computational Biology

Submit Abstracts Online Submit Manuscripts Online

Alessandro Giuliani
Istituto Superiore di Sanitá (Italian NIH) Environment and Health Dept

View Full Editorial Board

Subscribe Purchase Articles Order Reprints

Current: 1.726
5 - Year: 1.577

Machine Learning Sequence Classification Techniques: Application To Cysteine Protease Cleavage Prediction

Author(s): David A. duVerle and Hiroshi Mamitsuka


Sequence classification is one of the most fundamental machine learning task in computational biology nowadays. With the wide availability of large corpora of annotated sequences, the use of supervised learning techniques can greatly speed up the process of identifying new sequences sharing certain function or properties. Many methods have been proposed over the years and we hope to provide an introduction to some of the more prominent ones by focussing on protease cleavage prediction: a typical representative of this class of problem. The variety of proteolytic action modes between cysteine-proteases covers a broad range of complexity level and feature specificity, illustrating the strengths and limitations of the different machine learning techniques used on them.

This review briefly introduces the particulars of predicting cleavage by calpains and caspases. We then offer some general practical considerations on treating sequences for use with machine learning algorithms, before covering specific methods. The methods presented range from basic position-based statistical models to more technically advanced methods such as Markov models or kernel-based algorithms, as well as methods with more restricted goals such as decision trees. With each family of algorithms, examples of implementations are introduced and their performances compared, along with particular strengths and weaknesses.

With this review, we aim to provide useful elements of decision toward choosing an existing method or developing a new one, based on the complexity and specific needs of a given sequence classification problem.

Purchase Online Order Reprints Order Eprints Rights and Permissions


Article Details

Volume: 8
First Page:
Page Count:
DOI: 10.2174/15748936113089990010

Related Journals

Webmaster Contact: Copyright © 2015 Bentham Science