Abstract
Today, in various aspects of molecular biology, sequence alignment has become an essential tool to study the structure-function relationships of proteins. With the impressive increase of the number of available sequences, alignments provide a substantial piece of information by way of various computational methods. These approaches have generally become a crucial tool to put forward working hypotheses for time-consuming bench work, as protein engineering and site directed mutagenesis. However alignment methods remain hugely perfectible. All methods are dramatically limited in the twilight zone, taking place around 25% of identity between pairs of sequences. More worrying is the very high rate of false positive results generated by most algorithms, depending of empirical parameters, and hard to validate by statistical criteria. After reviewing the main methods, this paper draws users attention to the fact that algorithm performance evaluations are entirely limited to alignment power (sensibility) evaluation. In reference to a given truth defined from alignment of know structures, the power is defined as the proportion of truth restored in the solution. The power may be overestimated by a lack of independent sets of poorly related sequences and its value depends entirely on the criterion used to define the truth. On the other hand, confidence (selectivity) represents the proportion of the solution that is true. Depending on the method and the parameters used, confidence may be much lower than power, and is usually never evaluated. For non-trivial alignments, when the power is high, confidence is low, which means that correctly aligned positions are embedded in large regions unduly aligned. One possible solution to these problems is to use consensus of several multiple alignment methods, which will increase the confidence of the results. The addition of external information, such as the prediction of the secondary structure and / or the prediction of solvent accessibility is also an other way that should increase the performance of existing multiple alignment methods.
Keywords: sequence alignment, scoring matrix, twilight zone, alignment methods, matrices specificity
Current Genomics
Title: Review of Common Sequence Alignment Methods: Clues to Enhance Reliability
Volume: 4 Issue: 2
Author(s): Christophe Lambert, Jean-Marc Van Campenhout, Xavier DeBolle and Eric Depiereux
Affiliation:
Keywords: sequence alignment, scoring matrix, twilight zone, alignment methods, matrices specificity
Abstract: Today, in various aspects of molecular biology, sequence alignment has become an essential tool to study the structure-function relationships of proteins. With the impressive increase of the number of available sequences, alignments provide a substantial piece of information by way of various computational methods. These approaches have generally become a crucial tool to put forward working hypotheses for time-consuming bench work, as protein engineering and site directed mutagenesis. However alignment methods remain hugely perfectible. All methods are dramatically limited in the twilight zone, taking place around 25% of identity between pairs of sequences. More worrying is the very high rate of false positive results generated by most algorithms, depending of empirical parameters, and hard to validate by statistical criteria. After reviewing the main methods, this paper draws users attention to the fact that algorithm performance evaluations are entirely limited to alignment power (sensibility) evaluation. In reference to a given truth defined from alignment of know structures, the power is defined as the proportion of truth restored in the solution. The power may be overestimated by a lack of independent sets of poorly related sequences and its value depends entirely on the criterion used to define the truth. On the other hand, confidence (selectivity) represents the proportion of the solution that is true. Depending on the method and the parameters used, confidence may be much lower than power, and is usually never evaluated. For non-trivial alignments, when the power is high, confidence is low, which means that correctly aligned positions are embedded in large regions unduly aligned. One possible solution to these problems is to use consensus of several multiple alignment methods, which will increase the confidence of the results. The addition of external information, such as the prediction of the secondary structure and / or the prediction of solvent accessibility is also an other way that should increase the performance of existing multiple alignment methods.
Export Options
About this article
Cite this article as:
Lambert Christophe, Campenhout Van Jean-Marc, DeBolle Xavier and Depiereux Eric, Review of Common Sequence Alignment Methods: Clues to Enhance Reliability, Current Genomics 2003; 4 (2) . https://dx.doi.org/10.2174/1389202033350038
DOI https://dx.doi.org/10.2174/1389202033350038 |
Print ISSN 1389-2029 |
Publisher Name Bentham Science Publisher |
Online ISSN 1875-5488 |
Call for Papers in Thematic Issues
Advanced AI Techniques in Big Genomic Data Analysis
The thematic issue on "Advanced AI Techniques in Big Genomic Data Analysis" aims to explore the cutting-edge methodologies and applications of artificial intelligence (AI) in the realm of genomic research, where vast amounts of data pose both challenges and opportunities. This issue will cover a broad spectrum of AI-driven strategies, ...read more
Advanced Computational Algorithms and Artificial Intelligence in Clinical Pharmacogenomics
In the era of personalized medicine, understanding the relationship between genetics and drug response is crucial. This issue delves into innovative methodologies, leveraging deep computational analysis and artificial intelligence, to enhance the field of Clinical Pharmacogenomics. The interdisciplinary approach harnesses the power of advanced high-throughput genotyping technologies, sophisticated computational analysis, ...read more
Applications of Single-cell Sequencing Technology in Reproductive Medicine
Single cell sequencing (SCS) technology utilizes individual cells' genetic material to sequence their genome, transcriptome, and epigenetics at the molecular level. It offers insights into cell heterogeneity and enables the study of limited biological materials. Since its recognition as a valuable technique in 2011, single cell sequencing has yielded numerous ...read more
Current Genomics in Cardiovascular Research
Cardiovascular diseases are the main cause of death in the world, in recent years we have had important advances in the interaction between cardiovascular disease and genomics. In this Research Topic, we intend for researchers to present their results with a focus on basic, translational and clinical investigations associated with ...read more
Related Journals
![](/images/wayfinder.jpg)
- Author Guidelines
- Graphical Abstracts
- Fabricating and Stating False Information
- Research Misconduct
- Post Publication Discussions and Corrections
- Publishing Ethics and Rectitude
- Increase Visibility of Your Article
- Archiving Policies
- Peer Review Workflow
- Order Your Article Before Print
- Promote Your Article
- Manuscript Transfer Facility
- Editorial Policies
- Allegations from Whistleblowers
- Announcements
Related Articles
-
Polymer Particulates in Drug Delivery
Current Pharmaceutical Design Roles of L5-7 Loop in the Structure and Chaperone Function of SsHSP14.1
Protein & Peptide Letters Mucoadhesive Nanosystems for Nose-to-Brain Drug Delivery in the Treatment of Central Nervous System Diseases
Current Medicinal Chemistry Antioxidants and Neuroprotection in the Adult and Developing Central Nervous System
Current Medicinal Chemistry Effects of Mefepronic Acid (2-Phenoxy-2-Methyl Propionic Acid) on Hepatic Metabolism and Reproductive Parameters in Postpartum Dairy Cows
Endocrine, Metabolic & Immune Disorders - Drug Targets Cationic Antimicrobial Peptides for Tuberculosis: A Mini-Review
Current Protein & Peptide Science Proteasome Inhibitors: Recent Advances and New Perspectives In Medicinal Chemistry
Current Topics in Medicinal Chemistry Native Brazilian Plants Against Nosocomial Infections: A Critical Review on their Potential and the Antimicrobial Methodology
Current Topics in Medicinal Chemistry Antibiotic Properties and Applications of Lactoferrin
Current Pharmaceutical Design Subject Index To Volume 7
Current Pharmaceutical Design Signs and Related Mechanisms of Ethanol Hepatotoxicity
Current Drug Abuse Reviews The Evil Axis of Obesity, Inflammation and Type-2 Diabetes
Endocrine, Metabolic & Immune Disorders - Drug Targets Advances in Pharmacological Activities and Mechanisms of Glycyrrhizic Acid
Current Medicinal Chemistry Synthesis and Biological Evaluation of Naphthalene-1,4-dione Derivatives as Potent Antimycobacterial Agents
Medicinal Chemistry Inhibitors of the Sulfur Assimilation Pathway in Bacterial Pathogens as Enhancers of Antibiotic Therapy
Current Medicinal Chemistry Label-Free Cell Phenotypic Drug Discovery
Combinatorial Chemistry & High Throughput Screening Efficacy of Endobronchial Ultrasound-guided Transbronchial Needle Aspiration in the Diagnosis of Mediastinal and Hilar Lesions
Current Medical Imaging CAPi: Computational Model for Apicoplast Inhibitors Prediction Against Plasmodium Parasite
Current Computer-Aided Drug Design Oxidative Stress and Antioxidant Potential of One Hundred Medicinal Plants
Current Topics in Medicinal Chemistry Prediction of Cytochrome 450 Mediated Drug-Drug Interactions by Three-Dimensional Cultured Hepatocytes
Mini-Reviews in Medicinal Chemistry