A Partial Least Squares Algorithm for Microarray Data Analysis Using the VIP Statistic for Gene Selection and Binary Classification

ISSN: 2212-392X (Online)
ISSN: 1574-8936 (Print)


Volume 9, 5 Issues, 2014


Download PDF Flyer




Current Bioinformatics

Aims & ScopeAbstracted/Indexed in

Ranking and Category:
  • 20th of 52 in Mathematical & Computational Biology

Submit Abstracts Online Submit Manuscripts Online

Editor-in-Chief:
Alessandro Giuliani
Istituto Superiore di Sanitá (Italian NIH) Environment and Health Dept
Roma
Italy


View Full Editorial Board

Subscribe Purchase Articles Order Reprints

Current: 1.726
5 - Year: 1.577

A Partial Least Squares Algorithm for Microarray Data Analysis Using the VIP Statistic for Gene Selection and Binary Classification

Author(s): Francisco J. Burguillo, Luis A. Corchete, Javier Martin, Inmaculada Barrera and William G. Bardsley

Affiliation: Departamento de Química Física, Facultad de Farmacia, Universidad de Salamanca, 37080-Salamanca, Spain.

Abstract

An important application of microarray technology is the assignment of new subjects to known clinical groups (class prediction), but the huge number of screened genes and the small number of samples make this task difficult. To overcome this problem, the usual approach has been to extract a small subset of significant genes (gene selection) or to use the whole set of genes to build latent components (dimension reduction), then applying some usual multivariate classification procedure. Alternatively, both aims -gene selection and class prediction- can be achieved at the same time by using methods based on Partial Least Squares (PLS), as reported in the present work.

We present an iterative PLS algorithm based on backward variable elimination through the “Variable Influence on Projection” (VIP) statistic, which finds an optimal PLS model through training and test sets. It simultaneously manages to reduce the number of selected genes by an iterative procedure and finds the best number of PLS factors to reach an optimal classification performance. It is a simple approach that uses only one mathematical method, maintains the identification of discriminatory genes, and builds an optimal predicting model with a fast computation. The algorithm runs as a module of the SIMFIT statistical package, where the optimal model and datasets can be re-run to further interpret the system through additional PLS options, such as scores and loadings plots, or class assignment of new samples.

The proposed algorithm was tested under different scenarios occurring in microarray analysis using simulated data. The results are also compared against different classification methods such as KNN, PAM, SVM, RF and standard PLS.


Keywords: Classification, gene selection, microarray, partial least squares, PLS, VIP statistic.

Purchase Online Rights and Permissions

  
  



Article Details

Volume: 9
Issue Number: 3
First Page: 348
Last Page: 359
Page Count: 12
DOI: 10.2174/15748936113086660011
Advertisement

Related Journals




Webmaster Contact: urooj@benthamscience.org Copyright © 2014 Bentham Science