Stem cell laboratories around the world routinely generate whole-genome
expression data to study systems-level processes in stem cell biology, and
computational clustering methods are critical for the genome-wide analysis of such
large data sets. To address major limitations with commonly used clustering
approaches, we developed a novel computational method called AutoSOME to
automatically cluster large, high-dimensional data sets, such as whole-genome
microarray expression data, without prior assumptions about cluster number or data
structure. In previous work we demonstrated that AutoSOME clustering is an effective
method for studying genome-wide expression patterns in stem cells. Here we present a
primer that describes how to use this method to perform comprehensive cluster analyses
of stem cell gene expression data. We include two detailed protocols illustrating the
identification of gene co-expression modules and clusters of cellular phenotypes in a
single step (Protocol 1), and the visualization of transcriptome variation among stem
cells using an intuitive network display (Protocol 2). The workflow described in this
chapter is sufficiently general for use with a wide variety of in-house and publicly
available genomics data sets.
Keywords: Gene clustering, whole-genome expression data, cellular phenotypes,
AutoSOME gene co-expression modules, machine learning, cartography, graph
theory.