Serial Analysis of Gene Expression (SAGE) is a sequence-based measure of
gene expression that provides quantitative information on the population of transcripts
through the generation and counting of specific sequence tags. Many SAGE datasets are
publicly available for analysis, constituting a valuable resource for the study of gene
expression. These datasets contain tags that are not obviously derived from known
transcripts and thus hint at the existence of a large number of novel transcripts;
however, the prioritization of candidates for further experimental verification is
difficult. Here we demonstrate a method to identify non-coding antisense transcripts
which may be implicated in stem cell differentiation by combining SAGE data with
gene expression data derived by a complementary method. We produced SAGE
libraries and paired microarray gene expression data pre- and post-differentiation of
three mouse stem cell types (embryonic, mammary and neural). We found 1,674 SAGE
tags antisense to 1,351 protein coding genes. A majority of these antisense tags overlap
the 3’UTRs of sense genes; their abundance correlates with the expression of the
corresponding sense genes and appears to be tissue specific. We did not find significant
association between the expression of these tags and alternative splicing. We measured
the expression of three genes expressed in the mouse embryo (Zfp42/Rex1, Ywhag/14-
3-3g and Pspr1) and corresponding putative antisense transcripts by qPCR before and
after differentiation of mESC. We conclude that it is possible to identify putative novel
antisense transcripts with a potential role in ES cell differentiation by integrating data
from existing SAGE libraries with expression data derived by a complementary method.
All data used in this work are available from the Gene Expression Omnibus (GEO) and
StemBase databases.
Keywords: Serial Analysis of Gene Expression, DNA microarray profiling of
gene expression, Embryonic stem cells, Neural stem cells, Mammospheres, Stem
cell differentiation, Antisense transcripts, Non-coding RNA, Alternative splicing,
Expressed Sequence Tag libraries.