Testing the Significance of Ranked Gene Sets in Genome-wide Transcriptome
Profiling Data Using Weighted Rank Correlation Statistics

Min      Yao; Hao      He; Binyu      Wang; Xinmiao      Huang; Sunli      Zheng; Jianwu      Wang; Xuejun      Gao; Tinghua      Huang

doi:10.2174/0113892029280470240306044159

Abstract

Background: Popular gene set enrichment analysis approaches assumed that genes in the gene set contributed to the statistics equally. However, the genes in the transcription factors (TFs) derived gene sets, or gene sets constructed by TF targets identified by the ChIP-Seq experiment, have a rank attribute, as each of these genes have been assigned with a p-value which indicates the true or false possibilities of the ownerships of the genes belong to the gene sets.

Objectives: Ignoring the rank information during the enrichment analysis will lead to improper statistical inference. We address this issue by developing of new method to test the significance of ranked gene sets in genome-wide transcriptome profiling data.

Methods: A method was proposed by first creating ranked gene sets and gene lists and then applying weighted Kendall's tau rank correlation statistics to the test. After introducing top-down weights to the genes in the gene set, a new software called "Flaver" was developed.

Results: Theoretical properties of the proposed method were established, and its differences over the GSEA approach were demonstrated when analyzing the transcriptome profiling data across 55 human tissues and 176 human cell-lines. The results indicated that the TFs identified by our method have higher tendency to be differentially expressed across the tissues analyzed than its competitors. It significantly outperforms the well-known gene set enrichment analyzing tools, GOStats (9%) and GSEA (17%), in analyzing well-documented human RNA transcriptome datasets.

Conclusions: The method is outstanding in detecting gene sets of which the gene ranks were correlated with the expression levels of the genes in the transcriptome data.

Keywords: Flaver, ranked gene set, enrichment analysis, weighted rank correlation, GSEA, GOStats, transcription factor.

« Previous Next »

Graphical Abstract

[1]
Wang, Z.; Gerstein, M.; Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet.,  2009, 10(1), 57-63.
 [http://dx.doi.org/10.1038/nrg2484] [PMID:  19015660]

[2]
Costa-Silva, J.; Domingues, D.; Lopes, F.M. RNA-Seq differential expression analysis: An extended review and a software tool. PLoS One,  2017, 12(12), e0190152.
 [http://dx.doi.org/10.1371/journal.pone.0190152] [PMID:  29267363]

[3]
Tieri, P.; Nardini, C. Signalling pathway database usability: lessons learned. Mol. Biosyst.,  2013, 9(10), 2401-2407.
 [http://dx.doi.org/10.1039/c3mb70242a] [PMID:  23942525]

[4]
Gene Ontology, C. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res.,  2021, 49(D1), D325-D334.
 [http://dx.doi.org/10.1093/nar/gkaa1113] [PMID:  33290552]

[5]
Park, P.J. ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet.,  2009, 10(10), 669-680.
 [http://dx.doi.org/10.1038/nrg2641] [PMID:  19736561]

[6]
Bulyk, M.L. Computational prediction of transcription-factor binding site locations. Genome Biol.,  2003, 5(1), 201.
 [http://dx.doi.org/10.1186/gb-2003-5-1-201] [PMID:  14709165]

[7]
Yao, M.; Jiang, C.Y.; Li, C.L. GEREA: Prediction of Gene Expression Regulators from Transcriptome Profiling Data to Transition Networks. Curr. Bioinform.,  2021, 16(9), 1190-1202.
 [http://dx.doi.org/10.2174/1574893616666210621100335]

[8]
Keenan, A.B.; Torre, D.; Lachmann, A. ChEA3: transcription factor enrichment analysis by orthogonal omics integration. Nucleic Acids Res.,  2019, 47(W1), W212-W224.
 [http://dx.doi.org/10.1093/nar/gkz446] [PMID:  31114921]

[9]
Magnusson, R.; Lubovac-Pilav, Z. TFTenricher: a python toolbox for annotation enrichment analysis of transcription factor target genes. BMC Bioinformatics,  2021, 22(1), 7-9.
 [http://dx.doi.org/10.1186/s12859-021-04357-4] [PMID:  34530727]

[10]
Lachmann, A.; Xu, H.L.; Krishnan, J. ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics,  2010, 26(19), 2438-2444.
 [http://dx.doi.org/10.1093/bioinformatics/btq466] [PMID:  20709693]

[11]
Maleki, F.; Ovens, K.; Hogan, D.J. Gene Set Analysis: Challenges, Opportunities, and Future Research. Front. Genet.,  2020, 11(1), 654.
 [http://dx.doi.org/10.3389/fgene.2020.00654] [PMID:  32695141]

[12]
Subramanian, A.; Tamayo, P.; Mootha, V.K. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA,  2005, 102(43), 15545-15550.
 [http://dx.doi.org/10.1073/pnas.0506580102] [PMID:  16199517]

[13]
Falcon, S.; Gentleman, R. Using GOstats to test gene lists for GO term association. Bioinformatics,  2007, 23(2), 257-258.
 [http://dx.doi.org/10.1093/bioinformatics/btl567] [PMID:  17098774]

[14]
Yu, G.; Wang, L.G.; Han, Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS,  2012, 16(5), 284-287.
 [http://dx.doi.org/10.1089/omi.2011.0118] [PMID:  22455463]

[15]
Wang, J.; Vasaikar, S.; Shi, Z. WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit. Nucleic Acids Res.,  2017, 45(W1), W130-W137.
 [http://dx.doi.org/10.1093/nar/gkx356] [PMID:  28472511]

[16]
Huang, T.; Xiao, H.; Tian, Q. Identification of upstream transcription factor binding sites in orthologous genes using mixed Student’s t-test statistics. PLOS Comput. Biol.,  2022, 18(6), e1009773.
 [http://dx.doi.org/10.1371/journal.pcbi.1009773] [PMID:  35671296]

[17]
Grant, C.E.; Bailey, T.L.; Noble, W.S. FIMO: scanning for occurrences of a given motif. Bioinformatics,  2011, 27(7), 1017-1018.
 [http://dx.doi.org/10.1093/bioinformatics/btr064] [PMID:  21330290]

[18]
Zambelli, F.; Pesole, G.; Pavesi, G. Pscan: finding over-represented transcription factor binding site motifs in sequences from co-regulated or co-expressed genes. Nucleic Acids Res,  2009, 37(Web Server issue), W247-52.
 [http://dx.doi.org/10.1093/nar/gkp464]

[19]
Shieh, G.S. A weighted Kendall’s tau statistic. Stat. Probab. Lett.,  1998, 39(1), 17-24.
 [http://dx.doi.org/10.1016/S0167-7152(98)00006-6]

[20]
Benjamini, Y..; Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society series b-methodological,  1995, 57(1), 289-300.
 [http://dx.doi.org/10.1111/j.2517-6161.1995.tb02031.x]

[21]
Uhlen, M.; Fagerberg, L.; Hallstrom, B.M. Proteomics. Tissue-based map of the human proteome. Science,  2015, 347(6220), 1260419.
 [http://dx.doi.org/10.1126/science.1260419] [PMID:  25613900]

[22]
Liberzon, A.; Birger, C.; Thorvaldsdottir, H. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst.,  2015, 1(6), 417-425.
 [http://dx.doi.org/10.1016/j.cels.2015.12.004] [PMID:  26771021]

[23]
Sanatgar, M.; Dolati, A.; Amini, M. A General Class of Weighted Rank Correlation Measures. arXiv,  2020, 1(1), 22.

[24]
Savage, I.R. Contributions to the Theory of Rank Order Statistics-the Two-Sample Case. The Annals of Mathematical Statistics,  1956, 27(3), 590-615, 26.
 [http://dx.doi.org/10.1214/aoms/1177728170]

[25]
Iman, R.L.; Conover, W.J. A Measure of Top-Down Correlation. Technometrics,  1987, 29(3), 351-357.

[26]
Hájek, J.; Šidák, Z.; Sen, P.K. Chapter 4 - Selected rank tests, in Theory of Rank Tests (Second Edition), J. Hájek, Z. Šidák, and P.K. Sen, Editors; San Diego.: Academic Press, 1999, pp. 94-164.
 [http://dx.doi.org/10.1016/B978-012642350-1/50022-9]

Rights & Permissions Print Cite

Article Metrics

24

2

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/0113892029280470240306044159	Print ISSN 1389-2029
Publisher Name Bentham Science Publisher	Online ISSN 1875-5488

Current Genomics

Testing the Significance of Ranked Gene Sets in Genome-wide Transcriptome Profiling Data Using Weighted Rank Correlation Statistics

Abstract

Graphical Abstract

Current Genomics in Cardiovascular Research

Deep Learning in Single Cell Analysis

Genomic Insights into Oncology: Harnessing Machine Learning for Breakthroughs in Cancer Genomics.

Integrating Artificial Intelligence and Omics Approaches in Complex Diseases

Current Genomics

Testing the Significance of Ranked Gene Sets in Genome-wide Transcriptome Profiling Data Using Weighted Rank Correlation Statistics

Abstract

Graphical Abstract

Call for Papers in Thematic Issues

Current Genomics in Cardiovascular Research

Deep Learning in Single Cell Analysis

Genomic Insights into Oncology: Harnessing Machine Learning for Breakthroughs in Cancer Genomics.

Integrating Artificial Intelligence and Omics Approaches in Complex Diseases

Related Journals

Related Books