The Power of Matrix Factorization: Methods for Deconvoluting Genetic Heterogeneous Data at Expression Level

Yuan       Liu; Zhining       Wen; Menglong       Li
Abstract

Background: The utilization of genetic data to investigate biological problems has recently become a vital approach. However, it is undeniable that the heterogeneity of original samples at the biological level is usually ignored when utilizing genetic data. Different cell-constitutions of a sample could differentiate the expression profile, and set considerable biases for downstream research. Matrix factorization (MF) which originated as a set of mathematical methods, has contributed massively to deconvoluting genetic profiles in silico, especially at the expression level.
Objective: With the development of artificial intelligence algorithms and machine learning, the number of computational methods for solving heterogeneous problems is also rapidly abundant. However, a structural view from the angle of using MF to deconvolute genetic data is quite limited. This study was conducted to review the usages of MF methods on heterogeneous problems of genetic data on expression level.
Methods: MF methods involved in deconvolution were reviewed according to their individual strengths. The demonstration is presented separately into three sections: application scenarios, method categories and summarization for tools. Specifically, application scenarios defined deconvoluting problem with applying scenarios. Method categories summarized MF algorithms contributed to different scenarios. Summarization for tools listed functions and developed web-servers over the latest decade. Additionally, challenges and opportunities of relative fields are discussed.
Results and Conclusion: Based on the investigation, this study aims to present a relatively global picture to assist researchers to achieve a quicker access of deconvoluting genetic data in silico, further to help researchers in selecting suitable MF methods based on the different scenarios.
Keywords: Matrix factorization, heterogenization, gene expression, deconvolution, computational method, cell type.
« Previous Next »
Graphical Abstract

[1] 
Barrett T, Troup DB, Wilhite SE, et al. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res  2009; 37(Database issue): D885-90.
[http://dx.doi.org/10.1093/nar/gkn764] [PMID:  18940857] 
[2] 
Shen-Orr SS, Tibshirani R, Khatri P, et al. Cell type-specific gene expression differences in complex tissues. Nat Methods  2010; 7(4): 287-9.
[http://dx.doi.org/10.1038/nmeth.1439] [PMID:  20208531] 
[3] 
Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res  2002; 30(1): 207-10.
[http://dx.doi.org/10.1093/nar/30.1.207] [PMID:  11752295] 
[4] 
Shen-Orr SS, Gaujoux R. Computational deconvolution: extracting cell type-specific information from heterogeneous samples. Curr Opin Immunol  2013; 25(5): 571-8.
[http://dx.doi.org/10.1016/j.coi.2013.09.015] [PMID:  24148234] 
[5] 
Ju W, Greene CS, Eichinger F, et al. Defining cell-type specificity at the transcriptional level in human disease. Genome Res  2013; 23(11): 1862-73.
[http://dx.doi.org/10.1101/gr.155697.113] [PMID:  23950145] 
[6] 
Lassmann S, Bauer M, Soong R, et al. Quantification of CK20 gene and protein expression in colorectal cancer by RT-PCR and immunohistochemistry reveals inter- and intratumour heterogeneity. J Pathol  2002; 198(2): 198-206.
[http://dx.doi.org/10.1002/path.1196] [PMID:  12237879] 
[7] 
DePianto DJ, Chandriani S, Abbas AR, et al. Heterogeneous gene expression signatures correspond to distinct lung pathologies and biomarkers of disease severity in idiopathic pulmonary fibrosis. Thorax  2015; 70(1): 48-56.
[http://dx.doi.org/10.1136/thoraxjnl-2013-204596] [PMID:  25217476] 
[8] 
Tautz D, Pfeifle C. A non-radioactive in situ hybridization method for the localization of specific RNAs in Drosophila embryos reveals translational control of the segmentation gene hunchback. Chromosoma  1989; 98(2): 81-5.
[http://dx.doi.org/10.1007/BF00291041] [PMID:  2476281] 
[9] 
Gerdes J, Lemke H, Baisch H, Wacker HH, Schwab U, Stein H. Cell cycle analysis of a cell proliferation-associated human nuclear antigen defined by the monoclonal antibody Ki-67. J Immunol  1984; 133(4): 1710-5.
[PMID:  6206131] 
[10] 
Wang N, Hoffman EP, Chen L, et al. Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues. Sci Rep  2016; 6: 18909.
[http://dx.doi.org/10.1038/srep18909] [PMID:  26739359] 
[11] 
Soboleski MR, Oaks J, Halford WP. Green fluorescent protein is a quantitative reporter of gene expression in individual eukaryotic cells. FASEB J  2005; 19(3): 440-2.
[http://dx.doi.org/10.1096/fj.04-3180fje] [PMID:  15640280] 
[12] 
Qiao W, Quon G, Csaszar E, Yu M, Morris Q, Zandstra PW. PERT: a method for expression deconvolution of human blood samples from varied microenvironmental and developmental conditions. PLOS Comput Biol  2012; 8(12)e1002838
[http://dx.doi.org/10.1371/journal.pcbi.1002838] [PMID:  23284283] 
[13] 
Li B, Severson E, Pignon J-C, et al. Comprehensive analyses of tumor immunity: implications for cancer immunotherapy. Genome Biol  2016; 17(1): 174.
[http://dx.doi.org/10.1186/s13059-016-1028-7] [PMID:  27549193] 
[14] 
Titus AJ, Gallimore RM, Salas LA, Christensen BC. Cell-type deconvolution from DNA methylation: a review of recent applications. Hum Mol Genet  2017; 26(R2): R216-24.
[http://dx.doi.org/10.1093/hmg/ddx275] [PMID:  28977446] 
[15] 
Şenbabaoğlu Y, Gejman RS, Winer AG, et al. Tumor immune microenvironment characterization in clear cell renal cell carcinoma identifies prognostic and immunotherapeutically relevant messenger RNA signatures. Genome Biol  2016; 17(1): 231.
[http://dx.doi.org/10.1186/s13059-016-1092-z] [PMID:  27855702] 
[16] 
de Ridder D, van der Linden CE, Schonewille T, et al. Purity for clarity: the need for purification of tumor cells in DNA microarray studies. Leukemia  2005; 19(4): 618-27.
[http://dx.doi.org/10.1038/sj.leu.2403685] [PMID:  15744349] 
[17] 
Troyanskaya OG, Dolinski K, Owen AB, Altman RB, Botstein D. A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc Natl Acad Sci USA  2003; 100(14): 8348-53.
[http://dx.doi.org/10.1073/pnas.0832373100] [PMID:  12826619] 
[18] 
Newman AM, Liu CL, Green MR, et al. Robust enumeration of cell subsets from tissue expression profiles. Nat Methods  2015; 12(5): 453-7.
[http://dx.doi.org/10.1038/nmeth.3337] [PMID:  25822800] 
[19] 
Wang IJ. Examining the full effects of landscape heterogeneity on spatial genetic variation: a multiple matrix regression approach for quantifying geographic and ecological isolation. Evolution  2013; 67(12): 3403-11.
[http://dx.doi.org/10.1111/evo.12134] [PMID:  24299396] 
[20] 
Aragues R, Sander C, Oliva B. Predicting cancer involvement of genes from heterogeneous data. BMC Bioinformatics  2008; 9(1): 172.
[http://dx.doi.org/10.1186/1471-2105-9-172] [PMID:  18371197] 
[21] 
Reimand J, Tooming L, Peterson H, Adler P, Vilo J. GraphWeb: mining heterogeneous biological networks for gene modules with functional significance  Nucleic Acids Res  2008; 36(suppl_2) W452-9.
[http://dx.doi.org/10.1093/nar/gkn230] 
[22] 
Avila Cobos F, Vandesompele J, Mestdagh P, De Preter K. Computational deconvolution of transcriptomics data from mixed cell populations. Bioinformatics  2018; 34(11): 1969-79.
[http://dx.doi.org/10.1093/bioinformatics/bty019] [PMID:  29351586] 
[23] 
Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer  2009; 42(8): 30-7.
[http://dx.doi.org/10.1109/MC.2009.263] 
[24] 
Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng  2009; 22(10): 1345-59.
[http://dx.doi.org/10.1109/TKDE.2009.191] 
[25] 
Ochs MF, Fertig EJ, Eds. Matrix factorization for transcriptional regulatory network inference. 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) 2012 May 9-12 San Diego, USA  IEEE 2012.
[http://dx.doi.org/10.1109/CIBCB.2012.6217256] 
[26] 
Kumar K. Non-negative Factor (NNF) Assisted Partial Least Square (PLS) analysis of excitation-emission matrix fluorescence spectroscopic data sets: automating the identification and quantification of multifluorophoric mixtures. J Fluoresc  2019; 29(5): 1183-90.
[http://dx.doi.org/10.1007/s10895-019-02435-8] [PMID:  31506744] 
[27] 
O’Malley D, Vesselinov VV, Alexandrov BS, Alexandrov LB. Nonnegative/Binary matrix factorization with a D-Wave quantum annealer. PLoS One  2018; 13(12)e0206653
[http://dx.doi.org/10.1371/journal.pone.0206653] [PMID:  30532243] 
[28] 
Kalina J. Classification methods for high-dimensional genetic data. Biocybern Biomed Eng  2014; 34(1): 10-8.
[http://dx.doi.org/10.1016/j.bbe.2013.09.007] 
[29] 
Dai JJ, Lieu L, Rocke D. Dimension reduction for classification with gene expression microarray data. Stat Appl Genet Mol Biol  2006; 5(1): 6.
[http://dx.doi.org/10.2202/1544-6115.1147] 
[30] 
Reich D, Price AL, Patterson N. Principal component analysis of genetic data. Nat Genet  2008; 40(5): 491-2.
[http://dx.doi.org/10.1038/ng0508-491] [PMID:  18443580] 
[31] 
Nguyen DV, Rocke DM. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics  2002; 18(1): 39-50.
[http://dx.doi.org/10.1093/bioinformatics/18.1.39] [PMID:  11836210] 
[32] 
Antoniadis A, Lambert-Lacroix S, Leblanc F. Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics  2003; 19(5): 563-70.
[http://dx.doi.org/10.1093/bioinformatics/btg062] [PMID:  12651713] 
[33] 
Leardi R. Genetic algorithms in chemometrics and chemistry: a review. J Chemometr  2001; 15(7): 559-69.
[http://dx.doi.org/10.1002/cem.651] 
[34] 
Zaitsev K, Bambouskova M, Swain A, Artyomov MN. Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures. Nat Commun  2019; 10(1): 2209.
[http://dx.doi.org/10.1038/s41467-019-09990-5] [PMID:  31101809] 
[35] 
Chou K-C. An insightful recollection since the distorted key theory was born about 23 years ago. Genomics 2019.
[http://dx.doi.org/10.1016/j.ygeno.2019.09.001] 
[36] 
Le NQK, Yapp EKY, Ho Q-T, Nagasundaram N, Ou Y-Y, Yeh H-Y. iEnhancer-5Step: Identifying enhancers using hidden information of DNA sequences via Chou’s 5-step rule and word embedding. Anal Biochem  2019; 571: 53-61.
[http://dx.doi.org/10.1016/j.ab.2019.02.017] [PMID:  30822398] 
[37] 
Chou K-C. An insightful recollection for predicting protein subcellular locations in multi-label systems. Genomics 2019.
[http://dx.doi.org/10.1016/j.ygeno.2019.08.008] 
[38] 
Chou K-C. Proposing pseudo amino acid components is an important milestone for proteome and genome analyses. Int J Pept Res Ther  2019; 1-14.
[http://dx.doi.org/10.1007/s10989-019-09910-7] 
[39] 
Hussain W, Khan YD, Rasool N, Khan SA, Chou K-C. SPalmitoylC-PseAAC: A sequence-based model developed via Chou’s 5-steps rule and general PseAAC for identifying S-palmitoylation sites in proteins. Anal Biochem  2019; 568: 14-23.
[http://dx.doi.org/10.1016/j.ab.2018.12.019] [PMID:  30593778] 
[40] 
Tomczak K, Czerwińska P, Wiznerowicz M. The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp Oncol (Pozn)  2015; 19(1A): A68-77.
[http://dx.doi.org/10.5114/wo.2014.47136] [PMID:  25691825] 
[41] 
Chou K-C. Progresses in predicting post-translational modification. Int J Pept Res Ther  2020; 26: 873-88.
[42] 
Guo S-H, Deng E-Z, Xu L-Q, et al. iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics  2014; 30(11): 1522-9.
[http://dx.doi.org/10.1093/bioinformatics/btu083] [PMID:  24504871] 
[43] 
Qiu W-R, Sun B-Q, Xiao X, Xu Z-C, Chou K-C. iPTM-mLys: identifying multiple lysine PTM sites and their different types. Bioinformatics  2016; 32(20): 3116-23.
[http://dx.doi.org/10.1093/bioinformatics/btw380] [PMID:  27334473] 
[44] 
Chen W, Zhang X, Brooker J, Lin H, Zhang L, Chou K-C. PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics  2015; 31(1): 119-20.
[http://dx.doi.org/10.1093/bioinformatics/btu602] [PMID:  25231908] 
[45] 
Feng P, Yang H, Ding H, Lin H, Chen W, Chou K-C. iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics  2019; 111(1): 96-102.
[http://dx.doi.org/10.1016/j.ygeno.2018.01.005] [PMID:  29360500] 
[46] 
Chou K-C. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol  2011; 273(1): 236-47.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID:  21168420] 
[47] 
Cherry EC. Some experiments on the recognition of speech, with one and with two ears. J Acoust Soc Am  1953; 25(5): 975-9.
[http://dx.doi.org/10.1121/1.1907229] 
[48] 
Liang Y-Z, Kvalheim OM, Manne R. White, grey and black multicomponent systems: A classification of mixture problems and methods for their quantitative analysis. Chemom Intell Lab Syst  1993; 18(3): 235-50.
[http://dx.doi.org/10.1016/0169-7439(93)85001-W] 
[49] 
Venet D, Pecasse F, Maenhaut C, Bersini H. Separation of samples into their constituents using gene expression data Bioinformatics  2001; 17(suppl_1): S279-87.
[http://dx.doi.org/10.1093/bioinformatics/17.suppl_1.S279] 
[50] 
Neter J, Kutner MH, Nachtsheim CJ, Wasserman W. Applied linear statistical models: Irwin Chicago 1996.
[51] 
Cibulskis K, Lawrence MS, Carter SL, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol  2013; 31(3): 213-9.
[http://dx.doi.org/10.1038/nbt.2514] [PMID:  23396013] 
[52] 
Wang N, Gong T, Clarke R, et al. UNDO: a Bioconductor R package for unsupervised deconvolution of mixed gene expressions in tumor samples. Bioinformatics  2015; 31(1): 137-9.
[http://dx.doi.org/10.1093/bioinformatics/btu607] [PMID:  25212756] 
[53] 
Reddy MC, Dourish P, Pratt W, Eds. Coordinating heterogeneous work: information and representation in medical care. In:ECSCW.  Dordrecht: Springer 2001; pp. 239-58.
[54] 
Liu Y, Liang Y, Kuang Q, Xie F, Hao Y, Wen Z, et al. Post‐modified non‐negative matrix factorization for deconvoluting the gene expression profiles of specific cell types from heterogeneous clinical samples based on RNA‐sequencing data. J Chemometr  2018; 32(11)e2929
[http://dx.doi.org/10.1002/cem.2929] 
[55] 
Kossaï M, Leary A, Scoazec JY, Genestie C. Ovarian cancer: a heterogeneous disease. Pathobiology  2018; 85(1-2): 41-9.
[http://dx.doi.org/10.1159/000479006] [PMID:  29020678] 
[56] 
Zhang J, Zhang L, Gang Y, Di W, Jiang L, Huang L, et al. Nonnegative matrix factorization for the improvement in sensitivity of discovering potentially disease-related genes. Chemom Intell Lab Syst  2013; 126: 100-7.
[http://dx.doi.org/10.1016/j.chemolab.2013.05.004] 
[57] 
Chen Z, Huang A, Sun J, Jiang T, Qin FX-F, Wu A. Inference of immune cell composition on the expression profiles of mouse tissue. Sci Rep  2017; 7: 40508.
[http://dx.doi.org/10.1038/srep40508] [PMID:  28084418] 
[58] 
Zhang Y, Cao X, Zhong S. GeNemo: a search engine for web-based functional genomic data. Nucleic Acids Res  2016; 44(W1)W122-7
[http://dx.doi.org/10.1093/nar/gkw299] [PMID:  27098038] 
[59] 
Zhang Y, Pu Y, Zhang H, Su Y, Zhang L, Zhou J. Using gene expression programming to infer gene regulatory networks from time-series data. Comput Biol Chem  2013; 47: 198-206.
[http://dx.doi.org/10.1016/j.compbiolchem.2013.09.004] [PMID:  24140883] 
[60] 
Russell SJ, Norvig P. Artificial intelligence-a modern approach (3rd internat edn). Prentice Hall 2010.
[61] 
Lawson CL, Hanson RJ. Solving least squares problems. Society for Industrial and Applied Mathematics 1995.
[62] 
Mullen KM, Van Stokkum IH. nnls: The Lawson-Hanson algorithm for non-negative least squares (NNLS). R package version  2007.
[63] 
Berkson J, Ed. Estimation by least squares and by maximum likelihood. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Contributions to the Theory of Statistics 1956. The Regents of the University of California.
[64] 
Henderson H. Encyclopedia of computer science and technology. Infobase Publishing 2009.
[65] 
Leek JT, Storey JD. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet  2007; 3(9): 1724-35.
[http://dx.doi.org/10.1371/journal.pgen.0030161] [PMID:  17907809] 
[66] 
Teschendorff AE, Zhuang J, Widschwendter M. Independent surrogate variable analysis to deconvolve confounding factors in large-scale microarray profiling studies. Bioinformatics  2011; 27(11): 1496-505.
[http://dx.doi.org/10.1093/bioinformatics/btr171] [PMID:  21471010] 
[67] 
Parker HS, Corrada Bravo H, Leek JT. Removing batch effects for prediction problems with frozen surrogate variable analysis. PeerJ  2014; 23(2)e561
[http://dx.doi.org/10.7717/peerj.561] [PMID:  25332844] 
[68] 
Parker HS, Leek JT, Favorov AV, et al. Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction. Bioinformatics  2014; 30(19): 2757-63.
[http://dx.doi.org/10.1093/bioinformatics/btu375] [PMID:  24907368] 
[69] 
Chakraborty S, Datta S, Datta S. Surrogate variable analysis using partial least squares (SVA-PLS) in gene expression studies. Bioinformatics  2012; 28(6): 799-806.
[http://dx.doi.org/10.1093/bioinformatics/bts022] [PMID:  22238271] 
[70] 
Leek JT, Johnson WE, Parker HS, et al. sva: Surrogate variable analysis. R package version 2017; 3: 882-3.
[71] 
Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature  1999; 401(6755): 788-91.
[http://dx.doi.org/10.1038/44565] [PMID:  10548103] 
[72] 
Gaujoux R, Seoighe C. A flexible R package for nonnegative matrix factorization. BMC Bioinformatics  2010; 11(1): 367.
[http://dx.doi.org/10.1186/1471-2105-11-367] [PMID:  20598126] 
[73] 
Hoyer PO. Non-negative matrix factorization with sparseness constraints. J Mach Learn Res  2004; 5: 1457-69.
[74] 
Kim H, Park H. Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics  2007; 23(12): 1495-502.
[http://dx.doi.org/10.1093/bioinformatics/btm134] [PMID:  17483501] 
[75] 
Paatero P, Tapper U. Positive matrix factorization: A non‐negative factor model with optimal utilization of error estimates of data values. Environmetrics  1994; 5(2): 111-26.
[http://dx.doi.org/10.1002/env.3170050203] 
[76] 
Paatero P. The multilinear engine-a table-driven, least squares program for solving multilinear problems, including the n-way parallel factor analysis model. J Comput Graph Stat  1999; 8(4): 854-88.
[77] 
Jolliffe I. Principal component analysis.  Berlin: Springer 2011; pp. 1094-6.
[78] 
Hyvärinen A, Oja E. Independent component analysis: algorithms and applications. Neural Netw  2000; 13(4-5): 411-30.
[http://dx.doi.org/10.1016/S0893-6080(00)00026-5] [PMID:  10946390] 
[79] 
Comon P. Independent component analysis, a new concept? Signal Processing  1994; 36(3): 287-314.
[http://dx.doi.org/10.1016/0165-1684(94)90029-9] 
[80] 
Roberts S, Everson R. Independent component analysis: principles and practice. Cambridge University Press 2001.
[http://dx.doi.org/10.1017/CBO9780511624148] 
[81] 
Draper BA, Baek K, Bartlett MS, Beveridge JR. Recognizing faces with PCA and ICA. Comput Vis Image Underst  2003; 91(1-2): 115-37.
[http://dx.doi.org/10.1016/S1077-3142(03)00077-8] 
[82] 
Patil MN, Iyer B, Arya R, Eds. Performance evaluation of PCA and ICA algorithm for facial expression recognition application. Proceedings of Fifth International Conference on Soft Computing for Problem Solving  2016. March; Springer.
[http://dx.doi.org/10.1007/978-981-10-0448-3_81] 
[83] 
Hyvarinen A. Ed. Fast ICA for noisy data using Gaussian moments.  ISCAS'99 Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat No 99CH36349); 1999. May 30- June 2; Orlando, USA.
[84] 
Langlois D, Chartier S, Gosselin D. An introduction to independent component analysis: InfoMax and FastICA algorithms. Tutor Quant Methods Psychol  2010; 6(1): 31-8.
[http://dx.doi.org/10.20982/tqmp.06.1.p031] 
[85] 
Rahmani E, Zaitlen N, Baran Y, et al. Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies. Nat Methods  2016; 13(5): 443-5.
[http://dx.doi.org/10.1038/nmeth.3809] [PMID:  27018579] 
[86] 
Holgado–Tello FP, Chacón–Moscoso S, Barbero–García I, Vila–Abad E. Polychoric versus Pearson correlations in exploratory and confirmatory factor analysis of ordinal variables. Qual Quant  2010; 44(1): 153.
[http://dx.doi.org/10.1007/s11135-008-9190-y] 
[87] 
Woolley AW, Chabris CF, Pentland A, Hashmi N, Malone TW. Evidence for a collective intelligence factor in the performance of human groups. science  2010; 330(6004): 686-8.
[88] 
Bartholomew DJ, Steele F, Galbraith J, Moustaki I. Analysis of multivariate social science data. Chapman and Hall 2008.
[89] 
Teschendorff AE, Breeze CE, Zheng SC, Beck S. A comparison of reference-based algorithms for correcting cell-type heterogeneity in epigenome-wide association studies. BMC Bioinformatics  2017; 18(1): 105.
[http://dx.doi.org/10.1186/s12859-017-1511-5] [PMID:  28193155] 
[90] 
Teschendorff AE, Zheng SC. Cell-type deconvolution in epigenome-wide association studies: a review and recommendations. Epigenomics  2017; 9(5): 757-68.
[http://dx.doi.org/10.2217/epi-2016-0153] [PMID:  28517979] 
[91] 
Korshunov A, Remke M, Kool M, et al. Biological and clinical heterogeneity of MYCN-amplified medulloblastoma. Acta Neuropathol  2012; 123(4): 515-27.
[http://dx.doi.org/10.1007/s00401-011-0918-8] [PMID:  22160402] 
[92] 
Hou M-X, Liu J-X, Shang J, Gao Y-L, Kong X-Z, Dai L-Y, Eds. Performance analysis of non-negative matrix factorization methods on TCGA data. International Conference on Intelligent Computing 2018.  July 6; Cham, Springer.
[http://dx.doi.org/10.1007/978-3-319-95933-7_50] 
[93] 
Boddy R, Smith GL. Statistical methods in practice: for scientists and technologists. Chichester: Wiley 2009.
[http://dx.doi.org/10.1002/9780470749296] 
[94] 
Lawrence I, Lin KJB. A concordance correlation coefficient to evaluate reproducibility  1989; 255-68.
[95] 
Zhong Y, Wan YW, Pang K, Chow LM, Liu Z. Digital sorting of complex tissues for cell type-specific gene expression profiles. BMC Bioinformatics  2013; 14(1): 89.
[http://dx.doi.org/10.1186/1471-2105-14-89] [PMID:  23497278] 
[96] 
Jin H, Wan YW, Liu Z. Comprehensive evaluation of RNA-seq quantification methods for linearity. BMC Bioinformatics  2017; 18(Suppl. 4): 117.
[http://dx.doi.org/10.1186/s12859-017-1526-y] [PMID:  28361706] 
[97] 
Pena JT, Sohn-Lee C, Rouhanifard SH, et al. miRNA in situ hybridization in formaldehyde and EDC-fixed tissues. Nat Methods  2009; 6(2): 139-41.
[http://dx.doi.org/10.1038/nmeth.1294] [PMID:  19137005] 
[98] 
Yang X, Balakrishnan I, Torok-Storb B, Pillai MM. Marrow stromal cell infusion rescues hematopoiesis in lethally irradiated mice despite rapid clearance after infusion. Advances In Hemat  2012; 2012142530
[http://dx.doi.org/10.1155/2012/142530] 
[99] 
Lanier LL, Warner NL. Paraformaldehyde fixation of hematopoietic cells for quantitative flow cytometry (FACS) analysis. J Immunol Methods  1981; 47(1): 25-30.
[http://dx.doi.org/10.1016/0022-1759(81)90253-2] [PMID:  7310138] 
[100] 
Thorpe SJ, Thein SL, Sampietro M, Craig JE, Mahon B, Huehns ER. Immunochemical estimation of haemoglobin types in red blood cells by FACS analysis. Br J Haematol  1994; 87(1): 125-32.
[http://dx.doi.org/10.1111/j.1365-2141.1994.tb04881.x] [PMID:  7524614] 
[101] 
Ye Z, Zhan H, Mali P, et al. Human-induced pluripotent stem cells from blood cells of healthy donors and patients with acquired blood disorders. Blood  2009; 114(27): 5473-80.
[http://dx.doi.org/10.1182/blood-2009-04-217406] [PMID:  19797525] 
[102] 
Yoshihara K, Shahmoradgoli M, Martínez E, et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat Commun  2013; 4: 2612.
[http://dx.doi.org/10.1038/ncomms3612] [PMID:  24113773] 
[103] 
Stuart RO, Wachsman W, Berry CC, et al. In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proc Natl Acad Sci USA  2004; 101(2): 615-20.
[http://dx.doi.org/10.1073/pnas.2536479100] [PMID:  14722351] 
[104] 
Hoffmann M, Pohlers D, Koczan D, Thiesen H-J, Wölfl S, Kinne RW. Robust computational reconstitution - a new method for the comparative analysis of gene expression in tissues and isolated cell fractions. BMC Bioinformatics  2006; 7(1): 369.
[http://dx.doi.org/10.1186/1471-2105-7-369] [PMID:  16889662] 
[105] 
Gaujoux R, Seoighe C. Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: a case study. Infect Genet Evol  2012; 12(5): 913-21.
[http://dx.doi.org/10.1016/j.meegid.2011.08.014] [PMID:  21930246] 
[106] 
Gong T, Szustakowski JD. DeconRNASeq: a statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-Seq data. Bioinformatics  2013; 29(8): 1083-5.
[http://dx.doi.org/10.1093/bioinformatics/btt090] [PMID:  23428642] 
[107] 
Moffitt RA, Marayati R, Flate EL, et al. Virtual microdissection identifies distinct tumor- and stroma-specific subtypes of pancreatic ductal adenocarcinoma. Nat Genet  2015; 47(10): 1168-78.
[http://dx.doi.org/10.1038/ng.3398] [PMID:  26343385] 
[108] 
Dimitrakopoulou K, Wik E, Akslen LA, Jonassen I. Deblender: a semi-/unsupervised multi-operational computational method for complete deconvolution of expression data from heterogeneous samples. BMC Bioinformatics  2018; 19(1): 408.
[http://dx.doi.org/10.1186/s12859-018-2442-5] [PMID:  30404611] 
[109] 
Laajala TD, Gerke T, Tyekucheva S, Costello JC. Modeling genetic heterogeneity of drug response and resistance in cancer. Curr Opin Syst Biol  2019; 17: 8-14.
[http://dx.doi.org/10.1016/j.coisb.2019.09.003] 
[110] 
Ahn J, Yuan Y, Parmigiani G, et al. DeMix: deconvolution for mixed cancer transcriptomes using raw measured data. Bioinformatics  2013; 29(15): 1865-71.
[http://dx.doi.org/10.1093/bioinformatics/btt301] [PMID:  23712657] 
[111] 
Anghel CV, Quon G, Haider S, et al. ISOpureR: an R implementation of a computational purification algorithm of mixed tumour profiles. BMC Bioinformatics  2015; 16(1): 156.
[http://dx.doi.org/10.1186/s12859-015-0597-x] [PMID:  25972088] 
[112] 
Shannon CP, Balshaw R, Ng RT, et al. Two-stage, in silico deconvolution of the lymphocyte compartment of the peripheral whole blood transcriptome in the context of acute kidney allograft rejection. PLoS One  2014; 9(4)e95224
[http://dx.doi.org/10.1371/journal.pone.0095224] [PMID:  24733377] 
[113] 
Clarke J, Seo P, Clarke B. Statistical expression deconvolution from mixed tissue samples. Bioinformatics  2010; 26(8): 1043-9.
[http://dx.doi.org/10.1093/bioinformatics/btq097] [PMID:  20202973] 
[114] 
Kuhn A, Kumar A, Beilina A, Dillman A, Cookson MR, Singleton AB. Cell population-specific expression analysis of human cerebellum. BMC Genomics  2012; 13(1): 610.
[http://dx.doi.org/10.1186/1471-2164-13-610] [PMID:  23145530] 
[115] 
Sun J, Bi J, Kranzler HR. Multi-view singular value decomposition for disease subtyping and genetic associations. BMC Genet  2014; 15(1): 73.
[http://dx.doi.org/10.1186/1471-2156-15-73] [PMID:  24938865] 
[116] 
Fu Y, Yu G, Levine DA, et al. BACOM2.0 facilitates absolute normalization and quantification of somatic copy number alterations in heterogeneous tumor. Sci Rep  2015; 5: 13955.
[http://dx.doi.org/10.1038/srep13955] [PMID:  26350498] 
[117] 
Patel AP, Tirosh I, Trombetta JJ, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science  2014; 344(6190): 1396-401.
[http://dx.doi.org/10.1126/science.1254257] [PMID:  24925914] 
[118] 
Papalexi E, Satija R. Single-cell RNA sequencing to explore immune cell heterogeneity. Nat Rev Immunol  2018; 18(1): 35-45.
[http://dx.doi.org/10.1038/nri.2017.76] [PMID:  28787399] 
[119] 
Jaitin DA, Kenigsberg E, Keren-Shaul H, et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science  2014; 343(6172): 776-9.
[http://dx.doi.org/10.1126/science.1247651] [PMID:  24531970] 
[120] 
Zhu X, Ching T, Pan X, Weissman SM, Garmire L. Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factorization. PeerJ  2017; 5e2888
[http://dx.doi.org/10.7717/peerj.2888] [PMID:  28133571] 
[121] 
Angermueller C, Clark SJ, Lee HJ, et al. Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat Methods  2016; 13(3): 229-32.
[http://dx.doi.org/10.1038/nmeth.3728] [PMID:  26752769] 
[122] 
Hou Y, Guo H, Cao C, et al. Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas. Cell Res  2016; 26(3): 304-19.
[http://dx.doi.org/10.1038/cr.2016.23] [PMID:  26902283] 
Rights & Permissions Print Cite
Article Metrics
25
Journal Information
For Authors
For Editors
For Reviewers
Explore Articles
Open Access
Open Access Articles
For Visitors
DOI https://dx.doi.org/10.2174/1574893615666200120110205	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X
Current Bioinformatics

The Power of Matrix Factorization: Methods for Deconvoluting Genetic Heterogeneous Data at Expression Level

Abstract

Graphical Abstract

Related Journals

Related Books

Related Articles