Representation Learning of Biological Concepts: A Systematic Review

Yuntao      Yang; Xu      Zuo; Avisha      Das; Hua      Xu; Wenjin      Zheng

doi:10.2174/1574893618666230612161210

Abstract

Objective: Representation learning in the context of biological concepts involves acquiring their numerical representations through various sources of biological information, such as sequences, interactions, and literature. This study has conducted a comprehensive systematic review by analyzing both quantitative and qualitative data to provide an overview of this field.

Methods: Our systematic review involved searching for articles on the representation learning of biological concepts in PubMed and EMBASE databases. Among the 507 articles published between 2015 and 2022, we carefully screened and selected 65 papers for inclusion. We then developed a structured workflow that involved identifying relevant biological concepts and data types, reviewing various representation learning techniques, and evaluating downstream applications for assessing the quality of the learned representations.

Results: The primary focus of this review was on the development of numerical representations for gene/DNA/RNA entities. We have found Word2Vec to be the most commonly used method for biological representation learning. Moreover, several studies are increasingly utilizing state-of-the-art large language models to learn numerical representations of biological concepts. We also observed that representations learned from specific sources were typically used for single downstream applications that were relevant to the source.

Conclusion: Existing methods for biological representation learning are primarily focused on learning representations from a single data type, with the output being fed into predictive models for downstream applications. Although there have been some studies that have explored the use of multiple data types to improve the performance of learned representations, such research is still relatively scarce. In this systematic review, we have provided a summary of the data types, models, and downstream applications used in this task.

Keywords: Machine learning, biological concepts, representation learning, embedding, natural language processing, graph neural networks.

« Previous Next »

[1]
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature  2015; 521(7553): 436-44.

[2]
Fakoor R, Ladhak F, Nazi A, Huber M, Eds. Using deep learning to enhance cancer diagnosis and classification. Proceedings of the international conference on machine learning:.  New York. 2013; pp. 3937-49.

[3]
Lyons J, Dehzangi A, Heffernan R, et al. Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto-encoder deep neural network. J Comput Chem  2014; 35(28): 2040-6.
 [http://dx.doi.org/10.1002/jcc.23718] [PMID:  25212657]

[4]
Zeng H, Edwards MD, Liu G, Gifford DK. Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics  2016; 32(12): i121-7.
 [http://dx.doi.org/10.1093/bioinformatics/btw255] [PMID:  27307608]

[5]
Tange HJ, Schouten HC, Kester ADM, Hasman A. The granularity of medical narratives and its effect on the speed and completeness of information retrieval. J Am Med Inform Assoc  1998; 5(6): 571-82.
 [http://dx.doi.org/10.1136/jamia.1998.0050571] [PMID:  9824804]

[6]
Wijaya CY. 4 Categorical Encoding Concepts to Know for Data Scientists 2021. Available from: https://towardsdatascience.com/4-categorical-encoding-concepts-to-know-for-data-scientists-e144851c6383

[7]
Firth J. A synopsis of linguistic theory, 1930-1955. In:In Studies in Linguistic Analysis.  Oxford: Blackwell 1957; pp. 10-32.

[8]
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R. Indexing by latent semantic analysis. J Am Soc Inf Sci  1990; 41(6): 391-407.
 [http://dx.doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9]

[9]
Landauer TK, Dumais ST. A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev  1997; 104(2): 211-40.
 [http://dx.doi.org/10.1037/0033-295X.104.2.211]

[10]
Dumais ST. Latent semantic analysis. Annu Rev Inform Sci Tech  2004; 38(1): 188-230.
 [http://dx.doi.org/10.1002/aris.1440380105]

[11]
Li G, Du X, Li X, Zou L, Zhang G, Wu Z. Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning. PeerJ  2021; 9: e11262.
 [http://dx.doi.org/10.7717/peerj.11262] [PMID:  33986992]

[12]
Hofmann T. Unsupervised learning by probabilistic latent semantic analysis. Mach Learn  2001; 42(1/2): 177-96.
 [http://dx.doi.org/10.1023/A:1007617005950]

[13]
Cohen T, Widdows D. Empirical distributional semantics: Methods and biomedical applications. J Biomed Inform  2009; 42(2): 390-405.
 [http://dx.doi.org/10.1016/j.jbi.2009.02.002] [PMID:  19232399]

[14]
Tsoi LC, Boehnke M, Klein RL, Zheng WJ. Evaluation of genome-wide association study results through development of ontology fingerprints. Bioinformatics  2009; 25(10): 1314-20.
 [http://dx.doi.org/10.1093/bioinformatics/btp158] [PMID:  19349285]

[15]
Qin T, Matmati N, Tsoi LC, Mohanty BK, Gao N, Tang J. Finding pathway-modulating genes from a novel Ontology Fingerprint-derived gene network. Nucleic Acids Res  2014; 42(18): e138.
 [http://dx.doi.org/10.1093/nar/gku678]

[16]
Aizawa A. An information-theoretic perspective of tf–idf measures. Inf Process Manage  2003; 39(1): 45-65.
 [http://dx.doi.org/10.1016/S0306-4573(02)00021-3]

[17]
Pennington J, Socher R, Manning CD, Eds. Glove: Global vectors for word representation. Proceedings of the 2014 conference onempirical methods in natural language processing (EMNLP):.  Doha, Qatar 2014; pp. 1532-43.
 [http://dx.doi.org/10.3115/v1/D14-1162]

[18]
Guthrie D, Allison B, Liu W, Guthrie L, Wilks Y, Eds. A closer look at skip-gram modelling. LREC; Genoa, Italy 2006; pp. 1222-5.

[19]
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv:13013781 2013.

[20]
Bojanowski P, Grave E, Joulin A, Mikolov T. Enriching word vectors with subword information. Trans Assoc Comput Linguist  2017; 5: 135-46.
 [http://dx.doi.org/10.1162/tacl_a_00051]

[21]
Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K. Deep contextualized word representations. arXiv:180205365 2018.
 [http://dx.doi.org/10.18653/v1/N18-1202]

[22]
Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:181004805 2018.

[23]
Le Q, Mikolov T. Distributed representations of sentences and documents. arXiv:14054053 2014.

[24]
Wu L, Fisch A, Chopra S, Adams K, Bordes A, Weston J,, Eds. Starspace: Embed all the things! Proceedings of the AAAI conference on artificial intelligence;. New Orleans, USA. 2018.

[25]
Tang J, Qu M, Wang M, Zhang M, Yan J, Mei Q, Eds. Line: Large-scale information network embedding. Proceedings of the 24th international conference on world wide web:.  Florence, Italy. 2018; pp. 1067-77.
 [http://dx.doi.org/10.1145/2736277.2741093]

[26]
Grover A, Leskovec J. Eds. node2vec: Scalable feature learning for networks. Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining;.  California, USA 2016; pp. 855-64.
 [http://dx.doi.org/10.1145/2939672.2939754]

[27]
Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv:160902907 2016.

[28]
Le NQK, Ho QT, Nguyen TTD, Ou YY. A transformer architecture based on BERT and 2D convolutional neural network to identify DNA enhancers from sequence information. Brief Bioinform  2021; 22(5): bbab005.
 [http://dx.doi.org/10.1093/bib/bbab005] [PMID:  33539511]

[29]
Charoenkwan P, Nantasenamat C, Hasan MM, Manavalan B, Shoombuatong W. BERT4Bitter: A bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides. Bioinformatics  2021; 37(17): 2556-62.
 [http://dx.doi.org/10.1093/bioinformatics/btab133] [PMID:  33638635]

[30]
Li K, Zhong Y, Lin X, Quan Z. Predicting the disease risk of protein mutation sequences with pre-training model. Front Genet  2020; 11: 605620.
 [http://dx.doi.org/10.3389/fgene.2020.605620] [PMID:  33408741]

[31]
Zhang W, Xue Z, Li Z, Yin H. DCE-DForest: A deep forest model for the prediction of anticancer drug combination effects. Comput Math Methods Med  2022; 2022: 8693746.

[32]
Yuan H, Kshirsagar M, Zamparo L, Lu Y, Leslie CS. BindSpace decodes transcription factor binding signals by large-scale sequence embedding. Nat Methods  2019; 16(9): 858-61.
 [http://dx.doi.org/10.1038/s41592-019-0511-y] [PMID:  31406384]

[33]
Yang KK, Wu Z, Bedbrook CN, Arnold FH. Learned protein embeddings for machine learning. Bioinformatics  2018; 34(15): 2642-8.
 [http://dx.doi.org/10.1093/bioinformatics/bty178] [PMID:  29584811]

[34]
Zou Q, Xing P, Wei L, Liu B. Gene2vec: Gene subsequence embedding for prediction of mammalian N6 -methyladenosine sites from mRNA. RNA  2019; 25(2): 205-18.
 [http://dx.doi.org/10.1261/rna.069112.118] [PMID:  30425123]

[35]
Zeng W, Wu M, Jiang R. Prediction of enhancer-promoter interactions via natural language processing. BMC Genomics  2018; 19(S2): 84.
 [http://dx.doi.org/10.1186/s12864-018-4459-6] [PMID:  29764360]

[36]
Wang Y, You ZH, Yang S, Li X, Jiang TH, Zhou X. A high efficient biological language model for predicting protein–protein interactions. Cells  2019; 8(2): 122.
 [http://dx.doi.org/10.3390/cells8020122] [PMID:  30717470]

[37]
Woloszynek S, Zhao Z, Chen J, Rosen GL. 16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses. PLOS Comput Biol  2019; 15(2): e1006721.
 [http://dx.doi.org/10.1371/journal.pcbi.1006721] [PMID:  30807567]

[38]
ÖZCAN ŞN, Özgür A, Gürgen F. Statistical representation models for mutation information within genomic data. BMC Bioinformatics  2019; 20(1): 1-13.
 [PMID:  30606105]

[39]
Wu C, Gao R, Zhang Y, De Marinis Y. PTPD: Predicting therapeutic peptides by deep learning and word2vec. BMC Bioinformatics  2019; 20(1): 456.
 [http://dx.doi.org/10.1186/s12859-019-3006-z] [PMID:  31492094]

[40]
Nguyen TTD, Le NQK, Ho QT, Phan DV, Ou YY. Using word embedding technique to efficiently represent protein sequences for identifying substrate specificities of transporters. Anal Biochem  2019; 577: 73-81.
 [http://dx.doi.org/10.1016/j.ab.2019.04.011] [PMID:  31022378]

[41]
Asgari E, McHardy AC, Mofrad MRK. Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX). Sci Rep  2019; 9(1): 3577.
 [http://dx.doi.org/10.1038/s41598-019-38746-w] [PMID:  30837494]

[42]
Aoki G, Sakakibara Y. Convolutional neural networks for classification of alignments of non-coding RNA sequences. Bioinformatics  2018; 34(13): i237-44.
 [http://dx.doi.org/10.1093/bioinformatics/bty228] [PMID:  29949978]

[43]
Pan X, Zuallaert J, Wang X, et al. ToxDL: Deep learning using primary structure and domain embeddings for assessing protein toxicity. Bioinformatics  2021; 36(21): 5159-68.
 [http://dx.doi.org/10.1093/bioinformatics/btaa656] [PMID:  32692832]

[44]
Yang S, Liu X, Ng RT. ProbeRating: A recommender system to infer binding profiles for nucleic acid-binding proteins. Bioinformatics  2020; 36(18): 4797-804.
 [http://dx.doi.org/10.1093/bioinformatics/btaa580] [PMID:  32573679]

[45]
Xie W, Luo J, Pan C, Liu Y. SG-LSTM-FRAME: A computational frame using sequence and geometrical information via LSTM to predict miRNA–gene associations. Brief Bioinform  2021; 22(2): 2032-42.
 [http://dx.doi.org/10.1093/bib/bbaa022] [PMID:  32181478]

[46]
Chen Z, He N, Huang Y, Qin WT, Liu X, Li L. Integration of a deep learning classifier with a random forest approach for predicting malonylation sites. Genom Proteom Bioinform  2018; 16(6): 451-9.
 [http://dx.doi.org/10.1016/j.gpb.2018.08.004] [PMID:  30639696]

[47]
Yang S, Wang Y, Lin Y, Shao D, He K, Huang L. LncMirNet: Predicting LncRNA–miRNA interaction based on deep learning of ribonucleic acid sequences. Molecules  2020; 25(19): 4372.
 [http://dx.doi.org/10.3390/molecules25194372] [PMID:  32977679]

[48]
Asgari E, Mofrad MRK. Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS One  2015; 10(11): e0141287.
 [http://dx.doi.org/10.1371/journal.pone.0141287] [PMID:  26555596]

[49]
Khanal J, Tayara H, Zou Q, Chong KT. Identifying DNA N4-methylcytosine sites in the rosaceae genome with a deep learning model relying on distributed feature representation. Comput Struct Biotechnol J  2021; 19: 1612-9.
 [http://dx.doi.org/10.1016/j.csbj.2021.03.015] [PMID:  33868598]

[50]
Xu B, Tan Z, Li K, Jiang T, Peng Y. Predicting the host of influenza viruses based on the word vector. PeerJ  2017; 5: e3579.
 [http://dx.doi.org/10.7717/peerj.3579] [PMID:  28729956]

[51]
Zeng M, Wu Y, Lu C, Zhang F, Wu FX, Li M. DeepLncLoc: A deep learning framework for long non-coding RNA subcellular localization prediction based on subsequence embedding. Brief Bioinform  2022; 23(1): bbab360.
 [http://dx.doi.org/10.1093/bib/bbab360] [PMID:  34498677]

[52]
Wang Z, Lei X. Prediction of RBP binding sites on circRNAs using an LSTM-based deep sequence learning architecture. Brief Bioinform  2021; 22(6): bbab342.
 [http://dx.doi.org/10.1093/bib/bbab342] [PMID:  34415289]

[53]
Ostrovsky-Berman M, Frankel B, Polak P, Yaari G. Immune2vec: Embedding B/T cell receptor sequences in N using natural language processing. Front Immunol  2021; 12: 680687.
 [http://dx.doi.org/10.3389/fimmu.2021.680687] [PMID:  34367141]

[54]
Heinzinger M, Elnaggar A, Wang Y, et al. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics  2019; 20(1): 723.
 [http://dx.doi.org/10.1186/s12859-019-3220-8] [PMID:  31847804]

[55]
Liu XQ, Li BX, Zeng GR, Liu QY, Ai DM. Prediction of long non-coding RNAs based on deep learning. Genes  2019; 10(4): 273.
 [http://dx.doi.org/10.3390/genes10040273] [PMID:  30987229]

[56]
Chen Z-H, You Z-H, Zhang W-B, Wang Y-B, Cheng L, Alghazzawi D. Global vectors representation of protein sequences and its application for predicting self-interacting proteins with multi-grained cascade forest model. Genes  2019; 10(11): 924.
 [http://dx.doi.org/10.3390/genes10110924] [PMID:  31726752]

[57]
Vang YS, Xie X. HLA class I binding prediction via convolutional neural networks. Bioinformatics  2017; 33(17): 2658-65.
 [http://dx.doi.org/10.1093/bioinformatics/btx264] [PMID:  28444127]

[58]
Min X, Zeng W, Chen N, Chen T, Jiang R. Chromatin accessibility prediction via convolutional long short-term memory networks with k-mer embedding. Bioinformatics  2017; 33(14): i92-i101.
 [http://dx.doi.org/10.1093/bioinformatics/btx234] [PMID:  28881969]

[59]
Hong J, Gao R, Yang Y. CrepHAN: Cross-species prediction of enhancers by using hierarchical attention networks. Bioinformatics  2021; 37(20): 3436-43.
 [http://dx.doi.org/10.1093/bioinformatics/btab349] [PMID:  33978703]

[60]
Jin Y, Lu J, Shi R, Yang Y. EmbedDTI: Enhancing the molecular representations via sequence embedding and graph convolutional network for the prediction of drug-target interaction. Biomolecules  2021; 11(12): 1783.
 [http://dx.doi.org/10.3390/biom11121783] [PMID:  34944427]

[61]
Hou WJ, Ceesay B. Extraction of drug–drug interaction using neural embedding. J Bioinform Comput Biol  2018; 16(6): 1840027.
 [http://dx.doi.org/10.1142/S0219720018400279] [PMID:  30567477]

[62]
Chen Q, Lee K, Yan S, Kim S, Wei CH, Lu Z. BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale. PLOS Comput Biol  2020; 16(4): e1007617.
 [http://dx.doi.org/10.1371/journal.pcbi.1007617] [PMID:  32324731]

[63]
You R, Huang X, Zhu S. DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation. Methods  2018; 145: 82-90.
 [http://dx.doi.org/10.1016/j.ymeth.2018.05.026] [PMID:  29883746]

[64]
Patrick MT, Raja K, Miller K, et al. Drug repurposing prediction for immune-mediated cutaneous diseases using a word-embedding–based machine learning approach. J Invest Dermatol  2019; 139(3): 683-91.
 [http://dx.doi.org/10.1016/j.jid.2018.09.018] [PMID:  30342048]

[65]
Du J, Jia P, Dai Y, Tao C, Zhao Z, Zhi D. Gene2vec: Distributed representation of genes based on co-expression. BMC Genomics  2019; 20(S1): 82.
 [http://dx.doi.org/10.1186/s12864-018-5370-x] [PMID:  30712510]

[66]
Choi J, Oh I, Seo S, Ahn J. G2Vec: Distributed gene representations for identification of cancer prognostic genes. Sci Rep  2018; 8(1): 13729.
 [http://dx.doi.org/10.1038/s41598-018-32180-0] [PMID:  30213980]

[67]
Dai W, Chang Q, Peng W, Zhong J, Li Y. Network embedding the protein–protein interaction network for human essential genes identification. Genes  2020; 11(2): 153.
 [http://dx.doi.org/10.3390/genes11020153] [PMID:  32023848]

[68]
Alachram H, Chereda H, Beißbarth T, Wingender E, Stegmaier P. Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks. PLoS One  2021; 16(10): e0258623.
 [http://dx.doi.org/10.1371/journal.pone.0258623] [PMID:  34653224]

[69]
Yang K, Wang R, Liu G, et al. HerGePred: heterogeneous network embedding representation for disease gene prediction. IEEE J Biomed Health Inform  2019; 23(4): 1805-15.
 [http://dx.doi.org/10.1109/JBHI.2018.2870728] [PMID:  31283472]

[70]
Chen L, Zhang YH, Huang G, Pan X, Huang T, Cai YD. Inferring novel genes related to oral cancer with a network embedding method and one-class learning algorithms. Gene Ther  2019; 26(12): 465-78.
 [http://dx.doi.org/10.1038/s41434-019-0099-y] [PMID:  31455874]

[71]
Xiao Z, Deng Y. Graph embedding-based novel protein interaction prediction via higher-order graph convolutional network. PLoS One  2020; 15(9): e0238915.
 [http://dx.doi.org/10.1371/journal.pone.0238915] [PMID:  32970681]

[72]
Zhang X, Xiao W, Xiao W, Deep HE. DeepHE: Accurately predicting human essential genes based on deep learning. PLOS Comput Biol  2020; 16(9): e1008229.
 [http://dx.doi.org/10.1371/journal.pcbi.1008229] [PMID:  32936825]

[73]
Pan X, Lu L, Cai YD. Predicting protein subcellular location with network embedding and enrichment features. Biochim Biophys Acta Proteins Proteomics  2020; 1868(10): 140477.
 [http://dx.doi.org/10.1016/j.bbapap.2020.140477] [PMID:  32593761]

[74]
Deepika SS, Geetha TV. A meta-learning framework using representation learning to predict drug-drug interaction. J Biomed Inform  2018; 84: 136-47.
 [http://dx.doi.org/10.1016/j.jbi.2018.06.015] [PMID:  29959033]

[75]
Devkota K, Murphy JM, Cowen LJ. GLIDE: Combining local methods and diffusion state embeddings to predict missing interactions in biological networks. Bioinformatics  2020; 36(S1): i464-73.
 [http://dx.doi.org/10.1093/bioinformatics/btaa459] [PMID:  32657369]

[76]
Zhang J, Jiang Z, Hu X, Song B. A novel graph attention adversarial network for predicting disease-related associations. Methods  2020; 179: 81-8.
 [http://dx.doi.org/10.1016/j.ymeth.2020.05.010] [PMID:  32446956]

[77]
Li J, Liu Y, Zhang Z, Liu B, Wang Y. PmDNE: Prediction of miRNA-disease association based on network embedding and network similarity analysis. Biomed Res Int  2020; 2020: 6248686.
 [http://dx.doi.org/10.1155/2020/6248686]

[78]
Zhang HY, Wang L, You ZH, et al. iGRLCDA: identifying circRNA–disease association based on graph representation learning. Brief Bioinform  2022; 23(3): bbac083.
 [http://dx.doi.org/10.1093/bib/bbac083] [PMID:  35323894]

[79]
Li L, Wang YT, Ji CM, Zheng CH, Ni JC, Su YS. GCAEMDA: Predicting miRNA-disease associations via graph convolutional autoencoder. PLOS Comput Biol  2021; 17(12): e1009655.
 [http://dx.doi.org/10.1371/journal.pcbi.1009655] [PMID:  34890410]

[80]
Kang C, Zhang H, Liu Z, Huang S, Yin Y. LR-GNN: A graph neural network based on link representation for predicting molecular associations. Brief Bioinform  2022; 23(1): bbab513.
 [http://dx.doi.org/10.1093/bib/bbab513] [PMID:  34889446]

[81]
Lan W, Dong Y, Chen Q, et al. KGANCDA: Predicting circRNA-disease associations based on knowledge graph attention network. Brief Bioinform  2022; 23(1): bbab494.
 [http://dx.doi.org/10.1093/bib/bbab494] [PMID:  34864877]

[82]
Xuan P, Zhan L, Cui H, Zhang T, Nakaguchi T, Zhang W. Graph triple-attention network for disease-related lncRNA prediction. IEEE J Biomed Health Inform  2022; 26(6): 2839-49.
 [http://dx.doi.org/10.1109/JBHI.2021.3130110] [PMID:  34813484]

[83]
Bamunu Mudiyanselage T, Lei X, Senanayake N, Zhang Y, Pan Y. Predicting CircRNA disease associations using novel node classification and link prediction models on Graph Convolutional Networks. Methods  2022; 198: 32-44.
 [http://dx.doi.org/10.1016/j.ymeth.2021.10.008] [PMID:  34748953]

[84]
Choi W, Lee H. Identifying disease-gene associations using a convolutional neural network-based model by embedding a biological knowledge graph with entity descriptions. PLoS One  2021; 16(10): e0258626.
 [http://dx.doi.org/10.1371/journal.pone.0258626] [PMID:  34653225]

[85]
Zhao X, Zhao X, Yin M. Heterogeneous graph attention network based on meta-paths for lncRNA–disease association prediction. Brief Bioinform  2022; 23(1): bbab407.
 [http://dx.doi.org/10.1093/bib/bbab407] [PMID:  34585231]

[86]
Fan Y, Chen M, Pan X. GCRFLDA: scoring lncRNA-disease associations using graph convolution matrix completion with conditional random field. Brief Bioinform  2022; 23(1): bbab361.
 [http://dx.doi.org/10.1093/bib/bbab361] [PMID:  34486019]

[87]
Ashoor H, Chen X, Rosikiewicz W, et al. Graph embedding and unsupervised learning predict genomic sub-compartments from HiC chromatin interaction data. Nat Commun  2020; 11(1): 1173.
 [http://dx.doi.org/10.1038/s41467-020-14974-x] [PMID:  32127534]

[88]
Wang J, Zhang J, Cai Y, Deng L. Deepmir2go: Inferring functions of human micrornas using a deep multi-label classification model. Int J Mol Sci  2019; 20(23): 6046.
 [http://dx.doi.org/10.3390/ijms20236046] [PMID:  31801264]

[89]
Li Y, Keqi W, Wang G. Evaluating disease similarity based on gene network reconstruction and representation. Bioinformatics  2021; 37(20): 3579-87.
 [http://dx.doi.org/10.1093/bioinformatics/btab252] [PMID:  33978702]

[90]
Kim S, Lee H, Kim K, Kang J. Mut2Vec: Distributed representation of cancerous mutations. BMC Med Genomics  2018; 11(S2): 33.
 [http://dx.doi.org/10.1186/s12920-018-0349-7] [PMID:  29697361]

[91]
Villegas-Morcillo A, Makrodimitris S, van Ham RCHJ, Gomez AM, Sanchez V, Reinders MJT. Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function. Bioinformatics  2021; 37(2): 162-70.
 [http://dx.doi.org/10.1093/bioinformatics/btaa701] [PMID:  32797179]

[92]
Lu C, Zeng M, Wu FX, Li M, Wang J. Improving circRNA–disease association prediction by sequence and ontology representations with convolutional and recurrent neural networks. Bioinformatics  2021; 36(24): 5656-64.
 [http://dx.doi.org/10.1093/bioinformatics/btaa1077] [PMID:  33367690]

[93]
Hao J, Ju CJ-T, Chen M, Sun Y, Zaniolo C, Wang W, Eds. Biojoie: Joint representation learning of biological knowledge bases. Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 
 [http://dx.doi.org/10.1145/3388440.3412477]

[94]
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Adv Neural Inf Process Syst  2020; 33: 1877-901.

[95]
PubMedGPT 2.7B 2022. 2022. Available from: https://crfm.stanford.edu/2022/12/15/pubmedgpt.html

Rights & Permissions Print Cite

Article Metrics

23

2

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893618666230612161210	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Representation Learning of Biological Concepts: A Systematic Review

Abstract

Related Journals

Related Books