Title:Recent Trends on the Development of Machine Learning Approaches
for the Prediction of Lysine Acetylation Sites
Volume: 29
Issue: 2
Author(s): Shaherin Basith, Hye Jin Chang, Saraswathy Nithiyanandam, Tae Hwan Shin, Balachandran Manavalan*Gwang Lee*
Affiliation:
- Department of Physiology, Ajou University School of Medicine, Suwon,Korea
- Department of Physiology, Ajou University School of Medicine, Suwon,Korea
Keywords:
Protein, post-translational modification, lysine, acetylation, machine learning, feature encoding, prediction model.
Abstract: Acetylation on lysine residues is considered one of the most potent protein
post-translational modifications, owing to its crucial role in cellular metabolism and regulatory
processes. Recent advances in experimental techniques have unraveled several lysine
acetylation substrates and sites. However, owing to its cost-ineffectiveness, cumbersome
process, time-consumption, and labor-intensiveness, several efforts have been
geared towards the development of computational tools. In particular, machine learning
(ML)-based approaches hold great promise in the rapid discovery of lysine acetylation
modification sites, which could be witnessed by the growing number of prediction tools.
Recently, several ML methods have been developed for the prediction of lysine acetylation
sites, owing to their time- and cost-effectiveness. In this review, we present a complete
survey of the state-of-the-art ML predictors for lysine acetylation. We discuss a variety
of key aspects for developing a successful predictor, including operating ML algorithms,
feature selection methods, validation techniques, and software utility. Initially, we
review lysine acetylation site databases, current ML approaches, working principles, and
their performances. Lastly, we discuss the shortcomings and future directions of ML approaches
in the prediction of lysine acetylation sites. This review may act as a useful
guide for the experimentalists in choosing the right ML tool for their research. Moreover,
it may help bioinformaticians in the development of more accurate and advanced MLbased
predictors in protein research.