Title:An Efficient Approach for Diabetes Classification Using Feature Selection and Hyperparameter Tuning
Volume: 17
Author(s): Bhanu Prakash Lohani*, Arvind Dagur. and Dhirendra Shukla
Affiliation:
- Department of SCSE, Galgotias University, Yamuna Express way, Greater Noida, India
Keywords:
Diabetes, machine learning, support vector machine, feature selection, P.S.O, hyperparameter.
Abstract: Background: Diabetes mellitus, stemming from insulin deficiency or resistance, poses
acute and chronic health issues driven by factors like age, obesity, genetics, and lifestyle. It significantly
impacts health, leading to conditions like heart disease, vision problems, and kidney dysfunction,
with a notable mortality rate reported by the WHO in 2019. The modern diet has escalated
diabetes risk. Machine learning techniques play a pivotal role in disease prediction, aiding
timely interventions.
Objective: The primary aim of this research work is to explore and contrast the effectiveness of
various existing machine-learning models for diabetes disease classification. The goal is to identify
the optimal solution that yields the highest accuracy.
Methods: In the initial phase, we implemented data pre-processing, followed by the application of
a diverse range of machine learning methods to classify diabetes mellitus. Subsequently, a comprehensive
analysis was conducted on machine learning algorithms, considering both the complete
dataset features and those selected through Particle Swarm Optimization (PSO). The assessment
covered various metrics such as accuracy score, precision, F1 score, and log loss for Support Vector
Classifier (SVC), K-Nearest Neighbours (KNN), Random Forest (RF), ADA Boost, XG Boost,
Extra Tree, and Decision Tree. Ultimately, the introduction of hyperparameter tuning was aimed
at enhancing performance and attaining the highest level of accuracy.
Results: The proposed model HSVC combines the Particle Swarm Optimization (PSO) feature
selection strategy with optimized hyperparameters, showcasing outstanding performance and
achieving an accuracy of 98.66%.
Conclusion: The models developed in this study can potentially be applied or recommended for
the classification of other health conditions in different domains, such as Parkinson’s disease,
heart disease, and many more.