This chapter explores the construction of a detailed machine
learning (ML) framework for predicting diabetes using diverse real-world
datasets. The alarming rise in diabetes prevalence globally and particularly in
developing nations such as India necessitates innovative approaches for early
detection and intervention. Traditional diagnostic techniques, though
clinically established, often fall short in scalability and adaptability. This
study focuses on bridging this gap by integrating ML methodologies that not
only offer superior prediction accuracy but also provide transparency through
interpretability tools.
Key contributions of this work include the comprehensive data
preprocessing steps (missing value treatment, normalisation, encoding, and
SMOTE-based class balancing), the comparative evaluation of three widely used
classifiers (Logistic Regression, Random Forest, and XGBoost), and the use of
SHAP values for enhancing model interpretability. Among the models tested,
XGBoost achieved the highest performance with an accuracy of 97.93%, AUC of
0.9974, and excellent sensitivity and specificity values, confirming its
suitability for real-world healthcare applications. The chapter concludes with
discussions on model performance, interpretability, clinical relevance,
limitations, and avenues for future research.
Author(s)
Details
Mounika
Panjala
Department of Statistics, Osmania University, Hyderabad-7,
Telangana, India.
Bhatracharyulu
N.Ch.
Department of Statistics, Osmania University, Hyderabad-7,
Telangana, India.
Please see the book here:- https://doi.org/10.9734/bpi/nhstc/v3/5950
No comments:
Post a Comment