A COMPARATIVE ANALYSIS ON DIABETES USING ENSEMBLE MACHINE LEARNING MODELS ON PIMA INDIAN DATASETS

Authors

  • Paul, R. Uzoamaka Department of Computer Science, Nnamdi Azikiwe University, Awka, Anambra State.
  • Mbeledogu, N. Njideka Department of Computer Science, Faculty of Physical Sciences, Nnamdi Azikiwe University, Awka,
  • Iduh B. Nwamaka Department of Computer Science, Nnamdi Azikiwe University, Awka, Anambra State.
  • Okechukwu O. Patience Department of Computer Science, Nnamdi Azikiwe University, Awka, Anambra State.

Keywords:

Diabetes Prediction, Machine Learning, Random Forest, XGBoost, Support Vector Machine (SVM), Medical Diagnosis, Pima Indians Diabetes Dataset, Early Detection, Ensemble Learning

Abstract

Diabetes is a major global health concern, affecting millions worldwide and leading to severe health complications if not detected early. Timely and accurate diabetes prediction can greatly enhance patient outcomes. We suggest a diabetes prediction system in this article that uses a number of machine learning (ML) models, such as Logistic Regression, Random Forest, Support Vector Machine, and XGBoost. The models were evaluated using the Pima Indians Diabetes Dataset. Accuracy, precision, recall, F1-score, and The receiver operating characteristic (ROC) curve's area under the curve (ROC) metrics were used to evaluate performance. Our findings reveal that ensemble methods like Random Forest and XGBoost outperformed traditional classifiers, achieving prediction accuracy of above 88.0%. This work highlights the potential of machine learning models in the early detection of diabetes and provides insights for developing scalable, real-time prediction systems.

Downloads

Published

2025-06-09