A COMPARATIVE ANALYSIS ON DIABETES USING ENSEMBLE MACHINE LEARNING MODELS ON PIMA INDIAN DATASETS
Keywords:
Diabetes Prediction, Machine Learning, Random Forest, XGBoost, Support Vector Machine (SVM), Medical Diagnosis, Pima Indians Diabetes Dataset, Early Detection, Ensemble LearningAbstract
Diabetes is a major global health concern, affecting millions worldwide and leading to severe health complications if not detected early. Timely and accurate diabetes prediction can greatly enhance patient outcomes. We suggest a diabetes prediction system in this article that uses a number of machine learning (ML) models, such as Logistic Regression, Random Forest, Support Vector Machine, and XGBoost. The models were evaluated using the Pima Indians Diabetes Dataset. Accuracy, precision, recall, F1-score, and The receiver operating characteristic (ROC) curve's area under the curve (ROC) metrics were used to evaluate performance. Our findings reveal that ensemble methods like Random Forest and XGBoost outperformed traditional classifiers, achieving prediction accuracy of above 88.0%. This work highlights the potential of machine learning models in the early detection of diabetes and provides insights for developing scalable, real-time prediction systems.
