A Comprehensive Analysis of Diabetes Risk Prediction Using Logistic Regression

  IJCTT-book-cover
 
         
 
© 2024 by IJCTT Journal
Volume-72 Issue-11
Year of Publication : 2024
Authors : Asish Pradhan
DOI :  10.14445/22312803/IJCTT-V72I11P122

How to Cite?

Asish Pradhan, "A Comprehensive Analysis of Diabetes Risk Prediction Using Logistic Regression," International Journal of Computer Trends and Technology, vol. 72, no. 11, pp. 192-219, 2024. Crossref, https://doi.org/10.14445/22312803/IJCTT-V72I11P122

Abstract
This study aimed to develop a predictive model for diabetes risk using a combination of demographic, examination, diet, and laboratory data. The dataset was processed through ETL (Extract, Transform, Load) and EDA (Exploratory Data Analysis) to identify potential correlations. A logistic regression model was built and evaluated using various metrics, achieving an accuracy of approximately 93%. The results indicate that the model can accurately predict diabetes risk, making it a valuable tool for healthcare professionals. The study demonstrates a comprehensive approach to building a predictive model for diabetes risk using a multidimensional dataset with potential applications in healthcare.

Keywords
Diabetes prediction, Exploratory data analysis, Healthcare data analysis, Logistic regression, Machine learning, Monte Carlo simulation.

Reference

[1] Deepti Sisodia, and Dilip Singh Sisodia, “Prediction of Diabetes using Classification Algorithms,” Procedia Computer Science, vol. 132, pp. 1578-1585, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[2] T.B. Sivakumar et al., “Enhanced Diabetes Prediction Using Deep Autoencoder Framework and Electronic Health Records,” 2024 Second International Conference on Advances in Information Technology (ICAIT), Chikkamagaluru, Karnataka, India, pp. 1-4, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Dirk P. Kroese et al., “Why the Monte Carlo Method is so Important Today,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 6, no. 6, pp. 386-392, 2014.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Ram D. Joshi, and Chandra K. Dhakal, “Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches,” International Journal of Environmental Research and Public Health, vol. 18, no. 14, pp. 1-17, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[5] B.J. Bipin Nair, S. Yadhukrishnan, and A. Manish, “A Comparative Study on Document Images Classification using Logistic Regression and Multiple Linear Regressions,” 2023 Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS), Trichy, India, pp. 1096-1104, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[6] R. Bhuvana, S. Maheshwari, and S. Sasikala, “Predict the Heart Disease Using a Logistic Regression Classifier Algorithm,” 2023 12th International Conference on System Modeling & Advancement in Research Trends, Moradabad, India, pp. 649-652, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Nishant Pritam et al., “Classification of Student Mental Health Analysis using Logistic Regression and other Classification Techniques through Machine Learning Methods,” 2024 3rd International Conference for Innovation in Technology (INOCON), pp. 1-5, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[8] S. Reshmi et al., “Diabetes Prediction Using Machine Learning Analytics,” 2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON), Faridabad, India, pp. 108-112, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Sangit Poudel, and Nava Raj Karki, “Composite System Adequacy Assessment Using Monte Carlo Simulation and Logistic Regression Classifier,” 2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology (ODICON), Bhubaneswar, India, pp. 1-5, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Bradley Efron, and Robert Tibshirani, An Introduction to the Bootstrap, Taylor & Francis, pp. 1-436, 1993.
[Google Scholar] [Publisher Link]
[11] Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.
[Google Scholar] [Publisher Link]
[12] Douglas C. Montgomery, Elizabeth A. Peck, and G. Geoffrey Vining, Introduction to Linear Regression Analysis, John Wiley & Sons, United States, 2021.
[Google Scholar] [Publisher Link]
[13] Michael H. Kutner et al., Applied Linear Statistical Models, 5th ed., McGraw-Hill, 2005.
[Google Scholar] [Publisher Link]
[14] Eric Matthes, Python Crash Course, 2nd ed., No Starch Press, pp. 1-544, 2019.
[Google Scholar] [Publisher Link]
[15] Maruthi Ram, Exploratory Data Analysis (EDA) on Diabetes Data Set, Medium, 2021. [Online]. Available: https://medium.com/@maruthiram1/exploratory-data-analysis-eda-on-diabetes-data-set-ee05044f7c0b
[16] Ayushi Aggarwal, Exploratory Data Analysis (EDA) and Classification on PIMA Indian Diabetes DataSet, Medium, 2022. [Online]. Available: https://medium.com/crossml/exploratory-data-analysis-eda-and-classification-on-pima-indian-diabetes-dataset-e4c649a666e9
[17] National Health and Nutrition Examination Survey, Kaggle, 2013-2014. [Online]. Available: https://www.kaggle.com/datasets/cdc/national-health-and-nutrition-examination-survey
[18] John H. McDonald, Multiple Logistic Regression, LibreTexts Statistics, 2024. [Online]. Available: https://stats.libretexts.org/Bookshelves/Applied_Statistics/Biological_Statistics_(McDonald)/05%3A_Tests_for_Multiple_Measurement _Variables/5.07%3A_Multiple_Logistic_Regression
[19] K.S.V. Muralidhar, Learning Curve to identify Overfitting and Underfitting in Machine Learning, Medium, 2021. [Online]. Available: https://towardsdatascience.com/learning-curve-to-identify-overfitting-underfitting-problems-133177f38df5
[20] Pia Pajunen et al., “Sagittal Abdominal Diameter as a New Predictor for Incident Diabetes,” Diabetes Care, vol. 36, no. 2, pp. 283-288, 2013.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Body Mass Index. [Online]. Available: https://en.wikipedia.org/wiki/Body_mass_index
[22] National Diabetes Statistics Report, Diabetes, 2024. [Online]. Available: https://www.cdc.gov/diabetes/php/data research/?CDC_AAref_Val=https://www.cdc.gov/diabetes/pdfs/data/statistics/national-diabetes-statistics-report.pdf