Prediksi Kemungkinan Diabetes pada Tahap Awal Menggunakan Algoritma Klasifikasi Random Forest

Widya Apriliah, Ilham Kurniawan, Muhamad Baydhowi, Tri Haryati

Abstract


Abstrak

Diabetes adalah salah satu penyakit kronis yang mengancam jiwa dengan pertumbuhan tercepat yang telah mempengaruhi 422 juta orang di seluruh dunia menurut laporan Organisasi Kesehatan Dunia (WHO), pada tahun 2018. Diabetes dianggap sebagai salah satu penyakit paling mematikan dan kronis yang menyebabkan peningkatan gula darah. Banyak komplikasi terjadi jika diabetes tetap tidak diobati dan tidak teridentifikasi. Namun, peningkatan pendekatan machine learning memecahkan masalah kritis ini. Tujuan dari penelitian ini adalah merancang model yang dapat memprakirakan kemungkinan terjadinya diabetes pada pasien dengan ketelitian yang maksimal. Klasifikasi adalah teknik data mining yang menetapkan kategori pada kumpulan data untuk membantu dalam memprediksi dan analisis yang lebih akurat. Oleh karena itu tiga algoritma klasifikasi machine learning yaitu Suport Vector Machine, Naive Bayes dan Random Forest digunakan dalam percobaan ini untuk mendeteksi diabetes secara dini. Eksperimen dilakukan menggunakan dataset Diabetes Hospital in Sylhet, Bangladesh yang bersumber dari UCI repository. Performa ketiga algoritma dievaluasi pada berbagai ukuran seperti Precision, Accuracy, F-Measure, dan Recall. Akurasi diukur melalui instance yang diklasifikasikan dengan benar dan salah. Hasil yang diperoleh menunjukkan Random Forest mengungguli dengan nilai akurasi tertinggi 97,88% dibandingkan algoritma lain. Hasil ini diverifikasi menggunakan kurva Receiver Operating Characteristic (ROC) secara tepat dan sistematis.

Kata Kunci: diabetes, naive bayes, random forest, akurasi, support vector machine, machine learning

 

Abstract

Diabetes is one of the fastest growing, life-threatening chronic diseases affecting 422 million people worldwide, according to a report by the World Health Organization (WHO) in 2018. Diabetes is considered to be one of the most deadly and chronic diseases that cause elevated blood sugar. Many complications occur if diabetes remains untreated and unidentified. However, an improved machine learning approach solves this critical problem. The aim of this study is to design a model that can predict the likelihood of diabetes occurr in patients with maximum accuracy. Therefore, three machine learning classification algorithms, namely Support Vector Machine, Naive Bayes and Random Forest, were used in this experiment to detect diabetes early. Experiments were conducted using the Diabetes Hospital in Sylhet, Bangladesh dataset sourced from the UCI repository. The performance of the three algorithms is evaluated on various measures such as Precision, Accuracy, F-Measure, and Recall. Accuracy is measured through correctly and incorrectly classified instances. The results obtained showed that Random Forest outperformed with the highest accuracy value of 97.88% compared to other algorithms. These results are verified using the Receiver Operating Characteristic (ROC) curve accurately and systematically.

Keywords: diabetes, naive bayes, random forest, accuracy,  machine learning, support vector machine


Full Text:

PDF

References


D. Sisodia and D. S. Sisodia, “Prediction of Diabetes using Classification Algorithms,” Procedia Comput. Sci., vol. 132, pp. 1578–1585, 2018, doi: 10.1016/j.procs.2018.05.122.

A. Viloria, Y. Herazo-Beltran, D. Cabrera, and O. B. Pineda, “Diabetes Diagnostic Prediction Using Vector Support Machines,” Procedia Comput. Sci., vol. 170, pp. 376–381, 2020, doi: 10.1016/j.procs.2020.03.065.

S. Hadijah, “Gejala Diabetes, Ciri-Ciri Diabetes, Penyebab Diabetes, Serta Penanganan Penyakit Diabetes yang Perlu Kamu Tahu,” 10 November, 2017. https://www.cermati.com/artikel/gejala-diabetes-ciri-ciri-diabetes-penyebab-diabetes-serta-penanganan-penyakit-diabetes-yang-perlu-kamu-tahu (accessed Dec. 10, 2020).

H. Wu, S. Yang, Z. Huang, J. He, and X. Wang, “Type 2 diabetes mellitus prediction model based on data mining,” Informatics Med. Unlocked, vol. 10, pp. 100–107, 2018, doi: 10.1016/j.imu.2017.12.006.

D. J. Reddy et al., “Materials Today : Proceedings Predictive machine learning model for early detection and analysis of diabetes,” Mater. Today Proc., 2020, doi: 10.1016/j.matpr.2020.09.522.

N. P. Tigga and S. Garg, “Prediction of Type 2 Diabetes using Machine Learning Classification Methods,” Procedia Comput. Sci., vol. 167, no. 2019, pp. 706–716, 2020, doi: 10.1016/j.procs.2020.03.336.

L. B. Moreira and A. A. Namen, “A hybrid data mining model for diagnosis of patients with clinical suspicion of dementia,” Comput. Methods Programs Biomed., vol. 165, pp. 139–149, 2018, doi: 10.1016/j.cmpb.2018.08.016.

A. Mujumdar and V. Vaidehi, “Diabetes Prediction using Machine Learning Algorithms,” Procedia Comput. Sci., vol. 165, pp. 292–299, 2019, doi: 10.1016/j.procs.2020.01.047.

R. B. Lukmanto and E. Irwansyah, “The Early Detection of Diabetes Mellitus (DM) Using Fuzzy Hierarchical Model,” Procedia Comput. Sci., vol. 59, no. Iccsci, pp. 312–319, 2015, doi: 10.1016/j.procs.2015.07.571.

C. Fiarni, E. M. Sipayung, and S. Maemunah, “Analysis and prediction of diabetes complication disease using data mining algorithm,” Procedia Comput. Sci., vol. 161, pp. 449–457, 2019, doi: 10.1016/j.procs.2019.11.144.

I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda, “Machine Learning and Data Mining Methods in Diabetes Research,” Comput. Struct. Biotechnol. J., vol. 15, pp. 104–116, 2017, doi: 10.1016/j.csbj.2016.12.005.

S. Perveen, M. Shahbaz, A. Guergachi, and K. Keshavjee, “Performance Analysis of Data Mining Classification Techniques to Predict Diabetes,” Procedia Comput. Sci., vol. 82, no. March, pp. 115–121, 2016, doi: 10.1016/j.procs.2016.04.016.

M. M. F. Islam, R. Ferdousi, S. Rahman, and H. Y. Bushra, “Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques,” Comput. Vis. Mach. Intell. Med. Image Anal., pp. 113–125, 2020, doi: doi.org/10.1007/978-981-13-8798-2_12.

S. Salcedo-Sanz, J. L. Rojo-Álvarez, M. Martínez-Ramón, and G. Camps-Valls, “Support vector machines in engineering: An overview,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 4, no. 3, pp. 234–267, 2014, doi: 10.1002/widm.1125.

M. Sewak, P. Vaidya, C.-C. Chan, and Zhong-Hui Duan, “SVM Approach to Breast Cancer Classification,” Second Int. Multi-Symposiums Comput. Comput. Sci. (IMSCCS 2007), pp. 32–37, 2007, doi: 10.1109/IMSCCS.2007.46.

H. Kucuk and I. Eminoglu, “Classification of ALS disease using support vector machines,” 2015 23nd Signal Processing and Communications Application Conference (SIU), Malatya, vol. 3, no. 2, pp. 1664–1667, 2015, doi: 10.1109/siu.2015.7130171.

W. Yu, T. Liu, R. Valdez, M. Gwinn, and M. J. Khoury, “Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes,” Med. Informatics Decis. Mak., pp. 1–7, 2010.

H. Zhang, C. T. Liu, J. Mao, C. Shen, R. L. Xie, and B. Mu, “Development of novel in silico prediction model for drug-induced ototoxicity by using naïve Bayes classifier approach,” Toxicol. Vitr., vol. 65, no. September 2019, 2020, doi: 10.1016/j.tiv.2020.104812.

A. Khajenezhad, M. A. Bashiri, and H. Beigy, “A distributed density estimation algorithm and its application to naive Bayes classification,” Appl. Soft Comput., p. 106837, 2020, doi: 10.1016/j.asoc.2020.106837.

L. Breiman, “Random forests,” Machine Learning, vol 45 no. 1 pp. 5–32, 2001.

L. Breiman, “Bagging predictors,” Machine Learning., vol. 24, no. 2, pp. 123–140, 1996

T. K. Ho, “The Random Subspace Method for Constructing Decision Forests,” vol. 20, no. 8, pp. 832–844, 1998.

H. R. Pourghasemi et al., Spatial modeling, risk mapping, change detection, and outbreak trend analysis of coronavirus (COVID-19) in Iran (days between February 19 and June 14, 2020), vol. 98, June. International Society for Infectious Diseases, 2020.

M. Jeung, S. Baek, J. Beom, K. H. Cho, Y. Her, and K. Yoon, “Evaluation of random forest and regression tree methods for estimation of mass first flush ratio in urban catchments,” J. Hydrol., vol. 575, May, pp. 1099–1110, 2019, doi: 10.1016/j.jhydrol.2019.05.079.

E. Izquierdo-Verdiguier and R. Zurita-Milla, “An evaluation of Guided Regularized Random Forest for classification and regression tasks in remote sensing,” Int. J. Appl. Earth Obs. Geoinf., vol. 88, no. October 2019, p. 102051, 2020, doi: 10.1016/j.jag.2020.102051.

T. Hengl, M. Nussbaum, M. N. Wright, G. B. M. Heuvelink, and B. Gräler, “Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables,” PeerJ, vol. 2018, no. 8, 2018, doi: 10.7717/peerj.5518.

S. Oliveira, F. Oehler, J. San-Miguel-Ayanz, A. Camia, and J. M. C. Pereira, “Modeling spatial patterns of fire occurrence in Mediterranean Europe using Multiple Regression and Random Forest,” For. Ecol. Manage., vol. 275, pp. 117–129, 2012, doi: 10.1016/j.foreco.2012.03.003.

P. Zahedi, S. Parvandeh, A. Asgharpour, B. S. McLaury, S. A. Shirazi, and B. A. McKinney, “Random forest regression prediction of solid particle Erosion in elbows,” Powder Technol., vol. 338, pp. 983–992, 2018, doi: 10.1016/j.powtec.2018.07.055.

R. Arora and S. Suman, “Comparative Analysis of Classification Algorithms on Different Datasets using WEKA,” Int. J. Comput. Appl., vol. 54, no. 13, pp. 21–25, 2012.




DOI: https://doi.org/10.32520/stmsi.v10i1.1129

Article Metrics

Abstract view : 49 times
PDF - 27 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.