Comparison of Ensemble Learning Method: Random Forest, Support Vector Machine, AdaBoost for Classification Human Development Index (HDI)
Abstract
Classification in supervised learning is a way to find patterns in database that the classes are already known. In the classification of machine learning, there is a term called ensemble classifier. The workings of the ensemble classifier aimed to improve model accuracy and optimize classification performance. This study aims to analyze the comparison of algorithms that work with ensemble learning, including Random Forest, Support Vector Machine (SVM), and AdaBoost. The data used is the Human Development Index (HDI) of districts/cities in Indonesia. Other variables that are strongly related to human development are GRDP per capita, gross enrollment rate, net enrollment rate, labor force participation rate, unemployment rate, poverty rate, poverty depth, poverty severity, and average consumption per capita. The reason for using HDI is that apart from being an important macroeconomic variable in describing the condition of human resources in Indonesia, HDI already has an obvious classification according to the Badan Pusat Statistik (BPS) so that supervised learning can be applied. Comparison of model evaluation using accuracy, specificity, sensitivity, and kappa statistics. The analysis flow starts with data preprocessing, resampling and cross-validation, then modeling using the Random Forest, Support Vector Machine (SVM), and AdaBoost algorithm. The final stage is the model evaluation by comparing the best models in the classifications of districts/ cities according to HDI. The results showed that the Random Forest model had the best performance compared to the Support Vector Machine (SVM) and AdaBoost models with an accuracy value of 85,23%, specificity of 71,63%, sensitivity of 95,05%, and kappa coefficient of 0,7698. From this research, the an ensemble classifier can be developed to help classify scores on the Human Development Index in Indonesia.
Keywords: AdaBoost, Random Forest, Support Vector Machine, Ensemble Learning, Human Development Index
Full Text:
PDFReferences
S. S. Pangastusi, “Perbandingan Metode Ensemble Random Forest Dengan Smote-Boosting dan Smoote-Bagging Pada Klasifikasi Data Mining Untuk Kelas Imbalance,” Tesis. Departemen Statistika, Insitut Teknologi Sepuluh Nopember, Surabaya, 2018.
Badan Pusat Statistik, Indeks Pembangunan Manusia 2021, Jakarta: BPS, 2022.
I. Syarif, E. Zaluska, A. Bennett dan G. Wills, “Application of Bagging, Boosting, and Stacking to Intrusion Detection,” Springer-Verlag Berlin Heidelberg, vol. 7376, pp. 593-602 , 2012.
T. G. Dietterich, “Ensemble Methods in Machine Learning,” Springer-Verlag Berlin Heidelberg, vol. 1857, pp. 1-15, 2000.
S. Pramana, B. Yuniarto, S. Mariyah, I. Santoso dan R. Nooraeni, Data Mining Dengan R: Konsep Serta Implementasi, Jakarta: IN MEDIA, 2018.
P. R. Sihombing dan O. P. Hendarsin, “Perbandingan Metode Artificial Neural Network (ANN) dan Support Vector Machine (SVM) untuk Klasifikasi Kinerja Perusahaan Daerah Air Minum (PDAM) di Indonesia,” Jurnal Ilmu Komputer, vol. XIII, no. 1, pp. 9-20, 2022.
A. Bisri dan R. S. Wahono, “Penerapan Adaboost untuk Penyelesaian Ketidakseimbangan Kelas pada Penentuan Kelulusan Mahasiswa dengan Metode Decision Tree,” Journal of Intelligent Systems, vol. 1, no. 1, pp. 27-32, 2015.
Q. Iman dan A. W. Wijayanto, “Klasifikasi Rumah Tangga Penerima Beras Miskin (Raskin)/Beras Sejahtera di Provinsi Jawa Barat Tahun 2017 dengan Metode Random Forest dan Support Vector Machine,” Jurnal Sistem dan Teknologi Informasi, vol. 9, no. 2, pp. 178-184, 2021.
A. A. Nurkhaliza, “ Perbandingan Algoritma Klasifikasi Support Vector Machine dan Random Forest pada Prediksi Status Indeks Mitigasi dan Kesiapsiagaan Bencana (IMKB) Satuan Kerja BPS di Indonesia Tahun 2020,” Jurnal Informatika Universitas Pamulang, vol. 7, no. 1, pp. 54-59, 2022.
A. Nurpiana dan A. W. Wijayanto , “Comparison of Models for Classification of Learning Achievement of Middle School Students in Indonesia in 2019 using the Support Vector Machine Algorithm, Conditional Inference Trees, and Random Forest,” Jurnal Matematika, Statisika & Komputasi, vol. 18, no. 3, pp. 447-455, 2022.
N. B. Putri dan A. W. Wijayanto, “Analisis Komparasi Algoritma Klasifikasi Data Mining Dalam Klasifikasi Website Phising,” Jurnal Sistem Komputer, vol. 11, no. 1, pp. 59-66, 2022.
F. Fauzi, “K-Nearest Neighbor (KNN) dan Support Vector Machine (SVM) untuk Klasifikasi Indeks Pembangunan Manusia Provinsi Jawa Tengah,” Jurnal MIPA, vol. 40, no. 2, pp. 118-124, 2017.
I. A. A. S. Pratiwi dan A. W. Wijayanto, “Klasifikasi Indeks Pembangunan Manusia dengan Metode K-Nearest Neighbor dan Support Vector Machine di Pulau Jawa,” Jurnal Ilmu Komputer, vol. 15, no. 1, pp. 8-21, 2022.
M. Fathurrahman dan N. Qisthi, “Klasifikasi Indeks Pembangunan Manusia (IPM) di Pulau Sumatera pada Dataset Multiclass Dengan Metode Artificial Neural Network,” Prosiding Seminar Nasional Fisika 7.0, pp. 377-384, 2021.
N. V. Chawla, A. Lazarevic, L. O. Hall dan Bowyer, “SMOTEBoost: Improving Prediction of The Minority Class in Boosting.,” dalam European Conference on Principles and Practice of Knowledge Discovery, Dubrovnik, 2003.
J. R. Quinlan, C4.5 : Programs For Machine Learning., San Mateo, California: Morgan Kaufman, 1993.
D. Gujarati, Ekonometrika Dasar, Jakarta: Erlangga, 2006.
R. Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 2, pp. 1137-1145, 1995.
J. G. Moreno-Torres, J. A. Saez dan F. Herrera, “A Study on the Impact of Partition-Induced Dataset Shift on k-fold Cross-Validation,” IEEE Trans. Neural Network Learn.Syst, vol. 23, no. 8, pp. 1304-1312, 2012.
O. Steinki dan Z. Mohammad, Introduction to Ensemble Learning, Schwyz: Evolutiq, 2015.
I. Kemala dan A. W. Wijayanto, “Perbandingan Kinerja Metode Bagging dan Non-Bagging Machine Learning pada Klasifikasi Wilayah di Indonesia Menurut Indeks Pembangunan Manusia,” Jurnal Sistem dan Teknologi Informasi, pp. 269-127, 2021.
M. Y. Darsyah, “Klasifikasi Indeks Pembangunan Manusia (IPM) Dengan Pendekatan K-Nearest Neighbor (K-NN),” Seminar Nasional Pendidikan, Sains dan Teknologi Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Muhammadiyah Semarang, pp. 29-35, 2017.
E. Alpaydin, Introduction to Machine Learning Fourth Edition, Cambridge: MIT Press, 2020.
R. A. Wijayanti, M. T. Furqon dan S. Adinugroho, “Penerapan Algoritma Support Vector Machine Terhadap Klasifikasi Tingkat Risiko Pasien Gagal Ginjal,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 2, no. 10, pp. 3500-3507, 2018.
R. Kusumodestoni dan Sarwido, “Komparasi Support Vector Machines (SVM) dan Neural Network Untuk Mengetahui Tingkat Akurasi Prediksi Tertinggi,” Jurnal Informatika UPGRIS, vol. 3, no. 1, pp. 1-9, 2017.
Y. M. Hutahaean dan A. W. Wijayanto, “Klasifikasi Rumah Tangga Penerima Subsidi Listrik di Provinsi,” Gorontalo Tahun 2019 dengan Metode K-Nearest Neighbor dan Support Vector Machine, vol. 10, no. 1, pp. 64-68, 2022.
F. Fauzi, M. Darsyah dan W. Utami, “Klasifikasi Indeks Pembangunan Manusia Kabupaten/Kota Se-Indonesia Dengan Pendekatan Smooth Support Vector Machine (SSVM) Kernel Radial Basis Function (RBF),” Seminar Nasional Pendidikan, Sains dan Teknologi Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Muhammadiyah Semarang, pp. 88-97, 2017.
Z.-H. Zhou, “Ensemble Learning,” Encyclopedia of Biometrics. Springer, vol. 7, p. 270–273, 2009.
M. I. Fachruddin, “Perbandingan Random Forest Classification Untuk Deteksi Epilepsi Menggunakan Data Rekaman Electroencephalograph (EEG),” dalam Skripsi Program Studi S1 Statistika, Institut Teknologi Sepuluh Nopember, Surabaya, 2015.
DOI: https://doi.org/10.32520/stmsi.v12i1.2501
Article Metrics
Abstract view : 1677 timesPDF - 639 times
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.