Handling of Data Imbalance in Classification of Regencies/Municipalities in Eastern Indonesia

Adham Malay Japany, Yuliagnis Transver Wijaya

Abstract


Imbalance of data between classes can result in incorrect predictions in classification, which can cause problems in decision making. Eastern Indonesia (KTI) is one of the regions that has a Human Development Index (HDI) below the national HDI, so increasing human potential in the production process in KTI must be focused on. In the categorization of regencies/municipalities in KTI there is imbalanced data. This shows that human development between regions in KTI is still uneven. For this reason, a classification of regencies/municipalities based on HDI into certain categories is carried out accurately and quickly. The classification results are expected to help the government in determining future strategic steps to improve the quality of human resources in KTI. One method that can handle data imbalance is Synthetic Minority Over-sampling Technique (SMOTE), using three classification algorithms, namely Support Vector Machine (SVM), K-Nearest neighbors (KNN), and Random Forest (RF). It was found that with the handling of data imbalance and the application of the k-fold cross validation method, the three algorithms showed a significant increase in accuracy. Therefore, handling data imbalance is proven to be able to improve the performance of the applied classification algorithms.

Full Text:

PDF

References


M. Batta, “Machine Learning Algorithms - A Review,” Int. J. Sci. Res., vol. 18, no. 8, hal. 381–386, 2018, doi: 10.21275/ART20203995.

I. Kemala dan A. W. Wijayanto, “Perbandingan Kinerja Metode Bagging dan Non-Ensemble Machine Learning pada Klasifikasi Wilayah di Indonesia menurut Indeks Pembangunan Manusia,” J. Sist. dan Teknol. Inf., vol. 9, no. 2, hal. 269–275, 2021, doi: 10.26418/justin.v9i2.44166.

M. C. Untoro dan J. L. Buliali, “Penanganan Imbalance Class Data Laboratorium Kesehatan dengan Majority Weighted Minority Oversampling Technique,” Regist. J. Ilm. Teknol. Sist. Inf., vol. 4, no. 1, hal. 23–29, 2018.

S. Mutmainah, “Penanganan Imbalance Data Pada Klasifikasi Kemungkinan Penyakit Stroke,” J. SNATi, vol. 1, no. 1, hal. 10–16, 2021.

R. Siringoringo, “Klasifikasi Data Tidak Seimbang Menggunakan Algoritma SMOTE dan KNN,” J. Inf. Syst. Dev., vol. 3, no. 1, hal. 44–49, 2018.

H. Ali, M. Najib, M. Salleh, R. Saedudin, dan K. Hussain, “Imbalance class problems in data mining : A review Imbalance class problems in data mining : a review,” no. April, hal. 1552–1563, 2019, doi: 10.11591/ijeecs.v14.i3.pp1552-1563.

M. R. Longadge, M. S. S. Dongre, dan D. L. Malik, “Class Imbalance Problem in Data Mining : Review,” Int. J. Comput. Sci. Netw., vol. 2, no. 1, 2013.

A. G. Pertiwi, “Perbandingan Kinerja Algoritma K-Nearest Neighbor Menggunakan SMOTE dan Algoritma K-Nearest Neighbor tanpa SMOTE dalam Diagnosis Penyakit Diabetes pada Data Tidak Seimbang.” Semarang, 2019.

D. Programme, Human Development Report 1990. United Nations Development Programme (UNDP), 1990.

BPS, “Indeks Pembangunan Manusia,” 2022. https://www.bps.go.id/subject/26/indeks-pembangunan-manusia.html

BPS, “Gender,” 2022. https://www.bps.go.id/subject/40/gender.html

BPS, “Kemiskinan dan Ketimpangan,” 2022. https://www.bps.go.id/subject/23/kemiskinan-dan-ketimpangan.html

BPS, Indeks Pembangunan Manusia 2020. Jakarta: Badan Pusat Statistik, 2020. [Daring]. Tersedia pada: https://www.bps.go.id/publication/2021/04/30/8e777ce2d7570ced44197a37/indeks-pembangunan-manusia-2020.html

A. Yusharsah, S. Dur, dan H. Cipta, “Penerapan Metode Support Vector Machine dalam Klasifikasi Indeks Pembangunan Manusia di Sumatera Utara,” Math Educ. J., vol. 06, no. 01, hal. 12–19, 2022.

M. Y. Darsyah, “Klasifikasi Indeks Pembangunan Manusia (IPM) Dengan Pendekatan K-Nearset Neighbor (KNN),” in Seminar Nasional Pendidikan, Sains dan Teknologi Fakultas Matematika dan Ilmu Pengetahuan Alam Universitas Muhammadiyah Semarang, 2020, no. October 2017, hal. 29–35.

K. Mauludiyah, “Klasifikasi Indeks Pembangunan Manusia Kabupaten/Kota di Indonesia Menggunakan Metode Random Forest,” 2020.

E. Polat, “The Classification of Countries’ Human Development Index Level Under Economic Inequality by Using Data Mining Classification Algorithms,” Rom. Stat. Rev., no. 4, hal. 27–44, 2021.

C. Haryawan dan Y. M. K. Ardhana, “Analisa Perbandingan Teknik Oversampling SMOTE pada Imbalanced Data,” JIRE (Jurnal Inform. Rekayasa Elektron., vol. 6, no. 1, hal. 73–78, 2023.

G. A. Mursianto, I. M. Falih, M. Irfan, T. Sakinah, dan D. S. Prasvita, “Perbandingan Metode Klasifikasi Random Forest dan XGBoost Serta Implementasi Teknik SMOTE pada Kasus Prediksi Hujan,” Senamika, vol. 2, no. 2, hal. 41–50, 2021.

M. Fathurrahman dan N. Qisthi, “Klasifikasi Indeks Pembangunan Manusia (IPM) di Pulau Sumatera Pada Dataset Multi-Class Dengan Metode Artificial Neural Network (ANN),” in Prosiding Seminar Nasional Fisika 7.0, 2021, hal. 377–384.

S. García, J. Luengo, dan F. Herrera, Data Preprocessing in Data Mining. Springer, 2015. doi: 10.1007/978-3-319-10247-4.

A. N. Kasanah, Muladi, dan U. Pujianto, “Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN,” J. Rekayasa Sist. dan Teknol. Inf., vol. 3, no. 2, hal. 196–201, 2019.

D. A. Nasution, H. H. Khotimah, dan N. Chamidah, “Perbandingan Normalisasi Data Untuk Klasifikasi Wine Menggunakan Algoritma KNN,” J. Comput. Eng. Syst. Sci., vol. 4, no. 1, hal. 78–82, 2019.

I. A. Nikmatun dan I. Waspada, “Implementasi Data Mining Untuk Klasifikasi Masa Studi Mahasiswa Menggunakan Algoritma K-Nearest Neighbor,” J. SIMETRIS, vol. 10, no. 2, hal. 421–432, 2019.

J. Han dan M. Kamber, Data Mining Concepts and Techniques - Second Edition. San Francisco: Morgan Kaufmann, 2006.

C. A. Pamungkas dan W. W. Widiyanto, “Klasifikasi Indeks Pembangunan Manusia di Indonesia Tahun 2022 dengan Support Vector Machine,” J. Ilm. Sist. Inf. dan Ilmu Komput., vol. 2, no. 3, hal. 139–145, 2022.

F. Fauzi, “K-Nearset Neighbor (KNN) dan Support Vector Machine (SVM) untuk Klasifikasi Indeks Pembangunan Manusia Provinsi Jawa Tengah,” J. MIPA, vol. 40, no. 2, hal. 118–124, 2017.

K. P. Murphy, Machine Learning: A Probabilistic Perspective. London: Massachusetts Institute of Technology, 2012.

S. Pramana, B. Yuniarto, S. Mariyah, I. Santoso, dan R. Nooraeni, Data Mining dengan R Konsep Serta Implementasi. Jakarta: In Media, 2018.




DOI: https://doi.org/10.32520/stmsi.v13i1.2862

Article Metrics

Abstract view : 107 times
PDF - 43 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
https://learning.modernland.co.id/api/toto/http://himatikauny.org/wp-includes/mahjong-ways-3/https://www.jst.hvu.edu.vn/akun-pro-kamboja/https://section.iaesonline.com/akun-pro-kamboja/https://journals.uol.edu.pk/sugar-rush/http://mysimpeg.gowakab.go.id/mysimpeg/aset/https://jurnal.jsa.ikippgriptk.ac.id/plugins/https://ppid.cimahikota.go.id/assets/demo/https://journals.zetech.ac.ke/scatter-hitam/https://silasa.sarolangunkab.go.id/swal/https://sipirus.sukabumikab.go.id/storage/uploads/-/sthai/https://sipirus.sukabumikab.go.id/storage/uploads/-/stoto/https://alwasilahlilhasanah.ac.id/starlight-princess-1000/https://www.remap.ugto.mx/pages/slot-luar-negeri-winrate-tertinggi/https://waper.serdangbedagaikab.go.id/storage/sgacor/https://waper.serdangbedagaikab.go.id/public/images/qrcode/slot-dana/https://siipbang.katingankab.go.id/storage_old/maxwin/https://waper.serdangbedagaikab.go.id/public/img/cover/10k/