Implementasi Algoritma Synthetic Minority Over-Sampling Technique untuk Menangani Ketidakseimbangan Kelas pada Dataset Klasifikasi
Abstract
Abstrak
Pada penelitian ini dilakukan penangganan ketidakseimbangan kelas terhadap kelas minoritas menggunakan teknik resampling yaitu oversampling. Algoritma oversampling yang digunakan adalah Synthetic Minority Over-sampling Technique (SMOTE). Hasil dari penelitian ini dibandingkan dengan hasil klasifikasi tanpa resampling. Uji evaluasi yang digunakan ialah akurasi, Geometric Mean (g-mean), dan Confussion Matrix (CM). Penanganan distribusi kelas yang tidak seimbang pada dataset menggunakan algoritma SMOTE dapat meningkatkan nilai akurasi maupun g-mean pada algoritma Naïve Bayes, SVM, KNN dan Decision Tree. Hal tersebut menunjukkan bahwa proses penanganan terhadap distribusi kelas yang tidak seimbang pada tahap pra-pemrosesan data memberikan pengaruh terhadap nilai akurasi maupun g-mean algoritma Naïve Bayes, SVM, KNN dan Decision Tree. Pada scenario percobaan yang telah dilakukan algoritma Naïve Bayes memiliki akurasi paling baik 96,43 %, SVM dengan 99,02 %, KNN dengan 97,29 % dan Decision Tree dengan nilai 97,29 % pada dataset ecoli 15,8 setelah dilakukan SMOTE dengan 10 fold cross validation. Sedangkan memiliki nilai G-mean paling baik 96,42 % untuk algoritma Naïve Bayes, SVM dengan 99,37 %, KNN dengan 99,53 % dan Decision Tree dengan nilai 96,29 % pada dataset ecoli 15,8 setelah dilakukan SMOTE dengan 10 fold cross validation.
Kata Kunci : Data Mining, Klasifikasi, Imbalance Ratio (IR), Oversampling, Synthetic Minority Over-sampling Technique (SMOTE)
Abstract
In this research, the subscriber of class imbalance to the minority class was carried out using a resampling technique, namely oversampling. The oversampling algorithm used is Synthetic Minority Over-sampling Technique (SMOTE). The results of this study were compared with the results of the classification without resampling. The evaluation tests used are accuracy, Geometric Mean (g-mean), and Confusion Matrix (CM). Handling the unbalanced class distribution on the dataset using the SMOTE algorithm can increase the accuracy and g-mean values of the Naïve Bayes, SVM, KNN and Decision Tree algorithms. This shows that the handling process of the unbalanced class distribution at the pre-processing stage has an effect on the accuracy and g-mean values of the Naïve Bayes, SVM, KNN and Decision Tree algorithms. In the experimental scenario that has been carried out the Naïve Bayes algorithm has the best accuracy of 96.43%, SVM with 99.02%, KNN with 97.29% and Decision Tree with a value of 97.29% on the ecoli dataset of 15.8 after SMOTE with 10 fold cross validation. Meanwhile, it has the best G-mean value of 96.42% for the Naïve Bayes algorithm, SVM with 99.37%, KNN with 99.53% and Decision Tree with a value of 96.29% in the ecoli dataset of 15.8 after SMOTE with 10 fold cross validation.
Keywords: Data Mining, Classification, Imbalance Ratio (IR), Oversampling, Synthetic Minority Over-sampling Technique (SMOTE)
Full Text:
PDFReferences
Han, Jiawei. Data Mining: Concepts and Techniques, Third Edition. 3rd ed. Waltham, Mass.: Morgan Kaufmann Publishers, 2012.
C. M. Bishop, “Bishop - Pattern Recognition and Machine Learning - Springer 2006,” Antimicrob. Agents Chemother., 2014, doi: 10.1128/AAC.03728-14.
S. Russell and P. Norvig, Artificial Intelligence A Modern Approach Third Edition. 2010.
M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning (Adaptive Computation and Machine Learning series). 2012.
Y. Sun, A. K. C. Wong, and M. S. Kamel, “Classification of imbalanced data: A review,” Int. J. Pattern Recognit. Artif. Intell., 2009, doi: 10.1142/S0218001409007326.
M. Bach, A. Werner, J. Żywiec, and W. Pluskiewicz, “The study of under- and over-sampling methods’ utility in analysis of highly imbalanced data on osteoporosis,” Inf. Sci. (Ny)., 2017, doi: 10.1016/j.ins.2016.09.038.
Y. Pristyanto, S. Adi, and A. Sunyoto, “The effect of feature selection on classification algorithms in credit approval,” 2019 Int. Conf. Inf. Commun. Technol. ICOIACT 2019, pp. 451–456, 2019, doi: 10.1109/ICOIACT46704.2019.8938523.
Septiani, I. P. A. Citra, and A. S. A. Nugraha, “JURNAL GEOGRAFI Perbandingan Metode Supervised Classification dan Unsupervised Classification terhadap Penutup Lahan di Kabupaten Buleleng,” vol. 16, no. 196, pp. 90–96, 2019, doi: 10.15294/jg.v16i2.19777.
M. Mustaqim, B. Warsito, and B. Surarso, “Kombinasi Synthetic Minority Oversampling Technique (SMOTE) dan Neural Network Backpropagation untuk menangani data tidak seimbang pada prediksi pemakaian alat kontrasepsi implan,” Regist. J. Ilm. Teknol. Sist. Inf., vol. 5, no. 2, p. 128, 2019, doi: 10.26594/register.v5i2.1705.
A. Smote and D. A. N. Neighbor, “Klasifikasi Data Tidak Seimbang Menggunakan,” vol. 3, no. 1, pp. 44–49.
Hairani, N. A. Setiawan, and T. B. Adji, “Metode Klasifikasi Data Mining dan Teknik Sampling Smote ... (Hairani dkk.),” Semin. Nas. Sains dan Teknol., pp. 168–172, 2016.
J. Ah-Pine and E. P. S. Morales, “A study of synthetic oversampling for twitter imbalanced sentiment analysis,” CEUR Workshop Proc., vol. 1646, pp. 17–24, 2016.
N. A. Verdikha, T. B. Adji, and A. E. Permanasari, “Komparasi Metode Oversampling Untuk Klasifikasi Teks Ujaran Kebencian,” Semant. 2017 Komparasi, pp. 195–202, 2018.
M. Bekkar, H. K. Djemaa, and T. A. Alitouche, “Evaluation Measures for Models Assessment over Imbalanced Data Sets,” J. Inf. Eng. Appl., vol. 3, no. 10, pp. 27–38, 2013, [Online]. Available: http://www.iiste.org/Journals/index.php/JIEA/article/view/7633.
A. Ilham, “Komparasi Algoritma Kasifikasi dengan Pendekatan Level Data Untuk Menangani Data Kelas Tidak Seimbang,” J. Ilm. Ilmu Komput., vol. 3, no. 1, pp. 1–6, 2017, doi: 10.35329/jiik.v3i1.60
DOI: https://doi.org/10.32520/stmsi.v10i2.1303
Article Metrics
Abstract view : 2282 timesPDF - 948 times
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.