Handling of Data Imbalance in Classification of Regencies/Municipalities in Eastern Indonesia

Adham Malay Japany, Yuliagnis Transver Wijaya


Imbalance of data between classes can result in incorrect predictions in classification, which can cause problems in decision making. Eastern Indonesia (KTI) is one of the regions that has a Human Development Index (HDI) below the national HDI, so increasing human potential in the production process in KTI must be focused on. In the categorization of regencies/municipalities in KTI there is imbalanced data. This shows that human development between regions in KTI is still uneven. For this reason, a classification of regencies/municipalities based on HDI into certain categories is carried out accurately and quickly. The classification results are expected to help the government in determining future strategic steps to improve the quality of human resources in KTI. One method that can handle data imbalance is Synthetic Minority Over-sampling Technique (SMOTE), using three classification algorithms, namely Support Vector Machine (SVM), K-Nearest neighbors (KNN), and Random Forest (RF). It was found that with the handling of data imbalance and the application of the k-fold cross validation method, the three algorithms showed a significant increase in accuracy. Therefore, handling data imbalance is proven to be able to improve the performance of the applied classification algorithms.

Full Text:



DOI: https://doi.org/10.32520/stmsi.v13i1.2862

