Comparison of Machine Learning Methods (Linear Regression, Random Forest, and XGBoost) for Predicting Poverty in Central Java in 2024

Zahwa Bunga Putri Pratama, Yani Parti Astuti

Abstract


Poverty is a major issue faced by Central Java Province, with rates fluctuating annually. To respond to and address this challenge more effectively, a predictive, data-driven approach is essential. This study applies machine learning techniques to forecast the number of people living in poverty in 2024 at the district/city level, utilizing socio-economic data from 2019 to 2023 provided by the Central Bureau of Statistics (BPS). Seven indicators are used as predictor variables, including the poverty line, the number and percentage of people living in poverty, the open unemployment rate, average years of schooling, the Human Development Index, and the regional minimum wage. The data were normalized using StandardScaler and split into training (80%) and testing (20%) sets. This study compares three regression algorithms—Linear Regression, Random Forest, and XGBoost—to evaluate their effectiveness in modeling the complexity of socio-economic data. The analysis reveals that XGBoost delivers the best performance, with a Mean Absolute Error (MAE) of 6,665 and an R² score of 0.978, outperforming Random Forest (MAE: 9,209; R²: 0.947) and Linear Regression (MAE: 10,917; R²: 0.896). By comparing these models, the study addresses a gap in the literature regarding the effectiveness of machine learning models for local-level poverty prediction. The findings suggest that XGBoost holds strong potential as a data-driven policy support tool, particularly in poverty alleviation planning and decision-making at the regional level.

Keywords


Poverty; Prediction; Machine Learning; Central Java

Full Text:

PDF

References


W. Agwil, D. Agustina, H. Fransiska, and N. Hidayati, “Klasifikasi Karakteristik Kemiskinan di Provinsi Bengkulu Tahun 2020 menggunakan Metode Pohon Klasifikasi Gabungan”, doi: 10.34123/jurnalasks.v14i2.348.

S. A. Latfalia and R. R. Marliana, “Klasifikasi Status Indeks Desa membangun Jawa Barat menggunakan Algoritma XGBoost,” Jurnal Riset Statistika, pp. 75–82, Dec. 2024, doi: 10.29313/jrs.v4i2.5011.

H. Hardianto, Y. N. Kunang, E. S. Negara, and T. Sutabri, “Model Data Mining dalam menganalisis Faktor-Faktor yang mempengaruhi Kemiskinan di Sumatera Selatan,” 2025, doi: 10.36040/jati.v9i2.12989.

D. N. Handayani and S. Qutub, “Penerapan Random Forest untuk Prediksi dan Analisis Kemiskinan,” RIGGS: Journal of Artificial Intelligence and Digital Business, Vol. 4, No. 2, pp. 405–412, May 2025, doi: 10.31004/riggs.v4i2.512.

D. A. Ubaid and M. I. F. Rachmad, “Klasifikasi Risiko Strok menggunakan Algoritma Random Forest dengan Teknik Knowledge Discovery in Database,” Journal Computer Science, Vol. 4, No. 1, 2025, [Online]. Available: https://www.kaggle.com/datasets/mahatiratushe, doi: 10.31294/5tbkg781

M. H. Mubarok and F. Septian “Prediksi GDP dengan RF dan XGBoost berdasarkan Aspek Sosial, Ekonomi, dan Lingkungan,” doi: 10.35957/mdp-sc.v4i1.11206.

G. Chairunisa, M. K. Najib, S. Nurdiati, S. F. Imni, W. Sanjaya, R. D. Andriani, H. Henriyansah, R. S. P. Putri, and D. Ekaputri, “Life Expectancy Prediction using Decision Tree, Random Forest, Gradient Boosting, and XGBoost Regressions,” Jurnal Sintak, Vol. 2, No. 2, 2024, doi: 10.62375/jsintak.v2i2.249

N. N. Sholihah and A. Hermawan, “Implementation of Random Forest and Smote Methods for Economic Status Classification in Cirebon City,” Jurnal Teknik Informatika (Jutif), Vol. 4, No. 6, pp. 1387–1397, Dec. 2023, doi: 10.52436/1.jutif.2023.4.6.1135.

R. P. P. Sinurat, “Analisis Faktor-Faktor Penyebab Kemiskinan sebagai Upaya Penanggulanagan Kemiskinan di Indonesia,” Jurnal Registratie, Vol. 5, No. 2, pp. 87–103, Dec. 2023, doi: 10.33701/jurnalregistratie.v5i2.3554.

L. Priseptian and W. P. Primandhana,“Analisis Faktor-Faktor yang mempengaruhi Kemiskinan,” Forum Ekonomi, Vol. 24, No. 1, pp. 45–53, 2022, [Online]. Available: http://journal.feb.unmul.ac.id/index.php/FORUMEKONOMI, doi: 10.33701/jurnalregistratie.v5i2.3554.

M. N. Faritz, “Pengaruh Pertumbuhan Ekonomi dan Rata-Rata Lama Sekolah terhadap Kemiskinan di Provinsi Jawa Tengah Ady Soejoto.”, doi: 10.26740/jupe.v8n1.p15-21

J. Halif, D. Wahiddin, I. Sanjaya, and S. Faisal, “Model Regresi Linear Berganda untuk Prediksi Tingkat Pengangguran di Provinsi Jawa Barat,” Jurnal Algoritma, Vol. 22, No. 1, pp. 324–335, May 2025, doi: 10.33364/algoritma/v.22-1.2312.

T. W. A. Putra, S. Solikhin, and M. Z. Abdillah, “Model Hybrid untuk Prediksi Jumlah Penduduk yang Hidup dalam Kemiskinan,” Jurnal Teknologi Informasi dan Ilmu Komputer, Vol. 10, No. 6, pp. 1253–1264, Dec. 2023, doi: 10.25126/jtiik.2023107484.

A. Heryati and T. S. Saputra, “Optimizing Socioeconomic Features for Poverty Prediction in South Sumatera,” TIERS Information Technology Journal, Vol. 6, No. 1, pp. 16–32, 2025, doi: 10.38043/tiers.v6i1.6244.

C. L. A. Navarro et al., “Risk of Bias in Studies on Prediction Models Developed using Supervised Machine Learning Techniques: Systematic Review,” Oct. 20, 2021, BMJ Publishing Group. doi: 10.1136/bmj.n2281.

Borito, B. Hendrik, “Infomatika Tinjauan Sistematis Metode Linear Regression, K-Nearest Neighbor dan Random Forest untuk Prediksi Tingkat Kemiskinan” Jurnal Informatika, Manajemen dan Komputer, Vol. 16, No. 2, 2024, doi: 10.36723/juri.v16i2.720




DOI: https://doi.org/10.32520/stmsi.v14i5.5494

Article Metrics

Abstract view : 13 times
PDF - 9 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.