House Price Prediction using the Random Forest Regression Algorithm

Fika Halimah Balqis, Qurrotul Aini

Abstract


House price prediction is a complex problem because it is influenced by various factors such as building quality, location, and living area size. As a result, conventional methods often lack accuracy in estimating housing prices. This study aims to apply the Random Forest Regression (RFR) algorithm to predict house prices using the House Prices – Advanced Regression Techniques dataset from Kaggle, which contains 1,460 property records. The SEMMA (Sample, Explore, Modify, Model, Assess) methodology was adopted due to its systematic workflow and structured focus, which improves the reliability of the developed model. In the modeling stage, RFR was implemented because it is capable of handling non-linear patterns and maintains stable performance even with a large number of features. Based on the evaluation results, the model achieved a Root Mean Squared Error (RMSE) of 28,452.75 and a coefficient of determination (R²) of 89%. This was followed by a robustness test with an RMSE of 30,665.40, indicating the stability of the model. Feature importance analysis also revealed that OverallQual had the greatest influence on house price prediction. These findings confirm that Random Forest Regression is a reliable method for predicting house prices and has strong potential to be further developed for price recommendation systems, automated property valuation, and integration into digital platforms within the real estate industry.

Keywords


house prices; random forest regression; R²; RMSE

Full Text:

PDF

References


T. M. Mitchell, Machine Learning. in McGraw-Hill international editions - computer science series. McGraw-Hill Education, 1997. [Online]. Available: https://books.google.co.id/books?id=xOGAngEACAAJ

J. D. Kelleher, B. Mac Namee, and A. D’Arcy, Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies. The MIT Press, 2015.

E. Siegel, “Praise for Predictive Analytics,” Wiley, uSA, 2015, [Online]. Available: https://www.predictiveanalyticsworld.com/book/pdf/Predictive_Analytics_by_Eric_Siegel_Excerpts.pdf

C.-H. Hung and S.-W. Tzang, “Consumption and Investment Values in Housing Price: A Real Options Approach,” Int. J. Strateg. Prop. Manag., Vol. 25, No. 4, pp. 278–290, 2021, DOI: 10.3846/ijspm.2021.14914.

W. Zhang and T. A. Masron, “Impact of Financial Market Development on Housing Prices: Evidence from China,” Int. J. Strateg. Prop. Manag., Vol. 29, No. 2, pp. 114–127, 2025, DOI: 10.3846/ijspm.2025.24036.

A. G. Ulucan, A. Bozdağ, M. Karakoyun, and T. Alkan, “Forecasting Pandemic-Induced Changes in Real Estate Market Values Through Machine Learning Approaches,” Int. J. Strateg. Prop. Manag., Vol. 29, No. 3, pp. 196–214, 2025, DOI: 10.3846/ijspm.2025.24063.

F. Pratami, T. Marwa, S. Andaiyani, and A. Abukosim, “What Factors Can Affect Indonesian Property Price?,” TRIKONOMIKA, Vol. 23, No. 1 SE-Articles, pp. 49–54, Jun. 2023, DOI: 10.23969/trikonomika.v23i1.7730.

M. M. Mutiara, Y. S. J. Nasution, and A. Syakir, “Factors Affecting Customer Interest in KPR iB Griya Housing Financing at PT Bank Sumut, Sibolga Syariah Branch,” Amkop Manag. Account. Rev., Vol. 5, No. 1 SE-Articles, pp. 575–588, Jun. 2025, DOI: 10.37531/amar.v5i1.2669.

R. Naz, B. Jamil, and H. Ijaz, “Machine Learning, Deep Learning, and Hybrid Approaches in Real Estate Price Prediction: A Comprehensive Systematic Literature Review,” Proc. Pakistan Acad. Sci. A. Phys. Comput. Sci., Vol. 61, No. 2 SE-Review Articles, pp. 129–144, Jun. 2024, DOI: 10.53560/PPASA(61-2)863.

P. Abhishek and R. B. Sankar, “House Price Prediction App,” Int. SCI. J. Eng. Manag., Vol. 04, No. 07, pp. 1–9, 2025, DOI: 10.55041/isjem04918.

A. Yağmur, M. Kayakuş, and M. Terzioğlu, “House Price Prediction Modeling using Machine Learning Techniques: A Comparative Study,” Aestimum, Vol. 81, pp. 39–51, 2022, DOI: 10.36253/aestim-13703.

Y. Zhao, J. Zhao, and E. Y. Lam, “House Price Prediction: A Multi-Source Data Fusion Perspective,” Big Data Min. Anal., Vol. 7, No. 3, pp. 603–620, 2024, DOI: 10.26599/BDMA.2024.9020019.

C. Zhan, Y. Liu, Z. Wu, M. Zhao, and T. W. S. Chow, “A Hybrid Machine Learning Framework for Forecasting House Price,” Expert Syst. Appl., Vol. 233, p. 120981, 2023, DOI: https://doi.org/10.1016/j.eswa.2023.120981.

Bank for International Settlement, “Statistical release : BIS residential property price statistics in Q1 2021,” no. August, pp. 1–8, 2022.

L. Breiman, “Random Forests,” Mach. Learn., Vol. 45, No. 1, pp. 5–32, 2001, DOI: 10.1023/A:1010933404324.

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. in Springer Series in Statistics. Springer New York, 2009. [Online]. Available: https://books.google.co.id/books?id=a3NazwEACAAJ

I. Moreno-Foronda, M.-T. Sánchez-Martínez, and M. Pareja-Eastaway, “Comparative Analysis of Advanced Models for Predicting Housing Prices: A Review,” 2025. DOI: 10.3390/urbansci9020032.

T. E. A. F. Elmuna, T. Chamidy, & F. Nugroho, “Optimization of the Random Forest Method Using Principal Component Analysis to Predict House Prices: A Case Study of House Prices in Malang City.” doi: 10.25008/ijadis.v4i2.1290.

A. Montoya and DataCanary, “House Prices - Advanced Regression Techniques,” 2016.

A. Wisnuwardhana, A. Nizar Hidayanto, N. Fitriah Ayuning Budi, I. Chandra Hapsari, Denny, and A. Haidaroh, “Systematic Literature Review: Critical Success Factor in the Application of Data Mining,” J. Phys. Conf. Ser., Vol. 1444, No. 1, p. 12023, 2020, DOI: 10.1088/1742-6596/1444/1/012023.

SAS Institute Inc, Data mining using SAS® Enterprise Miner: A Case Study Approach (Fourth Edition). 2018.

C. J. Willmott and K. Matsuura, “Advantages of the Mean Absolute Error (MAE) Over the Root Mean Square Error (RMSE) in Assessing Average Model Performance,” Clim. Res., Vol. 30, pp. 79–82, 2005, [Online]. Available: https://www.int-res.com/abstracts/cr/v30/cr030079




DOI: https://doi.org/10.32520/stmsi.v15i2.5726

Article Metrics

Abstract view : 6 times
PDF - 0 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.