Performance Analysis of Random Forest Using Attribute Normalization

Arie Nugroho, Abdullah Husin


Data mining can process previous data into a pattern to help the next human activity. Data mining is divided into several methods: classification, clustering, association, and forecasting. This study, using the classification method to determine the pattern of a dataset so that it can be used to predict decisions with new data. The dataset for the classification method must have a label or class. Datasets that have an unbalanced number of tags (imbalanced datasets) can affect the shape of the model and predictive results for new data. To overcome this problem, this research uses the ensemble method and pre-processing. One of the algorithms in the ensemble learning method is a random forest, and the pre-processing used is attribute normalization by converting nominal data to numeric. Random forest is the development of the decision tree that produces a tree-shaped pattern, showing the flow of the classification process. Random forest will be used for the learning process on the data after the attribute normalization process is carried out. This study aims to apply the attribute normalization process and use the random forest algorithm to overcome imbalanced datasets and measure accuracy. This study uses a public dataset from the UCI Repository, namely car evaluation. The accuracy of this method is ± 99% with 90% training data and 10% testing data, and ± 95.95% with eight k-folds cross-validation, and the number of trees is 100 trees.

Full Text:



S. Ray, “A Quick Review of Machine Learning Algorithms,” Proc. Int. Conf. Mach. Learn. Big Data, Cloud Parallel Comput. Trends, Perspectives Prospect. Com. 2019, pp. 35–39, 2019, DOI: 10.1109/COMITCon.2019.8862451.

S. N. Singh and K. Kathuria, “Diabetes diagnosis using different data pre-processing techniques,” 2018 4th Int. Conf. Comput. Commun. Autom. ICCCA 2018, pp. 1–4, 2018, DOI: 10.1109/CCAA.2018.8777332.

M. A. Azhar and P. A. Thomas, “Comparative Review of Feature Selection and Classification modeling,” 2019 6th IEEE Int. Conf. Adv. Comput. Commun. Control. ICAC3 2019, pp. 1–9, 2019, DOI: 10.1109/ICAC347590.2019.9036816.

P. Nair and I. Kashyap, “Hybrid Pre-processing Technique for Handling Imbalanced Data and Detecting Outliers for KNN Classifier,” Proc. Int. Conf. Mach. Learn. Big Data, Cloud Parallel Comput. Trends, Perspectives Prospect. Com. 2019, pp. 460–464, 2019, DOI: 10.1109/COMITCon.2019.8862250.

H. Nagashima and Y. Kato, “APREP-DM: A Framework for Automating the Pre-Processing of a Sensor Data Analysis based on CRISP-DM,” 2019 IEEE Int. Conf. Pervasive Comput. Commun. Work. PerCom Work. 2019, pp. 555–560, 2019, DOI: 10.1109/PERCOMW.2019.8730785.

S. C. Gupta and N. Goel, “Performance enhancement of diabetes prediction by finding optimum K for KNN classifier with feature selection method,” Proc. 3rd Int. Conf. Smart Syst. Inven. Technol. ICSSIT 2020, no. Assist, pp. 980–986, 2020, DOI: 10.1109/ICSSIT48917.2020.9214129.

H. S. Obaid, S. A. Dheyab, and S. S. Sabry, “The Impact of Data Pre-Processing Techniques and Dimensionality Reduction on the Accuracy of Machine Learning,” Annu. Inf. Technol. Electromechanical Eng. Microelectron. Conf., p. 279, 2019, DOI: 10.1109/IEMECONX.2019.8877011.

S. C. Gupta and N. Goel, “Enhancement of Performance of K-Nearest Neighbors Classifiers for the Prediction of Diabetes Using Feature Selection Method,” 2020 IEEE 5th Int. Conf. Comput. Commun. Autom. ICCCA 2020, pp. 681–686, 2020, DOI: 10.1109/ICCCA49541.2020.9250887.

B. Santosa and A. Umam, Data Mining dan Big Data Analytics, 2nd ed. Yogyakarta: Penebar Media Pustaka, 2018.

B. Dai, R. C. Chen, S. Z. Zhu, and W. W. Zhang, “Using random forest algorithm for breast cancer diagnosis,” Proc. - 2018 Int. Symp. Comput. Consum. Control. IS3C 2018, pp. 449–452, 2019, DOI: 10.1109/IS3C.2018.00119.

H. He and Y. Ma, “Imbalanced Learning - Foundations, Algorithms, and Applications,” p. 216, 2013.

Y. L. Pavlov, “Random forests,” Random For., pp. 1–122, 2019, DOI: 10.1201/9780429469275-8.

S. Benbelkacem and B. Atmani, “Random forests for diabetes diagnosis,” 2019 Int. Conf. Comput. Inf. Sci. ICCIS 2019, pp. 1–4, 2019, DOI: 10.1109/ICCISci.2019.8716405.

J. Awwalu, A. Ghazvini, and A. Abu Bakar, “Performance Comparison of Data Mining Algorithms: A Case Study on Car Evaluation Dataset,” Int. J. Comput. Trends Technol., vol. 13, no. 2, pp. 78–82, 2014, DOI: 10.14445/22312803/ijctt-v13p117.

Z. U. Rehman, H. Fayyaz, A. A. Shah, N. Aslam, M. Hanif, and S. Abbas, “Performance evaluation of MLPNN and NB : A Comparative Study on Car Evaluation Dataset,” vol. 18, no. 9, pp. 144–147, 2018.

M. Das and R. Dash, “Performance Analysis of Classification Techniques for Car Data Set Analysis,” Proc. 2020 IEEE Int. Conf. Commun. Signal Process. ICCSP 2020, pp. 549–553, 2020, DOI: 10.1109/ICCSP48568.2020.9182332.

Y. Hao and F. Liu, “Application of Fuzzy Equivalence Relation Kernel Clustering Algorithm to Car Evaluation,” Proc. 2018 IEEE Int. Conf. Saf. Prod. Information. IICSPI 2018, pp. 591–594, 2019, DOI: 10.1109/IICSPI.2018.8690512.

R. Saravanan and P. Sujatha, “Algorithms : A Perspective of Supervised Learning Approaches in Data Classification,” 2018 Second Int. Conf. Intell. Comput. Control Syst., no. Iciccs, pp. 945–949, 2018.

S. Budiman, A. Sunyoto, and A. Nasiri, “Analisa Performa Penggunaan Feature Selection untuk Mendeteksi Intrusion Detection Systems dengan Algoritma Random Forest Classifier,” vol. 10, pp. 754–760, 2021.

S. S. Bashar, M. S. Miah, A. H. M. Z. Karim, M. A. Al Mahmud, and Z. Hasan, “A Machine Learning Approach for Heart Rate Estimation from PPG Signal using Random Forest Regression Algorithm,” 2nd Int. Conf. Electr. Comput. Commun. Eng. ECCE 2019, pp. 1–5, 2019, DOI: 10.1109/ECACE.2019.8679356.

Z. Bingzhen, Q. Xiaoming, Y. Heming, and Z. Zhubo, “A random forest classification model for transmission line image processing,” 15th Int. Conf. Comput. Sci. Educ. ICCSE 2020, no. Access, pp. 613–617, 2020, DOI: 10.1109/ICCSE49874.2020.9201900.


Article Metrics

Abstract view : 500 times
PDF - 194 times


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.