Optimizing Feature Selection in Sentiment Analysis of Bank Saqu: A Comparative Study of SVM and Random Forest using Information Gain and Chi-Square

Anelta Tirta Putri Subandono, Dhani Ariatmanto

Abstract


The selection of an optimal feature selection method is a crucial factor in improving the accuracy and efficiency of text classification models. Irrelevant features can degrade model performance, increase computational complexity, and lead to overfitting. Although various feature selection techniques have been employed in sentiment analysis, systematic studies comparing the effectiveness of Information Gain and Chi-Square in enhancing classification performance remain limited. This study aims to evaluate and optimize the impact of different feature selection methods on the performance of Support Vector Machine (SVM) and Random Forest (RF) in sentiment analysis. Experiments were conducted using eight testing schemes, including models without feature selection, with Information Gain, Chi-Square, and a combination of both. The results showed that SVM with Chi-Square achieved the highest accuracy at 93%, while Random Forest with Chi-Square achieved the best performance at 91%. These findings indicate that Chi-Square is more effective than Information Gain in improving accuracy, and that SVM outperforms Random Forest in text classification tasks. In conclusion, selecting the appropriate feature selection method significantly contributes to enhancing the accuracy of text classification models. This research can serve as a reference for optimizing feature selection techniques in the development of more accurate and efficient machine learning-based systems.

Keywords


Analisis Sentimen; Support Vector Machine (SVM); Random Forest (RF); Chi-Square; Information Gain

Full Text:

PDF

References


A. C. Situru, Pengaruh Sikap terhadap Pemilihan melalui Minat Penggunaan Fintech pada Generasi Milenial Kota Makassar. 2021.

Alun Sujjadaa, Somantri, Juwita Nurfazri Novianti, dan Indra Griha Tofik Isa, “Analisis Sentimen terhadap Review Bank Digital pada Google Play Store menggunakan Metode Support Vector Machine (SVM),” J. Rekayasa Teknol. Nusa Putra, vol. 9, no. 2, pp. 122–135, 2023. https://doi.org/10.52005/rekayasa.v9i2.345

Y. H. Hoang, V. M. Ngo, and N. Bich Vu, “Central Bank Digital Currency: A Systematic Literature Review using Text Mining Approach,” Res. Int. Bus. Financ., vol. 64, no. May 2022, p. 101889, 2023.

A. Kumar, S. Chakraborty, and P. K. Bala, “Text Mining Approach to Explore Determinants of Grocery Mobile App Satisfaction using Online Customer Reviews,” J. Retail. Consum. Serv., vol. 73, no. June 2022, p. 103363, 2023.

S. Lavianto and I. W. D. P. Adnyana, “Analisa Sentimen terhadap Review Layanan Fintech dengan Metode Naive Bayes Classifier,” J. Teknol. Inf. dan Komput., vol. 8, no. 1, pp. 43–51, 2022.

M. I. Fikri, T. S. Sabrila, and Y. Azhar, “Perbandingan Metode Naïve Bayes dan Support Vector Machine pada Analisis Sentimen Twitter,” Smatika J., vol. 10, no. 02, pp. 71–76, 2020.

I. S. K. Idris, Y. A. Mustofa, and I. A. Salihi, “Analisis Sentimen terhadap Penggunaan Aplikasi Shopee menggunakan Algoritma Support Vector Machine (SVM),” Jambura J. Electr. Electron. Eng., vol. 5, no. 1, pp. 32–35, 2023.

R. A. S and Y. Yamasari, “Eksplorasi Fitur Seleksi pada SVM dan Random Forest dalam Analisis Sentimen Aplikasi GoPay,” vol. 06, pp. 55–65, 2024.

A. Salsabila, J. J. Sihombing, and R. I. Sitorus, “Implementasi Algoritma Support Vector Machine Untuk Analisis Sentimen Aplikasi OLX di Playstore,” J. Informatics Data Sci., vol. 1, no. 2, 2022.

T. Wahyuningsih, D. Manongga, I. Sembiring, and S. Wijono, “Comparison of Effectiveness of Logistic Regression, Naive Bayes, and Random Forest Algorithms in Predicting Student Arguments,” Procedia Comput. Sci., vol. 234, pp. 349–356, 2024.

S. Alfarizi and E. Fitriani, “Analisis Sentimen Kendaraan Listrik menggunakan Algoritma Naive Bayes dengan Seleksi Fitur Information Gain dan Particle Swarm Optimization,” Indones. J. Softw. Eng., vol. 9, no. 1, pp. 19–27, 2023.

B. Zhang, Z. Wang, H. Li, Z. Lei, J. Cheng, and S. Gao, “Information Gain-Based Multi-Objective Evolutionary Algorithm for Feature Selection,” Inf. Sci. (Ny)., vol. 677, no. May, p. 120901, 2024.

N. Sari, M. Jazman, T. K. Ahsyar, Syaifullah, and A. Marsal, “Penerapan Algoritma Klasifikasi Naive Bayes dan Support Vector Machine untuk Analisis Sentimen Cyberbullying Bilingual di Aplikasi X Implementation of Naive Bayes and Support Vector Machine Classification Algorithms for Sentiment Analysis of Bilingual Cyb,” vol. 14, pp. 211–224, 2025.

W. Utomo, M. I. Komputer, F. T. Informasi, U. B. Luhur, P. Utara, and J. Selatan, “Optimalisasi Metode Support Vector Machine ( SVM ) berbasis Optimized Weight Evolutionary dalam Penentuan Sentimen Komentar Optimized Weight Evolutionary - based Support Vector Machine ( SVM ) Optimization for Comment Sentiment,” vol. 14, pp. 147–171, 2025.

L. Hickman, S. Thapa, L. Tay, M. Cao, and P. Srinivasan, “Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations,” Organ. Res. Methods, vol. 25, no. 1, pp. 114–146, 2022.

Rachmawati Oktaria Mardiyanto, K. Kusrini, dan Ferry Wahyu Wibowo, “Analisis Sentimen Pengguna Aplikasi Bank Syariah Indonesia dengan menggunakan Algoritma Support Vector Machine (SVM),” artikel jurnal Tek. Teknol. Inf. dan Multimed., vol. 4, no. 1, pp. 9–15, 2023. DOI tidak tersedia.

J. Andrade-Hoz, J. M. Alcaraz-Calero, dan Q. Wang, “NetLabeller: Architecture with Data Extraction and Labelling Framework for Beyond 5G Networks,” artikel jurnal J. Commun. Networks, vol. 26, no. 1, pp. 80–98, 2024. https://doi.org/10.23919/JCN.2024.000006

B. Valarmathi, N. S. Gupta, V. Karthick, T. Chellatamilan, K. Santhi, and D. Chalicheemala, “Sentiment Analysis of Covid-19 Twitter Data using Deep Learning Algorithm,” Procedia Comput. Sci., vol. 235, no. 2023, pp. 3397–3407, 2024.

K. Liu, Z. Deng, and M. Zhang, “Research on Capability Maturity Evaluation Model of Power Grid Data Management,” Procedia Comput. Sci., vol. 228, pp. 1030–1037, 2023.

E. Hokijuliandy, H. Napitupulu, and Firdaniza, “Application of SVM and Chi-Square Feature Selection for Sentiment Analysis of Indonesia’s National Health Insurance Mobile Application,” Mathematics, vol. 11, no. 17, 2023.

T. Ernayanti, M. Mustafid, A. Rusgiyono, and A. R. Hakim, “Penggunaan Seleksi Fitur Chi-Square dan Algoritma Multinomial Naïve Bayes untuk Analisis Sentimen Pelangggan Tokopedia,” J. Gaussian, vol. 11, no. 4, pp. 562–571, 2023.

Z. Alhaq, A. Mustopa, S. Mulyatun, and J. D. Santoso, “Optimasi Algoritma Support Vector Machine untuk Analisis Sentimen pada Ulasan Produk Tokopedia menggunakan PSO,” Media Inform., vol. 20, no. 2, pp. 97–108, 2021.

D. K. Anuradha, D. B. Mallik, and D. M. V. Krishna, “Cucconi Feature Extracted Random Decision Forest Classification for Efficient Sentiment Analysis,” Migr. Lett., vol. 20, no. S13, pp. 520–533, 2023.

S. Wahyuni Kalumbang, “Perbandingan Regresi Logistik, Klasifikasi Naive Bayes, dan Random Forest (Comparison the Logistic Regression, Naive Bayes Classification, and Random Forest),” J. Mat. Thales, vol. 03, no. 02, pp. 1–13, 2021.

D. Yuan, J. Huang, X. Yang, and J. Cui, “Improved Random Forest Classification Approach based on Hybrid Clustering Selection,” Proc. - 2020 Chinese Autom. Congr. CAC 2020, pp. 1559–1563, 2020.

A. I. Tanggraeni and M. N. N. Sitokdana, “Analisis Sentimen Aplikasi E-Government pada Google Play menggunakan Algoritma Naïve Bayes,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 9, no. 2, pp. 785–795, 2022.

H. Azis, F. Tangguh Admojo, and E. Susanti, “Analisis Perbandingan Performa Metode Klasifikasi pada Dataset Multiclass Citra Busur Panah,” Techno.Com, vol. 19, no. 3, pp. 286–294, 2020.

K. Riehl, M. Neunteufel, and M. Hemberg, “Hierarchical Confusion Matrix for Classification Performance Evaluation,” no. August, 2023.

D. Chicco and G. Jurman, “The Advantages of the Matthews Correlation Coefficient ( MCC ) Over F1 Score and Accuracy in Binary Classification Evaluation,” pp. 1–13, 2020.

M. Siino, I. Tinnirello, and M. La Cascia, “Is Text Preprocessing Still Worth the Time ? A Comparative Survey on the Influence of Popular Preprocessing Methods on Transformers and Traditional Classifiers,” Inf. Syst., vol. 121, no. July 2023, p. 102342, 2024.

S. Khairunnisa and S. Al Faraby, “Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter ( Studi Kasus Pandemi,” vol. 5, no. April, pp. 406–414, 2021.

O. Alsemaree, A. S. Alam, S. S. Gill, and S. Uhlig, “Heliyon an Analysis of Customer Perception using Lexicon-based Sentiment Analysis of Arabic Texts Framework,” Heliyon, vol. 10, no. 11, p. e30320, 2024.




DOI: https://doi.org/10.32520/stmsi.v14i3.5106

Article Metrics

Abstract view : 143 times
PDF - 47 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.