Word Embedding Features to Improve Machine Learning Performance in Sentiment Analysis of the Honor of Kings Game

Abdul Harris, Agus Nugroho, Yudi Novianto, Jasmir Jasmir, Dhea Fatma

Abstract


The rapid growth of social media has encouraged an increasing number of studies on sentiment analysis to better understand public perceptions and opinions. This study aims to evaluate the performance of three machine learning algorithms—Naïve Bayes, K-Nearest Neighbor (KNN), and Random Forest—in classifying user review sentiments toward the game Honor of Kings. The dataset was collected from the Google Play Store, consisting of 900 reviews. The data then underwent preprocessing steps including cleaning, case folding, tokenization, stopword removal, stemming, and sentiment labeling into positive and negative classes. Furthermore, three word embedding techniques were applied, namely Word2Vec, GloVe, and FastText, each of which was tested across the three machine learning algorithms. The experimental results indicate that the use of word embedding features significantly improves classification accuracy compared to models without embedding features. KNN combined with FastText achieved the best performance, reaching an accuracy of 87.55%, while Random Forest combined with FastText produced the lowest accuracy. FastText demonstrated superior performance due to its ability to represent words through subword information, making it more effective in handling rare vocabulary and large-scale datasets. This study confirms that combining machine learning classification methods with word embedding features plays a crucial role in improving sentiment analysis performance. Future research may focus on hyperparameter optimization, the application of more advanced preprocessing techniques, and dataset expansion to develop more robust models with better generalization capability.

Keywords


FastText; GloVe; Machine Learning; Sentiment Analysis; Word2Vec

Full Text:

PDF

References


R. Fatmasari, V. Mega Ayu, B. Pratama, and W. Gata, “Analisis Sentimen dalam Pengkategorian Komentar Youtube terhadap Layanan Akademik dan Non-Akademik Universitas Terbuka untuk Prediksi Kepuasan,” Technol. SCI., Vol. 4, No. 2, pp. 395–404, 2022, DOI: 10.47065/bits.v4i2.1738.

F. Panjaitan et al., “Studi Komparatif Algoritma Machine Learning pada Analisis Sentimen Media Sosial,” JATI (Jurnal Mhs. Tek. Inform., Vol. 9, No. 2, pp. 3145–3152, 2025.

D. A. S. Suhdi, “Integrasi Analisis Sentimen berbasis Aspek dan Intelijen Bisnis untuk Analisis Ulasan Pelanggan di Instagram dalam,” Karapan Netw. J., No. I, 2025.

L. O. M. Y. Muhamad Djufri Rachim, “Analisis Sentimen Publik Terhadap Penggunaan Teknologi AI dalam Berita Politik dan Implikasinya Terhadap Pertumbuhan Ekonomi,” J. Online Progr. Stud. Pendidik. Ekon., Vol. 9, No. 4, pp. 1535–1551, 2024.

E. H. Muktafin and P. Kusrini, “Sentiments Analysis of Customer Satisfaction in Public Services using K-Nearest Neighbors Algorithm and Natural Language Processing Approach,” Telkomnika (Telecommunication Comput. Electron. Control., Vol. 19, No. 1, pp. 146–154, 2021, DOI: https://doi.org/10.12928/TELKOMNIKA.V19I1.17417.

H. A. Santoso, E. H. Rachmawanto, A. Nugraha, A. A. Nugroho, D. R. I. M. Setiadi, and R. S. Basuki, “Hoax Classification and Sentiment Analysis of Indonesian News using Naive Bayes Optimization,” Telkomnika (Telecommunication Comput. Electron. Control., Vol. 18, No. 2, pp. 799–806, 2020, DOI: https://doi.org/10.12928/TELKOMNIKA.V18I2.14744.

M. A. Fauzi, “Random Forest Approach fo Sentiment Analysis in Indonesian Language,” Indones. J. Electr. Eng. Comput. SCI., Vol. 12, No. 1, pp. 46–50, 2018, DOI: https://doi.org/10.11591/ijeecs.v12.i1.pp46-50.

A. Basuki, “Sentiment Analysis of Service Provider on Twitter Tweet using Naive Bayes Classifier,” J. Ilm. Tek. Elektro Komput. dan Inform., Vol. 5, No. 2, pp. 13–23, 2023, DOI: https://doi.org/10.47080/iftech.v5i2.2752.

K. K. Agustiningsih, E. Utami, and O. M. A. Alsyaibani, “Sentiment Analysis and Topic Modelling of the COVID-19 Vaccine in Indonesia on Twitter Social Media using Word Embedding,” J. Ilm. Tek. Elektro Komput. dan Inform., Vol. 8, No. 1, p. 64, 2022, DOI: https://doi.org/10.26555/jiteki.v8i1.23009.

A. George, H. B. Barathi Ganesh, M. Anand Kumar, and K. P. Soman, Significance of Global Vectors Representation in Protein Sequences Analysis, Vol. 31. Springer International Publishing, 2019. DOI: https://doi.org/10.1007/978-3-030-04061-1_27.

N. Badri, F. Kboubi, and A. H. Chaibi, “Combining FastText and Glove Word Embedding for Offensive and Hate Speech Text Detection,” Procedia Comput. SCI., Vol. 207, No. Kes, pp. 769–778, 2022, DOI: https://doi.org/10.1016/j.procs.2022.09.132.

D. Jatnika, M. A. Bijaksana, and A. A. Suryani, “Word2vec Model Analysis for Semantic Similarities in English Words,” Procedia Comput. SCI., Vol. 157, pp. 160–167, 2019, DOI: https://doi.org/10.1016/j.procs.2019.08.153.

R. Rahmanda and E. B.Setiawan, “Word2Vec on Sentiment Analysis with Synthetic Minority Oversampling Technique and Boosting Algorithm,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), Vol. 6, No. 4, pp. 599–605, 2022, DOI: 10.29207/resti.v6i4.4186.

I. N. Khasanah, “Sentiment Classification using FastText Embedding and Deep Learning Model,” Procedia CIRP, Vol. 189, pp. 343–350, 2021, DOI: https://doi.org/10.1016/j.procs.2021.05.103.

M. A. Raihan and E. B. Setiawan, “Aspect based Sentiment Analysis with FastText Feature Expansion and Support Vector Machine Method on Twitter,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), Vol. 6, No. 4, pp. 591–598, 2022, DOI: https://doi.org/10.29207/resti.v6i4.4187.

R. Chivukula, T. Jaya Lakshmi, S. S. Uday, and S. T. Pavani, “Classifying Clinically Actionable Genetic Mutations using KNN and SVM,” Indones. J. Electr. Eng. Comput. SCI., Vol. 24, No. 3, pp. 1672–1679, 2021, DOI: https://doi.org/10.11591/ijeecs.v24.i3.pp1672-1679.

M. S. Islam et al., “Machine Learning-based Music Genre Classification with Pre-Processed Feature Analysis,” J. Ilm. Tek. Elektro Komput. dan Inform., Vol. 7, No. 3, p. 491, 2022, DOI: https://doi.org/10.26555/jiteki.v7i3.22327.

J. Zhang, Y. Li, F. Shen, Y. He, H. Tan, and Y. He, “Hierarchical Text Classification with Multi-Label Contrastive Learning and KNN,” Neurocomputing, Vol. 577, No. January, p. 127323, 2024, DOI: https://doi.org/10.1016/j.neucom.2024.127323.

L. V. Nguyen, Q. T. Vo, and T. H. Nguyen, “Adaptive KNN-based Extended Collaborative Filtering Recommendation Services,” Big Data Cogn. Comput., Vol. 7, No. 2, 2023, DOI: https://doi.org/10.3390/bdcc7020106.

M. Nadeem et al., “Preventing Cloud Network from Spamming Attacks using Cloudflare and KNN,” Comput. Mater. Contin., Vol. 74, No. 2, pp. 2641–2659, 2023, DOI: https://doi.org/10.32604/cmc.2023.028796.

Y. Yang and X. Liu, “A Re-Examination of Text Categorization Methods”.

D. Berrar, “Bayes’ Theorem and Naive Bayes Classifier,” Encycl. Bioinforma. Comput. Biol. ABC Bioinforma., Vol. 1–3, No. 2018, pp. 403–412, 2018, DOI: 10.1016/B978-0-12-809633-8.20473-1.

J. K. Alwan, D. S. Jaafar, and I. R. Ali, “Diabetes Diagnosis System using Modified Naive Bayes Classifier,” Indones. J. Electr. Eng. Comput. SCI., Vol. 28, No. 3, pp. 1766–1774, 2022, DOI: https://doi.org/10.11591/ijeecs.v28.i3.pp1766-1774.

S. Wang, J. Ren, and R. Bai, “A Semi-Supervised Adaptive Discriminative Discretization Method Improving Discrimination Power of Regularized Naive Bayes,” Expert Syst. Appl., Vol. 225, No. April, p. 120094, 2023, DOI: 10.1016/j.eswa.2023.120094.

C. J. Anderson et al., “A Novel Naïve Bayes Approach to Identifying Grooming Behaviors in the Force-Plate Actometric Platform,” J. Neurosci. Methods, Vol. 403, No. July 2023, p. 110026, 2024, DOI: 10.1016/j.jneumeth.2023.110026.

M. Badar and M. Fisichella, “Fair-CMNB: Advancing Fairness-Aware Stream Learning with Naïve Bayes and Multi-Objective Optimization,” Big Data Cogn. Comput., Vol. 8, No. 2, 2024, DOI: https://doi.org/10.3390/bdcc8020016.

A. A. Khaleel, A. A. M. Al-Azzawi, and A. M. Alkhazraji, “Random Forest for Lung Cancer Analysis using Apache Mahout and Hadoop based on Software Defined Networking,” Indones. J. Electr. Eng. Comput. SCI., Vol. 32, No. 2, pp. 1086–1093, 2023, DOI: https://doi.org/10.11591/ijeecs.v32.i2.pp1086-1093.

T. A. Assegie, R. Subhashni, N. K. Kumar, J. P. Manivannan, P. Duraisamy, and M. F. Engidaye, “Random Forest and Support Vector Machine-based Hybrid Liver Disease Detection,” Bull. Electr. Eng. Informatics, Vol. 11, No. 3, pp. 1650–1656, 2022, DOI: https://doi.org/10.11591/eei.v11i3.3787.

A. Sekulić, M. Kilibarda, G. B. M. Heuvelink, M. Nikolić, and B. Bajat, “Random Forest Spatial Interpolation,” Remote Sens., Vol. 12, No. 10, pp. 1–29, 2020, DOI: https://doi.org/10.3390/rs12101687.

H. Syahputra and A. Wibowo, “Comparison of Support Vector Machine (SVM) and Random Forest Algorithm for Detection of Negative Content on Websites,” J. Ilm. Tek. Elektro Komput. dan Inform., Vol. 9, No. 1, pp. 165–173, 2023, DOI: https://doi.org/10.26555/jiteki.v9i1.25861.




DOI: https://doi.org/10.32520/stmsi.v15i2.5850

Article Metrics

Abstract view : 4 times
PDF - 0 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.