Accuracy Evaluation of the Naïve Bayes Classifier for Sentiment Classification of Diploma Authenticity Issues using Orange

Eunike Loise Laapen, Indrastanti Ratna Widiasari

Abstract


The rapid growth of digital text data has increased the demand for effective methods to extract meaningful information, particularly for understanding public opinion. Sentiment analysis is widely used to classify opinions into positive, negative, and neutral categories. However, challenges such as linguistic ambiguity, subjectivity, and class imbalance often degrade classification performance. This study aims to evaluate the performance of the Naïve Bayes algorithm for sentiment classification on the issue of diploma authenticity using a publicly available dataset, while examining the impact of data distribution on model performance. A quantitative experimental approach was employed using an original dataset of 1,014 instances and an oversampled dataset of 1,767 instances. The data were processed through preprocessing, Bag-of-Words feature extraction, and sentiment classification using Orange Data Mining with 10-fold cross-validation. Model performance was evaluated using accuracy and the Area Under the Receiver Operating Characteristic Curve (AUC). The results indicate that the Naïve Bayes model achieved an accuracy of 37.2% and an AUC of 0.704 on the original imbalanced dataset, reflecting relatively poor classification performance. After applying oversampling to balance the class distribution, the model's accuracy increased substantially to 82.1%, while the AUC improved to 0.970. These findings demonstrate that class distribution has a significant impact on the performance of the Naïve Bayes algorithm in sentiment classification and highlight the importance of addressing class imbalance to achieve more reliable classification results.

Keywords


imbalanced dataset; naïve bayes; oversampling; sentiment analysis; text mining

Full Text:

PDF

References


A. A. Purnama and Y. R. Sipayung, “Sentiment Analysis of Public Service using Naïve Bayes Classifier,” J. Inf. Syst. Informatics, Vol. 7, No. 3, pp. 2439–2457, 2025, DOI: 10.51519/journalisi.v7i3.1207.

S. Panggabean and A. Junika, “Sentiment Analysis on Public Opinions Regarding the 2024 Regional Elections using Long Short-Term Memory ( LSTM ), Random Forest , and Naive Bayes,” JOISTECH: Journal Inf. Syst. Technol., Vol. 01, No. 02, pp. 67–75, 2024.

A. A. Hisyam and A. T. Ayunda, “Analisis Sentimen Persepsi Publik terhadap Bank DKI Pada Twitter menggunakan Metode Naive Bayes Classifier,” JITET (Jurnal Inform. dan Tek. Elektro Ter., Vol. 14, No. 1.

M. N. Hanan, H. M. Jumasa, and I. Y. Pasa, “Analisis Sentimen Calon Gubernur Jawa Tengah 2024 menggunakan Metode Naïve Bayes,” J. FASILKOM, Vol. 15, No. 3, pp. 491–499, 2025.

M. J. Siddiq et al., “Analisis Sentimen Opini Masyarakat terhadap PILKADA 2024 di Media Sosial Twitter menggunakan Algoritma Naive Bayes,” JITET (Jurnal Inform. dan Tek. Elektro Ter., Vol. 13, No. 2, pp. 609–622, 2025.

F. Yudistira and A. R. Isnain, “Analisis Sentimen terhadap Seleksi CPNS Tahun 2024 berbasis Media Sosial X menggunakan Algoritma Naïve Bayes Program Studi Informatika , Fakultas Teknik dan Ilmu Komputer , Universitas Teknokrat Indonesia,” J. Pendidik. dan Teknol. Indones., Vol. 5, No. 3, pp. 887–897, 2025.

W. Astriani, O. S. Bachri, and B. Irawan, “Classification of Sentiment of Emina Product Reviews using the Naive Bayes Algorithm,” Bit-Tech (Binary Digit. - Technol., Vol. 8, No. 2, 2025, DOI: 10.32877/bt.v8i2.3554.

N. Dwi, H. Sadikin, and S. Susanti, “Analisis Sentimen Publik terhadap Kampanye Pengurangan Sampah Plastik menggunakan Algoritma Naïve Bayes,” J. FASILKOM, Vol. 15, No. 2, pp. 202–212, 2025.

M. Riswan, A. Primajaya, A. Susilo, and Y. Irawan, “Analisis Sentimen terhadap Pemberitaan Hasil Rekapitulasi PEMILU Presiden 2024 pada Media Sosial Instagram menggunakan Naive Bayes,” JITET (Jurnal Inform. dan Tek. Elektro Ter., Vol. 13, No. 1, 2025.

I. F. Rozi, E. N. Hamdana, and Muhammad Balya Iqbal Alfahmi, “Pengembangan Aplikasi Analisis Sentimen Twitter menggunakan Metode Naive Bayes Classifier (Studi Kasus SAMSAT Kota Malang),” J. Inform. Polinema, Vol. 4, No. 2, p. 149, 2018, DOI: 10.33795/jip.v4i2.164.

I. Juventius, T. Gurning, P. P. Adikara, and R. S. Perdana, “Analisis Sentimen Dokumen Twitter menggunakan Metode Naïve Bayes dengan Seleksi Fitur GU Metric,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., Vol. 7, No. 5, pp. 2169–2177, 2023, [Online]. Available: https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/12665

M. A. Kausar, A. Soosaimanickam, and M. Nasar, “Public Sentiment Analysis on Twitter Data during COVID-19 Outbreak,” Int. J. Adv. Comput. SCI. Appl., Vol. 12, No. 2, pp. 415–422, 2021, DOI: 10.14569/IJACSA.2021.0120252.

A. P. Giovani, A. Ardiansyah, T. Haryanti, L. Kurniawati, and W. Gata, “Analisis Sentimen Aplikasi Ruang Guru di Twitter menggunakan Algoritma Klasifikasi,” J. Teknoinfo, Vol. 14, No. 2, p. 115, 2020, DOI: 10.33365/jti.v14i2.679.

S. I. Nurhafida and F. Sembiring, “Analisis Text Clustering Masyarakat di Twiter mengenai Mcdonald’Sxbts menggunakan Orange Data Mining,” SISMATIK (Seminar Nas. Sist. Inf. dan Manaj. Inform., pp. 28–35, 2021.

T. Mufhimah, S. A. Yumnatusta, and N. A. Rakhmawati, “Dataset Isu Ijazah Jokowi.” 2025. DOI: https://doi.org/10.5281/zenodo.15636033.

J. Saputra, L. Maryani, D. Wulandari, W. Eka, P. T. Informatika, and P. T. Komputer, “Analisis Performa Naive Bayes dan SVM terhadap Sentimen Teks Media Sosial dengan Word2Vec dan SMOTE,” J. INSTEK (Informatika Sains dan Teknol., Vol. 10, No. 1, pp. 143–155, 2025.

M. A. Hermawan, A. Faqih, G. Dwilestari, T. Informatika, and S. Informasi, “Implementasi Akurasi Model Naive Bayes menggunakan Smote dalam Analisis Sentimen Pengguna Aplikasi Brimo,” JITET (Jurnal Inform. dan Tek. Elektro Ter., Vol. 13, No. 1, 2025.

A. Rasool, R. Tao, K. Marjan, and T. Naveed, “Twitter Sentiment Analysis: A Case Study for Apparel Brands,” J. Phys. Conf. Ser., Vol. 1176, No. 2, 2019, DOI: 10.1088/1742-6596/1176/2/022015.

S. Khoerunnisa, D. F. Shiddiq, and D. Nurhayati, “Application of the Naive Bayes Algorithm with TF-IDF and Cross Validation Techniques for Sentiment Analysis Towards Starlink Penerapan Algoritma Naive Bayes dengan Teknik TF-IDF dan Cross Validation untuk Analisis Sentimen Terhadap Starlink,” MALCOM Indones. J. Mach. Learn. Comput. SCI., Vol. 5, No. April, pp. 566–577, 2025.

R. N. Irawan, K. M. Hindrayani, and M. Idhom, “Penerapan Cross Validation sebagai Analisis Sentimen Pelayanan Publik Kereta Api Lokal Daop 8 menggunakan Metode Multinomial Naïve Bayes,” G-Tech J. Teknol. Terap., Vol. 8, No. 2, pp. 954–963, 2024.

M. F. Yulianto, F. M. Hana, and A. Prihandono, “Penerapan Algoritma Naïve Bayes dalam Analisis Sentimen terhadap Mobil Listrik,” Sainteks, Vol. 22, No. 1, pp. 109–115, 2025, DOI: 10.30595/sainteks.v22i1.26036.




DOI: https://doi.org/10.32520/stmsi.v15i6.6585

Article Metrics

Abstract view : 0 times
PDF - 0 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.