Feature Extraction Optimization to Improve Naïve Bayes Accuracy in Sentiment Analysis of Bulukumba Tourism Objects

Darmawan Setiawan, Najirah Umar, M. Adnan Nur

Abstract


This research employs social media (Twitter) to apply sentiment analysis ascertain the degree of public satisfaction with the Bulukumba tourist attraction. Unstructured text data is a major challenge in sentiment analysis. For this reason, implementing the Naïve Bayes algorithm is an effective approach for conquering this challenge because of its ability to handle text data well. This study aims to evaluate the performance of multinomial Naïve Bayes by testing a combination of minimum document frequency (min-df) and maximum document frequency (max-df) parameter values in determining the level of accuracy. This analysis stage includes collecting data from Twitter related to the Bulukumba tourist attraction. Preprocessing carried out includes data cleaning, casefolding, text normalization, tokenization, stopword removal, and stemming. Feature extraction using Count Vectorizer and TF-IDF weighting. The process ends with 10-Fold Cross-Validation by separating the data into training data and test data for sentiment analysis classification, as well as evaluation using the Confusion Matrix. In this research, there are 10 test scenarios with various combinations of min-df and max-df. The values of employed min-df consists of 0.001, 0.002, 0.005, 0.01, 0.02 and max-df consists of 0.5 and 0.8. The results of implementing Multinomial Naïve Bayes in this test show that classification accuracy increases with effective min-df and max-df parameter settings. The greatest accuracy was 0.7910 in testing a combination of min-df parameter values of 0.001 and max-df 0.8. Meanwhile, the average accuracy for each test was obtained the highest value of 0.7272 with min-df of 0.002 and max-df of 0.5 and 0.8 respectively.

Full Text:

PDF

References


S. H. Putri and L. O. Maharani, “Penggunaan Media Sosial Twitter @Txtdari Pemerintah Sebagai Saluran Penyebaran Berita Dalam Membentuk Opini Publik,” J. Komun. dan Desain, vol. 04, no. 02, pp. 79–88, 2021.

M. S. S. Kumar, Saurav, “Social Media and Its Impact on Consumers Behaviour,” Int. J. Multidiscip. Res., vol. 5, no. 2, pp. 1–9, 2023, doi: 10.36948/ijfmr.2023.v05i02.2252.

N. Aida, G. Atiqasani, and W. A. Palupi, “The Effect of the Tourism Sector on Economic Growth in Indonesia,” Wseas Trans. Bus. Econ., vol. 21, pp. 1158–1166, 2024, doi: 10.37394/23207.2024.21.95.

K. RI, “7 Destinasi Wisata Unggulan di Bulukumba yang Wajib Dikunjungi,” 2021. https://kemenparekraf.go.id/hasil-pencarian/7-destinasi-wisata-unggulan-di-bulukumba-yang-wajib-dikunjungi

M. A. Rahim, N. A. Bakar, N. A. A. N. Hashim, N. M. M. Nawi, and H. Wee, “Empirical Evidence From the Tourism Industry on the Factors That Affect Tourist Destination Satisfaction,” Geoj. Tour. Geosites , vol. 44, no. 4, pp. 1209–1215, 2022, doi: 10.30892/gtg.44404-936.

R. Kora and A. Mohammed, “An enhanced approach for sentiment analysis based on meta-ensemble deep learning,” Soc. Netw. Anal. Min., vol. 13, no. 1, pp. 1–13, 2023, doi: 10.1007/s13278-023-01043-6.

P. Agarwal, “Developing an Approach to Evaluate and Observe Sentiments of Tweets,” Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol., vol. 5, no. 3, pp. 473–479, 2019, doi: 10.32628/cseit1953143.

M. Dagar, A. Kajal, and P. Bhatia, “Twitter Sentiment Analysis using Supervised Machine Learning Techniques,” 2021 5th Int. Conf. Inf. Syst. Comput. Networks, ISCON 2021, no. March, 2021, doi: 10.1109/ISCON52037.2021.9702333.

R. Situmorang, U. M. Husni Tamyis, and L. S. Andar Muni, “Analisis Sentimen Destinasi Wisata Di Jawabarat Pada Twitter Menggunakan Algoritma Naive Bayes Classifier,” Simtek J. Sist. Inf. dan Tek. Komput., vol. 8, no. 2, pp. 339–342, 2023, doi: 10.51876/simtek.v8i2.287.

K. H. Chan and S. K. Im, “Sentiment analysis by using Naïve-Bayes classifier with stacked CARU,” Electron. Lett., vol. 58, no. 10, pp. 411–413, 2022, doi: 10.1049/ell2.12478.

R. Saxena, Solanki; Arun, Text Classification Using Self-Structure Extended Multinomial Naive Bayes. 2020. doi: 10.4018/978-1-5225-9643-1.ch006.

M. Tezgider, B. Yildiz, and G. Aydin, “Text classification using improved bidirectional transformer,” Concurr. Comput. Pract. Exp., vol. 34, no. 9, 2022, doi: 10.1002/cpe.6486.

N. Umar and M. Adnan Nur, “Application of Naïve Bayes Algorithm Variations On Indonesian General Analysis Dataset for Sentiment Analysis,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 6, no. 4, pp. 585–590, 2022, doi: 10.29207/resti.v6i4.4179.

F. A. J. Ayomi and K. E. Dewi, “Analisis Emosi pada Media Sosial Twitter Menggunakan Metode Multinomial Naive Bayes dan Synthetic Minority Oversampling Technique,” Komputa J. Ilm. Komput. dan Inform., vol. 12, no. 2, pp. 9–19, 2023, doi: 10.34010/komputa.v12i2.9454.

M. Adi Nugroho and R. Sulistiyowati, “Sentiment Analysis Netizens on Social Media Twitter Against Indonesian Presidential Candidates in 2024 Using Naive Bayes Classifier Algorithm,” vol. 7, no. 3, pp. 1611–1622, 2023, doi: 10.30865/mib.v7i3.6536.

N. Agustiana, O. N. Pratiwi, and H. Fakhrurroja, “Comparison Of Sentiment Analysis Of Traveloka And Tiket.Com Applications On Twitter Using The Naive Bayes Method,” ITEJ (Information Technol. Eng. Journals), vol. 8, no. 2, pp. 73–83, 2023, doi: 10.24235/itej.v8i2.119.

J. C. Aponno, “Penerapan Algoritma Sentimen Analysis dan Naïve Bayes terhadap opini pengunjung di tempat wisata pantai Pintu Kota, Kota Ambon,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 9, no. 4, pp. 3180–3188, 2022, doi: 10.35957/jatisi.v9i4.2697.

Y. A. Singgalen, “Analisis Sentimen Wisatawan Melalui Data Ulasan Candi Borobudur di Tripadvisor Menggunakan Algoritma Naïve Bayes Classifier,” Build. Informatics, Technol. Sci., vol. 4, no. 3, 2022, doi: 10.47065/bits.v4i3.2486.

S. A. Putra and A. Wijaya, “Analisis Sentimen Artificial Intelligence (Ai) Pada Media Sosial Twitter Menggunakan Metode Lexicon Based,” JuSiTik J. Sist. dan Teknol. Inf. Komun., vol. 7, no. 1, pp. 21–28, 2023, doi: 10.32524/jusitik.v7i1.1042.

J. P. Munggaran, A. A. Alhafidz, M. Taqy, D. A. R. Agustini, and M. Munawir, “Sentiment Analysis of Twitter Users’ Opinion Data Regarding the Use of ChatGPT in Education,” J. Comput. Eng. Electron. Inf. Technol., vol. 2, no. 2, pp. 75–88, 2023, doi: 10.17509/coelite.v2i2.59645.

R. Rasenda, H. Lubis, and R. Ridwan, “Implementasi K-NN Dalam Analisa Sentimen Riba Pada Bunga Bank Berdasarkan Data Twitter,” J. Media Inform. Budidarma, vol. 4, no. 2, p. 369, 2020, doi: 10.30865/mib.v4i2.2051.

R. Rifaldi, J. Indra, A. R. Pratama, and A. R. Juwita, “Analisis Sentimen Pemboikotan Produk dengan Pendekatan Algoritma Naïve Bayes Media Sosial X,” vol. 5, no. 4, pp. 940–946, 2024, doi: 10.47065/josh.v5i4.5420.

M. Adnan Nur, N. Wardhani, and C. Author, “Optimasi Normalisasi Kata Pada Data Twitter Untuk Meningkatkan Akurasi Analisis Sentimen (Studi Kasus Respon Masyarakat Terhadap Layanan Teman Bus),” J. Fokus Elektroda Energi List. Telekomun. Komputer, Elektron. dan Kendali), vol. 7, no. 4, pp. 237–243, 2022, [Online]. Available: https://elektroda.uho.ac.id/index.php/journal/article/view/21

L. Andraini and T. Komputer, “Analisis Part of Tagging Bahasa Indonesia tentang Swamedikasi Pada Dialog Interactive Qestion dengan Brill TAGGER,” Teknologipintar.org, vol. 2, no. 10, pp. 2022–2023, 2022.

A. P. Wibawa, F. Miftahuddin, and D. Suyono, “K-MEDOIDS CLUSTERING UNTUK PEMBENTUKAN DATABASE STOPWORD BAHASA JAWA K-Medoids Clustering for the Establishment of Javanese Language Stopword Database,” Ranah J. Kaji. Bhs., vol. 10, no. 2, pp. 261–269, 2021, [Online]. Available: https://doi.org/10.26499/rnh/v9i2.1490

ichsan nur irmasnyah Nurul chafid, luqman mujianto, “Penerapan Filter Kata Menggunakan Metode Stemming Pada Aplikasi Chatting Berbasis Web,” vol. 1, no. 1, pp. 1–9, 2020.

S. Mohd Sofi and A. Selamat, “Aspect Based Sentiment Analysis: Feature Extraction using Latent Dirichlet Allocation (LDA) and Term Frequency - Inverse Document Frequency (TF-IDF) in Machine Learning (ML),” Malaysian J. Inf. Commun. Technol., vol. 8, no. 2, pp. 169–179, 2023, doi: 10.53840/myjict8-2-102.

K. M. Suryaningrum, “Comparison of the TF-IDF Method with the Count Vectorizer to Classify Hate Speech,” Eng. Math. Comput. Sci. J., vol. 5, no. 2, pp. 79–83, 2023, doi: 10.21512/emacsjournal.v5i2.9978.

I. Widaningrum, D. Mustikasari, R. Arifin, S. L. Tsaqila, and D. Fatmawati, “Algoritma Term Frequency-Inverse Document Frequency (TF-IDF) dan K-Means Clustering Untuk Menentukan Kategori Dokumen,” Pros. Semin. Nas. Sist. Inf. dan Teknol., pp. 145–149, 2022.

M. T. Razaq, D. Nurjanah, and H. Nurrahmi, “Analisis Sentimen Review Film Menggunakan Naive Bayes Classifier dengan Fitur TF-IDF,” e-Proceeding Eng., vol. 10, no. 2, pp. 1698–1712, 2023, [Online]. Available: https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/19997

T. Ridwansyah, “Implementasi Text Mining Terhadap Analisis Sentimen Masyarakat Dunia Di Twitter Terhadap Kota Medan Menggunakan K-Fold Cross Validation Dan Naïve Bayes Classifier,” KLIK Kaji. Ilm. Inform. dan Komput., vol. 2, no. 5, pp. 178–185, 2022, doi: 10.30865/klik.v2i5.362.

L. Mardiana, D. Kusnandar, and N. Satyahadewi, “Analisis Diskriminan Dengan K Fold Cross Validation Untuk Klasifikasi Kualitas Air Di Kota Pontianak,” Bul. Ilm. Mat. Stat. dan Ter., vol. 11, no. 1, pp. 97–102, 2022.

A. Car et al., “Penerapan Algoritma Decision Tree Untuk Seleksi Penerima Beasiswa (Studi Kasus: Smpn 1 Soreang),” Int. J. Technol., vol. 47, no. 1, p. 100950, 2023, [Online]. Available: https://doi.org/10.1016/j.tranpol.2019.01.002%0Ahttps://doi.org/10.1016/j.cstp.2023.100950%0Ahttps://doi.org/10.1016/j.geoforum.2021.04.007%0Ahttps://doi.org/10.1016/j.trd.2021.102816%0Ahttps://doi.org/10.1016/j.tra.2020.03.015%0Ahttps://doi.org/10.1016/j




DOI: https://doi.org/10.32520/stmsi.v13i5.4580

Article Metrics

Abstract view : 38 times
PDF - 11 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.