TF-IDF Weighting to Detect Spammer Accounts on Twitter based on Tweets and Retweet Representation of Tweets

Arif Mudi Priyatno; Lidya Ningsih

doi:10.32520/stmsi.v11i3.1995

TF-IDF Weighting to Detect Spammer Accounts on Twitter based on Tweets and Retweet Representation of Tweets

Arif Mudi Priyatno, Lidya Ningsih

Abstract

Twitter is a social media service that is often used (popular) as a means of communication between users. Twitter's popularity makes spammers spam for personal purposes and gains. Bot spammers are user abuse on Twitter social media. Spammers spread spam repeatedly to other users. This spam is done with the aim of achieving trending topics. Spam activity is carried out by imitating the behavior patterns of real users so that they are not detected as acts of Twitter abuse. in this paper proposed a TF-IDF weighting to detect spammer accounts on Twitter based on tweets and retweet representation of tweets. The purpose of this study is to detect Bot Spammers or Humans using a classification technique using the Naive Bayes algorithm. The best experimental results in the division of 70% training data and 30% test data obtained 92% accuracy with precision and recall of 100% and 87.5%, respectively. This shows that it has successfully detected spammer accounts on Twitter.

Full Text:

PDF

References

A. M. Priyatno, M. M. Muttaqi, F. Syuhada, and A. Z. Arifin, “Deteksi Bot Spammer Twitter Berbasis Time Interval Entropy dan Global Vectors for Word Representations Tweet’s Hashtag,” Regist. J. Ilm. Teknol. Sist. Inf., vol. 5, no. 1, p. 37, Jan. 2019, doi: 10.26594/register.v5i1.1382.

R. Gilmary, A. Venkatesan, and G. Vaiyapuri, “Detection of automated behavior on Twitter Through Approximate Entropy and Sample Entropy,” Pers. Ubiquitous Comput., Sep. 2021, doi: 10.1007/s00779-021-01647-9.

T. Ruan, Q. Kong, S. K. McBride, A. Sethjiwala, and Q. Lv, “Cross-platform Analysis of Public Responses to the 2019 Ridgecrest Earthquake Sequence on Twitter and Reddit,” Sci. Rep., vol. 12, no. 1, pp. 1–14, 2022, doi: 10.1038/s41598-022-05359-9.

S. Bazzaz Abkenar, E. Mahdipour, S. M. Jameii, and M. Haghi Kashani, “A hybrid Classification Method for Twitter Spam Detection based on Differential Evolution and Random Forest,” Concurr. Comput. Pract. Exp., vol. 33, no. 21, pp. 1–20, 2021, doi: 10.1002/cpe.6381.

M. Heidari, J. H. J. Jones, and O. Uzuner, “Online User Profiling to Detect Social Bots on Twitter,” arXiv, Mar. 2022, [Online]. Available: http://arxiv.org/abs/2203.05966

A. S. Alhassun and M. A. Rassam, “A Combined Text-Based and Metadata-Based Deep-Learning Framework for the Detection of Spam Accounts on the Social Media Platform Twitter,” Processes, vol. 10, no. 3, p. 439, 2022, doi: 10.3390/pr10030439.

A. M. Priyatno, “Spammer Detection based on Account, Tweet, and Communication Activity on Twitter,” J. Ilmu Komput. dan Inf., vol. 13, no. 2, pp. 97–107, Jul. 2020, doi: 10.21609/jiki.v13i2.871.

L. D. Samper-Escalante, O. Loyola-González, R. Monroy, and M. A. Medina-Pérez, “Bot Datasets on Twitter: Analysis and Challenges,” Appl. Sci., vol. 11, no. 9, pp. 1–25, 2021, doi: 10.3390/app11094105.

Y. Wu, Y. Fang, S. Shang, J. Jin, L. Wei, and H. Wang, “A Novel Framework for Detecting Social Bots with Deep Neural Networks and Active Learning,” Knowledge-Based Syst., vol. 211, p. 106525, 2021, doi: 10.1016/j.knosys.2020.106525.

N. Andriani and A. Wibowo, “Implementasi Text Mining Klasifikasi Topik Tugas Akhir Mahasiswa Teknik Informatika menggunakan Pembobotan TF-IDF dan Metode Cosine Similarity Berbasis Web,” in Seminar Nasional Mahasiswa Ilmu Komputer dan Aplikasinya (SENAMIKA), 2021, pp. 130–137.

A. Supriatman, “Pembobotan TF-IDF pada Judul Penelitian Dosen Sebagai Dasar Klasifikasi Menggunakan Algoritma K-NN (Studi Kasus: Universitas Siliwangi),” J. Serambi Eng., vol. 6, no. 1, pp. 1573–1579, 2021, doi: 10.32672/jse.v6i1.2645.

R. Dwiyansaputra, G. S. Nugraha, F. Bimantoro, and A. Aranta, “Deteksi SMS Spam Berbahasa Indonesia Menggunakan TF-IDF dan Stochastic Gradient Descent Classifier ( Indonesian SMS Spam Detection using TF-IDF and Stochastic Gradient Descent,” J. Teknol. Informasi, Komput. dan Apl., vol. 3, no. 2, pp. 200–207, 2021, [Online]. Available: https://jtika.if.unram.ac.id/index.php/JTIKA/article/view/145

T. W. D. Sari, “Penerapan Text Mining dengan menggunakan Algoritma TF-IF Untuk Klasifikasi Genre Novel,” Pelita Inform. Inf. dan Inform., vol. 10, no. 1, pp. 29–37, 2021, [Online]. Available: http://www.stmik-budidarma.ac.id/ejurnal/index.php/pelita/article/view/3142

Oryza Habibie Rahman, Gunawan Abdillah, and Agus Komarudin, “Klasifikasi Ujaran Kebencian pada Media Sosial Twitter Menggunakan Support Vector Machine,” J. RESTI (Rekayasa Sist. dan Teknol. Informasi), vol. 5, no. 1, pp. 17–23, 2021, doi: 10.29207/resti.v5i1.2700.

N. S. Mohd Nafis and S. Awang, “An Enhanced Hybrid Feature Selection Technique using Term Frequency-Inverse Document Frequency and Support Vector Machine-Recursive Feature Elimination for Sentiment Classification,” IEEE Access, vol. 9, no. Ml, pp. 52177–52192, 2021, doi: 10.1109/ACCESS.2021.3069001.

E. Sutoyo, A. P. Rifai, A. Risnumawan, and M. Saputra, “A Comparison of Text Weighting Schemes on Sentiment Analysis of Government Policies: A Case Study of Replacement of National Examinations,” Multimed. Tools Appl., vol. 81, no. 5, pp. 6413–6431, 2022, doi: 10.1007/s11042-022-11900-9.

I. M. De Diego, A. R. Redondo, R. R. Fernández, J. Navarro, and J. M. Moguerza, “General Performance Score for classification problems,” Appl. Intell., no. January 2021, pp. 1–15, Jan. 2022, doi: 10.1007/s10489-021-03041-7.

DOI: https://doi.org/10.32520/stmsi.v11i3.1995

Article Metrics

Abstract view : 574 times
PDF - 237 times

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Username
Password
Remember me