ETLE Sentiment Analysis Performance Increasement with TF-IDF, MDI Feature Selection, and SVM

Muhammad Syiarul Amrullah, Aji Gautama Putrada, Mohamad Nurkamal Fauzan, Nur Alamsyah

Abstract


In Indonesia, the government, through the Indonesian National Police (POLRI), has just released a new regulation, the Electronic Traffic Law Enforcement (ETLE). A traffic ticket policy is carried out electronically through camera monitoring connected directly to the vehicle registration certificates (STNK) database. The government can measure people's likes or dislikes of these public policies through sentiment analysis. There have been studies that have applied sentiment analysis to find out people's responses to ETLE. However, in terms of performance, this model only has an accuracy of 0.42. This study proposes the use of a support vector machine (SVM), term frequency-inversed document frequency (TF-IDF), and mean decrease in impurity (MDI) to evaluate polarization sentiment analysis on ETLE policies. First, we retrieve tweets about ETLE from Twitter. Then we do text analysis pre-processing and the remove stop words process. The next step is to carry out the TF-IDF process. We apply two feature selection methods for our comparison: MDI and recurrent feature elimination (RFE). Next, we compare two classification models, namely naïve Bayes and SVM. Some  of the metrics that we use to evaluate the pre-processing stage are the probability density function (PDF) and the t-test. We use the bag of words (BoW) to evaluate the remove stop words stage. Finally, sensitivity, specificity, and the receiver operating curve (ROC) are for evaluating feature selection methods and classification methods. The test results show that TF-IDF produces 1,022 new features. The combination of the methods we used resulted in the six models we compared. SVM+TF-IDF+MDI is the model with the best performance compared to the other five models. Accuracy and area under curve (AUC) scores are 0.99 and 0.97, respectively.

Full Text:

PDF

References


E. Syafitri and D. Mashur, “Efektivitas Implementasi Program Electronic Traffic Law envorcement (ETLE) Nasional dalam Peningkatan Pelayanan Publik di Kota Pekanbaru,” Cross-Bord., vol. 5, no. 2, pp. 1322–1337, 2022.

F. A. Abdullah and F. Windiyastuti, “Electronic Traffic Law Enforcement (ETLE) sebagai Digitalisasi Proses Tilang,” J. Kewarganegaraan, vol. 6, no. 2, pp. 3004–3008, 2022.

E. Georgiadou, S. Angelopoulos, and H. Drake, “Big Data Analytics and international negotiations: sentiment analysis of Brexit Negotiating Outcomes,” Int. J. Inf. Manag., vol. 51, p. 102048, 2020.

R. Khalida and S. Setiawati, “Analisis Sentimen Sistem E-Tilang menggunakan Algoritma Naive Bayes dengan Optimalisasi Information Gain,” J. Inform. Inf. Secur., vol. 1, no. 1, 2020.

A. M. Rahat, A. Kahir, and A. K. M. Masum, “Comparison of Naive Bayes and Svm Algorithm Based on Sentiment Analysis using Review Dataset,” in 2019 8th International Conference System Modeling and Advancement in Research Trends (SMART), 2019, pp. 266–270.

A. G. Putrada, I. D. Wijaya, and D. Oktaria, “Overcoming Data Imbalance Problems in Sexual Harassment Classification with SMOTE,” Int. J. Inf. Commun. Technol. IJoICT, vol. 8, no. 1, pp. 20–29, 2022.

A. Madasu and S. Elango, “Efficient Feature Selection Techniques for Sentiment Analysis,” Multimed. Tools Appl., vol. 79, no. 9, pp. 6313–6335, 2020.

N. S. M. Nafis and S. Awang, “An Enhanced Hybrid Feature Selection Technique using Term Frequency-Inverse Document Frequency and Support Vector Machine-Recursive Feature Elimination For Sentiment Classification,” IEEE Access, vol. 9, pp. 52177–52192, 2021.

G. Rabby and P. Berka, “Multi-Class Classification of COVID-19 Documents using machine Learning Algorithms,” J. Intell. Inf. Syst., pp. 1–21, 2022.

I. Pratama and S. Suswanta, “Artificial Intelligence in Realizing Smart City through City Operation Center,” in International Conference on Public Organization (ICONPO 2021), 2022, pp. 53–60.

W. A. Prabowo and F. Azizah, “Sentiment Analysis for Detecting Cyberbullying using TF-IDF And SVM,” J. RESTI Rekayasa Sist. Dan Teknol. Inf., vol. 4, no. 6, pp. 1142–1148, 2020.

M. Alkaff, A. R. Baskara, and Y. H. Wicaksono, “Sentiment Analysis of Indonesian Movie Trailer on YouTube Using Delta TF-IDF and SVM,” in 2020 Fifth International Conference on Informatics and Computing (ICIC), 2020, pp. 1–5.

N. S. M. Nafis and S. Awang, “The Evaluation of Accuracy Performance in An Enhanced Embedded Feature Selection for Unstructured Text Classification,” Iraqi J. Sci., pp. 3397–3407, 2020.

A. G. Putrada, M. Abdurohman, D. Perdana, and H. H. Nuha, “Machine Learning Methods in Smart Lighting Toward Achieving User Comfort: A Survey,” IEEE Access, vol. 10, pp. 45137–45178, 2022, doi: 10.1109/ACCESS.2022.3169765.

A. Thakkar and K. Chaudhari, “Predicting Stock Trend using an Integrated Term Frequency–Inverse Document Frequency-Based Feature Weight Matrix with Neural Networks,” Appl. Soft Comput., vol. 96, p. 106684, 2020.

A. S. Neogi, K. A. Garg, R. K. Mishra, and Y. K. Dwivedi, “Sentiment Analysis and Classification of Indian Farmers’ Protest using Twitter Data,” Int. J. Inf. Manag. Data Insights, vol. 1, no. 2, p. 100019, 2021.

S.-W. Kim and J.-M. Gil, “Research Paper Classification Systems Based on TF-IDF and LDA schemes,” Hum.-Centric Comput. Inf. Sci., vol. 9, no. 1, pp. 1–21, 2019.

E. S. Saputra, A. G. Putrada, and M. Abdurohman, “Selection of Vape Sensing Features in IoT-Based Gas Monitoring with Feature Importance Techniques,” in 2019 Fourth International Conference on Informatics and Computing (ICIC), 2019, pp. 1–5.

A. Sutera, G. Louppe, V. A. Huynh-Thu, L. Wehenkel, and P. Geurts, “From global to local MDI variable importances for random forests and when they are Shapley values,” Adv. Neural Inf. Process. Syst., vol. 34, pp. 3533–3543, 2021.

M. Ameliasari, A. G. Putrada, and R. R. Pahlevi, “An Evaluation of Svm In Hand Gesture Detection using Imu-Based Smartwatches for Smart Lighting Control,” J. Infotel, vol. 13, no. 2, pp. 47–53, 2021.

A. G. Putrada, M. Abdurohman, D. Perdana, and H. H. Nuha, “CIMA: A Novel Classification-Integrated Moving Average Model for Smart Lighting Intelligent Control Based on Human Presence,” Complexity, vol. 2022, pp. 1–19, Sep. 2022, doi: 10.1155/2022/4989344.

B. A. Fadillah, A. G. Putrada, and M. Abdurohman, “A Wearable Device for Enhancing Basketball Shooting Correctness with MPU6050 Sensors and Support Vector Machine Classification,” Kinet. Game Technol. Inf. Syst. Comput. Netw. Comput. Electron. Control, 2022.

M. B. Satrio, A. G. Putrada, and M. Abdurohman, “Evaluation of Face Detection and Recognition Methods in Smart Mirror Implementation,” in Proceedings of Sixth International Congress on Information and Communication Technology, 2022, pp. 449–457.

S. F. Pane, Heriyanto, A. G. Putrada, N. Alamsyah, and M. N. Fauzan, “The Influence of The COVID-19 Pandemics in Indonesia On Predicting Economic Sectors,” in 2022 Seventh International Conference on Informatics and Computing (ICIC), Dec. 2022, pp. 1–6. doi: 10.1109/ICIC56845.2022.10006897.

H. J. Alyamani, “Determining Feature-Size for Text to Numeric Conversion based on BOW and TF-IDF,” IJCSNS, vol. 22, no. 1, p. 283, 2022.

D. Elavarasan, D. R. Vincent PM, K. Srinivasan, and C.-Y. Chang, “A hybrid CFS filter and RF-RFE wrapper-based feature extraction for enhanced agricultural crop yield prediction modeling,” Agriculture, vol. 10, no. 9, p. 400, 2020.

H. Jeon and S. Oh, “Hybrid-Recursive Feature Elimination for Efficient Feature Selection,” Appl. Sci., vol. 10, p. 3211, May 2020, doi: 10.3390/app10093211.

S. Chen, G. I. Webb, L. Liu, and X. Ma, “A novel selective naïve Bayes algorithm,” Knowl.-Based Syst., vol. 192, p. 105361, 2020.

A. G. Putrada, N. Alamsyah, S. F. Pane, and M. N. Fauzan, “XGBoost for IDS on WSN Cyber Attacks with Imbalanced Data,” in 2022 International Symposium on Electronics and Smart Devices (ISESD), Nov. 2022, pp. 1–7. doi: 10.1109/ISESD56103.2022.9980630.

A. G. Putrada and D. Perdana, “Improving Thermal Camera Performance in Fever Detection during COVID-19 Protocol with Random Forest Classification,” in 2021 International Conference Advancement in Data Science, E-learning and Information Systems (ICADEIS), 2021, pp. 1–6.

J. Singla, “Comparing ROC Curve Based Thresholding Methods In Online Transactions Fraud Detection System using Deep Learning,” in 2021 international conference on computing, communication, and intelligent systems (ICCCIS), 2021, pp. 9–12.

A. S. Jadhav, “A novel Weighted TPR-TNR Measure To Assess Performance of The Classifiers,” Expert Syst. Appl., vol. 152, p. 113391, 2020.

J. Yin, F. Mutiso, and L. Tian, “Joint Hypothesis Testing of the Area Under the Receiver Operating Characteristic Curve and The Youden Index,” Pharm. Stat., vol. 20, no. 3, pp. 657–674, 2021.

A. Wubalem and M. Meten, “Landslide Susceptibility Mapping using Information Value And Logistic Regression Models In Goncha Siso Eneses Area, Northwestern Ethiopia,” SN Appl. Sci., vol. 2, pp. 1–19, 2020.

J. Pereira and F. Saraiva, “Convolutional Neural Network Applied To Detect Electricity Theft: A Comparative Study on Unbalanced Data Handling Techniques,” Int. J. Electr. Power Energy Syst., vol. 131, p. 107085, 2021.




DOI: https://doi.org/10.32520/stmsi.v13i4.2701

Article Metrics

Abstract view : 737 times
PDF - 409 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.