Optimization of the Naive Bayes Algorithm with SMOTETomek Combination for Imbalance Class Fraud Detection

Arief Tri Arsanto; Arif Faizin; Moch lutfi; Zulfatun Nikmatus Saadah

doi:10.32520/stmsi.v13i6.4719

Optimization of the Naive Bayes Algorithm with SMOTETomek Combination for Imbalance Class Fraud Detection

Arief Tri Arsanto, Arif Faizin, Moch lutfi, Zulfatun Nikmatus Saadah

Abstract

The use of credit cards in the modern era is increasing. Therefore, it is necessary to prevent it with the use of technology such as address verification systems (AVS), card verification methods (CVM), and personal identification Numbers (PIN). Dataset analysis needs to be carried out to analyze the history of transactions that have been carried out. In the fraud detection dataset, it can be seen that there are attributes that cause data imbalance. Class imbalance in a dataset is a significant problem in machine learning that can affect overall model performance. The number of majority samples is more significant in one class than the number of minority classes. This research used an oversampling approach using a combination of smote and tomek-link. The focus of this research is card fraud classification. Detection of imbalanced datasets or imbalanced classes is carried out using the Naive Bayes method as a classification algorithm. In addition, a combination of resampling techniques is also applied to overcome imbalanced classes in this dataset through the SMOTETomek approach. SMOTETomek is a method that reduces the number of samples by considering two adjacent data from the minority and majority classes. Meanwhile, from the problems above, the results of the performance of Naïve Bayes, which experienced issues with data imbalance in this study, a resampling method was proposed in the hope of improving the performance of the Naïve Bayes algorithm and in the results of the AUC ROC curve, the SMOTETomek method could improve the performance of the Naïve Bayes algorithm. The higher the ROC score. -AUC, the better the model performance in terms of its ability to differentiate between two classes, but the accuracy results do not experience a significant change.

Full Text:

PDF

References

B. Lebichot, Y.-A. Le Borgne, L. He-Guelton, F. Oblé, and G. Bontempi, “Deep-Learning Domain Adaptation Techniques for Credit Cards Fraud Detection,” no. January, pp. 78–88, 2020, doi: 10.1007/978-3-030-16841-4_8.

A. D. Pozzolo, “Adaptive Machine Learning for Credit Card Fraud Detection Declaration of Authorship,” Dr. - Univ. Libr. Bruxelles, no. December, p. 199, 2015, [Online]. Available: https://www.ulb.ac.be/di/map/adalpozz/pdf/Dalpozzolo2015PhD.pdf%0Ahttp://www.ulb.ac.be/di/map/adalpozz/

N. Ofek, L. Rokach, R. Stern, and A. Shabtai, “Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem,” Neurocomputing, vol. 243, pp. 88–102, 2017, doi: 10.1016/j.neucom.2017.03.011.

H. Huang, B. Liu, X. Xue, J. Cao, and X. Chen, “Imbalanced credit card fraud detection data: A solution based on hybrid neural network and clustering-based undersampling technique,” Appl. Soft Comput., vol. 154, p. 111368, 2024, doi: 10.1016/J.ASOC.2024.111368.

G. Tong and J. Shen, “Financial transaction fraud detector based on imbalance learning and graph neural network,” Appl. Soft Comput., vol. 149, p. 110984, Dec. 2023, doi: 10.1016/J.ASOC.2023.110984.

A. G. C. de Sá, A. C. M. Pereira, and G. L. Pappa, “A customized classification algorithm for credit card fraud detection,” Eng. Appl. Artif. Intell., vol. 72, no. March, pp. 21–29, 2018, doi: 10.1016/j.engappai.2018.03.011.

B. Xu, Y. Wang, X. Liao, and K. Wang, “Efficient fraud detection using deep boosting decision trees,” Decis. Support Syst., vol. 175, no. 28, p. 114037, 2023, doi: 10.1016/j.dss.2023.114037.

H. Zhu, G. Liu, M. Zhou, Y. Xie, A. Abusorrah, and Q. Kang, “Optimizing Weighted Extreme Learning Machines for imbalanced classification and application to credit card fraud detection,” Neurocomputing, vol. 407, pp. 50–62, 2020, doi: 10.1016/j.neucom.2020.04.078.

M. A. Islam, M. A. Uddin, S. Aryal, and G. Stea, “An ensemble learning approach for anomaly detection in credit card data with imbalanced and overlapped classes,” J. Inf. Secur. Appl., vol. 78, no. October, p. 103618, 2023, doi: 10.1016/j.jisa.2023.103618.

S. Akila and U. Srinivasulu Reddy, “Cost-sensitive Risk Induced Bayesian Inference Bagging (RIBIB) for credit card fraud detection,” J. Comput. Sci., vol. 27, pp. 247–254, 2018, doi: 10.1016/j.jocs.2018.06.009.

S. B. Belhaouari, A. Islam, K. Kassoul, A. Al-Fuqaha, and A. Bouzerdoum, “Oversampling techniques for imbalanced data in regression,” Expert Syst. Appl., vol. 252, no. PB, p. 124118, 2024, doi: 10.1016/j.eswa.2024.124118.

M. Lutfi, A. T. Arsanto, M. F. Amrulloh, and U. Kulsum, “Penanganan Data Tidak Seimbang Menggunakan Hybrid Method Resampling Pada Algoritma Naive Bayes Untuk Software Defect Prediction,” Informatics J., vol. 8, no. 2, 2023.

E. P. Kondy, S. Siswanto, and N. Ilyas, “Data Balancing Approach Using Combine Sampling on Sentiment Analysis With K - Nearest Neighbor,” Sist. J. Sist. Inf., vol. 13, pp. 1836–1851, 2024.

T. Menzies, Z. Milton, B. Turhan, B. Cukic, Y. Jiang, and A. Bener, “Defect prediction from static code features: Current results, limitations, new approaches,” Autom. Softw. Eng., vol. 17, no. 4, pp. 375–407, 2010, doi: 10.1007/s10515-010-0069-5.

S. Lessmann, B. Baesens, C. Mues, and S. Pietsch, “Benchmarking classification models for software defect prediction: A proposed framework and novel findings,” IEEE Trans. Softw. Eng., vol. 34, no. 4, pp. 485–496, 2008, doi: 10.1109/TSE.2008.35.

S. A. Putri, “Prediksi Cacat Software Dengan Teknik Sampel Dan Seleksi Fitur Pada Bayesian Network,” J. Kaji. Ilm., vol. 19, no. 1, p. 17, 2019, doi: 10.31599/jki.v19i1.314.

J. F. Díez-Pastor, J. J. Rodríguez, C. García-Osorio, and L. I. Kuncheva, “Random Balance: Ensembles of variable priors classifiers for imbalanced data,” Knowledge-Based Syst., vol. 85, no. May, pp. 96–111, 2015, doi: 10.1016/j.knosys.2015.04.022.

Z. Xu, D. Shen, T. Nie, and Y. Kou, “A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data,” J. Biomed. Inform., vol. 107, no. June, p. 103465, 2020, doi: 10.1016/j.jbi.2020.103465.

L. Jiang, L. Zhang, L. Yu, and D. Wang, “Class-specific attribute weighted naive Bayes,” Pattern Recognit., vol. 88, pp. 321–330, 2019, doi: 10.1016/j.patcog.2018.11.032.

T. T. H. Le, Y. Shin, M. Kim, and H. Kim, “Towards unbalanced multiclass intrusion detection with hybrid sampling methods and ensemble classification,” Appl. Soft Comput., vol. 157, no. February, p. 111517, 2024, doi: 10.1016/j.asoc.2024.111517.

C. Cassidy, “Parameter tuning Naïve Bayes for automatic patent classification,” World Pat. Inf., vol. 61, no. March, p. 101968, 2020, doi: 10.1016/j.wpi.2020.101968.

M. A. Latief, L. R. Nabila, W. Miftakhurrahman, S. Ma, H. Tantyoko, and M. A. Latief, “Handling Imbalance Data Using Hybrid Sampling SMOTE-ENN in Lung Cancer Classification Corresponding Author :,” vol. 3, no. 1, pp. 11–18, 2024, doi: 10.30812/IJECSA.v3i1.3758.

DOI: https://doi.org/10.32520/stmsi.v13i6.4719

Article Metrics

Abstract view : 520 times
PDF - 205 times

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Username
Password
Remember me