Optimization of Phishing Detection Performance with Variable Correlation Analysis and Imbalance Learning

Samsul Arifin; Fandy Setyo Utomo

doi:10.32520/stmsi.v15i2.4671

Optimization of Phishing Detection Performance with Variable Correlation Analysis and Imbalance Learning

Samsul Arifin, Fandy Setyo Utomo

Abstract

Phishing is a common cyber security threat in which attackers attempt to deceive users into disclosing personal information such as passwords, credit card numbers, and other sensitive data.
With the rapid advancement of technology, phishing techniques have become increasingly sophisticated and harder to detect using traditional methods. Therefore, it is essential to develop detection techniques capable of identifying phishing websites with high accuracy. This study aims to optimize phishing detection performance by integrating variable correlation analysis for feature selection and applying imbalanced learning techniques to address data imbalance. The research stages include Data Collection, Data Preprocessing, and Data Exploration, which involve correlation analysis, removal of low-correlation features, and data visualization. In the Model Building and Training phase, the dataset is split into features and labels, followed by training and the application of data balancing techniques, ending with Model Evaluation. The evaluated algorithms include Logistic Regression, Naive Bayes, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Multi-Layer Perceptron, Decision Tree, Random Forest, Gradient Boosting, and CatBoost. The results show that the KNN algorithm delivers the best performance, achieving an accuracy of 91.25% and optimal scores in Precision (0.906943), Recall (0.927858), and F1-Score (0.922141), along with the lowest Hamming Loss at 0.0875. In contrast, the SVM algorithm recorded the lowest performance among the tested models. The implementation of this method is expected to contribute to the development of more reliable and accurate phishing detection systems in the future.

Keywords

phishing detection; model performance optimization; machine learning

Full Text:

PDF

References

T. A. Assegie*, “K-Nearest Neighbor Based URL Identification Model for Phishing Attack Detection,” IJAINN, Vol. 1, No. 2, pp. 18–21, Apr. 2021, doi: 10.35940/ijainn.B1019.041221.

Y. Muliono, M. A. Ma’ruf, and Z. M. Azzahra, “Phishing Site Detection Classification Model using Machine Learning Approach,” EMACS Journal, Vol. 5, No. 2, pp. 63–67, May 2023, doi: 10.21512/emacsjournal.v5i2.9951.

A. F. Mahmud and S. Wirawan, “Phishing Website Detection using Machine Learning Classification Method,” SISTEMASI, Vol. 13, No. 4, p. 1368, Jul. 2024, doi: 10.32520/stmsi.v13i4.3456.

A. S. Y. Irawan, N. Heryana, H. S. Hopipah, and D. Rahma, “Identifikasi Website Phishing dengan Perbandingan Algoritma Klasifikasi,” Syntax J. Inf., Vol. 10, No. 01, pp. 57–67, Jun. 2021, doi: 10.35706/syji.v10i01.5292.

A. S. Sunge, “Komparasi Machine Learning Memprediksi Phising dalam Keamanan Website,” 2022.

B. M. P. Waseso and N. A. Setiyanto, “Web Phishing Classification using Combined Machine Learning Methods,” J. Comput. Theor. Appl., Vol. 1, No. 1, pp. 11–18, Aug. 2023, doi: 10.33633/jcta.v1i1.8898.

A. E. Topcu, Y. I. Alzoubi, E. Elbasi, and E. Camalan, “Social Media Zero-Day Attack Detection using TensorFlow,” Electronics, Vol. 12, No. 17, p. 3554, Aug. 2023, doi: 10.3390/electronics12173554.

F. F. Tampinongkol, A. R. Kamila, A. C. Wardhana, A. W. Candra, and D. Revaldo, “Implementation of Random Forest Classification and Support Vector Machine Algorithms for Phishing Link Detection,” 2024.

M. Vebriani and W. Yustanti, “Klasifikasi Deteksi Link Phising DANA Kaget menggunakan Metode Support Vector Machine berbasis Website,” Journal of Informatics and Computer Science, Vol. 3 No. 2 Juli 2024 Hal. 82-91, doi: https://doi.org/10.22303/upu.1.1.2021.01-10.

A. D. Harahap, D. Juardi, and A. S. Y. Irawan, “Rancang Bangun Sistem Pendeteksi Link Phishing menggunakan Algoritma Random Forest berbasis Web,” JITET, Vol. 12, No. 3, Aug. 2024, doi: 10.23960/jitet.v12i3.4858.

O. Kayode-Ajala, “Applying Machine Learning Algorithms for Detecting Phishing Websites: Applications of SVM, KNN, Decision Trees, and Random Forests”.

Nor Hapiza Mohd Ariffin, Muhammad Imtiaz Mohamed Iqbal, Marina Yusoff, and Nurul Akhmal Mohd Zulkefli, “A Study on the Best Classification Method for an Intelligent Phishing Website Detection System,” ARASET, Vol. 48, No. 2, pp. 197–210, Jul. 2024, doi: 10.37934/araset.48.2.197210.

F. Raihan and R. Renaldy, “Efektivitas Algoritma Artificial Intelligence dalam Melawan Serangan Zero-Day,” Vol. 2, 2024.

L. Lakshmi, M. P. Reddy, C. Santhaiah, and U. J. Reddy, “Smart Phishing Detection in Web Pages using Supervised Deep Learning Classification and Optimization Technique ADAM,” Wireless Pers Commun, Vol. 118, No. 4, pp. 3549–3564, Jun. 2021, doi: 10.1007/s11277-021-08196-7.

D. Wahyudi, M. Niswar, and A. A. P. Alimuddin, “Website Phising Detection Application using Support Vector Machine (SVM),” JITU, Vol. 5, No. 1, pp. 18–24, Jun. 2022, doi: 10.56873/jitu.5.1.4836.

DOI: https://doi.org/10.32520/stmsi.v15i2.4671

Article Metrics

Abstract view : 692 times
PDF - 227 times

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Username
Password
Remember me