Comparison of Logistic Regression and Random Forest using Correlation-based Feature Selection for Phishing Website Detection
Abstract
The world is currently experiencing mass developments in information technology, especially during the current pandemic, which requires all of us to learn and even work online. They are triggered much crime in the internet world. One of them is stealing internet user data through a fake website built like the original or called a phishing website. In this research , a classification model is needed to detect phishing websites using the best performance from one of the logistic regression and random forest classification algorithms to overcome the rise of phishing websites in cyberspace. Classification performance can be improved using the correlation-based feature selection (CFS) method to select the most influential attribute in detecting web phishing. Based on the test results, applying the logistic regression and random forest classification algorithm in the classification of web phishing resulted in an accuracy of 93.035% and 96.834%. After feature selection with CFS, the accuracy was 92.718% and 97.015%, respectively. On the Testing, There was an increase in accuracy in RandomForest by 0.181% and an insignificant decrease in logistic regression. The test results prove that feature selection with CFS can eliminate redundant attributes and the resulting classification algorithm accuracy is not much different when the details are complete and Random Forest has accuracy better than after using CSF.
Keywords: website phishing, classification, logistic regression, random forest, correlation-based
Full Text:
PDFReferences
Irawan, Agung Susilo Yuda, dkk, “Identifikasi Website Phising dengan Perbandingan Algoritma Klasifikasi” Syntax: Jurnal Informatika Vol. 10, No. 01, pp. 57-67, 2021.
APWG, “Phising Activity Trends Report, 2 Quarter 2021”, 2021.
APWG, “Phising Activity Trends Report, 4th Quarter 2019”, 2021.
Sunge, Aswan Supriyadi, “Optimasi Algoritma C4.5 Dalam Prediksi Web Phising Menggunakan Seleksi Fitur Genetic Algoritma”, Paradigma – Jurnal Komputer dan Informatika, 2018.
Susanto, Bekti Maryuni, “Binary Logistic Regression Untuk Mendeteksi Website Phising Menggunakan Correlation-Based Feature Selection”, Jurnal Teknolog Informasi dan Terapan, 2019.
Nanda, Eza, Istikomah, Nurindah A.Amar, Yoga Pristyanto, “Perbandingan Klasifikasi Algoritma K-Nn, Neural Network, Naïve Bayes, C 4.5 Untuk Mendeteksi Web Phising”, FAHMA Vol.16, No 3, 2018.
Purwiantono, Febry Eka, “Model Klasifikasi Untuk Deteksi Situs Phising”, Surabaya: Institut Teknologi Sepuluh Nopember, 2017.
Kurniawan, Fransiska Amalia, Adiwijawa, dan Angelina Prima Kurniati, “Analisis Dan Implementasi Random Forest Dan Classification Dan Regression Tree (Cart) Untuk Klasifikasi Pada Misuse Intrusion Detection System”, Telkom University, 2011.
Anwari, Husnul dan Java Creativity, Website Hantu, Jakarta: Elex Media Komputindo, 2011.
Han, Jiawei, Micheline Kamber, Jian Pei, Data Mining Concepts and Techniques, Waltham: Elsevier, 2011.
Moedjahedy, Jimmy H, Arief Setyanto, Komang Aryasa, “Analisis Perbandingan Korelasi Spearman Dan Maximal Information Coefficient Dalam Seleksi Fitur Website Phising Menggunakan Algoritma Machine Learning”, CSRID Journal, Vol. 12, 2020.
Haryati Wibowo, Mia, Nur Fatimah, “Ancaman Phising Terhadap Pengguna Sosial Media Dalam Dunia Cyber Crime”, JOEITC (Jurnal of Education and Information Communication Technology) Volume 1, Nomor 1,pp. 1-5, 2017
Salim, Tomy, Yo Ceng Giap, “Data Mining Identifikasi Website Phising Menggunakan Algoritma C4.5”, Jurnal TAM (Technology Acceptance Model) Volume 8, Hal. 130-135, 2017.
Arum Sari, Yuita, Ratih Kartika Dewi, Chastine Fatichah, “Seleksi Fitur Menggunakan Ekstraksi Fitur Bentuk, Warna, Dan Tekstur Dalam Sistem Temu Kembali Citra Daun”, JUTI Volume 12, Nomor 1, pp. 1-8, 2014.
Nugroho, Yusuf Sulistyo, Nova Emiliyawati, “Sistem Klasifikasi Variabel Tingkat Penerimaan Konsumen Terhadap Mobil Menggunakan Metode Random Forest”, Jurnal Teknik Elektro, 2017.
DOI: https://doi.org/10.32520/stmsi.v12i1.1832
Article Metrics
Abstract view : 1489 timesPDF - 523 times
Refbacks
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.