Deteksi Phishing Website Menggunakan Machine Learning Metode Klasifikasi

Azzam Fawwaz Mahmud, Setia Wirawan

Abstract


Phishing website merupakan mekanisme kriminal yang menggunakan social engineering serta dalih teknis untuk mengambil data identitas personal dan kredensial akun keuangan dari pelanggan. Di Indonesia sendiri menurut laporan Pengelola Nama Domain Internet Indonesia (Pandi), tercatat jumlah phishing dalam kurun waktu 5 tahun terakhir mencapai 34.622. Jumlah serangan phishing unik yang dilaporkan pada Q3 2022 sebanyak 7.988. Penelitian ini bertujuan untuk mencari algoritma machine learning klasifikasi dengan performa terbaik untuk mendeteksi phishing website menggunakan fitur-fitur URL. Algoritma klasifikasi yang akan dibandingkan adalah decision tree, random forest, dan KNN. Hasil dari penelitian ini adalah model pertama yang menggunakan decision tree didapat akurasi sebesar 0.833, presisi sebesar 0.86, recall sebesar 0.83, dan F1-score sebesar 0.83. Model kedua yang menggunakan algoritma random forest mendapat akurasi sebesar 0.834, presisi sebesar 0.86, recall sebesar 0.83, dan F1-score sebesar 0.83. Model terakhir yang menggunakan algoritma K-Nearest Neighbors mendapat akurasi sebesar 0.482, presisi sebesar 0.24, recall sebesar 0.50, dan F1-score sebesar 0.48. Maka, dari ketiga algoritma tersebut random forest merupakan algoritma terbaik untuk mendeteksi phishing website.

Full Text:

PDF

References


APJII, “APJII (Indonesia Association Internet Services Organizer) report on 2019-2020 [Q2]”, Jul. 2020. https://www.infotek.id/licenses/survey_apjii_2020/Survei_APJII_2019-2020_Q2.pdf [Diakses pada 9 Juli 2023].

APWG, “APWG (Anti-Phishing Working Group) Phishing Activity Trends Report”, 1st Quarter 2021. https://docs.apwg.org/reports/apwg_trends_report_q1_2021.pdf [Diakses pada 9 Juli 2023].

R. Putra Ramadhan and T. Desyani, “Implementasi Algoritma J48 untuk Identifikasi Website Phishing”, Biner, vol. 1, no. 2, pp. 46–54, Jun. 2023.

Grand View Research, “Cyber Security Market Size and Share | Industry Report, 2019-2025,” Grandviewresearch.com, 2019. https://www.grandviewresearch.com/industry-analysis/cyber-security-market [Diakses pada 10 Juli 2023].

S. Badillo et al., “An Introduction to Machine Learning,” Clinical Pharmacology & Therapeutics, vol. 107, no. 4, pp. 871–885, Mar. 2020, doi: https://doi.org/10.1002/cpt.1796.

A. K. Jain and B. B. Gupta, “A machine learning based approach for phishing detection using hyperlinks information,” Journal of Ambient Intelligence and Humanized Computing, vol. 10, no. 5, pp. 2015–2028, Apr. 2018, doi: https://doi.org/10.1007/s12652-018-0798-z.

M. Korkmaz, O. K. Sahingoz, and B. Diri, “Detection of Phishing Websites by Using Machine Learning-Based URL Analysis,” 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Jul. 2020, doi: https://doi.org/10.1109/icccnt49239.2020.9225561.

M. Abutaha, M. Ababneh, K. A. Mahmoud, and Sherenaz W. Al-Haj Baddar, “URL Phishing Detection using Machine Learning Techniques based on URLs Lexical Analysis,” 2021 12th International Conference on Information and Communication Systems (ICICS), May 2021, doi: https://doi.org/10.1109/icics52457.2021.9464539.

O. K. Sahingoz, E. Buber, O. Demir, and B. Diri, “Machine learning based phishing detection from URLs,” Expert Systems with Applications, vol. 117, pp. 345–357, Mar. 2019, doi: https://doi.org/10.1016/j.eswa.2018.09.029.

A. Safi and S. Singh, “A systematic literature review on phishing website detection techniques,” Journal of King Saud University - Computer and Information Sciences, Jan. 2023, doi: https://doi.org/10.1016/j.jksuci.2023.01.004.

S. Shabudin, N. Samsiah, K. Akram, and M. Aliff, “Feature Selection for Phishing Website Classification,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 4, 2020, doi: https://doi.org/10.14569/ijacsa.2020.0110477.

M. Hasnain, M. F. Pasha, I. Ghani, M. Imran, M. Y. Alzahrani, and R. Budiarto, “Evaluating Trust Prediction and Confusion Matrix Measures for Web Services Ranking,” IEEE Access, vol. 8, pp. 90847–90861, 2020, doi: https://doi.org/10.1109/access.2020.2994222.

F. Rahmad, Y. Suryanto, and K. Ramli, “Performance Comparison of Anti-Spam Technology Using Confusion Matrix Classification,” IOP Conference Series: Materials Science and Engineering, vol. 879, p. 012076, Aug. 2020, doi: https://doi.org/10.1088/1757-899x/879/1/012076.

J. A. Mat Jizat, A. P.P. Abdul Majeed, A. F. Ab. Nasir, Z. Taha, and E. Yuen, “Evaluation of the machine learning classifier in wafer defects classification,” ICT Express, May 2021, doi: https://doi.org/10.1016/j.icte.2021.04.007.

I. Muraina, “Ideal dataset splitting ratios in machine learning algorithms: general concerns for data scientists and data analysts”, 7th International Mardin Artuklu Scientific Research Conference, pp. 496-504, Feb. 2022.




DOI: https://doi.org/10.32520/stmsi.v13i4.3456

Article Metrics

Abstract view : 837 times
PDF - 382 times

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.