Analisa Performa Penggunaan Feature Selection untuk Mendeteksi Intrusion Detection Systems dengan  Algoritma Random Forest Classifier

Setiawan Budiman; Andi Sunyoto; Asro Nasiri

doi:10.32520/stmsi.v10i3.1550

Analisa Performa Penggunaan Feature Selection untuk Mendeteksi Intrusion Detection Systems dengan Algoritma Random Forest Classifier

Setiawan Budiman, Andi Sunyoto, Asro Nasiri

Abstract

Abstrak

Semakin penting koneksi data melalui Internet membuat kebutuhan akan keamanan jaringan data semakin meningkat. Salah satu tools yang penting adalah Intrusion detection systems (IDS). Salah satu hal yang menjadi masalah dari penggunaan IDS adalah performan kecepatan untuk mendeteksi data yang semakin banyak dalam waktu yang singkat. Dalam penelitian ini kami akan melakukan analisa perbandingan performa IDS menggunakan features selection dengan algoritma Random Forest Classifier yang disimulasikan pada dataset UNSW-NB15, yaitu dataset simulasi serangan pada jaringan network yang dikembangan oleh Nour Moustafa & Jill Slay dari University of New South Wales pada Australian Defence Force Academy. Tujuan dari penelitian ini adalah mempercepat waktu proses Intrusion detection systems dengan machile learning. Penelitian dilakukan dengan 2 tahap, yaitu tahap pertama tanpa features selection dan tahap kedua dengan features selection ExtraTreesClassifier. Masing-masing tahap dilakukan dengan beberapa kali pengujian dengan persentasi testing dan training data yang berbeda. Hasil penelitian menunjukan bahwa penggunaan features selection dapat mempercepat waktu proses pendeteksian dengan menggunakan Random Forest Classifier, walaupun ada sedikit penurun akurasi dibawah 1%.

Kata kunci: feature selection, random forest, ids, machine learning

Abstract

Internet data connection is very important, therefore it will increasing the security issues. One of the important tools is Intrusion detection systems (IDS). The main problems of using IDS is the speed performance to detect more and more data in a short time. In this study, we will perform a comparative analysis of IDS performance using features selection with the Random Forest Classifier algorithm which is simulated on the UNSW-NB15 dataset, which is work as the attack simulation dataset on the network developed by Nour Moustafa & Jill Slay from the University of New South Wales at the Australian Defense Force Academy. The purpose of this research is to speed up the processing time of Intrusion detection systems with machile learning. The research was conducted in 2 stages, the first stage without features selection and the second stage with features selection. Each stage is carried out with several study using different percentages of testing and training data. The results showed that by using features selection, it can speed up the detection process time using the Random Forest Classifier, although there is a slight decrease in accuracy below 1%.

Keywords: feature selection, random forest, ids, machine learning

Full Text:

PDF

References

I. Sumaiya Thaseen, J. Saira Banu, K. Lavanya, M. Rukunuddin Ghalib, and K. Abhishek, “An integrated intrusion detection system using correlation-based attribute selection and artificial neural network,” Trans. Emerg. Telecommun. Technol., vol. 32, no. 2, pp. 1–15, 2021, doi: 10.1002/ett.4014.

N. Moustafa and J. Slay, “UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set),” 2015 Mil. Commun. Inf. Syst. Conf. MilCIS 2015 - Proc., no. November, 2015, doi: 10.1109/MilCIS.2015.7348942.

P. A. A. Resende and A. C. Drummond, “A survey of random forest based methods for intrusion detection systems,” ACM Comput. Surv., vol. 51, no. 3, 2018, doi: 10.1145/3178582.

S. M. Kasongo and Y. Sun, “Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset,” J. Big Data, vol. 7, no. 1, 2020, doi: 10.1186/s40537-020-00379-6.

M. Rai and H. L. Mandoria, “Network Intrusion Detection: A comparative study using state-of-the-art machine learning methods,” IEEE Int. Conf. Issues Challenges Intell. Comput. Tech. ICICT 2019, pp. 0–4, 2019, doi: 10.1109/ICICT46931.2019.8977679.

H. Thanh and T. Lang, “Use the ensemble methods when detecting DoS attacks in Network Intrusion Detection Systems,” EAI Endorsed Trans. Context. Syst. Appl., vol. 6, no. 19, p. 163484, 2019, doi: 10.4108/eai.29-11-2019.163484.

A. Dickson and C. Thomas, Analysis of UNSW-NB15 Dataset Using Machine Learning Classifiers, vol. 1366. Springer Singapore, 2021.

D. G. Mogal, S. R. Ghungrad, and B. B. Bhusare, “NIDS using Machine Learning Classifiers on UNSW-NB15 and KDDCUP99 Datasets,” Ijarcce, vol. 6, no. 4, pp. 533–537, 2017, doi: 10.17148/ijarcce.2017.64102.

S. Choudhary and N. Kesswani, “Analysis of KDD-Cup’99, NSL-KDD and UNSW-NB15 Datasets using Deep Learning in IoT,” Procedia Comput. Sci., vol. 167, no. 2019, pp. 1561–1573, 2020, doi: 10.1016/j.procs.2020.03.367.

B. Venkatesh and J. Anuradha, “A review of Feature Selection and its methods,” Cybern. Inf. Technol., vol. 19, no. 1, pp. 3–26, 2019, doi: 10.2478/CAIT-2019-0001.

I. Sumaiya Thaseen and C. Aswani Kumar, “Intrusion detection model using fusion of chi-square feature selection and multi class SVM,” J. King Saud Univ. - Comput. Inf. Sci., vol. 29, no. 4, pp. 462–472, 2017, doi: 10.1016/j.jksuci.2015.12.004.

D. B. Satmoko, P. Sukarno, and E. M. Jadied, “Peningkatan Akurasi Pendeteksian Serangan DDoS Menggunakan Multiclassifier Ensemble Learning dan Chi-Square,” vol. 5, no. 3, pp. 7977–7985, 2018.

J. Gu and S. Lu, “An effective intrusion detection approach using SVM with naïve Bayes feature embedding,” Comput. Secur., vol. 103, p. 102158, 2021, doi: 10.1016/j.cose.2020.102158.

T. M. Oshiro, P. S. Perez, and J. A. Baranauskas, “How many trees in a random forest?,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 7376 LNAI, pp. 154–168, 2012, doi: 10.1007/978-3-642-31537-4_13.

K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification,” Augment. Hum. Res., vol. 5, no. 1, 2020, doi: 10.1007/s41133-020-00032-0.

B. P. O. Lovatti, M. H. C. Nascimento, Á. C. Neto, E. V. R. Castro, and P. R. Filgueiras, “Use of Random forest in the identification of important variables,” Microchem. J., vol. 145, no. December 2018, pp. 1129–1134, 2019, doi: 10.1016/j.microc.2018.12.028.

S. K. Lakshmanaprabu, K. Shankar, M. Ilayaraja, A. W. Nasir, V. Vijayakumar, and N. Chilamkurti, “Random forest for big data classification in the internet of things using optimal features,” Int. J. Mach. Learn. Cybern., vol. 10, no. 10, pp. 2609–2618, 2019, doi: 10.1007/s13042-018-00916-z.

Z. Noshad et al., “Fault detection in wireless sensor networks through the random forest classifier,” Sensors (Switzerland), vol. 19, no. 7, pp. 1–21, 2019, doi: 10.3390/s19071568.

DOI: https://doi.org/10.32520/stmsi.v10i3.1550

Article Metrics

Abstract view : 1539 times
PDF - 530 times

Refbacks

There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Username
Password
Remember me