Learning From Few Cyber-Attacks: Addressing the Class Imbalance Problem in Machine Learning-Based Intrusion Detection in Software-Defined Networking

Seyed Mohammad Hadi Mirsadeghi, Hayretdin Bahsi, Risto Vaarandi, Wissem Inoubli

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

The class imbalance problem negatively impacts learning algorithms' performance in minority classes which may constitute more severe attacks than the majority ones. This study investigates the benefits of balancing strategies and imbalanced learning approaches on intrusion data from Software Defined Networking (SDN). Although the research community has covered the imbalance problem in machine learning-based intrusion detection, addressing this problem in SDN is novel and powerful. Addressing the class imbalance problem over InSDN (the only publicly available SDN intrusion detection dataset as of recent) is of significant impact on future research in the area of intrusion detection in SDN. We address the class imbalance problem through data-level and classifier-level techniques. Our research objective is to determine suitable methods of addressing the class imbalance problem in machine learning-based intrusion detection in SDN. We propose custom deep learning architectures based on GANs and Siamese Neural Networks for generative modeling and similarity-based intrusion detection. This paper provides benchmarking results from classification with Random Oversampling (ROS), SMOTE, GANs, weighted Random Forest, and Siamese-based one-shot learning. We have found that Random Forest (RF) outperforms deep learning models in the classification of minority class instances. This supports the notion that RF can handle class imbalance well. We also observe that widely-used balancing techniques, ROS and SMOTE, drastically decrease the False Positive Rate (FPR) but increase the False Negative Rate (FNR) in the classification of minority classes. Conclusively, while data-level methods improve classification performance over deep learning models, they, in fact, degrade RF's performance, i.e. cause higher numbers of false predictions. Therefore, RF does not need additional balancing strategies to get higher performance. Although this work addresses the class imbalance problem in SDN intrusion data, it provides a well-designed benchmark that can be exemplary for any network intrusion detection data. Thus, it may have a significant impact on future studies in this respective domain.

Original languageEnglish (US)
Pages (from-to)140428-140442
Number of pages15
JournalIEEE Access
Volume11
DOIs
StatePublished - 2023
Externally publishedYes

Keywords

  • Class imbalance problem
  • cyber intrusion detection
  • deep learning
  • machine learning
  • software-defined networking

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Learning From Few Cyber-Attacks: Addressing the Class Imbalance Problem in Machine Learning-Based Intrusion Detection in Software-Defined Networking'. Together they form a unique fingerprint.

Cite this