Performance Evaluation of Parametric and Non-Parametric Machine Learning Models using Statistical Analysis for RT-IoT2022 Dataset: PARAMETRIC AND NON-PARAMETRIC MACHINE LEARNING MODELS

Sharmila B S; Nandini B M; Kavitha S S; Anand Srivatsa

doi:10.56042/jsir.v83i8.7437

Authors

Sharmila B S Department of Electronics and Communication Engineering, The National Institute of Engineering, Mysuru 570 008, Karnataka, India
Nandini B M Department of Information Science and Engineering, The National Institute of Engineering, Mysuru 570 008, Karnataka, India
Kavitha S S Department of Electronics and Communication Engineering, The National Institute of Engineering, Mysuru 570 008, Karnataka, India
Anand Srivatsa Department of Electronics and Communication Engineering, The National Institute of Engineering, Mysuru 570 008, Karnataka, India

DOI:

https://doi.org/10.56042/jsir.v83i8.7437

Keywords:

Dataset, Feature extraction, IDS, Internet of things, Machine learning

Abstract

With the vast growth of the Internet of Things (IoT) applications, the number of devices connected to the IoT is increasing exponentially. On the other hand, cybercriminals are generating sophisticated new cyber attacks to exploit IoT devices. However, conventional Intrusion Detection Systems (IDS) that rely on an alert-based approach fail to detect these novel attacks. Machine Learning (ML) based IDS has the potential to spot even small mutations and new threats. This study investigates the statistical tests like Kolmogorov-Smirnov (KS) test, skewness test, kurtosis test, Pearson’s Correlation Coefficient (PCC) and Information Gain Ratio techniques on the recently introduced RT-IoT2022 dataset. The aim is to determine the optimal machine learning algorithms for detecting vulnerabilities within this dataset. The results of skewness and kurtosis tests identify the features having outliers and the KS test indicates that the proposed dataset exhibits non-parametric characteristics. Subsequently, Pearson’s Correlation Coefficient (PCC) and Information Gain Ratio techniques are applied to analyze the correlation between features and the categories of the target attacks. Further, parametric and non-parametric ML models are tested to validate the results of statistical tests. With the non-parametric Decision Tree algorithm achieving the highest accuracy of 99.85% among all other models, we conclude that non-parametric ML models are optimal for detecting mutant vulnerabilities.