Feature Selection for Biomedical Data Classification: Statistical vs. Swarm Intelligence Methods: BIOMEDICAL DATA CLASSIFICATION — STATISTICAL VS. SWARM INTELLIGENCE METHODS

Ulfeta Marovac; Aldina Avdic; Irfan Fetahović; Lejlija Memić; Nataša Đorđević; Zana Dolićanin; Goran Babić

doi:10.56042/jsir.v84i6.13842

Authors

Ulfeta Marovac Department of Technical and Technological Sciences, State University of Novi Pazar, Vuka Karadžića 9, 36300 Novi Pazar, Serbia
Aldina Avdić Department of Technical and Technological Sciences, State University of Novi Pazar, Vuka Karadžića 9, 36300 Novi Pazar, Serbia
Irfan Fetahović Department of Technical and Technological Sciences, State University of Novi Pazar, Vuka Karadžića 9, 36300 Novi Pazar, Serbia
Lejlija Memić Department of Technical and Technological Sciences, State University of Novi Pazar, Vuka Karadžića 9, 36300 Novi Pazar, Serbia
Nataša Đorđević Department of Natural and Mathematical Sciences, State University of Novi Pazar, Vuka Karadžića 9, 36300 Novi Pazar, Serbia
Zana Dolićanin Department of Biomedical Sciences, State University of Novi Pazar, Vuka Karadžića 9, 36300 Novi Pazar, Serbia
Goran Babić Faculty of Medical Sciences, University of Kragujevac, Svetozara Markovića 69, 34000 Kragujevac, Serbia

DOI:

https://doi.org/10.56042/jsir.v84i6.13842

Keywords:

Biomedical data classification, Feature selection, Machine learning, Swarm intelligence

Abstract

Applying machine learning methods to large datasets with numerous features presents challenges in terms of training time and model complexity. Feature selection is crucial for reducing data dimensions, improving classification accuracy, and optimizing model interpretability. This study aims to enhance the classification of integrated biomedical data to identify thrombophilia diagnosis. The dataset consists of 71 features from 35 women (22 healthy, 13 with thrombophilia), and three classification algorithms (K Nearest Neighbors, Random Forest, Support Vector Machine) are used to evaluate model performance. Identifying key features related to thrombophilia diagnosisis performed using both filter methods and wrapper methods based on swarm intelligence algorithms. Those methods are analyzed and compared as potential approaches for the feature selection process. The wrapper method outperformed the filter methods for clinical and biological data, achieving a classification accuracy of 0.97 compared to 0.91, while selecting only 4 key features compared to 10. For demographic data, both methods produced the same classification accuracy (0.83), but the wrapper method reduced the number of features. These findings demonstrate that wrapper methods based on swarm intelligence algorithms improve model performance and facilitate more efficient data management, which holds significant practical applications for thrombophilia diagnostics. Additionally, the study highlights the advantage of applying the Bat Algorithm in the feature selection process for thrombophilia prediction, contributing to both the novelty and utility of the approach.