Feature Selection for Biomedical Data Classification: Statistical vs. Swarm Intelligence Methods
BIOMEDICAL DATA CLASSIFICATION — STATISTICAL VS. SWARM INTELLIGENCE METHODS
DOI:
https://doi.org/10.56042/jsir.v84i6.13842Keywords:
Biomedical data classification, Feature selection, Machine learning, Swarm intelligenceAbstract
Applying machine learning methods to large datasets with numerous features presents challenges in terms of training time and model complexity. Feature selection is crucial for reducing data dimensions, improving classification accuracy, and optimizing model interpretability. This study aims to enhance the classification of integrated biomedical data to identify thrombophilia diagnosis. The dataset consists of 71 features from 35 women (22 healthy, 13 with thrombophilia), and three classification algorithms (K Nearest Neighbors, Random Forest, Support Vector Machine) are used to evaluate model performance. Identifying key features related to thrombophilia diagnosisis performed using both filter methods and wrapper methods based on swarm intelligence algorithms. Those methods are analyzed and compared as potential approaches for the feature selection process. The wrapper method outperformed the filter methods for clinical and biological data, achieving a classification accuracy of 0.97 compared to 0.91, while selecting only 4 key features compared to 10. For demographic data, both methods produced the same classification accuracy (0.83), but the wrapper method reduced the number of features. These findings demonstrate that wrapper methods based on swarm intelligence algorithms improve model performance and facilitate more efficient data management, which holds significant practical applications for thrombophilia diagnostics. Additionally, the study highlights the advantage of applying the Bat Algorithm in the feature selection process for thrombophilia prediction, contributing to both the novelty and utility of the approach.