Enhancing Environmental Sound Classification with EcoOptiNet: A CNN-based Approach for Low-Frequency Signals

Kumud Patel; J.P Pandey; Malay Kishore Dutta

doi:10.56042/jsir.v84i10.17030

Authors

Kumud Patel Research Scholar
J.P.Pandey Dr. A.P.J. Abdul Kalam Technical University, Lucknow, Uttar Pradesh, India
Malay Kishore Dutta Amity Centre for Artificial Intelligence University, Noida, Uttar Pradesh, India

DOI:

https://doi.org/10.56042/jsir.v84i10.17030

Keywords:

Acoustic recognition, Deep-learning, Optimization algorithms, Signal processing, Time series analysis

Abstract

Time–frequency analysis is widely used to extract meaningful features from raw signals, but it is particularly challenging for low-frequency, non-stationary audio data. Deep neural networks offer strong feature-learning capabilities, yet their effectiveness depends on optimizers that can manage temporal variability and guide models toward stable convergence.
This research introduces EcoOptiNet, an improved optimization algorithm designed for low-frequency time-series audio signals. EcoOptiNet incorporates bias correction, an adaptive learning rate, and a learning warm-up phase (ζ) to prevent abrupt weight updates during early training. In contrast, a cubed-gradient update strategy enhances learning for
non-stationary signals. Audio signal data is transformed into Mel-spectrograms and delta features, capturing both spectral and temporal characteristics, and a Convolutional Neural Network (CNN) architecture is employed for classification.
The algorithm is evaluated on two benchmark environmental sound datasets using standard splits: ESC-50 with 5-fold
cross-validation and UrbanSound8K with 10-fold cross-validation. Experimental results indicate that EcoOptiNet achieves an average accuracy of 98.83% on ESC-50 and 91% on UrbanSound8K, outperforming commonly used optimizers such as Adam, RMSprop, and SGD, while maintaining low variance across folds. These findings demonstrate that EcoOptiNet provides an efficient and robust approach for optimizing deep neural networks on low-frequency, real-world audio signals. The study highlights the algorithm's ability to reliably extract discriminatory features from challenging datasets, offering a practical solution for environmental sound recognition applications where non-stationarity and low-frequency components can hinder traditional training approaches.

Author Biographies

J.P.Pandey, Dr. A.P.J. Abdul Kalam Technical University, Lucknow, Uttar Pradesh, India

Research Interests: Neural Network Applications in Power Systems, Electric Drive ERP Solution, Automation Process
Malay Kishore Dutta, Amity Centre for Artificial Intelligence University, Noida, Uttar Pradesh, India

Research Interests: Machine Learning, Computer Vision, Image Processing, Advanced Machine Learning, Object Recognition, Feature Selection, Digital Image