Innovations in water quality management using machine learning approaches
DOI:
https://doi.org/10.56042/ijems.v32i05.20867Keywords:
Groundwater contamination, K-fold cross validation, Machine learning models, Regional hydrology, Water quality indexAbstract
Groundwater contamination has been posing a significant threat to sustainable water resource management, particularly in industrialized and urbanized regions. This research has introduced a novel, data-driven framework that integrates machine learning, statistical data analysis, and feature optimization to evaluate and forecast groundwater quality. Analytical results of 488 groundwater samples had been tested, and four feature reduction scenarios had been implemented using Pearson correlation to evaluate predictive performance with minimal input variables. Statistical analysis has highlighted elevated levels of parameters such as Electrical Conductivity, Chloride, Magnesium, and Total Hardness, exceeding permissible limits, and have been causing most samples to be unsuitable for consumption without treatment. To enhance groundwater monitoring and reduce laboratory testing costs, six machine learning algorithms, K-Nearest Neighbors, Support Vector Machine, Decision Tree, Random Forest, XGBoost, and Artificial Neural Network, have been used to predict the Weighted Arithmetic Water Quality Index. Model accuracy had been tested using statistical metrics such as R², RMSE, MAE, MAPE, and CRMSE, with effectiveness assessed using Taylor diagrams. ANN exhibited the highest accuracy even when using a single input (K), while SVM maintained consistent reliability with only two inputs (Mg and K), providing a cost-effective monitoring solution. Validation with 70 independent datasets has confirmed the robustness and applicability of the
suggested methodology. The study has presented an innovative modeling strategy that has substantially decreased laboratory testing needs while preserving predictive reliability. Additionally, it has offered practical implications for scalable, cost-effective deployment in areas with water scarcity or insufficient datasets.