Evaluation of Predictive Models for Air Quality Index Prediction in an Indian Urban Area
Keywords:
Air Pollution, Air Quality Index, Air Quality Prediction, Machine Learning Models, Random Forest, Linear RegressionAbstract
With rapid urbanization, the air quality standards for cities have deteriorated due to increased emissions. this increased addition of pollutants in the atmosphere severely affects city life. To identify ambient air quality in a city, an air quality index (AQI) number is provided by CAAQMS situated in those cities. This study delves deeper into predicting AQI using machine learning-based models. In this study, the primary data was collected from CPCB for Gorakhpur City Uttar Pradesh, India. Particulate matter (PM10 and PM2.5), SO2, and NO2 were considered as the primary AQI pollutant parameters. This study develops a statistical comparative analysis of two different machine learning models vis a vis linear regression and Random Forest for predicting this AQI. The evaluation metrics used for validating and evaluating the prediction accuracies of models were MAE. MSE. RMSE and R2. Also, it includes statistical metrics such as T-statistics, 95% Confidence intervals, and p-values, for determining the significant difference between the models developed. the value of the R2 matrix for Random Forest (0.99895) was significantly more than the R2 value for Linear Regression (0.91848), indicating high accuracy and low variance of Random Forest in predicting AQI. Also, the Random Forest displayed a higher degree of accuracy than the Linear Regression, as indicated by the higher values of MAE, MSE, and RMSE for the latter. Statistically significant differences between Random Forest and Linear Regression were demonstrated by the t-statistics, p-values, and confidence intervals calculated for MAE, MSE, RMSE, and R2. 95% confidence intervals calculated for all evaluation metrics indicate the higher performance of Random Forest over the Linear Regression model.