Air Quality Index Classification for Imbalanced Data using Machine Learning Approach

Bryan Valentino Jayadi, Manatap Dolok Lauro, Zyad Rusdi, Teny Handhayani


Air pollution is one of the problems in society. Air pollutions affect human health and environment. In Indonesia, air quality index is measured by the level of particulate matter 10 (PM10), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3), and nitrogen dioxide (NO2). This research is conducted to evaluate the performance of machine learning algorithms, e.g., Support Vector Machine (SVM), Naïve Bayes, Logistic Regression, Decision Tree, and AdaBoost, to classify air quality index based on the level of PM10, CO, SO2, O3, and NO2 with imbalanced samples. The air quality index is classified into Good, Moderate, and Unhealthy. The dataset is downloaded from Open Data Jakarta from 2010 -2021. The data containing 4383 samples consist of 1155 samples of Good, 3087 samples of Moderate, and 141 samples of Unhealthy. The experimental results show that Decision Tree outperforms other methods. Decision Tree produces accuracy, precision, recall, and F1-score of 99%, 98%, 99%, and 98%, respectively.

Full Text:



B. Ritz, B. Hoffmann, and A. Peters, “The Effects of Fine Dust, Ozone, and Nitrogen Dioxide on Health,” Dtsch Arztebl Int, Dec. 2019, doi: 10.3238/arztebl.2019.0881.

H. Chen et al., “Effects of air pollution on human health–Mechanistic evidence suggested by in vitro and in vivo modelling,” Environ Res, vol. 212, p. 113378, Sep. 2022, doi: 10.1016/j.envres.2022.113378.

P. Mannucci and M. Franchini, “Health Effects of Ambient Air Pollution in Developing Countries,” Int J Environ Res Public Health, vol. 14, no. 9, p. 1048, Sep. 2017, doi: 10.3390/ijerph14091048.

W. Wei and Z. Wang, “Impact of Industrial Air Pollution on Agricultural Production,” Atmosphere (Basel), vol. 12, no. 5, p. 639, May 2021, doi: 10.3390/atmos12050639.

B. Cox, A. Gasparrini, B. Catry, F. Fierens, J. Vangronsveld, and T. Nawrot, “Cattle mortality as a sentinel for the effects of ambient air pollution on human health,” Archives of Public Health, vol. 73, no. S1, p. P22, Dec. 2015, doi: 10.1186/2049-3258-73-S1-P22.

B. L. Beaupied et al., “Cows as canaries: The effects of ambient air pollution exposure on milk production and somatic cell count in dairy cows,” Environ Res, vol. 207, p. 112197, May 2022, doi: 10.1016/j.envres.2021.112197.

KLHK, “Indeks Standar Pencemar Udara (ISPU) Sebagai Informasi Mutu Udara Ambien di Indonesia.”

N. A. Istiqomah and N. N. N. Marleni, “Particulate Air Pollution in Indonesia: quality index, characteristic, and source identification,” IOP Conf Ser Earth Environ Sci, vol. 599, no. 1, p. 012084, Nov. 2020, doi: 10.1088/1755-1315/599/1/012084.

T. Handhayani, “An integrated Analysis of Air Pollution and Meteorological Conditions in Jakarta,” Sci Rep, vol. 13, no. 1, p. 5798, Apr. 2023, doi: 10.1038/s41598-023-32817-9.

G. Syuhada et al., “Impacts of Air Pollution on Health and Cost of Illness in Jakarta, Indonesia,” Int J Environ Res Public Health, vol. 20, no. 4, p. 2916, Feb. 2023, doi: 10.3390/ijerph20042916.

M. Rendana and L. N. Komariah, “The Relationship between Air Pollutants and COVID-19 Cases and its Implications for Air Quality in Jakarta, Indonesia,” Jurnal Pengelolaan Sumberdaya Alam dan Lingkungan (Journal of Natural Resources and Environmental Management), vol. 11, no. 1, pp. 93–100, Apr. 2021, doi: 10.29244/jpsl.11.1.93-100.

Y.-C. Liang, Y. Maimury, A. H.-L. Chen, and J. R. C. Juarez, “Machine Learning-based Prediction of Air Quality,” Applied Sciences, vol. 10, no. 24, pp. 1–17, Dec. 2020, doi: 10.3390/app10249151.

M. Méndez, M. G. Merayo, and M. Núñez, “Machine Learning Algorithms to Forecast Air Quality: a survey,” Artif Intell Rev, vol. 56, no. 9, pp. 10031–10066, Sep. 2023, doi: 10.1007/s10462-023-10424-4.

T. Handhayani, I. Lewenusa, D. E. Herwindiati, and J. Hendryli, “A Comparison of LSTM and BiLSTM for Forecasting the Air Pollution Index and Meteorological Conditions in Jakarta,” in 2022 5th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), IEEE, Dec. 2022, pp. 334–339. doi: 10.1109/ISRITI56927.2022.10053078.

N. N. Maltare and S. Vahora, “Air Quality Index PrCediction using Machine Learning for Ahmedabad ity,” Digital Chemical Engineering, vol. 7, pp. 1–9, Jun. 2023, doi: 10.1016/j.dche.2023.100093.

K. Saikiran, G. Lithesh, B. Srinivas, and S. Ashok, “Prediction of Air Quality Index using Supervised Machine Learning Algorithms,” in 2021 2nd International Conference on Advances in Computing, Communication, Embedded and Secure Systems (ACCESS), IEEE, Sep. 2021, pp. 1–4. doi: 10.1109/ACCESS51619.2021.9563323.

Z. Zhao, J. Wu, F. Cai, S. Zhang, and Y.-G. Wang, “A hybrid Deep Learning Framework for Air Quality Prediction with Spatial Autocorrelation during the COVID-19 Pandemic,” Sci Rep, vol. 13, no. 1, pp. 1–17, Jan. 2023, doi: 10.1038/s41598-023-28287-8.

I. I. Ridho and G. Mahalisa, “Analisis Klasifikasi Dataset Indeks Standar Pencemaran Udara (ISPU) di Masa Pandemi menggunakan Algoritma Support Vector Machine (SVM),” Technologia : Jurnal Ilmiah, vol. 14, no. 1, pp. 38–41, Jan. 2023, doi: 10.31602/tji.v14i1.8005.

S. Syihabuddin Azmil Umri, “Analisis dan Komparasi Algoritma Klasifikasi dalam Indeks Pencemaran Udara di DKI Jakarta,” JIKO (Jurnal Informatika dan Komputer), vol. 4, no. 2, pp. 98–104, Aug. 2021, doi: 10.33387/jiko.v4i2.2871.

Y. Freund and R. E. Schapire, “A Decision-Theoretic Generalization of on-line Learning and an Application to Boosting,” J Comput Syst Sci, vol. 55, no. 1, pp. 119–139, Aug. 1997, doi: 10.1006/jcss.1997.1504.

J. Zhu, H. Zou, S. Rosset, and T. Hastie, “Multi-class AdaBoost,” Stat Interface, vol. 2, pp. 349–360, 2009.

H. Chen, S. Hu, R. Hua, and X. Zhao, “Improved Naive Bayes Classification Algorithm for Traffic Risk Management,” EURASIP J Adv Signal Process, vol. 2021, no. 1, pp. 1–12, Dec. 2021, doi: 10.1186/s13634-021-00742-6.

E. Bisong, Building Machine Learning and Deep Learning Models on Google Cloud Platform. Berkeley, CA: Apress, 2019. doi: 10.1007/978-1-4842-4470-8.

E. Makalic and D. F. Schmidt, “Review of Modern Logistic Regression Methods with Application to Small and Medium Sample Size Problems,” in Advances in Artificial Intelligence, AI 2010., Berlin: Springer, 2010, pp. 213–222. doi: 10.1007/978-3-642-17432-2_22.

W. H. Nugroho, S. Handoyo, Y. J. Akri, and A. D. Sulistyono, “Building Multiclass Classification Model of Logistic Regression and Decision Tree using the Chi-Square Test for Variable Selection Method,” Journal of Hunan University Natural Sciences, vol. 49, no. 4, pp. 172–181, Apr. 2022, doi: 10.55463/issn.1674-2974.49.4.17.

L. Rokach and O. Maimon, “Decision Trees,” in Data Mining and Knowledge Discovery Handbook, New York: Springer-Verlag, 2005, pp. 165–192. doi: 10.1007/0-387-25465-X_9.

I. Jenhani, N. Ben Amor, and Z. Elouedi, “Decision Trees as Possibilistic Classifiers,” International Journal of Approximate Reasoning, vol. 48, no. 3, pp. 784–807, Aug. 2008, doi: 10.1016/j.ijar.2007.12.002.

Y. Zhang, “Support Vector Machine Classification Algorithm and Its Application,” 2012, pp. 179–186. doi: 10.1007/978-3-642-34041-3_27.

J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. doi: 10.1017/CBO9780511809682.

T. Handhayani, A. H. Pawening, and J. Hendryli, “An Automatic Recognition System for Digital Collections of Indonesian Traditional Houses using Convolutional Neural Networks for Cultural Heritage Preservation,” Int J Comput Intell Appl, vol. 22, no. 02, Jun. 2023, doi: 10.1142/S1469026823500037.

K. M. Ting, “Confusion Matrix,” in Encyclopedia of Machine Learning, Boston, MA: Springer US, 2011, pp. 209–209. doi: 10.1007/978-0-387-30164-8_157.

F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2826–2830, 2011.


Article Metrics

Abstract view : 117 times
PDF - 49 times


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.