Missing values imputation tool using imputex algorithm
https://doi.org/10.54596/2958-0048-2024-4-195-203
Abstract
Missing data is a prevalent issue affecting data quality across numerous fields. One frequent challenge arises when data is lost during the input stage. Numerous studies have proposed methods to impute missing values for data across multiple fields. However, certain domains present unique challenges due to the involvement of attributes from multiple scientific disciplines, such as biology, chemistry, and medical which complicates the imputation process. The purpose of this study is to design an application that addresses missing values and maintains accuracy in large datasets, with a focus on minimizing processing time. The application's performance is evaluated based on classification accuracy using various imputation methods. The proposed application outperforms performance compared to current software tools such as against R package, Statistical Package for the Social Sciences (SPSS), Stata, and Microsoft Excel. This study helps to improve data quality and contributes to data science by improving the data cleaning procedure, which is a step in the data pre-processing stage.
About the Authors
Fatimah SidiMalaysia
Corresponding author, PhD, Associate Professor, Department of Computer Science, Faculty
of Computer Science and Information Technology
Serdang, Selangor
Lili Nurliyana Abdullah
Malaysia
PhD, Associate Professor, Department of Mulitimedia, Faculty of Computer
Science and Information Technology
Serdang, Selangor
Mustafa Alabadla
Malaysia
PhD Candidate, Department of Computer Science, Faculty of Computer Science and
Information Technology
Serdang, Selangor
Iskandar Ishak
Malaysia
PhD, Associate Professor, Department of Computer Science, Faculty of Computer
Science and Information Technology
Serdang, Selangor
References
1. Phung, S., Kumar, A., & Kim, J. (2019). A deep learning technique for imputing missing healthcare data. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 6513-6516. https://doi.org/10.1109/EMBC.2019.8856760
2. Deb, R., & Liew, A.W.C. (2016). Missing value imputation for the analysis of incomplete traffic accident data. Information Sciences, 339, 274-289. https://doi.org/10.1016/i.ins.2016.01.018
3. Dhindsa, K., Bhandari, M., & Sonnadara, R.R. (2018). What’s holding up the big data revolution in healthcare? BMJ (Online), 363, 1-2. https://doi.org/10.1136/bmi.k5357
4. Tsai, C.F., & Chang, F.Y. (2016). Combining instance selection for better missing value imputation. Journal of Systems and Software, 122, 63-71. https://doi.org/10.1016/i.iss.2016.08.093
5. Janssen, M., van der Voort, H., & Wahyudi, A. (2017). Factors influencing big data decision-making quality. Journal of Business Study, 70, 338-345. https://doi.org/10.1016/i.ibusres.2016.08.007
6. Batra, S., Khurana, R., Khan, M.Z., Boulila, W., Koubaa, A., & Srivastava, P. (2022). A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records. Entropy, 24(4), 1 -20. https://doi.ore/10.3390/e24040533
7. Chen, Z., Tan, S., Chajewska, U., Rudin, C., & Caruana, R. (2023). Missing Values and Imputation in Healthcare Data: Can Interpretable Machine Learning Help? Proceedings of Machine Learning Research, 209, 86-99.
8. Feng, S., Hategeka, C., & Grepin, K.A. (2021). Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic. Population Health Metrics, 19(1), 1-28. https://doi.org/10.1186/s12963-021-00274-z
9. Urda, D., Subirats, J.L., Garria-Laencina, P.J., Franco, L., Sancho-Gomez, J.L., & Jerez, J.M. (2012). WIMP: Web server tool for missing data imputation. Computer Methods and Programs in Biomedicine, 108(3), 1247-1254. https://doi.org/10.1016/i.cmpb.2012.08.006
10. Acampora, G., Vitiello, A., & Siciliano, R. (2020). MIDA: A web tool for missing data imputation based on a boosted and incremental learning algorithm. IEEE International Conference on Fuzzy Systems, 1-6. https://doi.org/10.1109/FUZZ48607.2020.9177644
11. Zhou, Y.H., & Saghapour, E. (2021). ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data. Frontiers in Genetics, 12(July), 1-9. https://doi.org/10.3389/fgene.2021.691274
12. Elfadaly, F.G., Adamson, A., Patel, J., Potts, L., Potts, J., Blangiardo, M., Thompson, J., & Minelli, C. (2021). BIMAM - A tool for imputing variables missing across datasets using a Bayesian imputation and analysis model. International Journal of Epidemiology, 50(5), 1419-1425. https://doi.org/10.1093/iie/dyab177
13. Alabadla, M., Sidi, F., Ishak, I., Ibrahim, H., & Hamdan, H. (2022). ExtraImpute: A Novel Machine Learning Method for Missing Data Imputation. Journal of Advances in Information Technology, 13(5). https://doi.org/10.12720/iait.13.5.470-476
14. Alabadla, M., Sidi, F., Ishak, I., Ibrahim, H., Hamdan, H., Amir, S. I., Nurlankyzy, A.Y. (2023). AutoImpute: An Autonomous Web Tool for Data Imputation Based on Extremely Randomized Trees. In Proceedings of the 12th International Conference on Data Science, Technology and Applications (DATA2023), (Italy, Rome), 11-13 July 2023. Volume 1, pp 598-605.
15. Jabason, E., Ahmad, M.O., & Swamy, M.N.S. (2018). Missing Structural and Clinical Features Imputation for Semi-supervised Alzheimer’s Disease Classification using Stacked Sparse Autoencoder. 2018 IEEE Biomedical Circuits and Systems Conference, BioCAS 2018 - Proceedings, 1-4. https://doi.org/10.1109/BIOCAS.2018.8584844
Review
For citations:
Sidi F., Abdullah L.N., Alabadla M., Ishak I. Missing values imputation tool using imputex algorithm. Vestnik of M. Kozybayev North Kazakhstan University. 2024;(4 (64)):195-203. https://doi.org/10.54596/2958-0048-2024-4-195-203