Preview

Vestnik of M. Kozybayev North Kazakhstan University

Advanced search

Missing values imputation tool using imputex algorithm

https://doi.org/10.54596/2958-0048-2024-4-195-203

Abstract

Missing data is a prevalent issue affecting data quality across numerous fields. One frequent challenge arises when data is lost during the input stage. Numerous studies have proposed methods to impute missing values for data across multiple fields. However, certain domains present unique challenges due to the involvement of attributes from multiple scientific disciplines, such as biology, chemistry, and medical which complicates the imputation process. The purpose of this study is to design an application that addresses missing values and maintains accuracy in large datasets, with a focus on minimizing processing time. The application's performance is evaluated based on classification accuracy using various imputation methods. The proposed application outperforms performance compared to current software tools such as against R package, Statistical Package for the Social Sciences (SPSS), Stata, and Microsoft Excel. This study helps to improve data quality and contributes to data science by improving the data cleaning procedure, which is a step in the data pre-processing stage.

About the Authors

Fatimah Sidi
Universiti Putra Malaysia
Malaysia

Corresponding author, PhD, Associate Professor, Department of Computer Science, Faculty 
of Computer Science and Information Technology

Serdang, Selangor 



Lili Nurliyana Abdullah
Universiti Putra Malaysia
Malaysia

PhD, Associate Professor, Department of Mulitimedia, Faculty of Computer 
Science and Information Technology

Serdang, Selangor 



Mustafa Alabadla
Universiti Putra Malaysia
Malaysia

PhD Candidate, Department of Computer Science, Faculty of Computer Science and 
Information Technology

Serdang, Selangor 



Iskandar Ishak
Universiti Putra Malaysia
Malaysia

PhD, Associate Professor, Department of Computer Science, Faculty of Computer 
Science and Information Technology

Serdang, Selangor 



References

1. Phung, S., Kumar, A., & Kim, J. (2019). A deep learning technique for imputing missing healthcare data. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 6513-6516. https://doi.org/10.1109/EMBC.2019.8856760

2. Deb, R., & Liew, A.W.C. (2016). Missing value imputation for the analysis of incomplete traffic accident data. Information Sciences, 339, 274-289. https://doi.org/10.1016/i.ins.2016.01.018

3. Dhindsa, K., Bhandari, M., & Sonnadara, R.R. (2018). What’s holding up the big data revolution in healthcare? BMJ (Online), 363, 1-2. https://doi.org/10.1136/bmi.k5357

4. Tsai, C.F., & Chang, F.Y. (2016). Combining instance selection for better missing value imputation. Journal of Systems and Software, 122, 63-71. https://doi.org/10.1016/i.iss.2016.08.093

5. Janssen, M., van der Voort, H., & Wahyudi, A. (2017). Factors influencing big data decision-making quality. Journal of Business Study, 70, 338-345. https://doi.org/10.1016/i.ibusres.2016.08.007

6. Batra, S., Khurana, R., Khan, M.Z., Boulila, W., Koubaa, A., & Srivastava, P. (2022). A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records. Entropy, 24(4), 1 -20. https://doi.ore/10.3390/e24040533

7. Chen, Z., Tan, S., Chajewska, U., Rudin, C., & Caruana, R. (2023). Missing Values and Imputation in Healthcare Data: Can Interpretable Machine Learning Help? Proceedings of Machine Learning Research, 209, 86-99.

8. Feng, S., Hategeka, C., & Grepin, K.A. (2021). Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic. Population Health Metrics, 19(1), 1-28. https://doi.org/10.1186/s12963-021-00274-z

9. Urda, D., Subirats, J.L., Garria-Laencina, P.J., Franco, L., Sancho-Gomez, J.L., & Jerez, J.M. (2012). WIMP: Web server tool for missing data imputation. Computer Methods and Programs in Biomedicine, 108(3), 1247-1254. https://doi.org/10.1016/i.cmpb.2012.08.006

10. Acampora, G., Vitiello, A., & Siciliano, R. (2020). MIDA: A web tool for missing data imputation based on a boosted and incremental learning algorithm. IEEE International Conference on Fuzzy Systems, 1-6. https://doi.org/10.1109/FUZZ48607.2020.9177644

11. Zhou, Y.H., & Saghapour, E. (2021). ImputEHR: A Visualization Tool of Imputation for the Prediction of Biomedical Data. Frontiers in Genetics, 12(July), 1-9. https://doi.org/10.3389/fgene.2021.691274

12. Elfadaly, F.G., Adamson, A., Patel, J., Potts, L., Potts, J., Blangiardo, M., Thompson, J., & Minelli, C. (2021). BIMAM - A tool for imputing variables missing across datasets using a Bayesian imputation and analysis model. International Journal of Epidemiology, 50(5), 1419-1425. https://doi.org/10.1093/iie/dyab177

13. Alabadla, M., Sidi, F., Ishak, I., Ibrahim, H., & Hamdan, H. (2022). ExtraImpute: A Novel Machine Learning Method for Missing Data Imputation. Journal of Advances in Information Technology, 13(5). https://doi.org/10.12720/iait.13.5.470-476

14. Alabadla, M., Sidi, F., Ishak, I., Ibrahim, H., Hamdan, H., Amir, S. I., Nurlankyzy, A.Y. (2023). AutoImpute: An Autonomous Web Tool for Data Imputation Based on Extremely Randomized Trees. In Proceedings of the 12th International Conference on Data Science, Technology and Applications (DATA2023), (Italy, Rome), 11-13 July 2023. Volume 1, pp 598-605.

15. Jabason, E., Ahmad, M.O., & Swamy, M.N.S. (2018). Missing Structural and Clinical Features Imputation for Semi-supervised Alzheimer’s Disease Classification using Stacked Sparse Autoencoder. 2018 IEEE Biomedical Circuits and Systems Conference, BioCAS 2018 - Proceedings, 1-4. https://doi.org/10.1109/BIOCAS.2018.8584844


Review

For citations:


Sidi F., Abdullah L.N., Alabadla M., Ishak I. Missing values imputation tool using imputex algorithm. Vestnik of M. Kozybayev North Kazakhstan University. 2024;(4 (64)):195-203. https://doi.org/10.54596/2958-0048-2024-4-195-203

Views: 89


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2958-003X (Print)
ISSN 2958-0048 (Online)