A Review of Missing Data Handling Techniques for Machine Learning

Authors

  • Luke Oluwaseye Joel Institute for Intelligent Systems University of Johannesburg, Johannesburg, South Africa
  • Wesley Doorsamy Institute for Intelligent Systems University of Johannesburg, Johannesburg, South Africa
  • Babu Sena Paul Institute for Intelligent Systems University of Johannesburg, Johannesburg, South Africa

DOI:

https://doi.org/10.15157/IJITIS.2022.5.3.971-1005

Keywords:

Machine learning, Missing Data, Data, Data Techniques, Classification model

Abstract

Real-world data are commonly known to contain missing values, and consequently affect the performance of most machine learning algorithms adversely when employed on such datasets. Precisely, missing values are among the various challenges occurring in real-world data. Since the accuracy and efficiency of machine learning models depend on the quality of the data used, there is a need for data analysts and researchers working with data, to seek out some relevant techniques that can be used to handle these inescapable missing values. This paper reviews some state-of-art practices obtained in the literature for handling missing data problems for machine learning. It lists some evaluation metrics used in measuring the performance of these techniques. This study tries to put these techniques and evaluation metrics in clear terms, followed by some mathematical equations. Furthermore, some recommendations to consider when dealing with missing data handling techniques were provided.

Downloads

Published

2022-09-09

How to Cite

Oluwaseye Joel, L., Doorsamy, W., & Sena Paul, B. (2022). A Review of Missing Data Handling Techniques for Machine Learning. International Journal of Innovative Technology and Interdisciplinary Sciences, 5(3), 971–1005. https://doi.org/10.15157/IJITIS.2022.5.3.971-1005