Estimation of missing values is an essential step in data pre-processing to increase the data quality for further data mining approaches. The significance of estimation of missing values in industrial data sets is that different operational situations cannot be describe properly while data sets includes missing values. In this paper, Expectation Conditional Maximization is used to find an approximated model over the data based on Gaussian distribution. Then, in the Expectation step, Sweep operation is used to obtain the regression model of missing values on observable values and estimate the missing values based on observable data. In order to evaluate the results a process data set for film production is considered. The missing values are simulated by randomly removing the data from variables. Finally, the accuracy of the proposed method in estimation of missing values is discussed as well as the effect of imputation of missing values on further data analysis.
«
Estimation of missing values is an essential step in data pre-processing to increase the data quality for further data mining approaches. The significance of estimation of missing values in industrial data sets is that different operational situations cannot be describe properly while data sets includes missing values. In this paper, Expectation Conditional Maximization is used to find an approximated model over the data based on Gaussian distribution. Then, in the Expectation step, Sweep operat...
»