Original Research Article Enhancing Missing Data Imputation with Improved DAE Training and Input Recombination

Yean Chung Liu; Yen-Liang Chen

doi:10.9734/jamcs/2024/v39i61905

Original Research Article Enhancing Missing Data Imputation with Improved DAE Training and Input Recombination

Full Article - PDF Review History

Published: 2024-06-11

DOI: 10.9734/jamcs/2024/v39i61905

Page: 101-127

Issue: 2024 - Volume 39 [Issue 6]

Yean Chung Liu *

Department of Information Management, National Central University, Chung-Li, 32001, R.O.C, Taiwan.

Yen-Liang Chen

Department of Information Management, National Central University, Chung-Li, 32001, R.O.C, Taiwan.

*Author to whom correspondence should be addressed.

Abstract

The increasing significance of addressing data loss, has led to a heightened focus on missing data imputation (MDI). Autoencoder (AE) models, renowned for their ability to autonomously learn and impute missing data, are gaining prominence in MDI. These models exhibit adaptability to diverse datasets, and their unsupervised nature makes them robust in handling data lacking clear labels. This study aims to explore the scope and objectives of AE training, which encompass critical elements such as optimization algorithms, loss functions, and training epochs. We specifically investigate the impact of updating input data during AE training, a topic that has been insufficiently explored in existing research. Traditionally, AEs are trained on the original data, assuming it contains latent information. However, in the context of MDI, where data may be corrupted, it becomes imperative to evaluate whether updating input data can lead to superior results. The objective of this research is to introduce and evaluate two methods inspired by Gradient Boosting Machines: Short-Term Reconstruction with Iterative Updates (STR-IU) and Long-Term Reconstruction with a Single Update (LTR-SU). We utilize Denoising Autoencoder (DAE) models and examine how various optimization mechanisms affect our proposed methods. We conduct comparisons between Stochastic Gradient Descent (SGD) and the Adam optimization algorithm, and transform three distinct datasets into synthetic datasets with varying levels of missing data (5%, 15%, 25%). The results indicate that, while performance may not consistently excel across all training epoch settings, there is a noticeable overall improvement when updating input data, whether using SGD or Adam. Additionally, LTR-SU outperforms STR-IU, and models with DAE using SGD exhibit greater optimization compared to those using Adam.

Keywords: Missing data imputation, denoising Autoencoder, updating input data

How to Cite

Liu, Yean Chung, and Yen-Liang Chen. 2024. “Original Research Article Enhancing Missing Data Imputation With Improved DAE Training and Input Recombination”. Journal of Advances in Mathematics and Computer Science 39 (6):101-27. https://doi.org/10.9734/jamcs/2024/v39i61905.

Downloads

Download data is not yet available.