Generating synthetic data enables research on data, which due to privacy protection must not be published, such as patient data. Synthetic data can also augment existing (real) data, if they are limited. Commonly used methods for synthetic data generation, such as generative adversarial networks (GANs) (Goodfellow et al. (2014)) or variational auto encoders (VAEs) (Kingma and Welling (2013)), are based on neural networks. This makes their training intensive and, especially in the case of GANs, difficult (Arjovsky and Bottou (2017)). We use vine copulas as a synthetic data generator and focus on a setting, where the analysis to be done on true and synthetic data is classification. The synthetic data produced should allow the user to estimate a classification rule, which is similar to the classification rule, that would be estimated on the true data. We compare three different vine estimation methods in a simulation study, as well as on real-data applications in astronomy and cancer genomics. We find, the best vine estimation method depends on the properties of the true data, but the differences are overall not very large.
«
Generating synthetic data enables research on data, which due to privacy protection must not be published, such as patient data. Synthetic data can also augment existing (real) data, if they are limited. Commonly used methods for synthetic data generation, such as generative adversarial networks (GANs) (Goodfellow et al. (2014)) or variational auto encoders (VAEs) (Kingma and Welling (2013)), are based on neural networks. This makes their training intensive and, especially in the case of GANs, d...
»