Appropriately modeling multivariate data is an important and present topic in statistics. Bayesian Networks, following Pearl (1988), have become a useful and intuitive tool to chal-lenge this task. A common approach is to model each conditional density as a conditional normal density receiving a Linear Gaussian Bayesian Network (Koller and Friedman, 2009), which then represents a multivariate Gaussian distribution. This automatically results in disadvantages if the underlying data is not jointly normally distributed.
In this thesis we will present a solution approach to this shortfall where we will model the conditional density of each node as a D-vine on the set of its parents following the work of Kraus and Czado (2017). We illustrate this approach with the biological data set from Sachs et al. (2005) describing different levels of phosphoproteins and phospholipids in individual cells. We allow for different marginals and copulas while modeling and compare the results to a Linear Gaussian Bayesian Network. We compare the models using different goodness of fit measures and their ability to recreate the original data. Further, we analyze how the conditional density of each node, given its parents, behaves when conditioning on specific values.
«
Appropriately modeling multivariate data is an important and present topic in statistics. Bayesian Networks, following Pearl (1988), have become a useful and intuitive tool to chal-lenge this task. A common approach is to model each conditional density as a conditional normal density receiving a Linear Gaussian Bayesian Network (Koller and Friedman, 2009), which then represents a multivariate Gaussian distribution. This automatically results in disadvantages if the underlying data is not jointly...
»