In many cases, linear or generalized linear regression is an adequate method for describing an association between one variable with another. A major assumption when using linear models is that the error term is independently distributed. But in practice there are lots of examples where this assumption is not fulfilled. An example may be a comparison of results of a math test, performed in different schools. When trying to model the results by several variables like "sex" or "belonging to a minority", it is not clear that all observations are independent of each other. For instance, assume there is a good teacher for one class. Pupils from that class may
achieve better results on average than pupils from another class with a bad teacher. Another example is given by longitudinal data, i.e. a growth process. Here one has to take into account that there may be different prerequisites of the subjects and hence different behaviour may be observed. For both examples, using linear regression will probably not lead to good results. A reason is that the observed data is not independent. Here, linear mixed models come into play. They loosen the assumptions for the error term, allowing a dependency structure for those. Linear mixed models allow grouping of data into clusters as well
as grouping data by connecting observations at different points in time to one single subject. Thereby, different behaviour of clusters or subjects, so called cluster-specific or individual-specific effects, can be included into the model. Hence in many cases where evidence for such effects is given, linear or generalized linear mixed models will yield better results than linear models or generalized linear models, respectively. The approach is that cluster-specific or individual-specific effects will not be modelled by fixed parameters as for other covariates, but as a random variable. Thus, a major advance of linear mixed models is that they can be still used when there are only few observations compared to the number of parameters to be estimated. By assumption these so called random effects follow a normal distribution with mean zero. Hence, trends inducted by all observations are included in the parameters for the so called fixed effects. Different variances in each cluster or for each individual allow including special properties of each cluster or individual, respectively. By using a distribution instead of parameters for random effects, estimation of parameters is slightly different than for linear models. Afterwards, not only the distribution of the random errors has to be checked but the one of random effects as well, in order to confirm model assumptions. An application of linear mixed models is given in the second part of this thesis. Driving behaviour of households, more precisely the mean covered distance per day, has been modelled by covariates such as horsepower or mean price per litre fuel. Since data had been collected from all over Germany, it has been clustered to pay attention to different infrastructural and cultural aspects of the dataset. A result of analysis has been that a linear mixed model indeed fit the data better than a linear model.
«
In many cases, linear or generalized linear regression is an adequate method for describing an association between one variable with another. A major assumption when using linear models is that the error term is independently distributed. But in practice there are lots of examples where this assumption is not fulfilled. An example may be a comparison of results of a math test, performed in different schools. When trying to model the results by several variables like "sex" or "belonging to a mino...
»