This work describes the integrative application of various bioinformatic approaches to the family of 17beta-hydroxysteroid dehydrogenases. This protein family is involved in the activation and inactivation of steroid hormones by reduction and oxidation at position 17 of the steroid backbone. Several family members are known to be involved in human disorders (e.g. pseudohermaphroditism or Zellweger-like syndrome) and are implicated in cancerogenesis and tumor proliferation. A reliable evolutionary scenario for the large family of HSD-related proteins was constructed by combining structure-based and sequence-based comparisons into a single phylogenetic tree. The tree indicates a fundamental subdivision of the 17beta-HSDs into two major groups. By determining the root of the phylogenetic tree it was possible to evaluate the time course of evolutionary events leading to the diversity 17beta-HSDs. A comprehensive bioinformatic study was performed to assess the physiological function of human 17beta-HSD type 7. Integrating expression data, phylogenetics and structural analyses it was possible to demonstrate that this estrogenic enzyme probably has an ancestral function in cholesterol metabolism, namely the reduction of 3-ketosteroids. Adjacent steps in the pathway are catalysed by proteins that are mutated in developmental defects (CHILD syndrome and CDPX2 syndrome), making 17beta-HSD7 a likely candidate for similar disorders. The combination of protein structure and statistical evaluation enabled the identification of unexpected conserved structural elements in 17beta-HSD type 5 and its relatives. A 3D-model of the enzyme allowed the explanation of the inhibitory capacity of phytoestrogens acting on 17beta-HSD5. The computer algorithm developed for that purpose is universally applicable and might be used for the structural analysis of other large protein families. The techniques necessary for the evolutionary analysis of large sequence datasets were established and validated in a pilot study on paired-box proteins. It was possible to find the ancestors of the paired-box family among Tc1-transposase proteins. It is shown that paired-box proteins originated at the beginning of metazoan evolution by a single fusion event between a transposase DNA binding domain and a homeobox protein. The fusion event was followed by a rapid diversification of the protein family.
«