This thesis describes the approach to a functional and metabolic analysis of biological whole-genome data. It focusses on three fields of biotechnology and bioinformatics: gene expression, protein-protein interactions and metabolic pathways. For each of them, an introduction is given that describes potential benefits of the technologies and that highlights the computational challenges that arise from their analysis. In this thesis, computational analysis methods for the respective data sets have been developed. These bioinformatic methods, the SOM clustering for gene expression data, the graph modeling of protein-protein interactions and the three methods for a dynamic modeling of metabolic pathways prepare the ground for the developed integrative methods. For the further analysis and interpretation of the high-throughput data sets, a knowledge-based integrative analysis approach has been elaborated. The developed combinatorial and integrative analysis methods make use of existent knowledge in order to achieve qualitative and reliable results. Data sets are analyzed in the context of systematic, previously assembled facts, leading to a more holistic view of the subjects of analysis. Protein-protein interaction data is combined with systematic functional annotations, focussing the analysis of the interaction data on a specific biological context. This allows to scale the complexity of the large protein-protein interaction data sets and makes the results comprehensible. Moreover, it allows to hypothesize on the functional context of previously uncharacterized genes and proteins. Biochemical reactions and textbook metabolic pathways are employed for the analysis of clustered gene expression data, which allows to analyze the metabolic properties and the changes in metabolism that have been captured by the respective gene expression experiment. Interesting features like coregulated or conversely regulated pathways are highlighted by the integrative methods. Besides working with the established schemes and categories of textbook metabolic pathways, the elaborated methods allow to construct hypothetical pathways dynamically based on the gene expression profiles. From the structure of a hypothetical pathway, relations between parts of an organism's metabolic network can be inferred that are conceptually distinct in the textbook pathways. A method for the integration of gene expression data with functional annotations has been developed. An expression data set can be analyzed in the context of every of the various categories of a functional classification scheme. This functional projection is capable of identifying functionally related sets of genes that exhibit similar, correlated or anti-correlated expression profiles. Cellular processes that are co-ordinately switched on or off during a biological experiment are revealed. The relation between the experiment, e.g. a systematic variation of environmental conditions, and the genetic response of the analyzed organism becomes obvious. The analysis of overlapping groups of functionally related genes reveals how the genes of different functional categories relate, highlighting a larger biological context. The combination of the functional projection with the metabolic analysis methods allows to further investigate the identified co-regulated gene groups with a specific focus on aspects of intermediary metabolism. The developed integrative methods are generically applicable to other types of high-throughput data, e.g. protein compexes, and for other systematically annotated facts about genes and proteins, e.g. subcellular localization, mutant phenotypes, protein classes, PROSITE motifs, signal transduction pathways and regulatory pathways. The developed bioinformatic methods are compared with other approaches to a combinatorial and integrative analysis of whole-genome data sets. The benefit and the great potential of integrative methods is pointed out: the combination of different types of data can lead significally towards a true systems biology that may allow the understanding and simulation of reasonably complex cellular processes or even whole cells.
«
This thesis describes the approach to a functional and metabolic analysis of biological whole-genome data. It focusses on three fields of biotechnology and bioinformatics: gene expression, protein-protein interactions and metabolic pathways. For each of them, an introduction is given that describes potential benefits of the technologies and that highlights the computational challenges that arise from their analysis. In this thesis, computational analysis methods for the respective data sets have...
»