E-science communities have put tremendous efforts into providing global access to their distributed scientific data sets to foster data and knowledge sharing. Beyond already existing huge data volumes, the researchers face major challenges in managing the anticipated data deluge of forthcoming projects with expected data rates of several terabytes a day. This thesis presents community-driven data grids which target at domain-specific federations and provide a scalable, distributed, and collaborative data management. Our infrastructure optimizes the overall query throughput by employing dominant data characteristics and query patterns. By combining well-established techniques for data partitioning and replication with P2P technologies, we address several challenging issues: data load balancing, efficient data dissemination and query processing, and the adaption to short-term query hot spots as well as to long-term load redistributions.
«
E-science communities have put tremendous efforts into providing global access to their distributed scientific data sets to foster data and knowledge sharing. Beyond already existing huge data volumes, the researchers face major challenges in managing the anticipated data deluge of forthcoming projects with expected data rates of several terabytes a day. This thesis presents community-driven data grids which target at domain-specific federations and provide a scalable, distributed, and collabora...
»