Abstract:
The analysis of large networks with up to billions of entities (e.g., the Internet, biochemical pathways, or huge social structures) is often based on abstract representations. Density-based clustering is an important technique for deriving these abstractions.
In this thesis we dicuss the complexity of finding density-based substructures in large-scale networks, represented as simple, undirected graphs. The main focus is on the analysis of a corresponding fixed parameter dense subgraph problem. We derive a classification for the decision problem that decides whether a given graph contains a subgraph on exactly k vertices and a given minimum number of edges (calculated by some fixed function f of k). Based on the magnitude of function f we state an upper bound for membership in the class of polynomially solvable problems and a lower bound for completeness in NP.
A large fraction of real world networks can be sufficiently described in terms of so called power-law graphs, which are characterized by their typical degree sequence. Based on this property, we also discuss the complexity of the above decision problem, when restricted to this class of input graphs. Similarly to the previous classification, we derive corresponding bounds.
Finally, we investigate the approximability of an underlying optimization problem. Further, in order to outline some possibilities to overcome the poor best known approximation ratio, we discuss some commonly used heuristics for networks representing the hyperlink structure of the World Wide Web.