zur Startseite

Bachelorarbeit

Distributed Graph Partitioning for Large-scale Graph Analytics
Bearbeiter Lukas Rieger
Betreuer Dipl.-Inf. Christian Mayer
Dr. rer. nat. Muhammad Adnan Tariq
Prüfer Prof. Dr. rer. nat. Dr. h. c. Kurt Rothermel
Ende2016/12/01
Beschreibung

The trend for Big Data analytics continues to attract academia and industry to find feasible solutions for the challenge of processing huge data sets. Large companies such as Google, Facebook and Yahoo are seeking data analysts that are able to extract value out of web, user or financial data.

Because of the size and complexity of the data, data analytics are usually performed in the cloud (and hence in large data centers) and distributed across many computers in order to parallelize execution and improve latency perceived by the data analysts. An enormous amount of data comes in the form of interconnected graph data (e.g. the WWW of linked websites or social networks of linked users). To analyze these billion-scale graphs in parallel, distributed graph processing systems such as Pregel [1], GraphLab [2], and PowerGraph [3] have emerged in the last years. These frameworks parallelize graph computation by dividing the graph onto different machines using a graph partitioning algorithm. To minimize information exchange across machines during graph processing, an effective partitioning algorithm of the graph is vital.

Goal and Tasks

In this thesis, a distributed partitioning algorithm should be developed addressing real-world graphs with billions of edges. In particular, the goals and tasks of this work are the following:

  1. Develop and implement a distributed vertex-cut graph partitioning algorithm especially designed for processing very large graphs.
  2. Evaluate the partitioning algorithm on a computing cluster and compare it with state-of-the-art graph partitioning benchmarks.

The results have to be presented and documented in a written report (English or German language).

Bibliography

[1] Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed
graph-parallel computation on natural graphs. In OSDI, volume 12, page 2, 2012.
[2] Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein.
Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the
VLDB Endowment, 5(8):716-727, 2012.
[3] Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and
Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM
SIGMOD International Conference on Management of data, pages 135-146. ACM, 2010.