Large-scale Graph Analysis: System, Algorithm and Optimization PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download Large-scale Graph Analysis: System, Algorithm and Optimization PDF full book. Access full book title Large-scale Graph Analysis: System, Algorithm and Optimization by Yingxia Shao. Download full books in PDF and EPUB format.

Large-scale Graph Analysis: System, Algorithm and Optimization

Large-scale Graph Analysis: System, Algorithm and Optimization PDF Author: Yingxia Shao
Publisher: Springer Nature
ISBN: 9811539286
Category : Computers
Languages : en
Pages : 154

Book Description
This book introduces readers to a workload-aware methodology for large-scale graph algorithm optimization in graph-computing systems, and proposes several optimization techniques that can enable these systems to handle advanced graph algorithms efficiently. More concretely, it proposes a workload-aware cost model to guide the development of high-performance algorithms. On the basis of the cost model, the book subsequently presents a system-level optimization resulting in a partition-aware graph-computing engine, PAGE. In addition, it presents three efficient and scalable advanced graph algorithms – the subgraph enumeration, cohesive subgraph detection, and graph extraction algorithms. This book offers a valuable reference guide for junior researchers, covering the latest advances in large-scale graph analysis; and for senior researchers, sharing state-of-the-art solutions based on advanced graph algorithms. In addition, all readers will find a workload-aware methodology for designing efficient large-scale graph algorithms.

Large-scale Graph Analysis: System, Algorithm and Optimization

Large-scale Graph Analysis: System, Algorithm and Optimization PDF Author: Yingxia Shao
Publisher: Springer Nature
ISBN: 9811539286
Category : Computers
Languages : en
Pages : 154

Book Description
This book introduces readers to a workload-aware methodology for large-scale graph algorithm optimization in graph-computing systems, and proposes several optimization techniques that can enable these systems to handle advanced graph algorithms efficiently. More concretely, it proposes a workload-aware cost model to guide the development of high-performance algorithms. On the basis of the cost model, the book subsequently presents a system-level optimization resulting in a partition-aware graph-computing engine, PAGE. In addition, it presents three efficient and scalable advanced graph algorithms – the subgraph enumeration, cohesive subgraph detection, and graph extraction algorithms. This book offers a valuable reference guide for junior researchers, covering the latest advances in large-scale graph analysis; and for senior researchers, sharing state-of-the-art solutions based on advanced graph algorithms. In addition, all readers will find a workload-aware methodology for designing efficient large-scale graph algorithms.

Large-Scale Graph Processing Using Apache Giraph

Large-Scale Graph Processing Using Apache Giraph PDF Author: Sherif Sakr
Publisher: Springer
ISBN: 3319474316
Category : Computers
Languages : en
Pages : 214

Book Description
This book takes its reader on a journey through Apache Giraph, a popular distributed graph processing platform designed to bring the power of big data processing to graph data. Designed as a step-by-step self-study guide for everyone interested in large-scale graph processing, it describes the fundamental abstractions of the system, its programming models and various techniques for using the system to process graph data at scale, including the implementation of several popular and advanced graph analytics algorithms. The book is organized as follows: Chapter 1 starts by providing a general background of the big data phenomenon and a general introduction to the Apache Giraph system, its abstraction, programming model and design architecture. Next, chapter 2 focuses on Giraph as a platform and how to use it. Based on a sample job, even more advanced topics like monitoring the Giraph application lifecycle and different methods for monitoring Giraph jobs are explained. Chapter 3 then provides an introduction to Giraph programming, introduces the basic Giraph graph model and explains how to write Giraph programs. In turn, Chapter 4 discusses in detail the implementation of some popular graph algorithms including PageRank, connected components, shortest paths and triangle closing. Chapter 5 focuses on advanced Giraph programming, discussing common Giraph algorithmic optimizations, tunable Giraph configurations that determine the system’s utilization of the underlying resources, and how to write a custom graph input and output format. Lastly, chapter 6 highlights two systems that have been introduced to tackle the challenge of large scale graph processing, GraphX and GraphLab, and explains the main commonalities and differences between these systems and Apache Giraph. This book serves as an essential reference guide for students, researchers and practitioners in the domain of large scale graph processing. It offers step-by-step guidance, with several code examples and the complete source code available in the related github repository. Students will find a comprehensive introduction to and hands-on practice with tackling large scale graph processing problems using the Apache Giraph system, while researchers will discover thorough coverage of the emerging and ongoing advancements in big graph processing systems.

Power-constrained Performance Optimization of GPU Graph Traversal

Power-constrained Performance Optimization of GPU Graph Traversal PDF Author: Adam Thomas McLaughlin
Publisher:
ISBN:
Category : Graph algorithms
Languages : en
Pages :

Book Description
Graph traversal represents an important class of graph algorithms that is the nucleus of many large scale graph analytics applications. While improving the performance of such algorithms using GPUs has received attention, understanding and managing performance under power constraints has not yet received similar attention. This thesis first explores the power and performance characteristics of breadth first search (BFS) via measurements on a commodity GPU. We utilize this analysis to address the problem of minimizing execution time below a predefined power limit or power cap exposing key relationships between graph properties and power consumption. We modify the firmware on a commodity GPU to measure power usage and use the GPU as an experimental system to evaluate future architectural enhancements for the optimization of graph algorithms. Specifically, we propose and evaluate power management algorithms that scale i) the GPU frequency or ii) the number of active GPU compute units for a diverse set of real-world and synthetic graphs. Compared to scaling either frequency or compute units individually, our proposed schemes reduce execution time by an average of 18.64% by adjusting the configuration based on the inter- and intra-graph characteristics.

Practical Graph Analytics with Apache Giraph

Practical Graph Analytics with Apache Giraph PDF Author: Roman Shaposhnik
Publisher: Apress
ISBN: 1484212517
Category : Computers
Languages : en
Pages : 320

Book Description
Practical Graph Analytics with Apache Giraph helps you build data mining and machine learning applications using the Apache Foundation’s Giraph framework for graph processing. This is the same framework as used by Facebook, Google, and other social media analytics operations to derive business value from vast amounts of interconnected data points. Graphs arise in a wealth of data scenarios and describe the connections that are naturally formed in both digital and real worlds. Examples of such connections abound in online social networks such as Facebook and Twitter, among users who rate movies from services like Netflix and Amazon Prime, and are useful even in the context of biological networks for scientific research. Whether in the context of business or science, viewing data as connected adds value by increasing the amount of information available to be drawn from that data and put to use in generating new revenue or scientific opportunities. Apache Giraph offers a simple yet flexible programming model targeted to graph algorithms and designed to scale easily to accommodate massive amounts of data. Originally developed at Yahoo!, Giraph is now a top top-level project at the Apache Foundation, and it enlists contributors from companies such as Facebook, LinkedIn, and Twitter. Practical Graph Analytics with Apache Giraph brings the power of Apache Giraph to you, showing how to harness the power of graph processing for your own data by building sophisticated graph analytics applications using the very same framework that is relied upon by some of the largest players in the industry today.

Massive Graph Analytics

Massive Graph Analytics PDF Author: David A. Bader
Publisher: CRC Press
ISBN: 1000538613
Category : Business & Economics
Languages : en
Pages : 632

Book Description
"Graphs. Such a simple idea. Map a problem onto a graph then solve it by searching over the graph or by exploring the structure of the graph. What could be easier? Turns out, however, that working with graphs is a vast and complex field. Keeping up is challenging. To help keep up, you just need an editor who knows most people working with graphs, and have that editor gather nearly 70 researchers to summarize their work with graphs. The result is the book Massive Graph Analytics." — Timothy G. Mattson, Senior Principal Engineer, Intel Corp Expertise in massive-scale graph analytics is key for solving real-world grand challenges from healthcare to sustainability to detecting insider threats, cyber defense, and more. This book provides a comprehensive introduction to massive graph analytics, featuring contributions from thought leaders across academia, industry, and government. Massive Graph Analytics will be beneficial to students, researchers, and practitioners in academia, national laboratories, and industry who wish to learn about the state-of-the-art algorithms, models, frameworks, and software in massive-scale graph analytics.

Systems for Big Graph Analytics

Systems for Big Graph Analytics PDF Author: Da Yan
Publisher: Springer
ISBN: 3319582178
Category : Computers
Languages : en
Pages : 93

Book Description
There has been a surging interest in developing systems for analyzing big graphs generated by real applications, such as online social networks and knowledge graphs. This book aims to help readers get familiar with the computation models of various graph processing systems with minimal time investment. This book is organized into three parts, addressing three popular computation models for big graph analytics: think-like-a-vertex, think-likea- graph, and think-like-a-matrix. While vertex-centric systems have gained great popularity, the latter two models are currently being actively studied to solve graph problems that cannot be efficiently solved in vertex-centric model, and are the promising next-generation models for big graph analytics. For each part, the authors introduce the state-of-the-art systems, emphasizing on both their technical novelties and hands-on experiences of using them. The systems introduced include Giraph, Pregel+, Blogel, GraphLab, CraphChi, X-Stream, Quegel, SystemML, etc. Readers will learn how to design graph algorithms in various graph analytics systems, and how to choose the most appropriate system for a particular application at hand. The target audience for this book include beginners who are interested in using a big graph analytics system, and students, researchers and practitioners who would like to build their own graph analytics systems with new features.

Improving Distributed Graph Processing by Load Balancing and Redundancy Reduction

Improving Distributed Graph Processing by Load Balancing and Redundancy Reduction PDF Author: Shuang Song (Ph. D.)
Publisher:
ISBN:
Category :
Languages : en
Pages : 294

Book Description
The amount of data generated every day is growing exponentially in the big data era. A significant portion of this data is stored as graphs in various domains, such as online retail and social networks. Analyzing large-scale graphs provides important insights that are highly utilized in many areas, such as recommendation systems, banking systems, and medical diagnosis. To accommodate analysis on large-scale graphs, developers from industry and academia design the distributed graph processing systems. However, processing graphs in a distributed manner suffers performance inefficiencies caused by workload imbalance and redundant computations. For instance, while data centers are trending towards a large amount of heterogeneous processing machines, graph partitioners still operate under the assumption of uniform computing resources. This leads to load imbalance that degrades the overall performance. Even with a balanced data distribution, the irregularity of graph applications can result in different amounts of dynamic operations on each machine in the cluster. Such imbalanced work distribution slows down the execution speed. Besides these, redundancy also impacts the performance of distributed graph analysis. To utilize the available parallelism of computing clusters, distributed graph systems deploy the repeated-relaxing computation model (e.g., Bellman-Ford algorithm variants) rather than in a sequential but work-optimal order. Studies performed in this dissertation show that redundant computations pervasively exist and significantly impact the performance efficiency of distributed graph processing. This dissertation explores novel techniques to reduce the workload imbalance and redundant computations of analyzing large-scale graphs in a distributed setup. It evaluates proposed techniques on both pre-processing and execution modules to enable fair data distribution, lightweight workload balancing, and redundancy optimization for future distributed graph processing systems. The first contribution of this dissertation is the Heterogeneity-aware Partitioning (HAP) that aims to balance load distribution of distributed graph processing in heterogeneous clusters. HAP proposes a number of methodologies to estimate various machines’ computational power on graph analytics. It also extends several state-of-the-art partitioning algorithms for heterogeneity-aware data distribution. Utilizing the estimation of machines’ graph processing capability to guide extended partitioning algorithms can reduce load imbalance when processing a large-scale graph in heterogeneous clusters. This results in significant performance improvement. Another contribution of the dissertation is the Hula, which optimizes the workload balance of distributed graph analytics on the fly. Hula offers a hybrid graph partitioning algorithm to split a large-scale graph in a locality-friendly manner and generate metadata for lightweight dynamic workload balancing. To track machines’ work intensity, Hula inserts hardware timers to count the time spent on the important operations (e.g., computational operations and atomic operations). This information can guide Hula’s workload scheduler to arrange work migration. With the support of metadata generated by the hybrid partitioner, Hula’s migration scheme only requires a minimal amount of data to transfer work between machines in the cluster. Hula’s workload balancing design achieves a lightweight imbalance reduction on the fly. Finally, this dissertation focuses on improving the computational efficiency of distributed graph processing. To do so, it reveals the root cause and the amount of redundant computations in distributed graph processing. SLFE is proposed as a system solution to reduce these redundant operations. SLFE develops a lightweight pre-processing technique to obtain the maximum propagation order of each vertex in a given graph. This information is defined as Redundancy Reduction Guidance (RRG) and is utilized by SLFE’s Redundancy Reduction (RR)-aware computing model to prune redundant operations on the fly. Moreover, SLFE provides RRaware APIs to maintain high promgrammablity. These techniques allow the redundancy optimizations of distributed graph processing to become transparent to users

Graphs, Algorithms, and Optimization

Graphs, Algorithms, and Optimization PDF Author: William Kocay
Publisher: CRC Press
ISBN: 135198912X
Category : Mathematics
Languages : en
Pages : 504

Book Description
Graph theory offers a rich source of problems and techniques for programming and data structure development, as well as for understanding computing theory, including NP-Completeness and polynomial reduction. A comprehensive text, Graphs, Algorithms, and Optimization features clear exposition on modern algorithmic graph theory presented in a rigorous yet approachable way. The book covers major areas of graph theory including discrete optimization and its connection to graph algorithms. The authors explore surface topology from an intuitive point of view and include detailed discussions on linear programming that emphasize graph theory problems useful in mathematics and computer science. Many algorithms are provided along with the data structure needed to program the algorithms efficiently. The book also provides coverage on algorithm complexity and efficiency, NP-completeness, linear optimization, and linear programming and its relationship to graph algorithms. Written in an accessible and informal style, this work covers nearly all areas of graph theory. Graphs, Algorithms, and Optimization provides a modern discussion of graph theory applicable to mathematics, computer science, and crossover applications.

The Design and Implementation of Large-scale Graph Analysis Language

The Design and Implementation of Large-scale Graph Analysis Language PDF Author: Ji Won Seo
Publisher:
ISBN:
Category :
Languages : en
Pages :

Book Description
With the rise of worldwide social networking services, large-scale graph analysis has become important. Because of a lack of high-level programming models, MapReduce or Pregel are used for large-scale graph analysis, but require much effort to implement even simple analyses. For greater ease of use and efficiency, we propose SociaLite, a high-level graph query language based on Datalog. As a logic programming language, Datalog allows many graph algorithms to be expressed succinctly. However, its performance has not been competitive when compared to imperative programming languages. With SociaLite, users can provide simple annotations to the data layout and evaluation order; they can also define recursive aggregate functions, which can be evaluated incrementally and efficiently. Moreover, SociaLite is extended to provide high-level abstractions for distributed computation. These extensions allow users to simply annotate how data is to be distributed; then SociaLite compiler automatically generates parallel code for a cluster of multi-core machines. The evaluation of recursive aggregate functions is optimized with a delta-stepping technique on a distributed cluster. In addition, approximate computation is supported in SociaLite, allowing users to trade off accuracy for less execution time and storage space. We evaluated SociaLite with core graph algorithms, including shortest paths and PageRank, that are commonly used in graph analyses. With its optimizations to Datalog, SociaLite is shown to be much faster than Datalog, and as fast as highly optimized Java. When evaluated on a multi-core machine, SociaLite programs scaled linearly up to 16 cores, except for the shortest paths program which showed a speedup of 10 on 16 cores. In our experiment with 64 Amazon EC2 instances, SociaLite programs tracked the ideal weak scaling curve within a factor of two. Compared to Giraph, an open-source version of Pregel, SociaLite programs are 4 to 12 times faster across benchmark algorithms and 22 times more succinct on average. We also extensively evaluated state-of-the-art graph systems, GraphLab, Combinatorial BLAS, as well as Giraph, and compared their programmability and performance with SociaLite. From the evaluation, we found that SociaLite provides the simplest programming model; for BFS (Breadth First Search) algorithm, the SociaLite program is 10 times more succinct than the equivalent programs in other systems. SociaLite also demonstrated competitive performance; among the compared systems, SociaLite is second fastest after Combinatorial BLAS. Most importantly, as a high-level query language, SociaLite makes it easy for users with little programming background to write many sophisticated graph applications, that are automatically optimized to run on a cluster of machines efficiently.

Frontiers in Massive Data Analysis

Frontiers in Massive Data Analysis PDF Author: National Research Council
Publisher: National Academies Press
ISBN: 0309287812
Category : Mathematics
Languages : en
Pages : 191

Book Description
Data mining of massive data sets is transforming the way we think about crisis response, marketing, entertainment, cybersecurity and national intelligence. Collections of documents, images, videos, and networks are being thought of not merely as bit strings to be stored, indexed, and retrieved, but as potential sources of discovery and knowledge, requiring sophisticated analysis techniques that go far beyond classical indexing and keyword counting, aiming to find relational and semantic interpretations of the phenomena underlying the data. Frontiers in Massive Data Analysis examines the frontier of analyzing massive amounts of data, whether in a static database or streaming through a system. Data at that scale-terabytes and petabytes-is increasingly common in science (e.g., particle physics, remote sensing, genomics), Internet commerce, business analytics, national security, communications, and elsewhere. The tools that work to infer knowledge from data at smaller scales do not necessarily work, or work well, at such massive scale. New tools, skills, and approaches are necessary, and this report identifies many of them, plus promising research directions to explore. Frontiers in Massive Data Analysis discusses pitfalls in trying to infer knowledge from massive data, and it characterizes seven major classes of computation that are common in the analysis of massive data. Overall, this report illustrates the cross-disciplinary knowledge-from computer science, statistics, machine learning, and application disciplines-that must be brought to bear to make useful inferences from massive data.