Presentation Type

Processing-in-memory (PIM)-based Manycore Architecture for Training Graph Neural Networks

School of Electrical Engineering & Computer Science Washington State University

Presentation Menu


Graph Neural Networks (GNNs) enable comprehensive predictive analytics over graph structured data. They have become popular in diverse real-world applications. A key challenge in facilitating such analytics is to learn good representations over nodes, edges, and graphs. Unlike traditional Deep Neural Networks (DNNs), which work over regular structures (images or sequences), GNNs operate on graphs. The computations associated with GNN can be divided into two parts: 1) Vertex-centric computations involving trainable weights, like conventional DNNs, and 2) Edge-centric computations, which involve accumulating neighboring vertices information along the edges of the graphs. Hence, GNN training exhibits characteristics of both DNN training, which is compute-intensive, and graph computation that exhibits heavy data exchange. Conventional CPU- or GPU-based systems are not tailor-made for applications that exhibits such trait. This necessitates the development of new and efficient hardware architectures tailored for GNN training/inference. Both the vertex- and edge-centric computations in GNNs can be represented as multiply-and-accumulate (MAC) operations, which can be efficiently implemented using resistive random-access memory or ReRAM-based architectures. In addition, ReRAMs allow for processing in-memory, which helps reduce the amount of communication (data transfers) between computing cores and the main memory. This is particularly useful for GNN training as it involves repeated feature aggregation along the graph edges. The in-memory nature of ReRAM’s computation significantly reduces the on-chip traffic leading to better performance. However, existing ReRAM-based architectures are designed to accelerate specifically either DNNs or graph computations. As GNN training exhibits characteristics of both DNNs and graph computations, these tailor-made architectures are not well suited for efficient GNN training. In this talk we will present design and performance evaluation of a novel ReRAM-based manycore architecture that caters to the specific characteristics exhibited by GNN training