New Memory Architecture for Deep Learning Applications
Recently, a number of neural network processors have been developed to efficiently process deep neural networks. Most these processors include a memory system with a large capacity of on-chip SRAMs as well as high-bandwidth off-chip DRAMs. In order to fully utilize the hardware resources of neural processors, efficient access of both on-chip SRAMs and off-chip DRAMs is essential. This tutorial presents traditional and state-of-art optimization techniques for memory access for neural network applications. State-of-art neural processors are introduced and common characteristics of the memory systems are explained. The optimization techniques for the data access of an on-chip SRAM are introduced. The scheduling, parallelization and data allocation of various deep learning algorithms are presented and the pros and cons of optimizations for a given memory system are explained. Additional data optimization for efficient access of an off-chip DRAM is explained in the next. To this end, the basic characteristics of a DRAM organization are introduced and then the data access scheduling for efficient DRAM access is explained. As the last subject of this tutorial, future memory systems are introduced. Processing-in-Memory (PIM) and Approximate Memory (AM) architecture are briefly introduced and data optimizations for PIM and AM in deep neural processing are presented. A SCM (Storage Class Memory), a potential new memory hierarchy, is introduced and data access techniques for them are also presented.