Circuit Design and Silicon Prototypes for Compute-in-Memory for Deep Learning Inference Engine
Compute-in-memory (CIM) is a new computing paradigm that addresses the memory-wall problem in the deep learning inference engine. SRAM and resistive random-access memory (RRAM) are identified as two promising embedded memories to store the weights of the deep neural network (DNN) models. In this seminar, first I will review the recent progresses of SRAM and RRAM-CIM macros that are integrated with peripheral analog-to-digital converter (ADC). The bit cell variants (e.g., 6T SRAM, 8T SRAM, 1T1R, 2T2R) and array architectures that allow parallel weighted sum are discussed. State-of-the-art silicon prototypes are surveyed with normalized metrics such as energy efficiency (TOPS/W). Second, we will discuss the array-level characterizations of non-ideal device characteristics of RRAM, e.g., the variability and reliability of multilevel states, which may negatively affect the inference accuracy. Third, I will discuss the general challenges in CIM chip design with regards to the imperfect device properties, ADC overhead, and chip to chip variations. Finally, I will discuss future research directions including 2021-2022 CASS Distinguished Lecturer Roster monolithic 3D integration of memory tier on top of the peripheral logic tier, as well enhancing the inference engine’s security against DNN model leaking and reverse engineering.