Intelligent Memory for Efficient Hardware Accelerator
Memory architecture and design have been critical tasks to achieve large storage, low latency, fast access time, and energy efficiency, especially for battery-operated devices. The increase of data generated by many devices such as mobile, sensors, communications, and security not only increased the requirements on memory capacity but also introduced challenges on memory access and energy. The memory interface has limited throughput and high latency which has not been scaling at the same rate as data size or processing speed, this limits the performance of accessing the data which refer to as the memory wall. In addition to the negative impact on latency and performance, large data movement results in high energy consumption. Research has been focusing on elevating the memory wall issue by engineering more memory hierarchy and increasing local on-chip memory. This has partially reduced the timing issue but did not address the high leakage and active energy consumption. It is estimated that more than 60% of energy spent on most computing platforms is spent on data movements and memory access. The new era of big data and artificial intelligence-based applications have increased the urgency to solve memory capacity, data movement energy, and memory wall issues. Some solutions have brought processing into centralized cloud computing where high performance and large memory hardware capacity are available. However, this brought a new challenge to communications, privacy, security, and latency, especially for real-time applications. The goal of this lecture is to highlight the after mentioned challenges and to present a new paradigm of computing beyond von Neuman's architecture to enable processing as close to the data source as possible. This includes in-memory computing, near memory computing architecture. Both existing and emerging memory technologies will be explored. Since the new computing paradigm is more data-centric rather than traditional processing-centric, the traditional single architecture for all applications is not feasible but rather a domain-specific architecture and hardware solutions need to be adopted. Popular high computing functions such as Query, MAC, hamming distance, and scale to max will be presented as an example of in-memory hardware accelerators.