Memory Access Optimization for Neural Network Processing
Deep learning applications demand a large amount of data movement between processors and memory devices. To reduce the time and power consumption for data movement, extensive research has been carried out to develop new memory architecture suitable for deep learning. This talk introduces new trends of memory architecture for deep learning applications. Among them, Processing-near-Memory (PNM) and Approximate Memory (AM) architectures attract wide attention. PNM architecture is used to reduce data movement by placing computation near DRAM devices. On the other hand, AM architecture attempts to reduce the precision of deep learning data, and consequently, to reduce memory traffic. AM is especially suitable for deep learning applications of which accuracy may not be degraded significantly even with a loss of data precision. Recent developments in PNM and AM architectures are briefed and then data access optimizations for these memory architectures are explained. A proposal of a new memory architecture combining the two architectures is presented. The new memory architecture is simulated by modifying a GPU simulator and its effectiveness is presented with simulation results.