Efficient Processing of Multi-Model Deep Learning Applications


Seok-Bum Ko

University of Saskatchewan

Presentation Menu


With the development of deep learning technologies, deep neural networks are adopted in more fields of applications. Nowadays, cloud servers need to simultaneously process different computation workloads from multiple tenants. In addition, multiple models are applied together to handle complex tasks. Moreover, multi-modal deep learning methods are also widely adopted. As a result, efficient processing of multiple deep learning models become more important. As different models need accelerators with different datapath to achieve their optimal energy efficiency, conventional single model accelerators may not handle multi-model tasks very well. In this lecture, efficient processing technologies for multi-model deep learning applications will be discussed. First, the current single model accelerators are briefly discussed. Then, the current inter-model parallelization strategy, the computing engine microarchitecture design, and the model scheduling algorithms available in the literature are going to be reviewed. Finally, challenges and future trends for multi-model processing will be discussed.