Advances in Design and Implementation of End-to-End Learned Image and Video Compression

National Yang Ming Chiao Tung University

Presentation Menu


The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265/266, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades. The arrival of deep learning recently spurred a new wave of developments in end-to-end learned image and video compression. The seminal work by Balle et al. connects the learning of an image compression system to learning a variational generative model, known as the variational autoencoder (VAE), opening up a new direction for constructing high-efficiency image/video coding systems based on advanced deep generative models. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of compression technology achieves comparable or superior compression performance to VVC Intra (the state-of-the-art codec standardized and published in 2020) and has superior subjective quality, especially at the very low bit rates. In this talk, I shall first (1) summarize briefly the progress of this topic in the past 3 or so years. In the second part, I shall (2) introduce the design concepts of VAE-based image compression. The third part switches gear to (3) explore the emerging area of end-to-end learned video compression, which has attracted lots of research interest and publications since the advent of the first such system in 2019. Lastly, I shall (4) address the circuits and systems aspects of learned image/video codecs by exploring recent efforts in creating hardware-friendly, low-complexity models. The talk will be concluded with open issues and recent standardization initiatives by the IEEE and other communities.