Image and video coding algorithms create compact representations of an image by exploiting its spatial redundancy and perceptual irrelevance, thus exploiting the characteristics of the human visual system. Recently, data-driven algorithms such as neural networks have attracted a lot of attention and become a popular area of research and development. This interest is driven by several factors, such as recent advances in processing power (cheap and powerful hardware), the availability of large data sets (big data), and several algorithmic and architectural advances (e.g. generative adversarial networks).
Nowadays, neural networks are the state-of-the-art for several computer vision tasks, such as those requiring a high-level understanding of image semantics, e.g. image classification, object segmentation, saliency detection, but also low-level image processing tasks, such as image denoising, inpainting, and super-resolution. These advances have led to an increased interest in applying deep neural networks to image and video coding, which is now the main focus of the JPEG AI and the JVET NN activities within the JPEG and MPEG standardization committees.
The aim of these novel image and video coding solutions is to design a compact representation model that has been obtained (learned) from a large amount of visual data and can efficiently represent the wide variety of images and videos that are consumed today. Some of the available learning-based image coding solutions already show very promising experimental results in terms of rate-distortion (RD) performance, notably in comparison with conventional standard image codecs (especially HEVC Intra and VVC Intra) which code the image data with hand-crafted transforms, entropy coding, and quantization schemes.
This special session on Learning-based Image and Video Coding gathers technical contributions that demonstrate the efficient coding of image and video content based on a learning-based approach. This topic has received many contributions in recent years and is considered critical for the future of both image and video coding, especially solutions adopting end-to-end training as well as for solutions where learning-based tools replace previous conventional tools.