Compact Visual Token representation and compression
Presentation Menu
The unprecedented proliferation of Foundation Models (FMs), encompassing Large Language Models (LLMs), Diffusion Models, and MLLMs, has fundamentally transformed multi-modal processing. However, the prohibitive memory and computational costs associated with these transformer-based architectures create significant latency barriers, severely restricting their real-world deployment on edge devices. Unlike conventional model compression which often demands expensive retraining, Token Compression has emerged as a promising, training-efficient paradigm that accelerates inference by exploiting information redundancy within input sequences. This tutorial presents a systematic review of this emerging field, moving beyond traditional weight reduction to focus on data-centric optimization. We first dissect the quadratic complexity of FMs and introduce four progressive compression categories: token pruning, merging, routing, and compact tokenization. The tutorial then analyzes the underlying scoring and selection mechanisms across diverse tasks, ranging from language processing to vision generation. Furthermore, we explore the integration of these techniques into critical downstream applications, such as embodied AI, autonomous driving, and medical analysis. The session concludes by identifying key challenges and future trajectories, including hardware-aware optimization, interpretability, and the quest for scalable, generalizable compression frameworks.