Compact Visual Token representation and compression

The unprecedented proliferation of Foundation Models (FMs), encompassing Large Language Models (LLMs), Diffusion Models, and MLLMs, has fundamentally transformed multi-modal processing. However, the prohibitive memory and computational costs associated with these transformer-based architectures create significant latency barriers, severely restricting their real-world deployment on edge devices. Unlike conventional model compression which often demands expensive retraining, Token Compression has emerged as a promising, training-efficient paradigm that accelerates inference by exploiting information redundancy within input sequences. This tutorial presents a systematic review of this emerging field, moving beyond traditional weight reduction to focus on data-centric optimization. We first dissect the quadratic complexity of FMs and introduce four progressive compression categories: token pruning, merging, routing, and compact tokenization. The tutorial then analyzes the underlying scoring and selection mechanisms across diverse tasks, ranging from language processing to vision generation. Furthermore, we explore the integration of these techniques into critical downstream applications, such as embodied AI, autonomous driving, and medical analysis. The session concludes by identifying key challenges and future trajectories, including hardware-aware optimization, interpretability, and the quest for scalable, generalizable compression frameworks.

Join CASS

Join CASS

Join CASS

Visit CASS MiLe

Join CASS

ISCAS 2027

IEEE CASS Seasonal School 2026: Emerging Technologies and Intelligent Methodologies for Next-Generation Circuits and Systems Design

2026 IEEE/ACM International Symposium on Low Power Electronics and Design

2026 IEEE International Midwest Symposium on Circuits and Systems

Compact Visual Token representation and compression

Zhibo Chen

Presentation Menu