Brain-Inspired Low-Power Language Models

Presenter

Presentation Menu

Abstract

This talk unveils the transformative potential of achieving sub-10-watt language models (LMs) by drawing inspiration from the brain’s energy efficiency. We introduce a groundbreaking approach to language model design, featuring a matrix-multiplication-free architecture that scales to billions of parameters. To validate this paradigm, we developed custom hardware solutions (FPGA) as well as leveraged pre-existing neuromorphic hardware (Intel Loihi 2), optimized for lightweight operations that outperform traditional GPU capabilities. Our system achieves human-surpassing throughput on billion-parameter models at just 13 watts, setting a new benchmark for energy-efficient AI. This work not only redefines what's possible for low-power LLMs but also highlights the critical operations future accelerators must prioritize to enable the next wave of sustainable AI innovation.