[2026-Mar-11] Toward Efficient LLM Inference

[2026-Mar-11] Toward Efficient LLM Inference - Challenges and Solutions

Institute of Information Systems and Applications
Speaker:	Prof. Kai-Chiang Wu Professor of the Department of Computer Science, National Yang Ming Chiao Tung University
Topic:	Toward Efficient LLM Inference - Challenges and Solutions
Date:	13:20-15:00 Wednesday 11-Mar-2026
Location:	Delta 103
Hosted by:	Prof. Yun-Chih Chen

Abstract

The inference cost of large language models (LLMs) has become a crucial bottleneck as LLMs continue to grow in scale and capability. This talk explores the key challenges of efficient LLM inference—spanning memory bandwidth, latency, compute utilization, and system scalability—and discusses recent advances that address these bottlenecks from both algorithmic and system perspectives. More explicitly, I will introduce emerging techniques such as model compression, KV-cache compression, speculative decoding and talk about how we speed up LLM inference as well as improve memory footprint via these techniques. Finally, I will discuss open problems and future directions toward building an efficient LLM inference framework for the next generation of AI applications.

Bio.

吳凱強於2011年獲得CMU博士學位，畢業後任職於Intel兩年多，然後返台加入交大資工系，一直到2017年以前，他的研究主題多在IC/SOC設計與設計自動化的領域，尤其擅長IC/SOC驗證、測試以及可靠度等議題。在2020年於ICCV國際會議所舉辦的Low-Power Computer Vision (LPCV) Challenge和幾個往後的類似競賽獲得佳績後，全力投入機器學習、深度學習、人工智慧相關領域，研究專長包含ML/DL system, DL efficiency, efficient LLM deployment/serving, edge AI等與ML/DL/AI效率(efficiency)有關的領域。

吳凱強曾獲

半導體產業協會之新進研究人員獎

International Test Conference (ITC) 2003 最佳論文獎“提名”

VLSI Test Symposium (VTS) 2023 最佳論文獎

VLSI Test Symposium (VTS) 2017 最佳論文獎“提名”

International Conference on Computer Design (ICCD) 2008 最佳論文獎

All faculty and students are welcome to join.

瀏覽數:

友善列印