avatar

Shang Yang

Ph.D. Student
MIT EECS
Cambridge, MA
shangy [at] mit [dot] edu


Shang Yang

I am a second-year Ph.D. student at HAN LAB of MIT EECS, advised by Prof. Song Han. My long-term goal is to build efficient machine learning systems for applications at different scales, especially the Large Language Models (LLMs). Recently, I am activately working on the efficient inference systems for LLMs/VLMs.

News

Selected Publications

  1. MLSys
    Shang Yang*, Junxian Guo*, Haotian Tang, Qinghao Hu, Guangxuan Xiao, Jiaming Tang, Yujun Lin, Zhijian Liu, Yao Lu, Song Han.
    The Eighth Annual Conference on Machine Learning and Systems (MLSys), 2025.

  2. MLSys
    Yujun Lin*, Haotian Tang*, Shang Yang*, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han.
    The Eighth Annual Conference on Machine Learning and Systems (MLSys), 2025.

  3. MLSys
    Ji Lin*, Jiaming Tang*, Haotian Tangโ€ , Shang Yangโ€ , Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han.
    The Seventh Annual Conference on Machine Learning and Systems (MLSys), 2024.

  4. MICRO
    Haotian Tang*, Shang Yang*, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han.
    56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023.

Blogs

  1. Explore the latest advancement in TinyChat - the 2.0 version with significant advancements in prefilling speed of Edge LLMs and VLMs. Apart from the 3-4x decoding speedups achieved with AWQ quantization, TinyChat 2.0 now delivers state-of-the-art Time-To-First-Token, which is 1.5-1.7x faster than the legacy version of TinyChat.

  2. Explore the latest advancement in TinyChat and AWQ - the integration of Visual Language Models (VLM) on the edge! The exciting advancements in VLM allows LLMs to comprehend visual inputs, enabling seamless image understanding tasks like caption generation, question answering, and more. With the latest release, TinyChat now supports leading VLMs such as VILA, which can be easily quantized with AWQ, empowering users with seamless experience for image understanding tasks.

  3. Running large language models (LLMs) on the edge is of great importance. In this blog, we introduce TinyChat, an efficient and lightweight system for LLM deployment on the edge. It runs Meta's latest LLaMA-2 model at 30 tokens / second on NVIDIA Jetson Orin and can easily support different models and hardware.


ยฉ Copyright 2024 Shang Yang. Powered by Jekyll and Minimal Light theme.