Shih-Yang (Sean) Liu

I'm a PhD student at HKUST, where I work on efficient deep learning (Compression and Parameter-efficient Finetuning). I am a member of Vision and System Design Lab (VSDL), advised by Prof. Tim Kwang-Ting CHENG. I am also mentored by Zechun Liu, and we work closely on model quantization.

I am currently a reserach intern at Nvidia Research working on efficient deep learning.

Email  /  Google Scholar  /  Twitter(X)  /  Github

profile photo

Research

I'm interested in model compression, efficient deep learning. Most of my research is about accelerating either the inference or training of deep learning model. Representative papers are highlighted.

DoRA: Weight-Decomposed Low-Rank Adaptation
Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen
Proceedings of the 41th International Conference on Machine Learning (ICML), 2024
project page / arXiv / code

We presented DoRA, a new parameter-efficient fine-tuning approach, which consistently outperforms LoRA in fine-tuning LLM without incurring additional inference costs. These improvements are particularly notable for smaller ranks with 37.2% improvement over LoRA for rank 8 and 22.4% improvement for rank 4.

Oscillation-free Quantization for Low-bit Vision Transformers
Shih-Yang Liu, Zechun Liu, Kwang-Ting Cheng
Proceedings of the 40th International Conference on Machine Learning (ICML), 2023
Paper / Code

In this study, we address weight oscillation in quantization-aware training and its negative impact on model performance. We propose three techniques: statistical weight quantization (StatsQ), confidence-guided annealing (CGA), and query-key reparameterization (QKR). These techniques improve quantization robustness and accuracy in the ViT model. The proposed 2-bit DeiT-T/DeiT-S algorithms outperform the previous state-of-the-art by 9.8% and 7.7%, respectively.

LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Shih-Yang Liu, Zechun Liu, Xijie Huang, Pingcheng Dong, Kwang-Ting Cheng
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP Main), 2023
Paper / Code

We introduced LLM-FP4, a novel post-training quantization framework which for the first time is capable of quantizing both the activation and weight of LLM to 4 bits without substantial loss in accuracy, outperforming previous methods by up to 13.1%.


Thanks to Barron's website template.