Research
I'm interested in model compression, efficient deep learning. Most of my research is about accelerating either the inference or training of deep learning model. Representative papers are highlighted.
|
|
DoRA: Weight-Decomposed Low-Rank Adaptation
Shih-Yang Liu,
Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen
Proceedings of the 41th International Conference on Machine Learning (ICML), 2024
project page
/
arXiv
/
code
We presented DoRA, a new parameter-efficient fine-tuning approach, which consistently outperforms LoRA in fine-tuning LLM without incurring additional inference costs. These improvements are particularly notable for smaller ranks with 37.2% improvement over LoRA for rank 8 and 22.4% improvement for rank 4.
|
|
Oscillation-free Quantization for Low-bit Vision Transformers
Shih-Yang Liu, Zechun Liu, Kwang-Ting Cheng
Proceedings of the 40th International Conference on Machine Learning (ICML), 2023
Paper
/
Code
In this study, we address weight oscillation in quantization-aware training and its negative impact on model performance. We propose three techniques: statistical weight quantization (StatsQ), confidence-guided annealing (CGA), and query-key reparameterization (QKR). These techniques improve quantization robustness and accuracy in the ViT model. The proposed 2-bit DeiT-T/DeiT-S algorithms outperform the previous state-of-the-art by 9.8% and 7.7%, respectively.
|
|
LLM-FP4: 4-Bit Floating-Point Quantized Transformers
Shih-Yang Liu, Zechun Liu, Xijie Huang, Pingcheng Dong, Kwang-Ting Cheng
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP Main), 2023
Paper
/
Code
We introduced LLM-FP4, a novel post-training quantization framework which for the first time is capable of quantizing both the activation and weight of LLM to 4 bits without substantial loss in accuracy, outperforming previous methods by up to 13.1%.
|
Thanks to Barron's website template.
|
|