DoRA: Weight-Decomposed Low-Rank Adaptation

Among the widely used parameter-efficient finetuning (PEFT) methods, LoRA and its variants have gained considerable popularity because of avoiding additional inference costs. However, there still often exists an accuracy gap between these methods and full fine-tuning (FT). In this work, we first introduce a novel weight decomposition analysis to investigate the inherent differences between FT and LoRA. Aiming to resemble the learning capacity of FT from the findings, we propose Weight-Decomposed LowRank Adaptation (DoRA). DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to efficiently minimize the number of trainable parameters. By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding.

Implementation of DoRA compared to LoRA in PyTorch-like code


  ## LoRA forward pass
  def forward(self, x: torch.Tensor):
    base_result = F.linear(x, transpose(self.weight, self.fan_in_fan_out), bias=self.bias)
    dropout_x = self.lora_dropout(x)

    result += (self.lora_B(self.lora_A(dropout_x.to(self.lora_A.weight.dtype)))) * self.scaling
    return result

  ## DoRA forward pass
  def forward(self, x: torch.Tensor):
    base_result = F.linear(x, transpose(self.weight, self.fan_in_fan_out))
    dropout_x = self.lora_dropout(x)

    new_weight_v = self.weight + (self.lora_B.weight @ self.lora_A.weight) * self.scaling
    norm_scale = self.weight_m_wdecomp.weight.view(-1) / (torch.linalg.norm(new_weight_v,dim=1)).detach()
    result = base_result + (norm_scale-1) * (F.linear(dropout_x, transpose(self.weight, self.fan_in_fan_out)))
    result += ( norm_scale * (self.lora_B(self.lora_A(dropout_x.to(self.lora_A.weight.dtype))))) * self.scaling
    if not self.bias is None:
      result += self.bias.view(1, -1).expand_as(result)
    return result

QDoRA + Fully Sharded Data Parallel(FSDP) is now supported by Answer.AI

Start finetuning LLMs on consumer GPUs!

DoRA is now supported by the Huggingface PEFT package. You can install the PEFT package using

After PEFT is installed, you can simply set the use_dora argument of LoRAConfig to True for applying DoRA. An example could be as follows:


    from peft import LoraConfig, get_peft_model

    # Initialize DoRA configuration
    config = LoraConfig(
        ...
        use_dora=True
        ...
    )

Please refer to the official documentation for more details.

You can also toy with DoRA on finetuning Diffusion Model. See huggingface/diffusers. Another good tutorial would be this collab notebook from Linoy Tsaban.

Some DoRA v.s LoRA diffusion finetuning results

Example From Linoy Tsaban (Images generated by DoRA are on the left and LoRA on the right):

@article{liu2024dora,
        title={DoRA: Weight-Decomposed Low-Rank Adaptation},
        author={Liu, Shih-Yang and Wang, Chien-Yi and Yin, Hongxu and Molchanov, Pavlo and Wang, Yu-Chiang Frank and Cheng, Kwang-Ting and Chen, Min-Hung},
        journal={arXiv preprint arXiv:2402.09353},
        year={2024}
      }

[ICML 2024 (Oral)] DoRA: Weight-Decomposed Low-Rank Adaptation

DoRA consistently outperforms LoRA on the LLaMA family models for commonsense reasoning tasks.

Abstract

Method Overview

Results

Finetuning LLaMA-7B/13B, LLaMA2-7B and LLaMA3-8B for the commonsense reasoning tasks

Instruction tuning LLaMA-7B and LLaMA2-7B with cleaned Alpaca dataset.

Finetuning VL-BART for the image/video-text understanding tasks

Visual instruction tuning (LLaVA-1.5-7B)

QDoRA vs. QLoRA vs. FT

Implementation of DoRA compared to LoRA in PyTorch-like code

Quick Start with FSDP/QDoRA from Answer.AI

Quick Start with Hugging Face PEFT and Diffusers

Hugging Face PEFT

Hugging Face Diffusers

Some DoRA v.s LoRA diffusion finetuning results

BibTeX