candle-lora
LoRA (low rank adaptation) implemented in Rust for use with Candle
. This technique interchanges the fully-trainable layers of the model with new, LoRA layers. These LoRA layers act as a wrapper over the original layers, but freeze the original layers. Because they contain fewer trainable parameters, LoRA allows for more efficient fine-tuning.
However, using a fine-tuned LoRA model for inference will have a negative impact on performance. This is because the original layer must still be used to calculate the outputs. However, for a LoRA model, an algorithm known as weight merging nullifies the added cost of using the fine-tuned LoRA model by merging the LoRA and original weights. Weights may also be unmerged.
Features
- Convert
Linear
,Conv1d
,Conv2d
,Embedding
layers into LoRA layers- All conversions are implemented in accordance with HuggingFace's official LoRA implementation
- Weight merging is implemented to improve inference performance
- Weight unmerging
- Easy-to-use APIs
- Extensible trait-based layer swapping mechanism
candle-lora-macro
This library makes using candle-lora
as simple as adding 2 macros to your model structs and calling a method! It is inspired by the simplicity of the Python peft
library's get_peft_model
method. Together, these macros mean that candle-lora
can be added to any candle
model with minimal code changes! To see an example of the benefits, compare the example below (or here) to this, equivalent example. See a precise diff here.
How to use
- Derive
AutoLoraConvert
from candle-lora-macro on each model struct and add thereplace_layer_fields
attribute macro. - Call
get_lora_model
on each model struct. - Enjoy your new LoRA model!
Examples
See a training example with Llama + LoRA here.
use candle_core::{DType, Device, Module, Result, Tensor};
use candle_lora::{LinearLayerLike, LoraConfig, LoraLinearConfig};
use candle_lora_macro::{replace_layer_fields, AutoLoraConvert};
use candle_nn::{init, Linear, VarBuilder, VarMap};
#[replace_layer_fields]
#[derive(AutoLoraConvert, Debug)]
struct Model {
layer: Linear,
}
impl Module for Model {
fn forward(&self, input: &Tensor) -> Result<Tensor> {
self.layer.forward(input)
}
}
fn main() {
let device = Device::Cpu;
let dtype = DType::F32;
let map = VarMap::new();
let layer_weight = map
.get(
(10, 10),
"layer.weight",
init::DEFAULT_KAIMING_NORMAL,
dtype,
&device,
)
.unwrap();
let mut model = Model {
layer: Box::new(Linear::new(layer_weight.clone(), None)),
};
let varmap = VarMap::new();
let vb = VarBuilder::from_varmap(&varmap, dtype, &device);
let loraconfig = LoraConfig::new(1, 1., None);
model.get_lora_model(
loraconfig,
&vb,
Some(LoraLinearConfig::new(10, 10)),
None,
None,
None,
);
let dummy_image = Tensor::zeros((10, 10), DType::F32, &device).unwrap();
let digit = model.forward(&dummy_image).unwrap();
println!("Output: {digit:?}");
}
Resources
candle-lora
's LoRA conversion implementations are based on HuggingFace's peft
library. See the original paper here, as well as Microsoft's implementation.