Low rank adaptation (LoRA) for Candle.

Eric Buehler

Last update: Oct 6, 2023

Related tags

Overview

candle-lora

LoRA (low rank adaptation) implemented in Rust for use with Candle. This technique interchanges the fully-trainable layers of the model with new, LoRA layers. These LoRA layers act as a wrapper over the original layers, but freeze the original layers. Because they contain fewer trainable parameters, LoRA allows for more efficient fine-tuning.

However, using a fine-tuned LoRA model for inference will have a negative impact on performance. This is because the original layer must still be used to calculate the outputs. However, for a LoRA model, an algorithm known as weight merging nullifies the added cost of using the fine-tuned LoRA model by merging the LoRA and original weights. Weights may also be unmerged.

Features

Convert Linear, Conv1d, Conv2d, Embedding layers into LoRA layers
- All conversions are implemented in accordance with HuggingFace's official LoRA implementation
Weight merging is implemented to improve inference performance
Weight unmerging
Easy-to-use APIs
Extensible trait-based layer swapping mechanism

candle-lora-macro

This library makes using candle-lora as simple as adding 2 macros to your model structs and calling a method! It is inspired by the simplicity of the Python peft library's get_peft_model method. Together, these macros mean that candle-lora can be added to any candle model with minimal code changes! To see an example of the benefits, compare the example below (or here) to this, equivalent example. See a precise diff here.

How to use

Derive AutoLoraConvert from candle-lora-macro on each model struct and add the replace_layer_fields attribute macro.
Call get_lora_model on each model struct.
Enjoy your new LoRA model!

Examples

See a training example with Llama + LoRA here.

use candle_core::{DType, Device, Module, Result, Tensor};
use candle_lora::{LinearLayerLike, LoraConfig, LoraLinearConfig};
use candle_lora_macro::{replace_layer_fields, AutoLoraConvert};
use candle_nn::{init, Linear, VarBuilder, VarMap};

#[replace_layer_fields]
#[derive(AutoLoraConvert, Debug)]
struct Model {
    layer: Linear,
}

impl Module for Model {
    fn forward(&self, input: &Tensor) -> Result<Tensor> {
        self.layer.forward(input)
    }
}

fn main() {
    let device = Device::Cpu;
    let dtype = DType::F32;

    let map = VarMap::new();
    let layer_weight = map
        .get(
            (10, 10),
            "layer.weight",
            init::DEFAULT_KAIMING_NORMAL,
            dtype,
            &device,
        )
        .unwrap();

    let mut model = Model {
        layer: Box::new(Linear::new(layer_weight.clone(), None)),
    };

    let varmap = VarMap::new();
    let vb = VarBuilder::from_varmap(&varmap, dtype, &device);

    let loraconfig = LoraConfig::new(1, 1., None);
    model.get_lora_model(
        loraconfig,
        &vb,
        Some(LoraLinearConfig::new(10, 10)),
        None,
        None,
        None,
    );

    let dummy_image = Tensor::zeros((10, 10), DType::F32, &device).unwrap();

    let digit = model.forward(&dummy_image).unwrap();
    println!("Output: {digit:?}");
}

Resources

candle-lora's LoRA conversion implementations are based on HuggingFace's peft library. See the original paper here, as well as Microsoft's implementation.

A simplified example in Rust of training a neural network and then using it based on the Candle Framework by Hugging Face.

candle-simplified-example A simplified example in Rust of training a neural network and then using it based on the Candle Framework by Hugging Face. H

8 Sep 26, 2023

Tutorial for Porting PyTorch Transformer Models to Candle (Rust)

Comments

Examples for Llama model architecture

Hello Eric, this looks like great work ! Thank you !!

Can you please add examples for both training and inference for Llama model using candle-lora ? Is it supported through this work ?

opened by okpatil4u 5
Question: Could we use the same mechanism for Quantization?

Just an idea but couldn't the LayerLike mechanism be used to simply swap e.g. Linear-Layers in a model with quantized Linear-Layers? This would make quantization of already implemented models trivial.

opened by LLukas22 4

Low rank adaptation (LoRA) for Candle.

Related tags

Overview

You might also like...

A simplified example in Rust of training a neural network and then using it based on the Candle Framework by Hugging Face.

Tutorial for Porting PyTorch Transformer Models to Candle (Rust)

A rust chess implementation using a neural network scoring function built on huggingface/candle + rust + wasm

rust+slint+candle+openchat3.5 demo

Low-level Rust library for implementing terminal command line interface, like in embedded systems.

A low-level ncurses wrapper for Rust

A low-overhead Vulkan-like GPU API for Rust.

Rust bindings to Core Foundation and other low level libraries on Mac OS X and iOS

Cross-platform, low level networking using the Rust programming language.

Low level HTTP server library in Rust

A low-ish level tool for easily writing and hosting WASM based plugins.

A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

Ergo is a low-code IFTTT/Zapier style application, built with Rust and Svelte

BLEZ - Asynchronous Bluetooth Low Energy on Linux for Rust

Low level access to T-Head Xuantie RISC-V processors

A low-level I/O ownership and borrowing library

Tool to draw low-resolution graphs in terminal

libnotcurses-sys is a low-level Rust wrapper for the notcurses C library

fftp is the "Fast File Transport Protocol". It transfers files quickly between computers on a network with low overhead.

Comments

Owner

Eric Buehler

Macros for candle-lora.

The Rank-Biased Centroids (RBC) rank fusion method to combine multiple-rankings of objects.

Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

An adaptation of the Solana token-swap program implementing Curve's StableSwap invariant.

Rust adaptation of sindresorhus/is-interactive from NodeJS

Didactic implementation of the type checker described in "Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism" written in OCaml

Implementation of "Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism"

A lightweight Rust library for BitVector Rank&Select operations, coupled with a generic Sparse Array implementation

A lending iterator trait based on generic associated types and higher-rank trait bounds

Use HuggingFace's Candle with Go.