Low rank adaptation (LoRA) for Candle.

Overview

candle-lora

MIT License Continuous integration

LoRA (low rank adaptation) implemented in Rust for use with Candle. This technique interchanges the fully-trainable layers of the model with new, LoRA layers. These LoRA layers act as a wrapper over the original layers, but freeze the original layers. Because they contain fewer trainable parameters, LoRA allows for more efficient fine-tuning.

However, using a fine-tuned LoRA model for inference will have a negative impact on performance. This is because the original layer must still be used to calculate the outputs. However, for a LoRA model, an algorithm known as weight merging nullifies the added cost of using the fine-tuned LoRA model by merging the LoRA and original weights. Weights may also be unmerged.

Features

  • Convert Linear, Conv1d, Conv2d, Embedding layers into LoRA layers
    • All conversions are implemented in accordance with HuggingFace's official LoRA implementation
  • Weight merging is implemented to improve inference performance
  • Weight unmerging
  • Easy-to-use APIs
  • Extensible trait-based layer swapping mechanism

candle-lora-macro

This library makes using candle-lora as simple as adding 2 macros to your model structs and calling a method! It is inspired by the simplicity of the Python peft library's get_peft_model method. Together, these macros mean that candle-lora can be added to any candle model with minimal code changes! To see an example of the benefits, compare the example below (or here) to this, equivalent example. See a precise diff here.

How to use

  1. Derive AutoLoraConvert from candle-lora-macro on each model struct and add the replace_layer_fields attribute macro.
  2. Call get_lora_model on each model struct.
  3. Enjoy your new LoRA model!

Examples

See a training example with Llama + LoRA here.

use candle_core::{DType, Device, Module, Result, Tensor};
use candle_lora::{LinearLayerLike, LoraConfig, LoraLinearConfig};
use candle_lora_macro::{replace_layer_fields, AutoLoraConvert};
use candle_nn::{init, Linear, VarBuilder, VarMap};

#[replace_layer_fields]
#[derive(AutoLoraConvert, Debug)]
struct Model {
    layer: Linear,
}

impl Module for Model {
    fn forward(&self, input: &Tensor) -> Result<Tensor> {
        self.layer.forward(input)
    }
}

fn main() {
    let device = Device::Cpu;
    let dtype = DType::F32;

    let map = VarMap::new();
    let layer_weight = map
        .get(
            (10, 10),
            "layer.weight",
            init::DEFAULT_KAIMING_NORMAL,
            dtype,
            &device,
        )
        .unwrap();

    let mut model = Model {
        layer: Box::new(Linear::new(layer_weight.clone(), None)),
    };

    let varmap = VarMap::new();
    let vb = VarBuilder::from_varmap(&varmap, dtype, &device);

    let loraconfig = LoraConfig::new(1, 1., None);
    model.get_lora_model(
        loraconfig,
        &vb,
        Some(LoraLinearConfig::new(10, 10)),
        None,
        None,
        None,
    );

    let dummy_image = Tensor::zeros((10, 10), DType::F32, &device).unwrap();

    let digit = model.forward(&dummy_image).unwrap();
    println!("Output: {digit:?}");
}

Resources

candle-lora's LoRA conversion implementations are based on HuggingFace's peft library. See the original paper here, as well as Microsoft's implementation.

You might also like...
A simplified example in Rust of training a neural network and then using it based on the Candle Framework by Hugging Face.

candle-simplified-example A simplified example in Rust of training a neural network and then using it based on the Candle Framework by Hugging Face. H

Tutorial for Porting PyTorch Transformer Models to Candle (Rust)
Tutorial for Porting PyTorch Transformer Models to Candle (Rust)

Candle Tutorial - Convert Pytorch Models to Candle Candle is an ML framework written in rust that takes advantage of the speed and memory safety Rust

A rust chess implementation using a neural network scoring function built on huggingface/candle + rust + wasm

Rusty Chess What is it? Rusty Chess aims to be a high quality embeddable chess engine that runs entirely locally in the browser (no backend required).

rust+slint+candle+openchat3.5 demo
rust+slint+candle+openchat3.5 demo

Slint Chatbot Demo This is a demo of Rust + Slint + Candle + OpenChat LLM, it looks like this: Do it by yourself Make sure you have downloaded opencha

Low-level Rust library for implementing terminal command line interface, like in embedded systems.

Terminal CLI Need to build an interactive command prompt, with commands, properties and with full autocomplete? This is for you. Example, output only

A low-level ncurses wrapper for Rust

ncurses-rs This is a very thin wrapper around the ncurses TUI lib. NOTE: The ncurses lib is terribly unsafe and ncurses-rs is only the lightest wrappe

A low-overhead Vulkan-like GPU API for Rust.
A low-overhead Vulkan-like GPU API for Rust.

Getting Started | Documentation | Blog gfx-rs gfx-rs is a low-level, cross-platform graphics and compute abstraction library in Rust. It consists of t

Rust bindings to Core Foundation and other low level libraries on Mac OS X and iOS

core-foundation-rs Compatibility Targets macOS 10.7 by default. To enable features added in macOS 10.8, set Cargo feature mac_os_10_8_features. To hav

Cross-platform, low level networking using the Rust programming language.

libpnet Linux ∪ OS X Build Status: Windows Build Status: Discussion and support: #libpnet on freenode / #rust-networking on irc.mozilla.org / #rust on

Low level HTTP server library in Rust

tiny-http Documentation Tiny but strong HTTP server in Rust. Its main objectives are to be 100% compliant with the HTTP standard and to provide an eas

A low-ish level tool for easily writing and hosting WASM based plugins.

A low-ish level tool for easily writing and hosting WASM based plugins. The goal of wasm_plugin is to make communicating across the host-plugin bounda

A fast, low-resource Natural Language Processing and Text Correction library written in Rust.

nlprule A fast, low-resource Natural Language Processing and Error Correction library written in Rust. nlprule implements a rule- and lookup-based app

Ergo is a low-code IFTTT/Zapier style application, built with Rust and Svelte

Ergo is a low-code IFTTT/Zapier style application, built with Rust and Svelte. Tasks are customizable with Javascript and can contain state machines for more advanced task handling.

BLEZ - Asynchronous Bluetooth Low Energy on Linux for Rust

BLEZ - Asynchronous Bluetooth Low Energy on Linux for Rust This library provides an asynchronous, fully featured interface to the Bluetooth Low Energy

Low level access to T-Head Xuantie RISC-V processors

XuanTie Low level access to T-Head XuanTie RISC-V processors Contributing We welcome contribution! Please send an issue or pull request if you are rea

A low-level I/O ownership and borrowing library

This library introduces OwnedFd, BorrowedFd, and supporting types and traits, and corresponding features for Windows, which implement safe owning and

Tool to draw low-resolution graphs in terminal
Tool to draw low-resolution graphs in terminal

lowcharts Tool to draw low-resolution graphs in terminal. lowcharts is meant to be used in those scenarios where we have numerical data in text files

libnotcurses-sys is a low-level Rust wrapper for the notcurses C library

libnotcurses-sys is a low-level Rust wrapper for the notcurses C library This library is built with several layers of zero-overhead abstractions over

fftp is the "Fast File Transport Protocol". It transfers files quickly between computers on a network with low overhead.

fftp fftp is the "Fast File Transport Protocol". It transfers files quickly between computers on a network with low overhead. Motivation FTP uses two

Comments
  • Examples for Llama model architecture

    Examples for Llama model architecture

    Hello Eric, this looks like great work ! Thank you !!

    Can you please add examples for both training and inference for Llama model using candle-lora ? Is it supported through this work ?

    opened by okpatil4u 5
  • Question: Could we use the same mechanism for Quantization?

    Question: Could we use the same mechanism for Quantization?

    Just an idea but couldn't the LayerLike mechanism be used to simply swap e.g. Linear-Layers in a model with quantized Linear-Layers? This would make quantization of already implemented models trivial.

    opened by LLukas22 4
Owner
Eric Buehler
Eric Buehler
Macros for candle-lora.

candle-lora-macro This library makes using candle-lora as simple as adding 2 macros to your model structs and calling a method! It is inspired by the

Eric Buehler 4 Sep 13, 2023
The Rank-Biased Centroids (RBC) rank fusion method to combine multiple-rankings of objects.

Rank-Biased Centroids (RBC) The Rank-Biased Centroids (RBC) rank fusion method to combine multiple-rankings of objects. This code implements the RBC r

Matthias Petri 2 Sep 8, 2022
Easy c̵̰͠r̵̛̠ö̴̪s̶̩̒s̵̭̀-t̶̲͝h̶̯̚r̵̺͐e̷̖̽ḁ̴̍d̶̖̔ ȓ̵͙ė̶͎ḟ̴͙e̸̖͛r̶̖͗ë̶̱́ṉ̵̒ĉ̷̥e̷͚̍ s̷̹͌h̷̲̉a̵̭͋r̷̫̊ḭ̵̊n̷̬͂g̵̦̃ f̶̻̊ơ̵̜ṟ̸̈́ R̵̞̋ù̵̺s̷̖̅ţ̸͗!̸̼͋

Rust S̵̓i̸̓n̵̉ I̴n̴f̶e̸r̵n̷a̴l mutability! Howdy, friendly Rust developer! Ever had a value get m̵̯̅ð̶͊v̴̮̾ê̴̼͘d away right under your nose just when

null 294 Dec 23, 2022
An adaptation of the Solana token-swap program implementing Curve's StableSwap invariant.

StableSwap Program An adaptation of the Solana token-swap program implementing Curve's StableSwap invariant. Click here to try it out live on the Sola

smaster0517 3 Mar 30, 2022
Rust adaptation of sindresorhus/is-interactive from NodeJS

is-interactive Rust adaptation of sindresorhus/is-interactive from NodeJS Check if stdout or stderr is interactive It checks that stedout or stderr is

Sean Larkin 4 Jan 21, 2023
Didactic implementation of the type checker described in "Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism" written in OCaml

bidi-higher-rank-poly Didactic implementation of the type checker described in "Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorph

Søren Nørbæk 23 Oct 18, 2022
Implementation of "Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism"

Implementation of "Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism" See arXiv:1306.6032 This implementation focusses on read

Jakob Demler 95 Dec 20, 2022
A lightweight Rust library for BitVector Rank&Select operations, coupled with a generic Sparse Array implementation

A lightweight Rust library for BitVector Rank&Select operations, coupled with a generic Sparse Array implementation

Alperen Keleş 5 Jun 20, 2022
A lending iterator trait based on generic associated types and higher-rank trait bounds

A lending iterator trait based on higher-rank trait bounds (HRTBs) A lending iterator is an iterator which lends mutable borrows to the items it retur

Sebastiano Vigna 6 Oct 23, 2023
Use HuggingFace's Candle with Go.

An example on using huggingface/candle with Golang. For educational purposes only. The implementation is thread-safe and uses multilingual-e5-large mo

null 8 Aug 15, 2023