High-level, optionally asynchronous Rust bindings to llama.cpp

Overview

llama_cpp-rs

Documentation Crate

Safe, high-level Rust bindings to the C++ project of the same name, meant to be as user-friendly as possible. Run GGUF-based large language models directly on your CPU in fifteen lines of code, no ML experience required!

// Create a model from anything that implements `AsRef<Path>`:
let model = LlamaModel::load_from_file("path_to_model.gguf").expect("Could not load model");

// A `LlamaModel` holds the weights shared across many _sessions_; while your model may be
// several gigabytes large, a session is typically a few dozen to a hundred megabytes!
let mut ctx = model.create_session();

// You can feed anything that implements `AsRef<[u8]>` into the model's context.
ctx.advance_context("This is the story of a man named Stanley.").unwrap();

// LLMs are typically used to predict the next word in a sequence. Let's generate some tokens!
let max_tokens = 1024;
let mut decoded_tokens = 0;

// `ctx.get_completions` creates a worker thread that generates tokens. When the completion
// handle is dropped, tokens stop generating!
let mut completions = ctx.get_completions();

while let Some(next_token) = completions.next_token() {
    println!("{}", String::from_utf8_lossy(&*next_token.detokenize()));
    decoded_tokens += 1;
    if decoded_tokens > max_tokens {
        break;
    }
}

This repository hosts the high-level bindings (crates/llama_cpp) as well as automatically generated bindings to llama.cpp's low-level C API (crates/llama_cpp_sys). Contributions are welcome--just keep the UX clean!

License

MIT or Apache-2.0, at your option (the "Rust" license). See LICENSE-MIT and LICENSE-APACHE.

You might also like...
An asynchronous Rust client library for the Hashicorp Vault API

vaultrs An asynchronous Rust client library for the Hashicorp Vault API The following features are currently supported: Auth AppRole JWT/OIDC Token Us

An asynchronous API client for a light installation at the University of Kiel

An asynchronous API client for a light installation at the University of Kiel

Simple tray application which shows battery level for HyperX Cloud Flight Wireless Headset.
Simple tray application which shows battery level for HyperX Cloud Flight Wireless Headset.

HyperX Cloud Flight Battery Monitoring Introduction Simple tray application which shows battery level for HyperX Cloud Flight Wireless Headset. Screen

An upper-level course for CS majors on formal languages theory and compilers.

CS4100 Introduction to Formal Languages and Compilers Spring 2022 An upper-level course for CS majors on formal languages theory and compilers. Topics

A toy-level BLE peripheral stack

bleps - A toy-level BLE peripheral stack This is a BLE peripheral stack in Rust. (no-std / no-alloc) To use it you need an implementation of embedded-

Damavand is a quantum circuit simulator. It can  run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.
Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

Moonshine CSS - πŸ₯ƒ High-proof atomic CSS framework

Moonshine CSS - πŸ₯ƒ High-proof atomic CSS framework

Rust bindings to Cloudflare Worker KV Stores using wasm-bindgen and js-sys.

worker-kv Rust bindings to Cloudflare Worker KV Stores using wasm-bindgen and js-sys

Rust bindings for the WebView2 COM APIs

webview2-rs Rust bindings for the WebView2 COM APIs Crates in this repo The root of this repo defines a virtual workspace in Cargo.toml which includes

Comments
Releases(llama_cpp_sys-v0.2.2)
  • llama_cpp_sys-v0.2.2(Nov 8, 2023)

    Bug Fixes

    • do not rerun build on changed header files this restores functionality lost in the latest upgrade to bindgen, which enabled this functionality

    Commit Statistics

    • 2 commits contributed to the release.
    • 1 commit was understood as conventional.
    • 0 issues like '(#ID)' were seen in commit messages

    Commit Details

    view details
    • Uncategorized
      • Do not rerun build on changed header files (674f395)
      • Release llama_cpp_sys v0.2.1, llama_cpp v0.1.1 (a9e5813)
    Source code(tar.gz)
    Source code(zip)
  • llama_cpp_sys-v0.2.1(Nov 8, 2023)

    Chore

    • Update to bindgen 0.69.1

    Bug Fixes

    • start_completing should not be invoked on a per-iteration basis There's still some UB that can be triggered due to llama.cpp's threading model, which needs patching up.

    Commit Statistics

    • 2 commits contributed to the release.
    • 13 days passed between releases.
    • 2 commits were understood as conventional.
    • 0 issues like '(#ID)' were seen in commit messages

    Commit Details

    view details
    • Uncategorized
      • Update to bindgen 0.69.1 (ccb794d)
      • start_completing should not be invoked on a per-iteration basis (4eb0bc9)
    Source code(tar.gz)
    Source code(zip)
  • llama_cpp-v0.1.3(Nov 8, 2023)

    New Features

    • more async function variants
    • add LlamaSession.model

    Other

    • typo

    Commit Statistics

    • 5 commits contributed to the release.
    • 3 commits were understood as conventional.
    • 0 issues like '(#ID)' were seen in commit messages

    Commit Details

    view details
    • Uncategorized
      • Typo (0a0d5f3)
      • Release llama_cpp v0.1.2 (4d0b130)
      • More async function variants (1019402)
      • Add LlamaSession.model (c190df6)
      • Release llama_cpp_sys v0.2.1, llama_cpp v0.1.1 (a9e5813)
    Source code(tar.gz)
    Source code(zip)
  • llama_cpp-v0.1.2(Nov 8, 2023)

    New Features

    • more async function variants
    • add LlamaSession.model

    Commit Statistics

    • 2 commits contributed to the release.
    • 2 commits were understood as conventional.
    • 0 issues like '(#ID)' were seen in commit messages

    Commit Details

    view details
    • Uncategorized
      • More async function variants (dcfccdf)
      • Add LlamaSession.model (56285a1)
    Source code(tar.gz)
    Source code(zip)
  • llama_cpp-v0.1.1(Nov 8, 2023)

    Chore

    • Remove debug binary from Cargo.toml

    New Features

    • add LlamaModel::load_from_file_async

    Bug Fixes

    • require llama_context is accessed from behind a mutex This solves a race condition when several get_completions threads are spawned at the same time
    • start_completing should not be invoked on a per-iteration basis There's still some UB that can be triggered due to llama.cpp's threading model, which needs patching up.

    Commit Statistics

    • 5 commits contributed to the release.
    • 13 days passed between releases.
    • 4 commits were understood as conventional.
    • 0 issues like '(#ID)' were seen in commit messages

    Commit Details

    view details
    • Uncategorized
      • Add LlamaModel::load_from_file_async (3bada65)
      • Remove debug binary from Cargo.toml (3eddbab)
      • Require llama_context is accessed from behind a mutex (b676baa)
      • start_completing should not be invoked on a per-iteration basis (4eb0bc9)
      • Update to llama.cpp 0a7c980 (94d7385)
    Source code(tar.gz)
    Source code(zip)
  • llama_cpp_sys-v0.2.0(Oct 25, 2023)

    Chore

    • Release
    • latest fixes from upstream

    Bug Fixes

    • set clang to use c++ stl
    • use SPDX license identifiers

    Other

    • use link-cplusplus, enable build+test on all branches
      • ci: disable static linking of llama.o\r \r
      • ci: build+test on all branches/prs\r \r
      • ci: use link-cplusplus
    • configure for cargo-release

    Commit Statistics

    • 10 commits contributed to the release over the course of 5 calendar days.
    • 6 commits were understood as conventional.
    • 3 unique issues were worked on: #1, #2, #3

    Commit Details

    view details
    • #1
      • Use link-cplusplus, enable build+test on all branches (2d14d8d)
    • #2
      • Prepare for publishing to crates.io (f35e282)
    • #3
      • Release (116fe8c)
    • Uncategorized
      • Use SPDX license identifiers (2cb06ae)
      • Release llama_cpp_sys v0.2.0 (85f21a1)
      • Add CHANGELOG.md (0e836f5)
      • Set clang to use c++ stl (b9cde4a)
      • Latest fixes from upstream (96548c8)
      • Configure for cargo-release (a5fb194)
      • Initial commit (6f672ff)
    Source code(tar.gz)
    Source code(zip)
  • llama_cpp-v0.1.0(Oct 25, 2023)

    Chore

    • remove include from llama_cpp
    • Release
    • latest fixes from upstream

    Chore

    • add CHANGELOG.md

    Bug Fixes

    • use SPDX license identifiers

    Other

    • configure for cargo-release

    Commit Statistics

    • 8 commits contributed to the release over the course of 5 calendar days.
    • 6 commits were understood as conventional.
    • 1 unique issue was worked on: #3

    Commit Details

    view details
    • #3
      • Release (116fe8c)
    • Uncategorized
      • Add CHANGELOG.md (aa5eed4)
      • Remove include from llama_cpp (702a6ff)
      • Use SPDX license identifiers (2cb06ae)
      • Release llama_cpp_sys v0.2.0 (d1868ac)
      • Latest fixes from upstream (96548c8)
      • Configure for cargo-release (a5fb194)
      • Initial commit (6f672ff)
    Source code(tar.gz)
    Source code(zip)
Owner
Binedge.ai
Enabling GenAI on the Edge
Binedge.ai
A rusty interface to llama.cpp for rust

llama-cpp-rs Higher level API for the llama-cpp-sys library here: https://github.com/shadowmint/llama-cpp-sys/ A full end-to-end example can be found

Doug 3 Apr 16, 2023
High-level PortMidi bindings and wrappers for Rust

High-level PortMidi bindings and wrappers for Rust

Philippe Delrieu 69 Dec 1, 2022
A Discord bot, written in Rust, that generates responses using the LLaMA language model.

llamacord A Discord bot, written in Rust, that generates responses using the LLaMA language model. Built on top of llama-rs. Setup Model Obtain the LL

Philpax 6 Mar 20, 2023
A Discord bot, written in Rust, that generates responses using the LLaMA language model.

llamacord A Discord bot, written in Rust, that generates responses using the LLaMA language model. Built on top of llama-rs. Setup Model Obtain the LL

Rustformers 18 Apr 9, 2023
Run LLaMA inference on CPU, with Rust πŸ¦€πŸš€πŸ¦™

LLaMA-rs Do the LLaMA thing, but now in Rust ?? ?? ?? Image by @darthdeus, using Stable Diffusion LLaMA-rs is a Rust port of the llama.cpp project. Th

Rustformers 2.7k Apr 17, 2023
Run LLaMA inference on CPU, with Rust πŸ¦€πŸš€πŸ¦™

LLaMA-rs Do the LLaMA thing, but now in Rust ?? ?? ?? Image by @darthdeus, using Stable Diffusion LLaMA-rs is a Rust port of the llama.cpp project. Th

Rustformers 2.7k Apr 17, 2023
A mimimal Rust implementation of Llama.c

llama2.rs Rust meets llama. A mimimal Rust implementation of karpathy's llama.c. Currently the code uses the 15M parameter model provided by Karpathy

null 6 Aug 8, 2023
OpenAI compatible API for serving LLAMA-2 model

Cria - Local llama OpenAI-compatible API The objective is to serve a local llama-2 model by mimicking an OpenAI API service. The llama2 model runs on

AmineDiro 66 Aug 8, 2023
A high-level Rust crate around the Discord API, aimed to be easy and straight-forward to use.

rs-cord A high-level Rust crate around the Discord API, aimed to be easy and straight-forward to use. Documentation β€’ Crates.io β€’ Discord Navigation M

Jay3332 4 Sep 24, 2022
A collection of compilers based around compiling a high level language to a Brainfuck dialect.

tf A collection of compilers based around compiling a high level language to a Brainfuck dialect. Built at, and for, the VolHacks V hackathon during O

adam mcdaniel 6 Nov 25, 2021