High-level, optionally asynchronous Rust bindings to llama.cpp

Binedge.ai

Last update: Nov 21, 2023

Related tags

Miscellaneous llama_cpp-rs

Overview

llama_cpp-rs

Safe, high-level Rust bindings to the C++ project of the same name, meant to be as user-friendly as possible. Run GGUF-based large language models directly on your CPU in fifteen lines of code, no ML experience required!

// Create a model from anything that implements `AsRef<Path>`:
let model = LlamaModel::load_from_file("path_to_model.gguf").expect("Could not load model");

// A `LlamaModel` holds the weights shared across many _sessions_; while your model may be
// several gigabytes large, a session is typically a few dozen to a hundred megabytes!
let mut ctx = model.create_session();

// You can feed anything that implements `AsRef<[u8]>` into the model's context.
ctx.advance_context("This is the story of a man named Stanley.").unwrap();

// LLMs are typically used to predict the next word in a sequence. Let's generate some tokens!
let max_tokens = 1024;
let mut decoded_tokens = 0;

// `ctx.get_completions` creates a worker thread that generates tokens. When the completion
// handle is dropped, tokens stop generating!
let mut completions = ctx.get_completions();

while let Some(next_token) = completions.next_token() {
    println!("{}", String::from_utf8_lossy(&*next_token.detokenize()));
    decoded_tokens += 1;
    if decoded_tokens > max_tokens {
        break;
    }
}

This repository hosts the high-level bindings (crates/llama_cpp) as well as automatically generated bindings to llama.cpp's low-level C API (crates/llama_cpp_sys). Contributions are welcome--just keep the UX clean!

License

MIT or Apache-2.0, at your option (the "Rust" license). See LICENSE-MIT and LICENSE-APACHE.

An asynchronous Rust client library for the Hashicorp Vault API

vaultrs An asynchronous Rust client library for the Hashicorp Vault API The following features are currently supported: Auth AppRole JWT/OIDC Token Us

59 Dec 29, 2022

An asynchronous API client for a light installation at the University of Kiel

2 Nov 22, 2022

Simple tray application which shows battery level for HyperX Cloud Flight Wireless Headset.

HyperX Cloud Flight Battery Monitoring Introduction Simple tray application which shows battery level for HyperX Cloud Flight Wireless Headset. Screen

18 Dec 27, 2022

An upper-level course for CS majors on formal languages theory and compilers.

CS4100 Introduction to Formal Languages and Compilers Spring 2022 An upper-level course for CS majors on formal languages theory and compilers. Topics

2 May 28, 2022

A toy-level BLE peripheral stack

bleps - A toy-level BLE peripheral stack This is a BLE peripheral stack in Rust. (no-std / no-alloc) To use it you need an implementation of embedded-

4 Oct 17, 2022

Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

0 Mar 29, 2022

Moonshine CSS - 🥃 High-proof atomic CSS framework

25 Nov 25, 2022

Rust bindings to Cloudflare Worker KV Stores using wasm-bindgen and js-sys.

worker-kv Rust bindings to Cloudflare Worker KV Stores using wasm-bindgen and js-sys

39 Dec 4, 2022

Rust bindings for the WebView2 COM APIs

webview2-rs Rust bindings for the WebView2 COM APIs Crates in this repo The root of this repo defines a virtual workspace in Cargo.toml which includes

24 Dec 15, 2022

Comments

Do not rerun llama_cpp_sys build on changed header files

bindgen enables this flag by default after its latest update, which isn't what we want because we manually copy the entire CMake directory into OUT_DIR at build-time.

opened by scriptis 0
Fix a race condition while using `llama_context`; update `llama.cpp`; update README.md

llama_beam_search secretly isn't thread-safe!

This PR rebases to the latest llama.cpp and adjusts the crate's API surface to match. It also puts llama_contexts behind a Mutex, ensuring that they don't destroy each other during beam searching.

opened by scriptis 0
Prepare for publishing to crates.io

llama.cpp eagerly reads/writes git information while cmake-ing, this introduces a (temporary) hack to work around that. cargo publish --dry-run now completes.

opened by scriptis 0

Releases(llama_cpp_sys-v0.2.2)

llama_cpp_sys-v0.2.2(Nov 8, 2023)
Bug Fixes

do not rerun build on changed header files this restores functionality lost in the latest upgrade to bindgen, which enabled this functionality

Commit Statistics

2 commits contributed to the release.

1 commit was understood as conventional.

0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

Uncategorized

Do not rerun build on changed header files (674f395)

Release llama_cpp_sys v0.2.1, llama_cpp v0.1.1 (a9e5813)

Source code(tar.gz)
Source code(zip)
llama_cpp_sys-v0.2.1(Nov 8, 2023)
Chore

Update to bindgen 0.69.1

Bug Fixes

start_completing should not be invoked on a per-iteration basis There's still some UB that can be triggered due to llama.cpp's threading model, which needs patching up.

Commit Statistics

2 commits contributed to the release.

13 days passed between releases.

2 commits were understood as conventional.

0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

Uncategorized

Update to bindgen 0.69.1 (ccb794d)

start_completing should not be invoked on a per-iteration basis (4eb0bc9)

Source code(tar.gz)
Source code(zip)
llama_cpp-v0.1.3(Nov 8, 2023)
New Features

more async function variants

add LlamaSession.model

Other

typo

Commit Statistics

5 commits contributed to the release.

3 commits were understood as conventional.

0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

Uncategorized

Typo (0a0d5f3)

Release llama_cpp v0.1.2 (4d0b130)

More async function variants (1019402)

Add LlamaSession.model (c190df6)

Release llama_cpp_sys v0.2.1, llama_cpp v0.1.1 (a9e5813)

Source code(tar.gz)
Source code(zip)
llama_cpp-v0.1.2(Nov 8, 2023)
New Features

more async function variants

add LlamaSession.model

Commit Statistics

2 commits contributed to the release.

2 commits were understood as conventional.

0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

Uncategorized

More async function variants (dcfccdf)

Add LlamaSession.model (56285a1)

Source code(tar.gz)
Source code(zip)
llama_cpp-v0.1.1(Nov 8, 2023)
Chore

Remove debug binary from Cargo.toml

New Features

add LlamaModel::load_from_file_async

Bug Fixes

require llama_context is accessed from behind a mutex This solves a race condition when several get_completions threads are spawned at the same time

start_completing should not be invoked on a per-iteration basis There's still some UB that can be triggered due to llama.cpp's threading model, which needs patching up.

Commit Statistics

5 commits contributed to the release.

13 days passed between releases.

4 commits were understood as conventional.

0 issues like '(#ID)' were seen in commit messages

Commit Details

view details

Uncategorized

Add LlamaModel::load_from_file_async (3bada65)

Remove debug binary from Cargo.toml (3eddbab)

Require llama_context is accessed from behind a mutex (b676baa)

start_completing should not be invoked on a per-iteration basis (4eb0bc9)

Update to llama.cpp 0a7c980 (94d7385)

Source code(tar.gz)
Source code(zip)
llama_cpp_sys-v0.2.0(Oct 25, 2023)
Chore

Release

latest fixes from upstream

Bug Fixes

set clang to use c++ stl

use SPDX license identifiers

Other

use link-cplusplus, enable build+test on all branches

ci: disable static linking of llama.o\r \r

ci: build+test on all branches/prs\r \r

ci: use link-cplusplus

configure for cargo-release

Commit Statistics

10 commits contributed to the release over the course of 5 calendar days.

6 commits were understood as conventional.

3 unique issues were worked on: #1, #2, #3

Commit Details

view details

#1

Use link-cplusplus, enable build+test on all branches (2d14d8d)

#2

Prepare for publishing to crates.io (f35e282)

#3

Release (116fe8c)

Uncategorized

Use SPDX license identifiers (2cb06ae)

Release llama_cpp_sys v0.2.0 (85f21a1)

Add CHANGELOG.md (0e836f5)

Set clang to use c++ stl (b9cde4a)

Latest fixes from upstream (96548c8)

Configure for cargo-release (a5fb194)

Initial commit (6f672ff)

Source code(tar.gz)
Source code(zip)
llama_cpp-v0.1.0(Oct 25, 2023)
Chore

remove include from llama_cpp

Release

latest fixes from upstream

Chore

add CHANGELOG.md

Bug Fixes

use SPDX license identifiers

Other

configure for cargo-release

Commit Statistics

8 commits contributed to the release over the course of 5 calendar days.

6 commits were understood as conventional.

1 unique issue was worked on: #3

Commit Details

view details

#3

Release (116fe8c)

Uncategorized

Add CHANGELOG.md (aa5eed4)

Remove include from llama_cpp (702a6ff)

Use SPDX license identifiers (2cb06ae)

Release llama_cpp_sys v0.2.0 (d1868ac)

Latest fixes from upstream (96548c8)

Configure for cargo-release (a5fb194)

Initial commit (6f672ff)

Source code(tar.gz)
Source code(zip)

Owner

Binedge.ai

Enabling GenAI on the Edge

GitHub

A rusty interface to llama.cpp for rust

llama-cpp-rs Higher level API for the llama-cpp-sys library here: https://github.com/shadowmint/llama-cpp-sys/ A full end-to-end example can be found

3 Apr 16, 2023

High-level PortMidi bindings and wrappers for Rust

69 Dec 1, 2022

A Discord bot, written in Rust, that generates responses using the LLaMA language model.

llamacord A Discord bot, written in Rust, that generates responses using the LLaMA language model. Built on top of llama-rs. Setup Model Obtain the LL

6 Mar 20, 2023

A Discord bot, written in Rust, that generates responses using the LLaMA language model.

llamacord A Discord bot, written in Rust, that generates responses using the LLaMA language model. Built on top of llama-rs. Setup Model Obtain the LL

18 Apr 9, 2023

Run LLaMA inference on CPU, with Rust 🦀🚀🦙

LLaMA-rs Do the LLaMA thing, but now in Rust ?? ?? ?? Image by @darthdeus, using Stable Diffusion LLaMA-rs is a Rust port of the llama.cpp project. Th

2.7k Apr 17, 2023

Run LLaMA inference on CPU, with Rust 🦀🚀🦙

LLaMA-rs Do the LLaMA thing, but now in Rust ?? ?? ?? Image by @darthdeus, using Stable Diffusion LLaMA-rs is a Rust port of the llama.cpp project. Th

2.7k Apr 17, 2023

A mimimal Rust implementation of Llama.c

llama2.rs Rust meets llama. A mimimal Rust implementation of karpathy's llama.c. Currently the code uses the 15M parameter model provided by Karpathy

6 Aug 8, 2023

OpenAI compatible API for serving LLAMA-2 model

Cria - Local llama OpenAI-compatible API The objective is to serve a local llama-2 model by mimicking an OpenAI API service. The llama2 model runs on

66 Aug 8, 2023

A high-level Rust crate around the Discord API, aimed to be easy and straight-forward to use.

rs-cord A high-level Rust crate around the Discord API, aimed to be easy and straight-forward to use. Documentation • Crates.io • Discord Navigation M

4 Sep 24, 2022

A collection of compilers based around compiling a high level language to a Brainfuck dialect.

tf A collection of compilers based around compiling a high level language to a Brainfuck dialect. Built at, and for, the VolHacks V hackathon during O

6 Nov 25, 2021

High-level, optionally asynchronous Rust bindings to llama.cpp

Related tags

Overview

llama_cpp-rs

License

You might also like...

An asynchronous Rust client library for the Hashicorp Vault API

An asynchronous API client for a light installation at the University of Kiel

Simple tray application which shows battery level for HyperX Cloud Flight Wireless Headset.

An upper-level course for CS majors on formal languages theory and compilers.

A toy-level BLE peripheral stack

Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

Moonshine CSS - 🥃 High-proof atomic CSS framework

Rust bindings to Cloudflare Worker KV Stores using wasm-bindgen and js-sys.

Rust bindings for the WebView2 COM APIs

Comments

Do not rerun llama_cpp_sys build on changed header files

Fix a race condition while using `llama_context`; update `llama.cpp`; update README.md

Prepare for publishing to crates.io

Releases(llama_cpp_sys-v0.2.2)

llama_cpp_sys-v0.2.2(Nov 8, 2023)

Bug Fixes

Commit Statistics

Commit Details

llama_cpp_sys-v0.2.1(Nov 8, 2023)

Chore

Bug Fixes

Commit Statistics

Commit Details

llama_cpp-v0.1.3(Nov 8, 2023)

New Features

Other

Commit Statistics

Commit Details

llama_cpp-v0.1.2(Nov 8, 2023)

New Features

Commit Statistics

Commit Details

llama_cpp-v0.1.1(Nov 8, 2023)

Chore

New Features

Bug Fixes

Commit Statistics

Commit Details

llama_cpp_sys-v0.2.0(Oct 25, 2023)

Chore

Bug Fixes

Other

Commit Statistics

Commit Details

llama_cpp-v0.1.0(Oct 25, 2023)

Chore

Chore

Bug Fixes

Other

Commit Statistics

Commit Details

Owner

Binedge.ai

A rusty interface to llama.cpp for rust

High-level PortMidi bindings and wrappers for Rust

A Discord bot, written in Rust, that generates responses using the LLaMA language model.

A Discord bot, written in Rust, that generates responses using the LLaMA language model.

Run LLaMA inference on CPU, with Rust 🦀🚀🦙

Run LLaMA inference on CPU, with Rust 🦀🚀🦙

A mimimal Rust implementation of Llama.c

OpenAI compatible API for serving LLAMA-2 model

A high-level Rust crate around the Discord API, aimed to be easy and straight-forward to use.

A collection of compilers based around compiling a high level language to a Brainfuck dialect.