Rust library for integrating local LLMs (with llama.cpp) and external LLM APIs.

Shelby Jenkins

Last update: Dec 18, 2023

Related tags

Command-line rust ai openai llm llamacpp llama-cpp ggml local-llm

Overview

Table of Contents

About The Project
Getting Started
Roadmap
Contributing
License
Contact

A rust interface for the OpenAI API and Llama.cpp ./server API

A unified API for testing and integrating OpenAI and HuggingFace LLM models.
Load models from HuggingFace with just a URL.
Uses Llama.cpp server API rather than bindings, so as long as the Llama.cpp server API remains stable this project will remain usable.
Prebuilt agents - not chatbots - to unlock the true power of LLMs.

Easily switch between models and APIs

// Use an OpenAI model
let llm_definition = LlmDefinition::OpenAiLlm(OpenAiLlmModels::Gpt35Turbo)

// Or use a model from hugging face
let zephyr_7b_chat = LlamaLlmModel::new(
    url, 
    LlamaPromptFormat::Mistral7BChat, 
    Some(2000), // Max tokens for model AKA context size
);

let response = basic_text_gen::generate(
        &LlmDefinition::LlamaLlm(zephyr_7b_chat),
        Some("Howdy!"),
    )
    .await?;
eprintln!(response)

Get deterministic responses from LLMs

if !boolean_classifier::classify(
        llm_definition,
        Some(hopefully_a_list),
        Some("Is the attached feature a list of content split into discrete entries?"),
    )
    .await?
    {
        panic!("{}, was not properly split into a list!", hopefully_a_list)
    }

Dependencies

async-openai is used to interact with the OpenAI API. A modifed version of the async-openai crate is used for the Llama.cpp server. If you just need an OpenAI API interface, I suggest using the async-openai crate.

Hugging Face's rust client is used for model downloads from the huggingface hub.

(back to top)

Getting Started

Step-by-step guide

Clone repo:

git clone https://github.com/ShelbyJenkins/llm_client.git
cd llm_client

Optional: Build devcontainer from llm_client/.devcontainer/devcontainer.json This will build out a dev container with nvidia dependencies installed.
Add llama.cpp:

git submodule init 
git submodule update

Build llama.cpp ( This is dependent on your hardware. Please see full instructions here):

// Example build for nvidia gpus
cd llm_client/src/providers/llama_cpp/llama_cpp
make LLAMA_CUBLAS=1

Test llama.cpp ./server

cargo run -p llm_client --bin server_runner start --model_url "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q8_0.gguf"

This will download and load the given model, and then start the server.

When you see llama server listening at http://localhost:8080, you can load the llama.cpp UI in your browser.

Stop the server with cargo run -p llm_client --bin server_runner stop.

Using OpenAi: Add a .env file in the llm_client dir with the var OPENAI_API_KEY=<key>

Examples

Roadmap

Automate starting the llama.cpp with specified model
Handle the various prompt formats of LLM models more gracefully
Unit tests
Add additional classifier agents:
- many from many
- one from many
Implement all openai functionality with llama.cpp
More external apis (claude/etc)

(back to top)

Contributing

This is my first Rust crate. All contributions or feedback is more than welcomed!

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Shelby Jenkins - Here or Linkedin

(back to top)

🛠 SmartGPT is an experimental program meant to provide LLMs

🛠 SmartGPT is an experimental program meant to provide LLMs (particularly GPT-3.5 and GPT-4) with the ability to complete complex tasks without user input by breaking them down into smaller problems, and collecting information using the internet and other external sources.

3 Feb 25, 2024

Use LLMs to generate strongly-typed values

Magic Instantiate Quickstart use openai_magic_instantiate::*; #[derive(MagicInstantiate)] struct Person { // Descriptions can help the LLM unders

4 Feb 20, 2024

Execution of and interaction with external processes and pipelines

subprocess The subprocess library provides facilities for execution of and interaction with external processes and pipelines, inspired by Python's sub

375 Jan 2, 2023

Terminal UI to chat with large language models (LLM) using different model backends, and integrations with your favourite editors!

Oatmeal Terminal UI to chat with large language models (LLM) using different model backends, and integrations with your favourite editors! Overview In

88 Dec 4, 2023

auto-rust is an experimental project that aims to automatically generate Rust code with LLM (Large Language Models) during compilation, utilizing procedural macros.

Auto Rust auto-rust is an experimental project that aims to automatically generate Rust code with LLM (Large Language Models) during compilation, util