Table of Contents
A rust interface for the OpenAI API and Llama.cpp ./server API
- A unified API for testing and integrating OpenAI and HuggingFace LLM models.
- Load models from HuggingFace with just a URL.
- Uses Llama.cpp server API rather than bindings, so as long as the Llama.cpp server API remains stable this project will remain usable.
- Prebuilt agents - not chatbots - to unlock the true power of LLMs.
Easily switch between models and APIs
// Use an OpenAI model
let llm_definition = LlmDefinition::OpenAiLlm(OpenAiLlmModels::Gpt35Turbo)
// Or use a model from hugging face
let zephyr_7b_chat = LlamaLlmModel::new(
url,
LlamaPromptFormat::Mistral7BChat,
Some(2000), // Max tokens for model AKA context size
);
let response = basic_text_gen::generate(
&LlmDefinition::LlamaLlm(zephyr_7b_chat),
Some("Howdy!"),
)
.await?;
eprintln!(response)
Get deterministic responses from LLMs
if !boolean_classifier::classify(
llm_definition,
Some(hopefully_a_list),
Some("Is the attached feature a list of content split into discrete entries?"),
)
.await?
{
panic!("{}, was not properly split into a list!", hopefully_a_list)
}
Dependencies
async-openai is used to interact with the OpenAI API. A modifed version of the async-openai crate is used for the Llama.cpp server. If you just need an OpenAI API interface, I suggest using the async-openai crate.
Hugging Face's rust client is used for model downloads from the huggingface hub.
Getting Started
Step-by-step guide
- Clone repo:
git clone https://github.com/ShelbyJenkins/llm_client.git
cd llm_client
-
Optional: Build devcontainer from
llm_client/.devcontainer/devcontainer.json
This will build out a dev container with nvidia dependencies installed. -
Add llama.cpp:
git submodule init
git submodule update
- Build llama.cpp ( This is dependent on your hardware. Please see full instructions here):
// Example build for nvidia gpus
cd llm_client/src/providers/llama_cpp/llama_cpp
make LLAMA_CUBLAS=1
- Test llama.cpp ./server
cargo run -p llm_client --bin server_runner start --model_url "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/blob/main/mistral-7b-instruct-v0.2.Q8_0.gguf"
This will download and load the given model, and then start the server.
When you see llama server listening at http://localhost:8080
, you can load the llama.cpp UI in your browser.
Stop the server with cargo run -p llm_client --bin server_runner stop
.
- Using OpenAi: Add a
.env
file in the llm_client dir with the varOPENAI_API_KEY=<key>
Examples
Roadmap
- Automate starting the llama.cpp with specified model
- Handle the various prompt formats of LLM models more gracefully
- Unit tests
- Add additional classifier agents:
- many from many
- one from many
- Implement all openai functionality with llama.cpp
- More external apis (claude/etc)
Contributing
This is my first Rust crate. All contributions or feedback is more than welcomed!
License
Distributed under the MIT License. See LICENSE.txt
for more information.
Contact
Shelby Jenkins - Here or Linkedin