A Rust LLaMA project to load, serve and extend LLM models.
- Support both GGML and HF(HuggingFace) models
- Support a standard web server for inference
- Support download HF models through hf-hub
- Support Nvidia GPUs
- Support AMD GPUs
- Support macOS, Linux, Windows, etc.
- OpenAI compatible API spec
- Support more GPUs
- Support LPCP(Large-scale Parallel Central Processing)
OpenLLaMA is licensed under the MIT. For detail see LICENSE.
The master branch may be in an unstable or even broken state during development. Please use releases instead of the master branch in order to get a stable set of binaries.