A localized open-source AI server that is better than ChatGPT.

Overview

💯AI00 RWKV Server

All Contributors

English | 中文 | 日本語


AI00 RWKV Server is an inference API server based on the RWKV model.

It supports VULKAN inference acceleration and can run on all GPUs that support VULKAN. No need for Nvidia cards!!! AMD cards and even integrated graphics can be accelerated!!!

No need for bulky pytorch, CUDA and other runtime environments, it's compact and ready to use out of the box!

Compatible with OpenAI's ChatGPT API interface.

100% open source and commercially usable, under the MIT license.

If you are looking for a fast, efficient, and easy-to-use LLM API server, then AI00 RWKV Server is your best choice. It can be used for various tasks, including chatbots, text generation, translation, and Q&A.

Join the AI00 RWKV Server community now and experience the charm of AI!

QQ Group for communication: 30920262

💥Features

  • Based on the RWKV model, it has high performance and accuracy
  • Supports VULKAN inference acceleration, you can enjoy GPU acceleration without the need for CUDA! Supports AMD cards, integrated graphics, and all GPUs that support VULKAN
  • No need for bulky pytorch, CUDA and other runtime environments, it's compact and ready to use out of the box!
  • Compatible with OpenAI's ChatGPT API interface

⭕Usages

  • Chatbots
  • Text generation
  • Translation
  • Q&A
  • Any other tasks that LLM can do

👻Other

Installation, Compilation, and Usage

📦Direct Download and Installation

  1. Directly download the latest version from Release

  2. After downloading the model, place the model in the assets/models/ path, for example, assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st

  3. Run in the command line

    $ ./ai00_rwkv_server --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st
  4. Open the browser and visit the WebUI http://127.0.0.1:65530

📜Compile from Source Code

  1. Install Rust

  2. Clone this repository

    $ git clone https://github.com/cgisky1980/ai00_rwkv_server.git $ cd ai00_rwkv_server
  3. After downloading the model, place the model in the assets/models/ path, for example, assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st

  4. Compile

    $ cargo build --release
  5. After compilation, run

    $ cargo run --release -- --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st
  6. Open the browser and visit the WebUI http://127.0.0.1:65530

📝Supported Arguments

  • --model: Model path
  • --tokenizer: Tokenizer path
  • --port: Running port
  • --quant: Specify the number of quantization layers
  • --adepter: Adapter (GPU and backend) selection options

Example

The server listens on port 3000, loads the full-layer quantized (32 > 24) 0.4B model, and selects adapter 0 (to get the specific adapter number, you can first not add this parameter, and the program will enter the adapter selection page).

$ cargo run --release -- --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st --port 3000 --quant 32 --adepter 0

📙Currently Available APIs

The API service starts at port 65530, and the data input and output format follow the Openai API specification.

  • /v1/models
  • /models
  • /v1/chat/completions
  • /chat/completions
  • /v1/completions
  • /completions
  • /v1/embeddings
  • /embeddings

📙WebUI Screenshots

image

image

📝TODO List

  • Support for text_completions and chat_completions
  • Support for sse push
  • Add embeddings
  • Integrate basic front-end
  • Parallel inference via batch serve
  • Support for int8 quantization
  • Support for SpQR quantization
  • Support for LoRA model
  • Hot loading and switching of LoRA model

👥Join Us

We are always looking for people interested in helping us improve the project. If you are interested in any of the following, please join us!

  • 💀Writing code
  • 💬Providing feedback
  • 🔆Proposing ideas or needs
  • 🔍Testing new features
  • ✏Translating documentation
  • 📣Promoting the project
  • 🏅Anything else that would be helpful to us

No matter your skill level, we welcome you to join us. You can join us in the following ways:

  • Join our Discord channel
  • Join our QQ group
  • Submit issues or pull requests on GitHub
  • Leave feedback on our website

We can't wait to work with you to make this project better! We hope the project is helpful to you!

Thank you to these awesome individuals who are insightful and outstanding for their support and selfless dedication to the project

顾真牛
顾真牛

📖 💻 🖋 🎨 🧑‍🏫
研究社交
研究社交

💻 💡 🤔 🚧 👀 📦
josc146
josc146

🐛 💻 🤔 🔧
l15y
l15y

🔧 🔌 💻

Stargazers over time

Stargazers over time

Comments
  • The server crashed

    The server crashed

    Hi, Since the precompiled version doesn't work as mentioned in https://github.com/cgisky1980/ai00_rwkv_server/issues/15, I compiled the source code and run it, but it crashed after I submit any text in web user interface. Here is the messages in the console:

    $ cargo run --release -- --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st 
        Finished release [optimized] target(s) in 1.53s
         Running `target/release/ai00_server --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st`
    2023-08-01T01:24:06.717Z WARN  [wgpu_core::instance] Missing downlevel flags: DownlevelFlags(SURFACE_VIEW_FORMATS)
    The underlying API or device in use does not support enough features to be a fully compliant implementation of WebGPU. A subset of the features can still be used. If you are running this program on native and not in a browser and wish to limit the features you use to the supported subset, call Adapter::downlevel_properties or Device::downlevel_properties to get a listing of the features the current platform supports.
    WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
    2023-08-01T01:24:06.768Z INFO  [ai00_server] AdapterInfo {
        name: "llvmpipe (LLVM 12.0.0, 256 bits)",
        vendor: 65541,
        device: 0,
        device_type: Cpu,
        driver: "llvmpipe",
        driver_info: "Mesa 21.2.6 (LLVM 12.0.0)",
        backend: Vulkan,
    }
    2023-08-01T01:24:08.293Z INFO  [ai00_server] ModelInfo {
        num_layers: 24,
        num_emb: 1024,
        num_vocab: 65536,
    }
    2023-08-01T01:24:08.554Z INFO  [ai00_server] server started at http://0.0.0.0:65530
    2023-08-01T01:24:21.065Z TRACE [ai00_server] Sampler {
        top_p: 0.5,
        temperature: 1.0,
        presence_penalty: 0.3,
        frequency_penalty: 0.3,
    }
    2023-08-01T01:24:21.065Z TRACE [ai00_server] state cache miss
    2023-08-01T01:24:21.065Z TRACE [ai00_server] User: 现在的时间是2023 8月 1日 星期二 早上
    
    Assistant: 好的我知道了!
    
    User: 你是谁?
    
    Assistant: Hello, I am your AI assistant. If you have any questions or instructions, please let me know!
    
    User: test
    
    Assistant:
    Segmentation fault
    
    
    opened by cahya-wirawan 3
  • 前端界面文字标准化的一些意见和建议

    前端界面文字标准化的一些意见和建议

    H311ORMG1LS1Z(TMGK3ZDNQ

    1. 建议使用标准现代汉语“愉快地聊天吧!”
    2. 此处可使用/models接口获取真实模型名称
    3. 此处也可标准化成"Max Tokens"、"Top P"、“Presence Penalty”以及"Frequency Penalty",体现专业性
    good first issue 
    opened by cryscan 3
  • `GLIBC_2.33' not found

    `GLIBC_2.33' not found

    Hi, I tried the compiled version and run ./ai00_server --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st but I get following error message:

    $ ./ai00_server --model assets/models/RWKV-4-World-0.4B-v1-20230529-ctx4096.st
    ./ai00_server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by ./ai00_server)
    ./ai00_server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by ./ai00_server)
    ./ai00_server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by ./ai00_server)
    
    opened by cahya-wirawan 0
  • wgpu库进行计算着色器(Compute Shader)编译时出现了错误

    wgpu库进行计算着色器(Compute Shader)编译时出现了错误

    2023-08-18T04:24:01.957Z WARN [wgpu::backend::direct] Shader translation error for stage ShaderStages(COMPUTE): HLSL: Unimplemented("write_expr_math Unpack4x8unorm") 2023-08-18T04:24:01.958Z WARN [wgpu::backend::direct] Please report it to https://github.com/gfx-rs/naga 2023-08-18T04:24:01.958Z ERROR [wgpu::backend::direct] Handling wgpu errors as fatal by default thread 'main' panicked at 'wgpu error: Validation Error

    Caused by: In Device::create_compute_pipeline note: label = matmul Internal error: HLSL: Unimplemented("write_expr_math Unpack4x8unorm")

    ', C:\Users\runneradmin.cargo\registry\src\index.crates.io-6f17d22bba15001f\wgpu-0.16.3\src\backend\direct.rs:3019:5 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

    以下为openai给出的解释: 根据您提供的错误消息,看起来是在使用wgpu库进行计算着色器(Compute Shader)编译时出现了错误。

    错误消息中指出了一个HLSL(High-Level Shading Language)编译错误,具体是关于未实现的功能:"Unimplemented("write_expr_math Unpack4x8unorm")"。这可能表示您的计算着色器中使用了HLSL中尚未实现的操作或函数。

    解决此问题的步骤如下:

    1. 报告问题:根据错误消息中的提示,您可以将此问题报告给wgpu库的开发者,以便他们了解到该功能尚未实现,并可能提供修复或解决方案。您可以访问https://github.com/gfx-rs/naga并提交一个新的issue。

    2. 检查计算着色器代码:检查您的计算着色器代码,特别关注使用了"Unpack4x8unorm"操作或函数的地方。如果可能,尝试使用其他可用的操作或函数替代。

    3. 版本更新:确保您正在使用wgpu库的最新版本。可能已经有人报告了这个问题,并且在更新的版本中可能已经得到修复。

    4. 回溯信息:根据错误消息中的提示,您可以设置环境变量RUST_BACKTRACE为1,以显示完整的回溯信息。这可能会提供更多关于错误发生位置的信息,帮助您进行故障排除。

    我已经是最新驱动了,请问该怎么解决😔

    opened by TralMac 1
  • GPTQ quantification support.

    GPTQ quantification support.

    For some low performance devices, using GPTQ for 4Bit quantization is very important.

    Do you have any plans to add GPTQ quantification support?

    Here is a GPTQ quantification project that can be run on a computer: https://github.com/3outeille/GPTQ-for-RWKV

    opened by Pevernow 1
  • Mobile platform support?

    Mobile platform support?

    This project seems to use Vulkan to provide hardware acceleration, which means it can be easily ported to mobile platforms. Do you currently support mobile platforms, or do you have plans to support them in the future?

    opened by Pevernow 2
  • Add a text conversion API  增加文本转换API

    Add a text conversion API 增加文本转换API

    增加文本转换API,把常见的docx、pdf等文档转化为txt, 方便后续 对文档对话、个人知识库等功能的开发。

    可以说一下 还有那些文档格式的支持, 欢迎Pr

    Add a text conversion API to convert docx and PDF documents into txt,

    Facilitate the development of functions such as document chat and personal knowledge base in the future.

    Needs Pr

    good first issue 
    opened by cgisky1980 0
Releases(v0.1.11)
Owner
顾真牛
顾真牛
H2O Open Source Kubernetes operator and a command-line tool to ease deployment (and undeployment) of H2O open-source machine learning platform H2O-3 to Kubernetes.

H2O Kubernetes Repository with official tools to aid the deployment of H2O Machine Learning platform to Kubernetes. There are two essential tools to b

H2O.ai 16 Nov 12, 2022
ChatGPT-rs is a lightweight ChatGPT client with a graphical user interface, written in Rust

ChatGPT-rs is a lightweight ChatGPT client with a graphical user interface, written in Rust. It allows you to chat with OpenAI's GPT models through a simple and intuitive interface.

null 7 Apr 2, 2023
Over-simplified, featherweight, open-source and easy-to-use authentication and authorization server.

concess ⚠️ Early Development: This is not production ready, yet. Do not use it for anything important. Introduction concess is a over-simplified, feat

Dustin Frisch 3 Nov 25, 2022
A `nix` and `nix-shell` wrapper for shells other than `bash`

nix-your-shell A nix and nix-shell wrapper for shells other than bash. nix develop and nix-shell use bash as the default shell, so nix-your-shell prin

Mercury 15 Apr 10, 2023
Scriptable tool to read and write UEFI variables from EFI shell. View, save, edit and restore hidden UEFI (BIOS) Setup settings faster than with the OEM menu forms.

UEFI Variable Tool (UVT) UEFI Variable Tool (UVT) is a command-line application that runs from the UEFI shell. It can be launched in seconds from any

null 4 Dec 11, 2023
More than safe rust abstractions over rytm-sys, an unofficial SDK for writing software for Analog Rytm running on firmware 1.70.

rytm-rs More than safe rust abstractions over rytm-sys, an unofficial SDK for writing software for Analog Rytm running on firmware 1.70. On top of CC

Ali Somay 5 Dec 22, 2023
An open source artifact manager. Written in Rust back end and an Vue front end to create a fast and modern experience

nitro_repo Nitro Repo is an open source free artifact manager. Written with a Rust back end and a Vue front end to create a fast and modern experience

Wyatt Jacob Herkamp 30 Dec 14, 2022
zigfi is an open-source stocks, commodities and cryptocurrencies price monitoring CLI app, written fully in Rust, where you can organize assets you're watching easily into watchlists for easy access on your terminal.

zigfi zigfi is an open-source stocks, commodities and cryptocurrencies price monitoring CLI app, written fully in Rust, where you can organize assets

Aldrin Zigmund Cortez Velasco 18 Oct 24, 2022
This utility traverses through your filesystem looking for open-source dependencies that are seeking donations by parsing README.md and FUNDING.yml files

This utility traverses through your filesystem looking for open-source dependencies that are seeking donations by parsing README.md and FUNDING.yml files

Mufeed VH 38 Dec 30, 2022
A blazing fast command line license generator for your open source projects written in Rust🚀

Overview This is a blazing fast ⚡ , command line license generator for your open source projects written in Rust. I know that GitHub

Shoubhit Dash 43 Dec 30, 2022
An open source, programmed in rust, privacy focused tool for reading programming resources (like stackoverflow) fast, efficient and asynchronous from the terminal.

Falion An open source, programmed in rust, privacy focused tool for reading programming resources (like StackOverFlow) fast, efficient and asynchronou

Obscurely 17 Dec 20, 2022
Open-source compiler for the Papyrus scripting language of Bethesda games.

Open Papyrus Compiler This project is still WORK IN PROGRESS. If you have any feature requests, head over to the Issues tab and describe your needs. Y

erri120 22 Dec 5, 2022
A modern high-performance open source file analysis library for automating localization tasks

?? Filecount Filecount is a modern high-performance open source file analysis library for automating localization tasks. It enables you to add file an

Babblebase 4 Nov 11, 2022
Horus is an open source tool for running forensic and administrative tasks at the kernel level using eBPF, a low-overhead in-kernel virtual machine, and the Rust programming language.

Horus Horus is an open-source tool for running forensic and administrative tasks at the kernel level using eBPF, a low-overhead in-kernel virtual mach

null 4 Dec 15, 2022
Open source email client written in Rust and Dioxus. Under 🏗️

Blazemail A full-featued, beautiful, mail client that doesn't suck. Works on mac, windows, linux, mobile, web, etc. Features, status Blazemail is curr

Jon Kelley 13 Dec 19, 2022
Open-source Fortnite launcher, built in Rust.

Instigator Instigator is a basic command-line Fortnite launcher I've been working on for the last day and a bit. It is extremely basic. It injects con

jacksta 9 Feb 3, 2023
Open-source Rust framework for building event-driven live-trading & backtesting systems

Barter Barter is an open-source Rust framework for building event-driven live-trading & backtesting systems. Algorithmic trade with the peace of mind

Barter 157 Feb 18, 2023
A free and open-source DNA Sequencing/Visualization software for bioinformatics research.

DNArchery ?? A free and open-source cross-platform DNA Sequencing/Visualization Software for bioinformatics research. A toolkit for instantly performi

null 21 Mar 26, 2023
botwork is a single-binary, generic and open-source automation framework written in Rust for acceptance testing & RPA

botwork botwork is a single-binary, generic and open-source automation framework written in Rust for acceptance testing, acceptance test driven develo

Nitimis 8 Apr 17, 2023