pyke Diffusers is a modular Rust library for optimized Stable Diffusion inference 🔮

Overview
pyke Diffusers

pyke Diffusers is a modular Rust library for pretrained diffusion model inference to generate images, videos, or audio, using ONNX Runtime as a backend for extremely optimized generation on both CPU & GPU.

Prerequisites

You'll need Rust v1.62.1+ to use pyke Diffusers.

  • If using CPU: recent (no earlier than Haswell/Zen) x86-64 CPU for best results. ARM64 supported but not recommended. For acceleration, see notes for OpenVINO, oneDNN, ACL, SNPE
  • If using CUDA: CUDA v11.x, cuDNN v8.2.x more info
  • If using TensorRT: CUDA v11.x, TensorRT v8.4 more info
  • If using ROCm: ROCm v5.2 more info
  • If using DirectML: DirectX 12 compatible GPU, Windows 10 v1903+ more info

Only generic CPU, CUDA, and TensorRT have prebuilt binaries available. Other execution providers will require you to manually build them; see the ONNX Runtime docs for more info. Additionally, you'll need to make ml2 link to your custom-built binaries.

LMS notes

Note: By default, the LMS scheduler is not enabled, and this section can simply be skipped.

If you plan to enable the all-schedulers or scheduler-lms feature, you will need to install binaries for the GNU Scientific Library. See the installation instructions for rust-GSL to set up GSL.

Installation

[dependencies]
pyke-diffusers = "0.1"
# if you'd like to use CUDA:
pyke-diffusers = { version = "0.1", features = [ "cuda" ] }

The default features enable some commonly used schedulers and pipelines.

Usage

use pyke_diffusers::{Environment, EulerDiscreteScheduler, StableDiffusionOptions, StableDiffusionPipeline, StableDiffusionTxt2ImgOptions};

let environment = Arc::new(Environment::builder().build()?);
let mut scheduler = EulerDiscreteScheduler::default();
let pipeline = StableDiffusionPipeline::new(&environment, "./stable-diffusion-v1-5", &StableDiffusionOptions::default())?;

let imgs = pipeline.txt2img("photo of a red fox", &mut scheduler, &StableDiffusionTxt2ImgOptions::default())?;
imgs[0].clone().into_rgb8().save("result.png")?;

See the docs for more detailed information & examples.

Converting models

To convert a model from a HuggingFace diffusers model:

  1. Create and activate a virtual environment.
  2. Install script requirements: python3 -m pip install -r requirements.txt
  3. If you are converting a model directly from HuggingFace, log in to HuggingFace Hub with huggingface-cli login - this can be skipped if you have the model on disk
  4. Convert your model with scripts/hf2pyke.py:
    • To convert a float32 model from HF (recommended for CPU): python3 scripts/hf2pyke.py runwayml/stable-diffusion-v1-5 ~/pyke-diffusers-sd15/
    • To convert a float32 model from disk: python3 scripts/hf2pyke.py ~/stable-diffusion-v1-5/ ~/pyke-diffusers-sd15/
    • To convert a float16 model from HF (recommended for GPU): python3 scritps/hf2pyke.py runwayml/stable-diffusion-v1-5@fp16 ~/pyke-diffusers-sd15-fp16/
    • To convert a float16 model from disk: python3 scripts/hf2pyke.py ~/stable-diffusion-v1-5-fp16/ ~/pyke-diffusers/sd15-fp16/ -f16

Float16 models are faster on GPUs, but are not hardware-independent (due to an ONNX Runtime issue). Float16 models must be converted on the hardware they will be run on. Float32 models are hardware-independent, but are recommended only for x86 CPU inference or older NVIDIA GPUs.

ONNX Runtime binaries

On Windows (or other platforms), you may want to copy the ONNX Runtime dylibs to the target folder by enabling the onnx-copy-dylibs Cargo feature.

When running the examples in this repo on Windows, you'll need to also manually copy the dylibs from target/debug/ to target/debug/examples/ on first run. You'll also need to copy the dylibs to target/debug/deps/ if your project uses pyke Diffusers in a Cargo test.

Comments
  • low ram for model conversion

    low ram for model conversion

    File "/home/CS/Documents/test/rust/venv/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 627, in load_checkpoint_in_model
    raise ValueError(
    ValueError: At least one of the model submodule will be offloaded to disk, please pass along an `offload_folder`.
    
    enhancement p: medium 
    opened by ClashSAN 9
  • Cannot Create pipeline.

    Cannot Create pipeline.

    pipeline::new return an error when I try to create pipeline. the error is "invalid type: map, expected u32". (maybe something about the "diffusers.json" file) I use the python script "h2pyke.py" convert a model. here is the "diffusers.json" in the converted model: (I format it use vscode.)

    {
        "pipeline": "stable-diffusion",
        "framework": "onnx",
        "tokenizer": {
            "type": "CLIPTokenizer",
            "path": "tokenizer.json",
            "model-max-length": 77,
            "bos-token": 49406,
            "eos-token": 49407
        },
        "feature-extractor": {
            "resample": 3,
            "size": {
                "shortest_edge": 224
            },
            "crop": [
                {
                    "height": 224,
                    "width": 224
                },
                {
                    "height": 224,
                    "width": 224
                }
            ],
            "crop-center": true,
            "rgb": true,
            "normalize": true,
            "resize": true,
            "image-mean": [
                0.48145466,
                0.4578275,
                0.40821073
            ],
            "image-std": [
                0.26862954,
                0.26130258,
                0.27577711
            ]
        },
        "text-encoder": {
            "path": "text_encoder.onnx"
        },
        "unet": {
            "path": "unet.onnx"
        },
        "vae": {
            "encoder": "vae_encoder.onnx",
            "decoder": "vae_decoder.onnx"
        },
        "hashes": {
            "text-encoder": "edf2f0ea013cd652905d00e50e89db1a",
            "unet": "ffbefc01413f4e2e1fb01ef8767d53d9",
            "vae-encoder": "9ceb9c41e1fee3c67f13e74c6e65570b",
            "vae-decoder": "a0c8bb5e2fe170feb647d38602db3a63",
            "safety-checker": "b88096c404ad8d6f52daba2995c87fe8"
        },
        "safety-checker": {
            "path": "safety_checker.onnx"
        }
    }
    

    My code is here:

    let env = match OrtEnvironment::builder().with_name("Stable Diffusion").build() {
                Ok(e) => Arc::new(e),
                Err(e) => {
                    simple_message_box::create_message_box(&format!("Cannot init ORT environment: {}", e), "Error");
                    panic!("Cannot init ORT environment: {}", e);
                }
            };
            let scheduler = match DDIMScheduler::stable_diffusion_v1_optimized_default() {
                Ok(s) => Arc::new(s),
                Err(e) => {
                    simple_message_box::create_message_box(&format!("Cannot init scheduler: {}", e), "Error");
                    panic!("Cannot init scheduler: {}", e);
                }
            };
            let pipeline = match StableDiffusionPipeline::new(
                &env,
                "./models/",
                &StableDiffusionOptions::default()
            ) {
                Ok(p) => p,
                Err(e) => {
                    simple_message_box::create_message_box(&format!("Cannot init pipeline: {}", e), "Error");
                    panic!("Cannot init pipeline: {}", e);
                }
                
            };
    

    And it panic with a message:"Cannot init pipeline: invalid type: map, expected u32".

    bug p: high 
    opened by sakura6264 3
  • not running on GPU

    not running on GPU

    When I am running script from https://github.com/pykeio/diffusers/blob/main/examples/stable-diffusion.rs . Its using up my CPU and system Memory. I don't see the GPU being utilized

    image

    I have the following setup:

    May Cargo.toml looks like this

    [dependencies]
    pyke-diffusers = "0.1"
    

    Output of nvidia-smi

    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 527.56       Driver Version: 527.56       CUDA Version: 12.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ... WDDM  | 00000000:0A:00.0  On |                  N/A |
    |  0%   50C    P5    38W / 220W |    988MiB /  8192MiB |     11%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    

    Output of nvcc -v

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2021 NVIDIA Corporation
    Built on Fri_Dec_17_18:28:54_Pacific_Standard_Time_2021
    Cuda compilation tools, release 11.6, V11.6.55
    Build cuda_11.6.r11.6/compiler.30794723_0
    

    From C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include\cudnn_version.h version is 8.6.0

    #define CUDNN_MAJOR 8
    #define CUDNN_MINOR 6
    #define CUDNN_PATCHLEVEL 0
    

    image

    I have successfully run CUDA workloads on this machine outside of this library

    opened by geocine 2
  • Error: invalid type: integer `224`, expected an array of length 2

    Error: invalid type: integer `224`, expected an array of length 2

    From documentation I converted the SD1.5 model as follows:

    To convert a float32 model from HF (recommended for CPU inference):

     python3 scripts/hf2pyke.py runwayml/stable-diffusion-v1-5 ~/pyke-diffusers-sd15/
    

    Then I ran the https://github.com/pykeio/diffusers/blob/main/examples/stable-diffusion.rs

    May Cargo.toml looks like this

    [dependencies]
    pyke-diffusers = "0.1"
    
    opened by geocine 2
  • RuntimeError:

    RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

    Ran the command from documentation

    To convert a float16 model from HF (recommended for GPU inference):

    python3 scripts/hf2pyke.py --fp16 runwayml/stable-diffusion-v1-5@fp16 ~/pyke-diffusers-sd15-fp16/
    
    opened by geocine 2
  • Port remaining schedulers

    Port remaining schedulers

    • [x] DDIM
    • [x] DDPM
    • [x] DPM-solver, multistep & singlestep (mostly complete: dynamic thresholding not implemented)
    • [x] Euler a
    • [x] Euler
    • [ ] IPNDM
    • [ ] k-DPM2-a
    • [ ] k-DPM2
    • [ ] Karras VE
    • [x] LMS
    • [ ] PNDM/PLMS
    • [ ] ~~SDE-VE~~
    • [ ] ~~SDE-VP~~
    • [ ] VQ Diffusion
    enhancement p: high 
    opened by sudo-carson 0
  • More pipelines

    More pipelines

    • [x] Stable Diffusion text-to-image
    • [x] "Safe" Stable Diffusion
    • [ ] Stable Diffusion image-to-image
    • [ ] Stable Diffusion inpainting
    • [ ] Stable Diffusion v2 upscaling
    • [ ] Stable Diffusion v2 image variation

    Pipelines to explore

    May or may not be implemented depending on complexity.

    • [ ] Stable Diffusion outpainting
    • [ ] Latent Diffusion
    • [ ] Latent Diffusion super-resolution
    • [ ] Unconditional Latent Diffusion
    • [ ] Cycle Diffusion
    • [ ] Versatile Diffusion
    • [ ] Dance Diffusion
    enhancement help wanted p: medium 
    opened by sudo-carson 0
Owner
pyke
pyke
Stable Diffusion XL ported to Rust's burn framework

Stable-Diffusion-XL-Burn Stable-Diffusion-XL-Burn is a Rust-based project which ports stable diffusion xl into the Rust deep learning framework burn.

null 194 Sep 4, 2023
Wonnx - a GPU-accelerated ONNX inference run-time written 100% in Rust, ready for the web

Wonnx is a GPU-accelerated ONNX inference run-time written 100% in Rust, ready for the web. Supported Platforms (enabled by wgpu) API Windows Linux &

WebONNX 354 Jan 6, 2023
Rust+OpenCL+AVX2 implementation of LLaMA inference code

RLLaMA RLLaMA is a pure Rust implementation of LLaMA large language model inference.. Supported features Uses either f16 and f32 weights. LLaMA-7B, LL

Mikko Juola 344 Apr 16, 2023
`dfx new --type=rust` + burn-rs MNIST web inference example

ic-mnist The frontend provides a canvas where users can draw a digit. The drawn digit is then sent to the backend canister running burn-rs for inferen

Marcin Nowak-Liebiediew 4 Jun 25, 2023
Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference

Sonos' Neural Network inference engine. This project used to be called tfdeploy, or Tensorflow-deploy-rust. What ? tract is a Neural Network inference

Sonos, Inc. 1.5k Jan 8, 2023
Orkhon: ML Inference Framework and Server Runtime

Orkhon: ML Inference Framework and Server Runtime Latest Release License Build Status Downloads Gitter What is it? Orkhon is Rust framework for Machin

Theo M. Bulut 129 Dec 21, 2022
Using OpenAI Codex's "davinci-edit" Model for Gradual Type Inference

OpenTau: Using OpenAI Codex for Gradual Type Inference Current implementation is focused on TypeScript Python implementation comes next Requirements r

Gamma Tau 11 Dec 18, 2022
Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Arthur·Thomas 13 Mar 10, 2022
Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Fareast This library is a rust port of spotify/annoy , currently only index serving is supported. It also provides FFI bindings for jvm, dotnet and da

Arthur·Thomas 13 Mar 10, 2022
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

Hugging Face 6.2k Jan 2, 2023
A stable, linearithmic sort in constant space written in Rust

A stable, linearithmic sort in constant space written in Rust. Uses the method described in "Fast Stable Merging And Sorting In Constant Extra Space"

Dylan MacKenzie 4 Mar 30, 2022
Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

Cleora Cleora is a genus of moths in the family Geometridae. Their scientific name derives from the Ancient Greek geo γῆ or γαῖα "the earth", and metr

Synerise 405 Dec 20, 2022
Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

Aviram Hassan 139 Dec 30, 2022
A Rust library with homemade machine learning models to classify the MNIST dataset. Built in an attempt to get familiar with advanced Rust concepts.

mnist-classifier Ideas UPDATED: Finish CLI Flags Parallelize conputationally intensive functions Class-based naive bayes README Image parsing Confusio

Neil Kaushikkar 0 Sep 2, 2021
Machine Learning library for Rust

rusty-machine This library is no longer actively maintained. The crate is currently on version 0.5.4. Read the API Documentation to learn more. And he

James Lucas 1.2k Dec 31, 2022
Rust library for Self Organising Maps (SOM).

RusticSOM Rust library for Self Organising Maps (SOM). Using this Crate Add rusticsom as a dependency in Cargo.toml [dependencies] rusticsom = "1.1.0"

Avinash Shenoy 26 Oct 17, 2022
Rust numeric library with R, MATLAB & Python syntax

Peroxide Rust numeric library contains linear algebra, numerical analysis, statistics and machine learning tools with R, MATLAB, Python like macros. W

Tae Geun Kim 351 Dec 29, 2022
A deep learning library for rust

Alumina An experimental deep learning library written in pure rust. Breakage expected on each release in the short term. See mnist.rs in examples or R

zza 95 Nov 30, 2022
Machine Learning Library for Rust

autograph Machine Learning Library for Rust undergoing maintenance Features Portable accelerated compute Run SPIR-V shaders on GPU's that support Vulk

null 223 Jan 1, 2023