pyke Diffusers is a modular Rust library for optimized Stable Diffusion inference 🔮

pyke

Last update: Jan 5, 2023

Related tags

Overview

pyke Diffusers is a modular Rust library for pretrained diffusion model inference to generate images, videos, or audio, using ONNX Runtime as a backend for extremely optimized generation on both CPU & GPU.

Prerequisites

You'll need Rust v1.62.1+ to use pyke Diffusers.

If using CPU: recent (no earlier than Haswell/Zen) x86-64 CPU for best results. ARM64 supported but not recommended. For acceleration, see notes for OpenVINO, oneDNN, ACL, SNPE
If using CUDA: CUDA v11.x, cuDNN v8.2.x ^{more info}
If using TensorRT: CUDA v11.x, TensorRT v8.4 ^{more info}
If using ROCm: ROCm v5.2 ^{more info}
If using DirectML: DirectX 12 compatible GPU, Windows 10 v1903+ ^{more info}

Only generic CPU, CUDA, and TensorRT have prebuilt binaries available. Other execution providers will require you to manually build them; see the ONNX Runtime docs for more info. Additionally, you'll need to make ml2 link to your custom-built binaries.

LMS notes

Note: By default, the LMS scheduler is not enabled, and this section can simply be skipped.

If you plan to enable the all-schedulers or scheduler-lms feature, you will need to install binaries for the GNU Scientific Library. See the installation instructions for rust-GSL to set up GSL.

Installation

[dependencies]
pyke-diffusers = "0.1"
# if you'd like to use CUDA:
pyke-diffusers = { version = "0.1", features = [ "cuda" ] }

The default features enable some commonly used schedulers and pipelines.

Usage

use pyke_diffusers::{Environment, EulerDiscreteScheduler, StableDiffusionOptions, StableDiffusionPipeline, StableDiffusionTxt2ImgOptions};

let environment = Arc::new(Environment::builder().build()?);
let mut scheduler = EulerDiscreteScheduler::default();
let pipeline = StableDiffusionPipeline::new(&environment, "./stable-diffusion-v1-5", &StableDiffusionOptions::default())?;

let imgs = pipeline.txt2img("photo of a red fox", &mut scheduler, &StableDiffusionTxt2ImgOptions::default())?;
imgs[0].clone().into_rgb8().save("result.png")?;

See the docs for more detailed information & examples.

Converting models

To convert a model from a HuggingFace diffusers model:

Create and activate a virtual environment.
Install script requirements: python3 -m pip install -r requirements.txt
If you are converting a model directly from HuggingFace, log in to HuggingFace Hub with huggingface-cli login - this can be skipped if you have the model on disk
Convert your model with scripts/hf2pyke.py:
- To convert a float32 model from HF (recommended for CPU): python3 scripts/hf2pyke.py runwayml/stable-diffusion-v1-5 ~/pyke-diffusers-sd15/
- To convert a float32 model from disk: python3 scripts/hf2pyke.py ~/stable-diffusion-v1-5/ ~/pyke-diffusers-sd15/
- To convert a float16 model from HF (recommended for GPU): python3 scritps/hf2pyke.py runwayml/stable-diffusion-v1-5@fp16 ~/pyke-diffusers-sd15-fp16/
- To convert a float16 model from disk: python3 scripts/hf2pyke.py ~/stable-diffusion-v1-5-fp16/ ~/pyke-diffusers/sd15-fp16/ -f16

Float16 models are faster on GPUs, but are not hardware-independent (due to an ONNX Runtime issue). Float16 models must be converted on the hardware they will be run on. Float32 models are hardware-independent, but are recommended only for x86 CPU inference or older NVIDIA GPUs.

ONNX Runtime binaries

On Windows (or other platforms), you may want to copy the ONNX Runtime dylibs to the target folder by enabling the onnx-copy-dylibs Cargo feature.

When running the examples in this repo on Windows, you'll need to also manually copy the dylibs from target/debug/ to target/debug/examples/ on first run. You'll also need to copy the dylibs to target/debug/deps/ if your project uses pyke Diffusers in a Cargo test.

Comments

low ram for model conversion

File "/home/CS/Documents/test/rust/venv/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 627, in load_checkpoint_in_model
raise ValueError(
ValueError: At least one of the model submodule will be offloaded to disk, please pass along an `offload_folder`.

enhancement p: medium

opened by ClashSAN 9

Cannot Create pipeline.

pipeline::new return an error when I try to create pipeline. the error is "invalid type: map, expected u32". (maybe something about the "diffusers.json" file) I use the python script "h2pyke.py" convert a model. here is the "diffusers.json" in the converted model: (I format it use vscode.)

{
    "pipeline": "stable-diffusion",
    "framework": "onnx",
    "tokenizer": {
        "type": "CLIPTokenizer",
        "path": "tokenizer.json",
        "model-max-length": 77,
        "bos-token": 49406,
        "eos-token": 49407
    },
    "feature-extractor": {
        "resample": 3,
        "size": {
            "shortest_edge": 224
        },
        "crop": [
            {
                "height": 224,
                "width": 224
            },
            {
                "height": 224,
                "width": 224
            }
        ],
        "crop-center": true,
        "rgb": true,
        "normalize": true,
        "resize": true,
        "image-mean": [
            0.48145466,
            0.4578275,
            0.40821073
        ],
        "image-std": [
            0.26862954,
            0.26130258,
            0.27577711
        ]
    },
    "text-encoder": {
        "path": "text_encoder.onnx"
    },
    "unet": {
        "path": "unet.onnx"
    },
    "vae": {
        "encoder": "vae_encoder.onnx",
        "decoder": "vae_decoder.onnx"
    },
    "hashes": {
        "text-encoder": "edf2f0ea013cd652905d00e50e89db1a",
        "unet": "ffbefc01413f4e2e1fb01ef8767d53d9",
        "vae-encoder": "9ceb9c41e1fee3c67f13e74c6e65570b",
        "vae-decoder": "a0c8bb5e2fe170feb647d38602db3a63",
        "safety-checker": "b88096c404ad8d6f52daba2995c87fe8"
    },
    "safety-checker": {
        "path": "safety_checker.onnx"
    }
}

My code is here:

let env = match OrtEnvironment::builder().with_name("Stable Diffusion").build() {
            Ok(e) => Arc::new(e),
            Err(e) => {
                simple_message_box::create_message_box(&format!("Cannot init ORT environment: {}", e), "Error");
                panic!("Cannot init ORT environment: {}", e);
            }
        };
        let scheduler = match DDIMScheduler::stable_diffusion_v1_optimized_default() {
            Ok(s) => Arc::new(s),
            Err(e) => {
                simple_message_box::create_message_box(&format!("Cannot init scheduler: {}", e), "Error");
                panic!("Cannot init scheduler: {}", e);
            }
        };
        let pipeline = match StableDiffusionPipeline::new(
            &env,
            "./models/",
            &StableDiffusionOptions::default()
        ) {
            Ok(p) => p,
            Err(e) => {
                simple_message_box::create_message_box(&format!("Cannot init pipeline: {}", e), "Error");
                panic!("Cannot init pipeline: {}", e);
            }
            
        };

And it panic with a message:"Cannot init pipeline: invalid type: map, expected u32".

bug p: high

opened by sakura6264 3

not running on GPU

When I am running script from https://github.com/pykeio/diffusers/blob/main/examples/stable-diffusion.rs . Its using up my CPU and system Memory. I don't see the GPU being utilized

I have the following setup:

May Cargo.toml looks like this

[dependencies]
pyke-diffusers = "0.1"

Output of nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 527.56       Driver Version: 527.56       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:0A:00.0  On |                  N/A |
|  0%   50C    P5    38W / 220W |    988MiB /  8192MiB |     11%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Output of nvcc -v

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Fri_Dec_17_18:28:54_Pacific_Standard_Time_2021
Cuda compilation tools, release 11.6, V11.6.55
Build cuda_11.6.r11.6/compiler.30794723_0

From C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include\cudnn_version.h version is 8.6.0

#define CUDNN_MAJOR 8
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 0

I have successfully run CUDA workloads on this machine outside of this library

opened by geocine 2

Error: invalid type: integer `224`, expected an array of length 2
From documentation I converted the SD1.5 model as follows:

To convert a float32 model from HF (recommended for CPU inference):

python3 scripts/hf2pyke.py runwayml/stable-diffusion-v1-5 ~/pyke-diffusers-sd15/

Then I ran the https://github.com/pykeio/diffusers/blob/main/examples/stable-diffusion.rs

May Cargo.toml looks like this

[dependencies] pyke-diffusers = "0.1"
opened by geocine 2
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
Ran the command from documentation

To convert a float16 model from HF (recommended for GPU inference):

python3 scripts/hf2pyke.py --fp16 runwayml/stable-diffusion-v1-5@fp16 ~/pyke-diffusers-sd15-fp16/
opened by geocine 2
Port remaining schedulers
[x] DDIM

[x] DDPM

[x] DPM-solver, multistep & singlestep (mostly complete: dynamic thresholding not implemented)

[x] Euler a

[x] Euler

[ ] IPNDM

[ ] k-DPM2-a

[ ] k-DPM2

[ ] Karras VE

[x] LMS

[ ] PNDM/PLMS

[ ] ~~SDE-VE~~

[ ] ~~SDE-VP~~

[ ] VQ Diffusion

enhancement p: high
opened by sudo-carson 0
More pipelines
[x] Stable Diffusion text-to-image

[x] "Safe" Stable Diffusion

[ ] Stable Diffusion image-to-image

[ ] Stable Diffusion inpainting

[ ] Stable Diffusion v2 upscaling

[ ] Stable Diffusion v2 image variation

Pipelines to explore

May or may not be implemented depending on complexity.

[ ] Stable Diffusion outpainting

[ ] Latent Diffusion

[ ] Latent Diffusion super-resolution

[ ] Unconditional Latent Diffusion

[ ] Cycle Diffusion

[ ] Versatile Diffusion

[ ] Dance Diffusion

enhancement help wanted p: medium
opened by sudo-carson 0

Owner

pyke

GitHub

Stable Diffusion XL ported to Rust's burn framework

Stable-Diffusion-XL-Burn Stable-Diffusion-XL-Burn is a Rust-based project which ports stable diffusion xl into the Rust deep learning framework burn.

194 Sep 4, 2023

Wonnx - a GPU-accelerated ONNX inference run-time written 100% in Rust, ready for the web

Wonnx is a GPU-accelerated ONNX inference run-time written 100% in Rust, ready for the web. Supported Platforms (enabled by wgpu) API Windows Linux &

354 Jan 6, 2023

Rust+OpenCL+AVX2 implementation of LLaMA inference code

RLLaMA RLLaMA is a pure Rust implementation of LLaMA large language model inference.. Supported features Uses either f16 and f32 weights. LLaMA-7B, LL

344 Apr 16, 2023

`dfx new --type=rust` + burn-rs MNIST web inference example

ic-mnist The frontend provides a canvas where users can draw a digit. The drawn digit is then sent to the backend canister running burn-rs for inferen

4 Jun 25, 2023

Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference

Sonos' Neural Network inference engine. This project used to be called tfdeploy, or Tensorflow-deploy-rust. What ? tract is a Neural Network inference

1.5k Jan 8, 2023

Orkhon: ML Inference Framework and Server Runtime

Orkhon: ML Inference Framework and Server Runtime Latest Release License Build Status Downloads Gitter What is it? Orkhon is Rust framework for Machin

129 Dec 21, 2022

Using OpenAI Codex's "davinci-edit" Model for Gradual Type Inference

OpenTau: Using OpenAI Codex for Gradual Type Inference Current implementation is focused on TypeScript Python implementation comes next Requirements r

11 Dec 18, 2022

Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

13 Mar 10, 2022

Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Fareast This library is a rust port of spotify/annoy , currently only index serving is supported. It also provides FFI bindings for jvm, dotnet and da

13 Mar 10, 2022

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

6.2k Jan 2, 2023

A stable, linearithmic sort in constant space written in Rust

A stable, linearithmic sort in constant space written in Rust. Uses the method described in "Fast Stable Merging And Sorting In Constant Extra Space"

4 Mar 30, 2022

Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

Cleora Cleora is a genus of moths in the family Geometridae. Their scientific name derives from the Ancient Greek geo γῆ or γαῖα "the earth", and metr

405 Dec 20, 2022

Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

139 Dec 30, 2022

A Rust library with homemade machine learning models to classify the MNIST dataset. Built in an attempt to get familiar with advanced Rust concepts.

mnist-classifier Ideas UPDATED: Finish CLI Flags Parallelize conputationally intensive functions Class-based naive bayes README Image parsing Confusio

0 Sep 2, 2021

pyke Diffusers is a modular Rust library for optimized Stable Diffusion inference 🔮

Related tags

Overview

Prerequisites

LMS notes

Installation

Usage

Converting models

ONNX Runtime binaries

Comments

low ram for model conversion

Cannot Create pipeline.

not running on GPU

Error: invalid type: integer `224`, expected an array of length 2

RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Port remaining schedulers

More pipelines

Pipelines to explore

Owner

pyke

Stable Diffusion XL ported to Rust's burn framework

Wonnx - a GPU-accelerated ONNX inference run-time written 100% in Rust, ready for the web

Rust+OpenCL+AVX2 implementation of LLaMA inference code

`dfx new --type=rust` + burn-rs MNIST web inference example

Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference

Orkhon: ML Inference Framework and Server Runtime

Using OpenAI Codex's "davinci-edit" Model for Gradual Type Inference

Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

Rust-port of spotify/annoy as a wrapper for Approximate Nearest Neighbors in C++/Python optimized for memory usage.

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

A stable, linearithmic sort in constant space written in Rust

Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

A Rust library with homemade machine learning models to classify the MNIST dataset. Built in an attempt to get familiar with advanced Rust concepts.

Machine Learning library for Rust

Rust library for Self Organising Maps (SOM).

Rust numeric library with R, MATLAB & Python syntax

A deep learning library for rust

Machine Learning Library for Rust