Accel: GPGPU Framework for Rust

Overview

Accel: GPGPU Framework for Rust

pipeline status

crate crates.io docs.rs GitLab Pages
accel Crate docs.rs cargo-doc CUDA-based GPGPU framework
accel-core Crate docs.rs cargo-doc Helper for writing device code
accel-derive Crate docs.rs cargo-doc Procedural macro for generating kernel code

Requirements

minimum supported rust version

  • Minimum Supported Rust Version (MSRV) is 1.42.0
  • Install CUDA on your system
    • accel depends on CUDA Device APIs through rust-cuda/cuda-sys
    • accel does not depend on CUDA Runtime APIs. It means that a compiled binary requires only libcuda.so at runtime, which is far lighter than entire CUDA development toolkit.
  • Setup NVPTX target of Rust
curl -sSL https://gitlab.com/termoshtt/accel/raw/master/setup_nvptx_toolchain.sh | bash

Or, you can use docker container

Limitations

This project is still in early stage. There are several limitations as following:

Contribution

This project is developed on GitLab and mirrored to GitHub.

Sponsors

Links

Projects which accel depends on:

Related Projects:

Comments
  • accel and nvptx64 linker issues

    accel and nvptx64 linker issues

    Hi

    I can build nvptx64 sub-crate but cargo test fails on it due to "Link" issue (when compiling ptx-builder v0.1.0). I've LLVM-6.0, CUDA-8.0 installed, and tried to changed the linker in nvptx64-nvidia-cuda.json to llvm-linker(?!) but didn't help.

    Is it because of my gpu titan xp arch? or something else?

    Also when I try to cargo build the root accel crate the error is:

    error: linking with cc failed: exit code: 1 .... OMITTED .... note: /usr/bin/ld: cannot find -lcudart /usr/bin/ld: cannot find -lcublas collect2: error: ld returned 1 exit status

    But I have them in my /usr/local/cuda/lib64 and /usr/local/cuda/include/.

    Any idea how to resolve it? Thanks

    good first issue question 
    opened by ehsanmok 11
  • Build core with nvptx target

    Build core with nvptx target

    Experiment to use nvptx-enabled rustc at https://github.com/rust-accel/rust

    Features

    • Use libstd (based on wasm implementations)
    • Linking LLVM bitcodes by llvm-link in rustc
    • Compile into a linked LLVM bitcode

    Links

    • Enable nvptx target https://github.com/rust-accel/rust/pull/1
    • llvm-link in rustc https://github.com/rust-accel/rust/pull/3

    See also #31

    opened by termoshtt 6
  • Auto-generate CUDA bindings using Bindgen

    Auto-generate CUDA bindings using Bindgen

    This pull request addresses the issue that CUDA headers change between versions. Thus, over time, cuda-sys becomes incompatible with newer CUDA versions. See also #56.

    For example, Nvidia introduced a breaking API change going from version 9.2 to 10.0. They added a new field "uuid" to the beginning of cuda_sys::cudart::cudaDeviceProp, making all the fields after "uuid" invalid memory locations.

    We solve such compatibility issues by generating new bindings every time that cuda-sys is built. This ensures that we are source-compatible with the installed CUDA version.

    opened by LutzCle 5
  • core::slice in kernel

    core::slice in kernel

    This PR try to use core::slice in a kernel function

    #[kernel]
    #[depends("accel-core" = "0.2.0-alpha")]
    pub unsafe fn add(a: &[f64], b: &[f64], c: &mut [f64]) {
        let i = accel_core::index() as usize;
        let n = c.len();
        if i < n {
            c[i] = a[i] + b[i]
        }
    }
    

    Problem

    This kernel yields an error about PTX:

    called `Result::unwrap()` on an `Err` value: cudaError(CUDA_ERROR_INVALID_PTX)
    

    PTX-builder links rust code into LLVM byte-code (kernel.bc) while macro #[kernel] of accel-derive, and it will be loaded into accel::Module. "Invalid PTX" erro occurs on this loading. To get more information about how this PTX is invalid, I have tried to load PTX file manually:

    llc -mcpu=sm_20 kernel.bc -o kernel.ptx  # compile LLVM byptecode to PTX
    nvcc --cubin kernel.ptx  # Convert PTX to cubin
    

    llc works, but nvcc raises an error:

    ptxas fatal   : Unresolved extern function '_ZN4core9panicking18panic_bounds_check17h476c69b1512db11aE'
    

    This function seems to be core::panicking::panic_bounds_check, which should be contained in libcore. (#11 is opened here on Jan, and I still stay here)

    opened by termoshtt 3
  • Can't build the sample code

    Can't build the sample code

    I tried your accel as follows:

    https://github.com/zacky1972/test_cuda

    ( the code is from https://qiita.com/termoshtt/items/41b4e23c4ce5e822319c )

    My environment is as follows:

    Mac Pro (Mid 2010) Processor 2.8GHz Quad-Core Intel Xeon Memory 16GB NVIDIA GeForce GTX 680 2047 MB

    But, the following error occurred:

    $ cargo build --verbose
           Fresh unicode-xid v0.1.0
           Fresh unicode-xid v0.0.4
           Fresh quote v0.3.15
           Fresh glob v0.2.11
           Fresh proc-macro2 v0.4.3
       Compiling proc-macro2 v0.1.10
           Fresh synom v0.11.3
           Fresh quote v0.6.2
           Fresh serde v1.0.62
         Running `rustc --crate-name proc_macro2 ~/.cargo/registry/src/github.com-1ecc6299db9ec823/proc-macro2-0.1.10/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="nightly"' --cfg 'feature="unstable"' -C metadata=94d016008ad7fd85 -C extra-filename=-94d016008ad7fd85 --out-dir ~/github/test_cuda/target/debug/deps -L dependency=~/github/test_cuda/target/debug/deps --extern unicode_xid=~/github/test_cuda/target/debug/deps/libunicode_xid-2d487da95ef31948.rlib --cap-lints allow`
           Fresh syn v0.11.11
           Fresh cuda-sys v0.1.0
           Fresh syn v0.14.0
           Fresh toml v0.4.6
           Fresh procedurals v0.2.3
           Fresh serde_derive v1.0.62
           Fresh accel v0.1.0
    error[E0554]: #![feature] may not be used on the stable release channel
      --> ~/.cargo/registry/src/github.com-1ecc6299db9ec823/proc-macro2-0.1.10/src/lib.rs:22:34
       |
    22 | #![cfg_attr(feature = "nightly", feature(proc_macro))]
       |                                  ^^^^^^^^^^^^^^^^^^^^
    
    error: aborting due to previous error
    
    For more information about this error, try `rustc --explain E0554`.
    error: Could not compile `proc-macro2`.
    
    Caused by:
      process didn't exit successfully: `rustc --crate-name proc_macro2 ~/.cargo/registry/src/github.com-1ecc6299db9ec823/proc-macro2-0.1.10/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg feature="nightly" --cfg feature="unstable" -C metadata=94d016008ad7fd85 -C extra-filename=-94d016008ad7fd85 --out-dir ~/github/test_cuda/target/debug/deps -L dependency=~/github/test_cuda/target/debug/deps --extern unicode_xid=~/github/test_cuda/target/debug/deps/libunicode_xid-2d487da95ef31948.rlib --cap-lints allow` (exit code: 101)
    $
    
    bug 
    opened by zacky1972 2
  • nvptx toolchain

    nvptx toolchain

    nvptx crate uses only a little part of xargo, and development of xargo is in maintenance mode.

    There is two ways to enable nvptx64-nvidia-cuda target in rustc ref: How do I bootstrap Rust to cross-compile for a new platform?

    • out-of-tree: Add nvptx64-nvidia-cuda.json and compile libcore
      • xargo uses this way
    • in-tree: Add nvptx64_nvidia_cuda.rs into librustc_target
      • distribute as toolchain https://github.com/termoshtt/rust/pull/1

    TODO

    • [x] Add nvptx64-nvidia-cuda target into rustc
    • [x] Distribute them as nvptx toolchain
    • [x] replace xargo rustc and llvm-link by cargo +accel-nvptx build
    • [x] Rewrite nvptx crate to manage nvptx-toolchain and execute build.
    • [x] Switch to rust-accel/nvptx
    opened by termoshtt 2
  • Link PTX assemblas

    Link PTX assemblas

    rust2ptx crate compiles Rust to PTX using xargo rustc command, which compiles only lib.rs. To avoid undefined reference, accel-core uses #[inline(always)], but it should be bad :<

    opened by termoshtt 2
  • #[kernel] ought to work in the context of a module.

    #[kernel] ought to work in the context of a module.

    Passing a single function is a huge limitation, wrt to writing helper functions. One should be able to compile an entire module, and mark a single function (or multiple functions) as kernels.

    My proposal is as follow:

    #[kernel_module]
    mod add {
        use algebra::ComplexBigInt;
        #[kernel] 
        fn add(a: *mut ComplexBigInt) {..}
    }
    

    etc.

    opened by jon-chuang 1
  • crate has N accel functions == N+1 cargo builds when recompiling crate?

    crate has N accel functions == N+1 cargo builds when recompiling crate?

    I have a crate that has N rust accel functions.

    When I rebuild this crate, even if I don't touch those N functions, it seems to trigger N nvptx-accel rebuilds.

    Is this avoidable?

    enhancement 
    opened by zeroexcuses 1
  • Improve Error Message When Kernel Compilation Fails

    Improve Error Message When Kernel Compilation Fails

    The main effect of these changes is to provide the user with clearer error messages when the kernel compilation fails.

    There are also some changes to make the kernel compilation more robust; the arguments to llvm-link are now converted to absolute paths (which makes it possible to put the auto-generated kernel crate at a relative path rather than an absolute one), and the build will no longer fail if cargo fmt fails (since it's not really critical to compilation, a warning is printed instead).

    opened by bheisler 1
  • Independent accel-derive

    Independent accel-derive

    • Drop accel dependency from accel-derive by moving ptx_builder into accel-derive
    • new attribute #[build_path(..)] and #[build_path_home(...)] for #[kernel]
    • Move accel-examples crate into accel/examples
    opened by termoshtt 1
  • Compile entire crate by nvptx64-nvidia-cuda target

    Compile entire crate by nvptx64-nvidia-cuda target

    A proposition to resolve #61

    Motivation

    #[kernel] function cannot use any variable, function, and so on because it will be compiled as a stand alone device code.

    fn add_2(a: &mut f32) {
        *a = *a + 2.0;
    }
    
    #[kernel] 
    pub fn add_2_all(a: *mut f32, n: usize) {
        let i = ::accel_core::index();
        unsafe { add_2(&mut *a.offset(i)) };  // add_2 cannot find
    }
    

    Resolution

    Compile entire crate both as x86_64 and nvptx64 targets.

    • rust-ptx-linker will eliminate non-PTX kernel code which does not called from PTX kernel

    Problems

    • [ ] std must be compiled with nvptx
    • [ ] Compile flow (How to trigger nvptx build instead of proc-macro?)
    enhancement 
    opened by termoshtt 0
  • Restart accel

    Restart accel

    CI on GPU

    GitHub Actions with self-hosted runner works?? #9

    Stable Rust

    Stabilize Host-side code. Device-side code is out-of-scope because large amount of issues are remains for nvptx backend. See the link.

    • [x] proc-macro has been stabilized as #63
    • [x] cargo check runs on stable Rust #66

    Update dependencies

    • [x] syn, quote, proc-macro2 1.0 #67
    • [x] rust-cuda/cuda-sys 0.3.0 #66

    rust-ptx-linker

    Linker flavor using rust-ptx-linker has been merged into rustc https://github.com/rust-lang/rust/pull/57937

    • [x] Rewrite accel-derive with rust-ptx-linker #69
    • [x] archive nvptx crate

    Document

    • [ ] Needs a guide book #68

    Links

    rust-lang/rust

    • NVPTX backend metabug https://github.com/rust-lang/rust/issues/38789

    rust-cuda/wg

    • Overview about existing solutions and approaches https://github.com/rust-cuda/wg/issues/2
    • Are we CUDA yet? https://github.com/rust-cuda/wg/issues/16
    opened by termoshtt 0
  • missmatch in enum error values

    missmatch in enum error values

    Hi there,

    I am running a program which contains some error, and the Cuda runtime keeps returning MissingConfiguration

    I can see in the cuda_sys code that this enum value maps to 2, nevertheless MissingConfiguration in the Cuda runtime maps to 52, while 2 is

        /**
         * The API call failed because it was unable to allocate enough memory to
         * perform the requested operation.
         */
        cudaErrorMemoryAllocation             =      2,
    
        /**
         * The device function being invoked (usually via ::cudaLaunchKernel()) was not
         * previously configured via the ::cudaConfigureCall() function.
         */
        cudaErrorMissingConfiguration         =      52
    

    In the rust code there are no 52 nor 53 values:

        PeerAccessAlreadyEnabled = 50,
        PeerAccessNotEnabled = 51,
        DeviceAlreadyInUse = 54,
        ProfilerDisabled = 55,
    

    I am using cuda 10.1.

    • is this version supported?
    • is this a version missmatch?
    • Should I generate the bindings myself from the Cuda header?

    Cheers

    opened by LuisAyuso 0
  • include raw *.cu file

    include raw *.cu file

    Is there a way to mix rust-accel with raw *.cu files?

    I would like to write some device helper functions in *.cu, then call them from #[kernel] functions defined in rust-accel.

    Is there an example for how to do this?

    enhancement 
    opened by zeroexcuses 1
Releases(v0.1.0)
  • v0.1.0(Nov 20, 2017)

    Accel v0.1.0

    CUDA-based GPGPU framework for Rust

    Compile PTX Kernel from Rust using NVPTX backend of LLVM

    From 2017/2, we can compile Rust into a PTX assembla using NVPTX backend of LLVM as demonstrated in japaric/nvptx, however, it needs a complicated setting. Accel generates this setting automatically using procedural macro feature.

    proc-macro-attribute-based approach like futures-await

    accel-derive crate introduces a proc-macro #[kernel], which generates two functions. One is compiled into a PTX code, called "kernel", and the other calls it from CPU code using cudaLaunchKernel, called "caller". A support crate rust2ptx is created while the proc-macro at $HOME/.rust2ptx directory, and compiles the generated function (saved as lib.rs) using xargo. Generated PTX assembla is inserted into the source code of "caller" and thus embedded into the executable binary.

    Simple memory management using Unified Memory

    Unified memory (UM) is a feature introduced in CUDA6 and extended in CUDA8. We can manage memory without considering the memory is on CPU or GPU. Accel introduces accel::UVec struct which manage UM as RAII and you can use it as a slice through Deref.

    Source code(tar.gz)
    Source code(zip)
Owner
Toshiki Teramura
Ph.D. in Turbulence/Dynamical Systems
Toshiki Teramura
A high level, easy to use gpgpu crate based on wgpu

A high level, easy to use gpgpu crate based on wgpu. It is made for very large computations on powerful gpus

null 18 Nov 26, 2022
A Rust machine learning framework.

Linfa linfa (Italian) / sap (English): The vital circulating fluid of a plant. linfa aims to provide a comprehensive toolkit to build Machine Learning

Rust-ML 2.2k Jan 2, 2023
A fast, safe and easy to use reinforcement learning framework in Rust.

RSRL (api) Reinforcement learning should be fast, safe and easy to use. Overview rsrl provides generic constructs for reinforcement learning (RL) expe

Thomas Spooner 139 Dec 13, 2022
A Machine Learning Framework for High Performance written in Rust

polarlight polarlight is a machine learning framework for high performance written in Rust. Key Features TBA Quick Start TBA How To Contribute Contrib

Chris Ohk 25 Aug 23, 2022
Stable Diffusion v1.4 ported to Rust's burn framework

Stable-Diffusion-Burn Stable-Diffusion-Burn is a Rust-based project which ports the V1 stable diffusion model into the deep learning framework, Burn.

null 156 Aug 8, 2023
Stable Diffusion XL ported to Rust's burn framework

Stable-Diffusion-XL-Burn Stable-Diffusion-XL-Burn is a Rust-based project which ports stable diffusion xl into the Rust deep learning framework burn.

null 194 Sep 4, 2023
Open Machine Intelligence Framework for Hackers. (GPU/CPU)

Leaf • Introduction Leaf is a open Machine Learning Framework for hackers to build classical, deep or hybrid machine learning applications. It was ins

Autumn 5.5k Jan 1, 2023
Xaynet represents an agnostic Federated Machine Learning framework to build privacy-preserving AI applications.

xaynet Xaynet: Train on the Edge with Federated Learning Want a framework that supports federated learning on the edge, in desktop browsers, integrate

XayNet 196 Dec 22, 2022
Orkhon: ML Inference Framework and Server Runtime

Orkhon: ML Inference Framework and Server Runtime Latest Release License Build Status Downloads Gitter What is it? Orkhon is Rust framework for Machin

Theo M. Bulut 129 Dec 21, 2022
Tangram is an automated machine learning framework designed for programmers.

Tangram Tangram is an automated machine learning framework designed for programmers. Run tangram train to train a model from a CSV file on the command

Tangram 1.4k Dec 30, 2022
zenoh-flow aims at providing a zenoh-based data-flow programming framework for computations that span from the cloud to the device.

Eclipse Zenoh-Flow Zenoh-Flow provides a zenoh-based dataflow programming framework for computations that span from the cloud to the device. ⚠️ This s

null 35 Dec 12, 2022
High performance distributed framework for training deep learning recommendation models based on PyTorch.

PERSIA (Parallel rEcommendation tRaining System with hybrId Acceleration) is developed by AI platform@Kuaishou Technology, collaborating with ETH. It

null 340 Dec 30, 2022
Multi-agent (path finding) planning framework

multi-agent (path finding) planning framework Mapf is a (currently experimental) Rust library for multi-agent planning, with a focus on cooperative pa

null 17 Dec 5, 2022
Machine learning framework for building object trackers and similarity search engines

Similari Similari is a framework that helps build sophisticated tracking systems. The most frequently met operations that can be efficiently implement

In-Sight 71 Dec 28, 2022
Framework and Language for Neuro-Symbolic Programming

Scallop Scallop is a declarative language designed to support rich symbolic reasoning in AI applications. It is based on Datalog, a logic rule-based q

null 27 Dec 14, 2022
☁ Puff ☁ - The deep stack framework.

☁ Puff ☁ Python with an async runtime built-in Rust for GraphQL, ASGI, WSGI, Postgres, PubSub, Redis, Distributed Tasks, and HTTP2 Client. What is Puf

Kyle Hanson 290 Jan 8, 2023
ThRust is a software framework for thermodynamic and probabilistic computing.

ThRust ThRust is a Rust crate that provides a framework for thermodynamic and probabilistic computing. This package currently supports the following f

Chase Zimmerman 6 Apr 26, 2023
A Framework for Production-Ready Continuous Machine Learning

CML "Domain generalization is dead, Continuous Machine Learning lives forever." —— an iKun CML is a framework for production-ready continuous machine

Yu Sun 3 Aug 1, 2023
Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

Aviram Hassan 139 Dec 30, 2022