RustFFT is a high-performance FFT library written in pure Rust.

Elliott Mahler

Last update: Jan 9, 2023

Related tags

Machine learning RustFFT

Overview

RustFFT

RustFFT is a high-performance FFT library written in pure Rust. It can compute FFTs of any size, including prime-number sizes, in O(nlogn) time.

RustFFT supports the AVX instruction set for increased performance. No special code is needed to activate AVX: Simply plan a FFT using the FftPlanner on a machine that supports the avx and fma CPU features, and RustFFT will automatically switch to faster AVX-accelerated algorithms.

Unlike previous major versions, RustFFT 5.0 has several breaking changes compared to RustFFT 4.0. Check out the Upgrade Guide for a walkthrough of the changes RustFFT 5.0 requires.

Usage

// Perform a forward FFT of size 1234
use rustfft::{FftPlanner, num_complex::Complex};

let mut planner = FftPlanner::<f32>::new();
let fft = planner.plan_fft_forward(1234);

let mut buffer = vec![Complex{ re: 0.0, im: 0.0 }; 1234];

fft.process(&mut buffer);

Supported Rust Version

RustFFT requires rustc 1.37 or newer. Minor releases of RustFFT may upgrade the MSRV(minimum supported Rust version) to a newer version of rustc. However, if we need to increase the MSRV, the new Rust version must have been released at least six months ago.

Stability/Future Breaking Changes

Version 5.0 contains several breaking API changes. In the interest of stability, we're committing to making no more breaking changes for 3 years, aka until 2024.

This policy has one exception: We currently re-export pre-1.0 versions of the num-complex and num-traits crates. If those crates release new major versions, we will upgrade as soon as possible, which will require a major version change of our own. If this happens, the version increase of num-complex/num-traits will be the only breaking change.

License

Licensed under either of

Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Before submitting a PR, please make sure to run cargo fmt.

Comments

SSE support

This is still very much work in progress, but I thought I would show what I'm up to. Looking at the AVX code, it's quite big and many of the things there don't look like they would work well with the much smaller SSE instruction sets. Instead I started out from the scalar code, and have made a hybrid solution where I use SSE code whenever it's ready, and fall back to the scalar one otherwise. For now I have butterflies of length 2,3,4,5,8,16,32, and Radix4 working. The other algorightms will use these as inners, to also get some speedup. Surprisingly I get about the same speedup for f32 as for f64. For radix4, that varies between +30% to +90%. There may be more to gain by tweaking a bit here and there. For the individual butterflies the gain increases with the length, and for 16 and 32 they are about 3x as fast as the scalars (both for f32 and f64). I will continue now with implementing the rest of the butterflies. (I haven't abandoned the estimating planner, this got caught up in this when it started working well).

opened by HEnquist 24
Neon

Now that all the needed intrinsics are available, it's time for some Neon! This is basically a direct translation of the SSE code. Running on Cortex A72 (a Raspberry Pi4), I get a speedup of about 50% for f32, and none for f64. The Neon unit of the A72 can only execute a single 128-bit operation at a time. But it can do two f64 operations in parallel, meaning there isn't really any advantage to Neon here. More advanced cores should do better. To build this, you need a compiler that has this merged: https://github.com/rust-lang/rust/pull/89145 Reason is here: https://github.com/rust-lang/stdarch/issues/1220 Once the latest nightly can be used, I'll add a CI job.

opened by HEnquist 17

benchmarks

Hi! I am porting some audio stuff to R and decided to benchmark rustfft against some other known implementations.

I used extendr crate to port rustfft to R and the reticulate package to run numpy from R. To my surprise the torch (no GPU) implementation seems to do better than all the others for n > 5000 points. Do you have any idea why would Torch's implementation be faster than the others?

code:

library(bench)
library(torch)
library(reticulate)
library(ggplot2)

results <- bench::press(
    n = seq(25, 50000, 200),
    {
    x = rep(1+1i, n)
    py$x = x
    y = torch_tensor(x)
    py_run_string("import numpy as np", convert = FALSE)

    mark(
        rustfft::fft(x),
        stats::fft(x),
        torch::torch_fft_fft(y),
        py_eval("np.fft.fft(x)", convert = FALSE),
        iterations = 100,
        check = FALSE
        )
    }
)

ggplot(results) + geom_point(aes(x = n, y = median, color = as.character(expression))) +
    scale_colour_discrete(name = "implementation", labels=c("rustfft", "numpy.fft.fft (via reticulate)", "stats::fft", "torch::torch_fft_fft"))

opened by daniellga 11

support for higher precision types

Is it possible to support types with higher precision than f64? Converting from f64 to custom type might lose precision, as done in https://github.com/ejmahler/RustFFT/blob/master/src/twiddles.rs#L13.

I'm thinking only converting from usize to FftNum and using num::traits::{Float, FloatConsts} to compute pi and do arithmetic operations (e.g., sqrt, cos, sin).

opened by roosephu 11
Make AvxFftPlanner: Send+Sync
Why making FftPlanner: Send+Sync ?

Making FftPlanner: Send+Sync simplifies the usage of a single FftPlanner in a large chunk of an multi-threaded application, hence benefitting of the cache across threads. This allows for instance to put the FftPlanner in a global variable (e.g. using lazy_static!), which is may be useful even if the application is actually not multi-threaded.

There is currently no performance cost in making FftPlanner: Send+Sync, and other parts of the APIs already pay some cost in order to allow multi-threading (e.g. FftPlanner.plan_ftt that returns an Arc).

Changes

make AvxFftPlanner: Send+Sync by adding a Send+Sync bound on the trait AvxPlannerInternalAPI declaration.

make FftPlannerScalar: Send+Sync by replacing the internal Rc<Recipe> by an arena (implemented as a simple Vec, but this can be easily changed to any of the arena crates).

add a unit test that FftPlanner: Send+Sync
opened by cassiersg 11
Experiments with sse, very slow (what am I doing wrong?)

This PR isn't meant to be merged! I made a little attempt at using sse to speed up the length 4 butterfly. I made it as a quite ugly hack in the scalar butterfly, mostly for playing around. It works and gives correct results, but it's slow like a turtle! The best way to handle sse would probably be to start off from the avx code instead, but that thing is huge and I wanted to just play a little first. Is there anything obviously wrong with my code?

opened by HEnquist 9
get_inplace_scratch_len results compared to the input size
Hi,

Thanks for the rustfft update :). I'm wondering about the results of get_inplace_scratch_len. I'm not claiming it's wrong, but I would like to double check with you.

I naively would have expected that the required scratch size never exceeds the length of the input array, but it does for some values, e.g.:

let points = 5466; let mut input : Vec<Complex<f32>> = vec![Complex::zero(); points]; let fft = { let mut planner = FftPlanner::new(); planner.plan_fft_forward(points) }; let mut scratch = vec![Complex::zero(); fft.get_inplace_scratch_len()]; fft.process_with_scratch(&mut input, &mut scratch); assert!(scratch.len() <= input.len()); // Fails as 9354 > 5466

get_outofplace_scratch_len returns for a same input length 0.

As the required scratch space is a lot larger than the input size I'm wondering if that is correct/really needed?
opened by liebharc 9
Arm ci

This adds a ci job to run check and test on aarch64. For simplicity it only runs on stable, should be fine since the nightly/beta/etc matrix is run in the other job on x86_64. The test_planned_fft_forward_f32() etc test in accuracy.rs take a fairly long time to run in emulation, if this becomes a problem we can probably skip them for arm.

opened by HEnquist 8
5.0 release
Things that need to happen before a 5.0 release

[x] Upgrade guide from 4.0 to 5.0, since this release has nontrivial breaking changes

[x] Update readme to have AVX performance tips, correct example, etc

[x] Feature flag to disable AVX? This would improve compile times for users who know their target CPU doesn't support avx

[x] Determine MSRV

[x] Document MSRV update policy (Basically, copy tokio 1.0's policy of requring that a version of rustc be out for 6 months before depending on it)

[x] Document policy on future major releases (No breaking changes for 3 years, EXCEPT for if we need to upgrade our version of num-traits/num-complex)

[x] Get ARM CI going

[x] Hide the FftPlanner enum from the public API by wrapping it in a wrapper struct

[x] Add MSRV to CI

This release will have a lot of breaking changes, and I'd like to avoid making any for a long time afterwards, so if we have any other breaking changes, we need to get them in soon.
opened by ejmahler 8
Const Generics to avoid panics
Currently the process and related methods panics when the input or output buffers are of incorrect lengths:

[Panics](https://docs.rs/rustfft/latest/rustfft/trait.Fft.html#panics-2) This method panics if: buffer.len() % self.len() > 0 buffer.len() < self.len()

This can be made into compile-time failures using const-generics. Unfortunately this would require a major version bump and also increase the rustc version dependency. Depending on the complexity of the constraints required it may even require generic_const_exprs which is still unstable. If this is the case, I'd suggest waiting releasing until it's stabilised - but no reason not to start fiddling.

Just posting this issue to see if there are any major objections before I look further into it.
opened by WalterSmuts 6
Save memory by skipping the shuffle map from Radix4 and Radix3
I was looking into how to make the bit reversal in Radix4 and Radix3 more friendly to SIMD. I was working under the assumption that the bit reversals were too expensive to do in the outer loop of bitreversed_transpose(), but during my experiments, i stumbled across something that made me challenge that assumption.

I discovered that there was little or no performance difference between

Using the shuffle map as-is

Unrolling one step of the shuffle map, so that it only stores some of the values, with the rest being reconstructed via simple arithmetic

Entirely eliminating the shuffle map, and computing one bit reversal per outer loop, with the rest of the bit reversals being reconstructed

Just computing all the bit reversals in the outer loop, with no fancy reconstruction.

As a result, this PR changes Radix 4 and Radix 3 to the last bullet point, completely eliminating the shuffle map. This makes radix4 and radix3 simpler, and creates a much more obvious path for SIMD-ification of the bit reversal algorithm. Although after my experiments here, I'm not too confident that SIMD bit reversal will make much of a difference.
opened by ejmahler 6

Different outputs than FFTW

Thank you for your library !

I am comparing the output of this library from the one from FFTW to see if I can port some code using the latter and using the following code :

use rustfft::{num_complex::Complex64, num_traits::Zero, FftPlanner};

fn fft_rustfft(input: Vec<Complex64>) -> Vec<Complex64> {
    let mut planner = FftPlanner::<f64>::new();
    let fft = planner.plan_fft_forward(input.len());
    let mut binding = input;
    let mut buffer = binding.as_mut_slice();
    fft.process(buffer);
    buffer.to_vec()
}

fn fft_fftw(input: Vec<Complex64>) -> Vec<Complex64> {
    use concrete_fftw::array::AlignedVec;
    use concrete_fftw::plan::*;
    use concrete_fftw::types::*;
    let n = input.len();
    let plan: C2CPlan64 = C2CPlan::aligned(&[n], Sign::Forward, Flag::MEASURE).unwrap();
    let mut binding = input.clone();
    let mut a = binding.as_mut_slice();
    let mut b = AlignedVec::new(n);
    plan.c2c(a, &mut b).unwrap();
    b.as_slice().to_owned()
}

fn main() {
    const size: usize = 750;

    let mut test_array = [Complex64::default(); size];

    // Zero vectors
    assert_eq!(
        fft_fftw(test_array.to_vec()),
        fft_rustfft(test_array.to_vec())
    );

    for (i, value) in test_array.iter_mut().enumerate() {
        *value = Complex64 {
            re: (-((i as f64 - size as f64 / 2.0) / (size / 10) as f64).powi(2)).exp(),
            im: 0.0,
        };
    }

    // Gaussian vectors
    assert_ne!(
        fft_fftw(test_array.to_vec()),
        fft_rustfft(test_array.to_vec())
    );
}

I am getting different outputs on the second assertion (no panics if you run it).

Is this behavior expected or am I doing something unexpected ?

opened by TheSirC 4

Consider relaxing initialisation requirement of scratch buffers

The Fft trait requires scratch buffers to be initialised by defining their type as &mut [Complex<T>]. This type carries an implicit requirement that the memory be initialised. Since the fft algorithm should ALWAYS write to the scratch buffer before reading from it, it does not really require the memory to be initialised. To capture this in the type, I think we'd need &mut MaybeUninit<[Complex<T>]> or perhaps &mut [MaybeUninit<Complex<T>>].

The benefit would be a slight performance improvement for applications. These applications won't be required to fill the scratch buffer with a needless default value anymore and would allow wrapping libraries such as easyfft to implement fft operations on slices and arrays without requiring the Default trait bound on the elements.

I haven't looked at the implementation of the fft algorithms to see if it's practical to express. It could be that it gets in the way of the logic in the way it's currently expressed. It would also carry another major version change.

opened by WalterSmuts 28
Unclear position of coefficients
Currently, the output order of the coefficients is documented as:

Elements in the output are ordered by ascending frequency, with the first element corresponding to frequency 0.

But this is a very unspecific definition, and could lead to various conclusions, like for example (for an input buffer of size n):

Starts at zero, and goes up to n (which means no negative frequencies)

Index zero has the zero frequency, then goes from -n/2 up to n/2 (then the order differences between even and oddly sized buffers is unspecified)

Numpy-like representation (index zero is frequency zero, then goes from 1 up to n/2, then from -n/2 up to -1)

A better explanation in the documentation would be very helpful. Failing that, a couple examples would also serve the purpose of showing in greater detail the output order.
opened by Aandreba 1
actions-rs is unmaintained, will stop working soon

The GitHub actions use actions-rs. This is unmaintained and will stop working once node 12 support is removed.

I have started updating my projects and it's quite easy. I'll do the same here and submit a PR soon (if nobody beats me to it). Opening this issue to try to not forget to actually do it..

opened by HEnquist 1

error: Undefined Behavior: attempting a read access using <981690> at alloc412474[0x0], but that tag does not exist in the borrow stack for this location.

The rustfft::accuracy test_planned_fft_forward_f32 reports undefined behavior, and from what it seems like, it is coming from butterfles.rs. Here is the full stacktrace generated by rust-lang/miri:

FAIL [   3.042s] rustfft::accuracy test_planned_fft_forward_f32         

--- STDOUT:              rustfft::accuracy test_planned_fft_forward_f32 ---

running 1 test
test test_planned_fft_forward_f32 ... 
--- STDERR:              rustfft::accuracy test_planned_fft_forward_f32 ---
error: Undefined Behavior: attempting a read access using <981690> at alloc412474[0x0], but that tag does not exist in the borrow stack for this location
   --> /root/build/src/array_utils.rs:64:9
    |
64  |         *self.ptr.add(index)
    |         ^^^^^^^^^^^^^^^^^^^^
    |         |
    |         attempting a read access using <981690> at alloc412474[0x0], but that tag does not exist in the borrow stack for this location
    |         this error occurs as part of an access at alloc412474[0x0..0x8]
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <981690> was created by a SharedReadOnly retag at offsets [0x0..0x20]
   --> /root/build/src/array_utils.rs:37:18
    |
37  |             ptr: slice.as_ptr(),
    |                  ^^^^^^^^^^^^^^
help: <981690> was later invalidated at offsets [0x0..0x20] by a Unique FnEntry retag inside this call
   --> /root/build/src/algorithm/butterflies.rs:221:1
    |
221 | boilerplate_fft_butterfly!(Butterfly4, 4, |this: &Butterfly4<_>| this.direction);
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: BACKTRACE:
    = note: inside `rustfft::array_utils::RawSlice::<rustfft::num_complex::Complex<f32>>::load` at /root/build/src/array_utils.rs:64:9
note: inside `rustfft::algorithm::butterflies::Butterfly4::<f32>::perform_fft_contiguous` at /root/build/src/algorithm/butterflies.rs:240:26
   --> /root/build/src/algorithm/butterflies.rs:240:26
    |
240 |         let mut value0 = input.load(0);
    |                          ^^^^^^^^^^^^^
note: inside `rustfft::algorithm::butterflies::Butterfly4::<f32>::perform_fft_butterfly` at /root/build/src/algorithm/butterflies.rs:17:17
   --> /root/build/src/algorithm/butterflies.rs:221:1
    |
221 | boilerplate_fft_butterfly!(Butterfly4, 4, |this: &Butterfly4<_>| this.direction);
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at /root/build/src/algorithm/butterflies.rs:61:21
   --> /root/build/src/algorithm/butterflies.rs:221:1
    |
221 | boilerplate_fft_butterfly!(Butterfly4, 4, |this: &Butterfly4<_>| this.direction);
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `rustfft::array_utils::iter_chunks::<rustfft::num_complex::Complex<f32>, [closure@<rustfft::algorithm::butterflies::Butterfly4<f32> as rustfft::Fft<f32>>::process_with_scratch::{closure#0}]>` at /root/build/src/array_utils.rs:155:9
   --> /root/build/src/array_utils.rs:155:9
    |
155 |         chunk_fn(head);
    |         ^^^^^^^^^^^^^^
note: inside `<rustfft::algorithm::butterflies::Butterfly4<f32> as rustfft::Fft<f32>>::process_with_scratch` at /root/build/src/algorithm/butterflies.rs:60:30
   --> /root/build/src/algorithm/butterflies.rs:221:1
    |
221 | boilerplate_fft_butterfly!(Butterfly4, 4, |this: &Butterfly4<_>| this.direction);
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `rustfft::algorithm::Radix4::<f32>::perform_fft_out_of_place` at /root/build/src/algorithm/radix4.rs:117:9
   --> /root/build/src/algorithm/radix4.rs:117:9
    |
117 |         self.base_fft.process_with_scratch(spectrum, &mut []);
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside closure at /root/build/src/common.rs:126:21
   --> /root/build/src/algorithm/radix4.rs:145:1
    |
145 | boilerplate_fft_oop!(Radix4, |this: &Radix4<_>| this.len);
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `rustfft::array_utils::iter_chunks::<rustfft::num_complex::Complex<f32>, [closure@<rustfft::algorithm::Radix4<f32> as rustfft::Fft<f32>>::process_with_scratch::{closure#0}]>` at /root/build/src/array_utils.rs:155:9
   --> /root/build/src/array_utils.rs:155:9
    |
155 |         chunk_fn(head);
    |         ^^^^^^^^^^^^^^
note: inside `<rustfft::algorithm::Radix4<f32> as rustfft::Fft<f32>>::process_with_scratch` at /root/build/src/common.rs:125:30
   --> /root/build/src/algorithm/radix4.rs:145:1
    |
145 | boilerplate_fft_oop!(Radix4, |this: &Radix4<_>| this.len);
    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `rustfft::algorithm::BluesteinsAlgorithm::<f32>::new` at /root/build/src/algorithm/bluesteins_algorithm.rs:85:9
   --> /root/build/src/algorithm/bluesteins_algorithm.rs:85:9
    |
85  |         inner_fft.process_with_scratch(&mut inner_fft_input, &mut inner_fft_scratch);
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `ControlCache::<f32>::plan_fft` at tests/accuracy.rs:103:18
   --> tests/accuracy.rs:103:18
    |
103 |         Arc::new(BluesteinsAlgorithm::new(len, inner_fft))
    |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `test_planned_fft_forward_f32` at tests/accuracy.rs:117:23
   --> tests/accuracy.rs:117:23
    |
117 |         let control = cache.plan_fft(len);
    |                       ^^^^^^^^^^^^^^^^^^^
note: inside closure at tests/accuracy.rs:112:1
   --> tests/accuracy.rs:112:1
    |
111 |   #[test]
    |   ------- in this procedural macro expansion
112 | / fn test_planned_fft_forward_f32() {
113 | |     let direction = FftDirection::Forward;
114 | |     let cache: ControlCache<f32> = ControlCache::new(TEST_MAX, direction);
115 | |
...   |
123 | |     }
124 | | }
    | |_^
    = note: this error originates in the macro `boilerplate_fft_butterfly` which comes from the expansion of the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error

opened by michaelgrigoryan25 10

Releases(6.1.0)

6.1.0(Nov 8, 2022)
[6.1]

Released 7th Novemeber 2022

Added

Implemented a code path for Neon-optimized FFTs on AArch64 (Thanks to Henrik Enquist!) (#84 and #78)

Changed

Improved performance of power-of-3 FFTs when not using SIMD-accelerated code paths (#80)

Reduced memory usage for some FFT sizes (#81)

Source code(tar.gz)
Source code(zip)
6.0.1(May 10, 2021)
[6.0.1]

Released 10 May 2021

Fixed

Fixed a compile-time divide by zero error on nightly Rust in stdarch\crates\core_arch\src\macros.rs (#75)

Increased the minimum version of strength_reduce to 0.2.3

Source code(tar.gz)
Source code(zip)
5.1.1(May 10, 2021)
[5.1.1]

Released 10 May 2021

Fixed

Fixed a compile-time divide by zero error on nightly Rust in stdarch\crates\core_arch\src\macros.rs (Backported from v6.0.1)

Increased the minimum version of strength_reduce to 0.2.3 (Backported from v6.0.1)

Source code(tar.gz)
Source code(zip)
6.0.0(May 10, 2021)
Released 16 April 2021

Breaking Changes

Increased the version of the num-complex dependency to 0.4.

This is a breaking change because we have a public dependency on num-complex.

See the num-complex changelog for a list of breaking changes in num-complex 0.4

As a high-level summary, most users will not need to do anything to upgrade to RustFFT 6.0: num-complex 0.4 re-exports a newer version of rand, and that's num-complex's only documented breaking change.

Source code(tar.gz)
Source code(zip)
5.1.0(May 10, 2021)
Released 16 April 2021

Added

Implemented a code path for SSE-optimized FFTs (Thanks to Henrik Enquist!) (#60)

Plan a FFT using the FftPlanner (or the new FftPlannerSse) on a machine that supports SSE4.1 (but not AVX) and you'll see a 2-3x performance improvement over the default scalar code.

Fixed

Fixed underflow when planning an AVX FFT of size zero (#56)

Fixed the FFT planner not being Send, due to internal use of Rc<> (#55)

Fixed typo in documentation (#54)

Slightly improved numerical precision of Rader's Algorithm and Bluestein's Algorithm (#66, #68)

Minor optimizations to Rader's Algorithm and Bluestein's Algorithm (#59)

Minor optimizations to MixedRadix setup time (#57)

Optimized performance of Radix4 (#65)

Source code(tar.gz)
Source code(zip)
5.0.1(Jan 9, 2021)
[5.0.1]

Released 8 January 2021

Fixed

Fixed the FFT planner not choosing an obviously faster plan in some rare cases (#46)

Documentation fixes and clarificarions (#47, #48, #51)

Source code(tar.gz)
Source code(zip)
5.0(Jan 5, 2021)
Released 4 January 2021

Breaking Changes

Several breaking changes. See the Upgrade Guide for details.

Added

Added support for the Avx instruction set. Plan a FFT with the FftPlanner on a machine that supports AVX, and you'll get a 5x-10x speedup in FFT performance.

Changed

Even though the main focus of this release is on AVX, most users should see moderate performance improvements due to a new internal architecture that reduces the amount of internal copies required when computing a FFT.

Source code(tar.gz)
Source code(zip)
4.1(Dec 24, 2020)
Released 24 December 2020

Added

Added a blanked impl of FFTnum to any type that implements the required traits (#7)

Added butterflies for many prime sizes, up to 31, and optimized the size-3, size-5, and size-7 buitterflies (#10)

Added an implementation of Bluestein's Algorithm (#6)

Changed

Improved the performance of GoodThomasAlgorithm re-indexing (#20)

Source code(tar.gz)
Source code(zip)
4.0(Oct 9, 2020)

Upgraded the num-complex dependency from 0.2 to 0.3 Upgraded the required rustc version from 1.26 to 1.31
Source code(tar.gz)
Source code(zip)

Owner

Elliott Mahler

GitHub

A Machine Learning Framework for High Performance written in Rust

polarlight polarlight is a machine learning framework for high performance written in Rust. Key Features TBA Quick Start TBA How To Contribute Contrib

25 Aug 23, 2022

🌾 High-performance Text processing library for the Thai language, built with Rust and exposed as a Python package.

Thongna ?? Thongna (ท้องนา) is a high-performance text processing library for the Thai language, built with Rust and exposed as a Python package. Insp

3 Aug 17, 2024

High-performance runtime for data analytics applications

Weld Documentation Weld is a language and runtime for improving the performance of data-intensive applications. It optimizes across libraries and func

2.9k Jan 7, 2023

High-performance automatic differentiation of LLVM.

The Enzyme High-Performance Automatic Differentiator of LLVM Enzyme is a plugin that performs automatic differentiation (AD) of statically analyzable

870 Jan 2, 2023

High performance distributed framework for training deep learning recommendation models based on PyTorch.

PERSIA (Parallel rEcommendation tRaining System with hybrId Acceleration) is developed by AI platform@Kuaishou Technology, collaborating with ETH. It

340 Dec 30, 2022

Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

Damavand is a code that simulates quantum circuits. In order to learn more about damavand, refer to the documentation. Development status Core feature

6 Mar 29, 2022

Network-agnostic, high-level game networking library for client-side prediction and server reconciliation.

WARNING: This crate currently depends on nightly rust unstable and incomplete features. crystalorb Network-agnostic, high-level game networking librar

175 Dec 31, 2022

Instance Distance is a fast pure-Rust implementation of the Hierarchical Navigable Small Worlds paper

Fast approximate nearest neighbor searching in Rust, based on HNSW index

135 Dec 24, 2022

Tensors and dynamic neural networks in pure Rust.

Neuronika is a machine learning framework written in pure Rust, built with a focus on ease of use, fast prototyping and performance. Dynamic neural ne

851 Jan 3, 2023

Deep learning superresolution in pure rust

Rusty_SR A Rust super-resolution tool, which when given a low resolution image utilises deep learning to infer the corresponding high resolution image

189 Dec 9, 2022

Automatic differentiation in pure Rust.

Niura is an automatic differentiation library written in Rust. Add niura to your project [dependencies] niura = { git = "https://github.com/taminki/n

10 Jun 16, 2022

A tiny embedding database in pure Rust.

tinyvector - a tiny embedding database in pure Rust ✨ Features Tiny: It's in the name. It's literally just an axum server. Extremely easy to customize

210 Jul 12, 2023

A pure, low-level tensor program representation enabling tensor program optimization via program rewriting

Glenside is a pure, low-level tensor program representation which enables tensor program optimization via program rewriting, using rewriting frameworks such as the egg equality saturation library.

45 Dec 28, 2022

High-level non-blocking Deno bindings to the rust-bert machine learning crate.

bertml High-level non-blocking Deno bindings to the rust-bert machine learning crate. Guide Introduction The ModelManager class manages the FFI bindin

14 Dec 15, 2022

MO's Trading - an online contest for high frequency trading

29 Dec 14, 2022

A high level, easy to use gpgpu crate based on wgpu

A high level, easy to use gpgpu crate based on wgpu. It is made for very large computations on powerful gpus

18 Nov 26, 2022

Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

139 Dec 30, 2022

Simple neural network library for classification written in Rust.

Cogent A note I continue working on GPU stuff, I've made some interesting things there, but ultimately it made me realise this is far too monumental a

41 Dec 25, 2022

l2 is a fast, Pytorch-style Tensor+Autograd library written in Rust

l2 • ?? A Pytorch-style Tensor+Autograd library written in Rust Installation • Contributing • Authors • License • Acknowledgements Made by Bilal Khan

163 Dec 25, 2022

RustFFT is a high-performance FFT library written in pure Rust.

Related tags

Overview

RustFFT

Usage

Supported Rust Version

Stability/Future Breaking Changes

License

Contribution

Comments

Why making FftPlanner: Send+Sync ?

Changes

Releases(6.1.0)

6.1.0(Nov 8, 2022)

[6.1]

Added

Changed

6.0.1(May 10, 2021)

[6.0.1]

Fixed

5.1.1(May 10, 2021)

[5.1.1]

Fixed

6.0.0(May 10, 2021)

Breaking Changes

5.1.0(May 10, 2021)

Added

Fixed

5.0.1(Jan 9, 2021)

[5.0.1]

Fixed

5.0(Jan 5, 2021)

Breaking Changes

Added

Changed

4.1(Dec 24, 2020)

Added

Changed

4.0(Oct 9, 2020)

Owner

Elliott Mahler

A Machine Learning Framework for High Performance written in Rust

🌾 High-performance Text processing library for the Thai language, built with Rust and exposed as a Python package.

High-performance runtime for data analytics applications

High-performance automatic differentiation of LLVM.

High performance distributed framework for training deep learning recommendation models based on PyTorch.

Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

Network-agnostic, high-level game networking library for client-side prediction and server reconciliation.

Instance Distance is a fast pure-Rust implementation of the Hierarchical Navigable Small Worlds paper

Tensors and dynamic neural networks in pure Rust.

Deep learning superresolution in pure rust

Automatic differentiation in pure Rust.

A tiny embedding database in pure Rust.

A pure, low-level tensor program representation enabling tensor program optimization via program rewriting

High-level non-blocking Deno bindings to the rust-bert machine learning crate.

MO's Trading - an online contest for high frequency trading

A high level, easy to use gpgpu crate based on wgpu

Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

Simple neural network library for classification written in Rust.

l2 is a fast, Pytorch-style Tensor+Autograd library written in Rust