RustFFT is a high-performance FFT library written in pure Rust.

Overview

RustFFT

CI minimum rustc 1.37

RustFFT is a high-performance FFT library written in pure Rust. It can compute FFTs of any size, including prime-number sizes, in O(nlogn) time.

RustFFT supports the AVX instruction set for increased performance. No special code is needed to activate AVX: Simply plan a FFT using the FftPlanner on a machine that supports the avx and fma CPU features, and RustFFT will automatically switch to faster AVX-accelerated algorithms.

Unlike previous major versions, RustFFT 5.0 has several breaking changes compared to RustFFT 4.0. Check out the Upgrade Guide for a walkthrough of the changes RustFFT 5.0 requires.

Usage

// Perform a forward FFT of size 1234
use rustfft::{FftPlanner, num_complex::Complex};

let mut planner = FftPlanner::<f32>::new();
let fft = planner.plan_fft_forward(1234);

let mut buffer = vec![Complex{ re: 0.0, im: 0.0 }; 1234];

fft.process(&mut buffer);

Supported Rust Version

RustFFT requires rustc 1.37 or newer. Minor releases of RustFFT may upgrade the MSRV(minimum supported Rust version) to a newer version of rustc. However, if we need to increase the MSRV, the new Rust version must have been released at least six months ago.

Stability/Future Breaking Changes

Version 5.0 contains several breaking API changes. In the interest of stability, we're committing to making no more breaking changes for 3 years, aka until 2024.

This policy has one exception: We currently re-export pre-1.0 versions of the num-complex and num-traits crates. If those crates release new major versions, we will upgrade as soon as possible, which will require a major version change of our own. If this happens, the version increase of num-complex/num-traits will be the only breaking change.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Before submitting a PR, please make sure to run cargo fmt.

Comments
  • SSE support

    SSE support

    This is still very much work in progress, but I thought I would show what I'm up to. Looking at the AVX code, it's quite big and many of the things there don't look like they would work well with the much smaller SSE instruction sets. Instead I started out from the scalar code, and have made a hybrid solution where I use SSE code whenever it's ready, and fall back to the scalar one otherwise. For now I have butterflies of length 2,3,4,5,8,16,32, and Radix4 working. The other algorightms will use these as inners, to also get some speedup. Surprisingly I get about the same speedup for f32 as for f64. For radix4, that varies between +30% to +90%. There may be more to gain by tweaking a bit here and there. For the individual butterflies the gain increases with the length, and for 16 and 32 they are about 3x as fast as the scalars (both for f32 and f64). I will continue now with implementing the rest of the butterflies. (I haven't abandoned the estimating planner, this got caught up in this when it started working well).

    opened by HEnquist 24
  • Neon

    Neon

    Now that all the needed intrinsics are available, it's time for some Neon! This is basically a direct translation of the SSE code. Running on Cortex A72 (a Raspberry Pi4), I get a speedup of about 50% for f32, and none for f64. The Neon unit of the A72 can only execute a single 128-bit operation at a time. But it can do two f64 operations in parallel, meaning there isn't really any advantage to Neon here. More advanced cores should do better. To build this, you need a compiler that has this merged: https://github.com/rust-lang/rust/pull/89145 Reason is here: https://github.com/rust-lang/stdarch/issues/1220 Once the latest nightly can be used, I'll add a CI job.

    opened by HEnquist 17
  • benchmarks

    benchmarks

    Hi! I am porting some audio stuff to R and decided to benchmark rustfft against some other known implementations.

    I used extendr crate to port rustfft to R and the reticulate package to run numpy from R. To my surprise the torch (no GPU) implementation seems to do better than all the others for n > 5000 points. Do you have any idea why would Torch's implementation be faster than the others?

    image

    code:

    library(bench)
    library(torch)
    library(reticulate)
    library(ggplot2)
    
    results <- bench::press(
        n = seq(25, 50000, 200),
        {
        x = rep(1+1i, n)
        py$x = x
        y = torch_tensor(x)
        py_run_string("import numpy as np", convert = FALSE)
    
        mark(
            rustfft::fft(x),
            stats::fft(x),
            torch::torch_fft_fft(y),
            py_eval("np.fft.fft(x)", convert = FALSE),
            iterations = 100,
            check = FALSE
            )
        }
    )
    
    ggplot(results) + geom_point(aes(x = n, y = median, color = as.character(expression))) +
        scale_colour_discrete(name = "implementation", labels=c("rustfft", "numpy.fft.fft (via reticulate)", "stats::fft", "torch::torch_fft_fft"))
    
    opened by daniellga 11
  • support for higher precision types

    support for higher precision types

    Is it possible to support types with higher precision than f64? Converting from f64 to custom type might lose precision, as done in https://github.com/ejmahler/RustFFT/blob/master/src/twiddles.rs#L13.

    I'm thinking only converting from usize to FftNum and using num::traits::{Float, FloatConsts} to compute pi and do arithmetic operations (e.g., sqrt, cos, sin).

    opened by roosephu 11
  •  Make AvxFftPlanner: Send+Sync

    Make AvxFftPlanner: Send+Sync

    Why making FftPlanner: Send+Sync ?

    Making FftPlanner: Send+Sync simplifies the usage of a single FftPlanner in a large chunk of an multi-threaded application, hence benefitting of the cache across threads. This allows for instance to put the FftPlanner in a global variable (e.g. using lazy_static!), which is may be useful even if the application is actually not multi-threaded.

    There is currently no performance cost in making FftPlanner: Send+Sync, and other parts of the APIs already pay some cost in order to allow multi-threading (e.g. FftPlanner.plan_ftt that returns an Arc).

    Changes

    • make AvxFftPlanner: Send+Sync by adding a Send+Sync bound on the trait AvxPlannerInternalAPI declaration.
    • make FftPlannerScalar: Send+Sync by replacing the internal Rc<Recipe> by an arena (implemented as a simple Vec, but this can be easily changed to any of the arena crates).
    • add a unit test that FftPlanner: Send+Sync
    opened by cassiersg 11
  • Experiments with sse, very slow (what am I doing wrong?)

    Experiments with sse, very slow (what am I doing wrong?)

    This PR isn't meant to be merged! I made a little attempt at using sse to speed up the length 4 butterfly. I made it as a quite ugly hack in the scalar butterfly, mostly for playing around. It works and gives correct results, but it's slow like a turtle! The best way to handle sse would probably be to start off from the avx code instead, but that thing is huge and I wanted to just play a little first. Is there anything obviously wrong with my code?

    opened by HEnquist 9
  • get_inplace_scratch_len results compared to the input size

    get_inplace_scratch_len results compared to the input size

    Hi,

    Thanks for the rustfft update :). I'm wondering about the results of get_inplace_scratch_len. I'm not claiming it's wrong, but I would like to double check with you.

    I naively would have expected that the required scratch size never exceeds the length of the input array, but it does for some values, e.g.:

            let points = 5466;
            let mut input : Vec<Complex<f32>> = vec![Complex::zero(); points];
    
            let fft = {
                let mut planner = FftPlanner::new();
                planner.plan_fft_forward(points)
            };
    
            let mut scratch = vec![Complex::zero(); fft.get_inplace_scratch_len()];
            fft.process_with_scratch(&mut input, &mut scratch);
            assert!(scratch.len() <= input.len()); // Fails as 9354 > 5466
    

    get_outofplace_scratch_len returns for a same input length 0.

    As the required scratch space is a lot larger than the input size I'm wondering if that is correct/really needed?

    opened by liebharc 9
  • Arm ci

    Arm ci

    This adds a ci job to run check and test on aarch64. For simplicity it only runs on stable, should be fine since the nightly/beta/etc matrix is run in the other job on x86_64. The test_planned_fft_forward_f32() etc test in accuracy.rs take a fairly long time to run in emulation, if this becomes a problem we can probably skip them for arm.

    opened by HEnquist 8
  • 5.0 release

    5.0 release

    Things that need to happen before a 5.0 release

    • [x] Upgrade guide from 4.0 to 5.0, since this release has nontrivial breaking changes
    • [x] Update readme to have AVX performance tips, correct example, etc
    • [x] Feature flag to disable AVX? This would improve compile times for users who know their target CPU doesn't support avx
    • [x] Determine MSRV
    • [x] Document MSRV update policy (Basically, copy tokio 1.0's policy of requring that a version of rustc be out for 6 months before depending on it)
    • [x] Document policy on future major releases (No breaking changes for 3 years, EXCEPT for if we need to upgrade our version of num-traits/num-complex)
    • [x] Get ARM CI going
    • [x] Hide the FftPlanner enum from the public API by wrapping it in a wrapper struct
    • [x] Add MSRV to CI

    This release will have a lot of breaking changes, and I'd like to avoid making any for a long time afterwards, so if we have any other breaking changes, we need to get them in soon.

    opened by ejmahler 8
  • Const Generics to avoid panics

    Const Generics to avoid panics

    Currently the process and related methods panics when the input or output buffers are of incorrect lengths:

    [Panics](https://docs.rs/rustfft/latest/rustfft/trait.Fft.html#panics-2)
    This method panics if:
    
    buffer.len() % self.len() > 0
    buffer.len() < self.len()
    

    This can be made into compile-time failures using const-generics. Unfortunately this would require a major version bump and also increase the rustc version dependency. Depending on the complexity of the constraints required it may even require generic_const_exprs which is still unstable. If this is the case, I'd suggest waiting releasing until it's stabilised - but no reason not to start fiddling.

    Just posting this issue to see if there are any major objections before I look further into it.

    opened by WalterSmuts 6
  • Save memory by skipping the shuffle map from Radix4 and Radix3

    Save memory by skipping the shuffle map from Radix4 and Radix3

    I was looking into how to make the bit reversal in Radix4 and Radix3 more friendly to SIMD. I was working under the assumption that the bit reversals were too expensive to do in the outer loop of bitreversed_transpose(), but during my experiments, i stumbled across something that made me challenge that assumption.

    I discovered that there was little or no performance difference between

    • Using the shuffle map as-is
    • Unrolling one step of the shuffle map, so that it only stores some of the values, with the rest being reconstructed via simple arithmetic
    • Entirely eliminating the shuffle map, and computing one bit reversal per outer loop, with the rest of the bit reversals being reconstructed
    • Just computing all the bit reversals in the outer loop, with no fancy reconstruction.

    As a result, this PR changes Radix 4 and Radix 3 to the last bullet point, completely eliminating the shuffle map. This makes radix4 and radix3 simpler, and creates a much more obvious path for SIMD-ification of the bit reversal algorithm. Although after my experiments here, I'm not too confident that SIMD bit reversal will make much of a difference.

    opened by ejmahler 6
  • Different outputs than FFTW

    Different outputs than FFTW

    Thank you for your library !

    I am comparing the output of this library from the one from FFTW to see if I can port some code using the latter and using the following code :

    use rustfft::{num_complex::Complex64, num_traits::Zero, FftPlanner};
    
    fn fft_rustfft(input: Vec<Complex64>) -> Vec<Complex64> {
        let mut planner = FftPlanner::<f64>::new();
        let fft = planner.plan_fft_forward(input.len());
        let mut binding = input;
        let mut buffer = binding.as_mut_slice();
        fft.process(buffer);
        buffer.to_vec()
    }
    
    fn fft_fftw(input: Vec<Complex64>) -> Vec<Complex64> {
        use concrete_fftw::array::AlignedVec;
        use concrete_fftw::plan::*;
        use concrete_fftw::types::*;
        let n = input.len();
        let plan: C2CPlan64 = C2CPlan::aligned(&[n], Sign::Forward, Flag::MEASURE).unwrap();
        let mut binding = input.clone();
        let mut a = binding.as_mut_slice();
        let mut b = AlignedVec::new(n);
        plan.c2c(a, &mut b).unwrap();
        b.as_slice().to_owned()
    }
    
    fn main() {
        const size: usize = 750;
    
        let mut test_array = [Complex64::default(); size];
    
        // Zero vectors
        assert_eq!(
            fft_fftw(test_array.to_vec()),
            fft_rustfft(test_array.to_vec())
        );
    
        for (i, value) in test_array.iter_mut().enumerate() {
            *value = Complex64 {
                re: (-((i as f64 - size as f64 / 2.0) / (size / 10) as f64).powi(2)).exp(),
                im: 0.0,
            };
        }
    
        // Gaussian vectors
        assert_ne!(
            fft_fftw(test_array.to_vec()),
            fft_rustfft(test_array.to_vec())
        );
    }
    

    I am getting different outputs on the second assertion (no panics if you run it).

    Is this behavior expected or am I doing something unexpected ?

    opened by TheSirC 4
  • Consider relaxing initialisation requirement of scratch buffers

    Consider relaxing initialisation requirement of scratch buffers

    The Fft trait requires scratch buffers to be initialised by defining their type as &mut [Complex<T>]. This type carries an implicit requirement that the memory be initialised. Since the fft algorithm should ALWAYS write to the scratch buffer before reading from it, it does not really require the memory to be initialised. To capture this in the type, I think we'd need &mut MaybeUninit<[Complex<T>]> or perhaps &mut [MaybeUninit<Complex<T>>].

    The benefit would be a slight performance improvement for applications. These applications won't be required to fill the scratch buffer with a needless default value anymore and would allow wrapping libraries such as easyfft to implement fft operations on slices and arrays without requiring the Default trait bound on the elements.

    I haven't looked at the implementation of the fft algorithms to see if it's practical to express. It could be that it gets in the way of the logic in the way it's currently expressed. It would also carry another major version change.

    opened by WalterSmuts 28
  • Unclear position of coefficients

    Unclear position of coefficients

    Currently, the output order of the coefficients is documented as:

    Elements in the output are ordered by ascending frequency, with the first element corresponding to frequency 0.

    But this is a very unspecific definition, and could lead to various conclusions, like for example (for an input buffer of size n):

    • Starts at zero, and goes up to n (which means no negative frequencies)
    • Index zero has the zero frequency, then goes from -n/2 up to n/2 (then the order differences between even and oddly sized buffers is unspecified)
    • Numpy-like representation (index zero is frequency zero, then goes from 1 up to n/2, then from -n/2 up to -1)

    A better explanation in the documentation would be very helpful. Failing that, a couple examples would also serve the purpose of showing in greater detail the output order.

    opened by Aandreba 1
  • actions-rs is unmaintained, will stop working soon

    actions-rs is unmaintained, will stop working soon

    The GitHub actions use actions-rs. This is unmaintained and will stop working once node 12 support is removed.

    I have started updating my projects and it's quite easy. I'll do the same here and submit a PR soon (if nobody beats me to it). Opening this issue to try to not forget to actually do it..

    opened by HEnquist 1
  • error: Undefined Behavior: attempting a read access using <981690> at alloc412474[0x0], but that tag does not exist in the borrow stack for this location.

    error: Undefined Behavior: attempting a read access using <981690> at alloc412474[0x0], but that tag does not exist in the borrow stack for this location.

    The rustfft::accuracy test_planned_fft_forward_f32 reports undefined behavior, and from what it seems like, it is coming from butterfles.rs. Here is the full stacktrace generated by rust-lang/miri:

    FAIL [   3.042s] rustfft::accuracy test_planned_fft_forward_f32         
    
    --- STDOUT:              rustfft::accuracy test_planned_fft_forward_f32 ---
    
    running 1 test
    test test_planned_fft_forward_f32 ... 
    --- STDERR:              rustfft::accuracy test_planned_fft_forward_f32 ---
    error: Undefined Behavior: attempting a read access using <981690> at alloc412474[0x0], but that tag does not exist in the borrow stack for this location
       --> /root/build/src/array_utils.rs:64:9
        |
    64  |         *self.ptr.add(index)
        |         ^^^^^^^^^^^^^^^^^^^^
        |         |
        |         attempting a read access using <981690> at alloc412474[0x0], but that tag does not exist in the borrow stack for this location
        |         this error occurs as part of an access at alloc412474[0x0..0x8]
        |
        = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
        = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
    help: <981690> was created by a SharedReadOnly retag at offsets [0x0..0x20]
       --> /root/build/src/array_utils.rs:37:18
        |
    37  |             ptr: slice.as_ptr(),
        |                  ^^^^^^^^^^^^^^
    help: <981690> was later invalidated at offsets [0x0..0x20] by a Unique FnEntry retag inside this call
       --> /root/build/src/algorithm/butterflies.rs:221:1
        |
    221 | boilerplate_fft_butterfly!(Butterfly4, 4, |this: &Butterfly4<_>| this.direction);
        | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        = note: BACKTRACE:
        = note: inside `rustfft::array_utils::RawSlice::<rustfft::num_complex::Complex<f32>>::load` at /root/build/src/array_utils.rs:64:9
    note: inside `rustfft::algorithm::butterflies::Butterfly4::<f32>::perform_fft_contiguous` at /root/build/src/algorithm/butterflies.rs:240:26
       --> /root/build/src/algorithm/butterflies.rs:240:26
        |
    240 |         let mut value0 = input.load(0);
        |                          ^^^^^^^^^^^^^
    note: inside `rustfft::algorithm::butterflies::Butterfly4::<f32>::perform_fft_butterfly` at /root/build/src/algorithm/butterflies.rs:17:17
       --> /root/build/src/algorithm/butterflies.rs:221:1
        |
    221 | boilerplate_fft_butterfly!(Butterfly4, 4, |this: &Butterfly4<_>| this.direction);
        | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    note: inside closure at /root/build/src/algorithm/butterflies.rs:61:21
       --> /root/build/src/algorithm/butterflies.rs:221:1
        |
    221 | boilerplate_fft_butterfly!(Butterfly4, 4, |this: &Butterfly4<_>| this.direction);
        | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    note: inside `rustfft::array_utils::iter_chunks::<rustfft::num_complex::Complex<f32>, [closure@<rustfft::algorithm::butterflies::Butterfly4<f32> as rustfft::Fft<f32>>::process_with_scratch::{closure#0}]>` at /root/build/src/array_utils.rs:155:9
       --> /root/build/src/array_utils.rs:155:9
        |
    155 |         chunk_fn(head);
        |         ^^^^^^^^^^^^^^
    note: inside `<rustfft::algorithm::butterflies::Butterfly4<f32> as rustfft::Fft<f32>>::process_with_scratch` at /root/build/src/algorithm/butterflies.rs:60:30
       --> /root/build/src/algorithm/butterflies.rs:221:1
        |
    221 | boilerplate_fft_butterfly!(Butterfly4, 4, |this: &Butterfly4<_>| this.direction);
        | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    note: inside `rustfft::algorithm::Radix4::<f32>::perform_fft_out_of_place` at /root/build/src/algorithm/radix4.rs:117:9
       --> /root/build/src/algorithm/radix4.rs:117:9
        |
    117 |         self.base_fft.process_with_scratch(spectrum, &mut []);
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    note: inside closure at /root/build/src/common.rs:126:21
       --> /root/build/src/algorithm/radix4.rs:145:1
        |
    145 | boilerplate_fft_oop!(Radix4, |this: &Radix4<_>| this.len);
        | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    note: inside `rustfft::array_utils::iter_chunks::<rustfft::num_complex::Complex<f32>, [closure@<rustfft::algorithm::Radix4<f32> as rustfft::Fft<f32>>::process_with_scratch::{closure#0}]>` at /root/build/src/array_utils.rs:155:9
       --> /root/build/src/array_utils.rs:155:9
        |
    155 |         chunk_fn(head);
        |         ^^^^^^^^^^^^^^
    note: inside `<rustfft::algorithm::Radix4<f32> as rustfft::Fft<f32>>::process_with_scratch` at /root/build/src/common.rs:125:30
       --> /root/build/src/algorithm/radix4.rs:145:1
        |
    145 | boilerplate_fft_oop!(Radix4, |this: &Radix4<_>| this.len);
        | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    note: inside `rustfft::algorithm::BluesteinsAlgorithm::<f32>::new` at /root/build/src/algorithm/bluesteins_algorithm.rs:85:9
       --> /root/build/src/algorithm/bluesteins_algorithm.rs:85:9
        |
    85  |         inner_fft.process_with_scratch(&mut inner_fft_input, &mut inner_fft_scratch);
        |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    note: inside `ControlCache::<f32>::plan_fft` at tests/accuracy.rs:103:18
       --> tests/accuracy.rs:103:18
        |
    103 |         Arc::new(BluesteinsAlgorithm::new(len, inner_fft))
        |                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    note: inside `test_planned_fft_forward_f32` at tests/accuracy.rs:117:23
       --> tests/accuracy.rs:117:23
        |
    117 |         let control = cache.plan_fft(len);
        |                       ^^^^^^^^^^^^^^^^^^^
    note: inside closure at tests/accuracy.rs:112:1
       --> tests/accuracy.rs:112:1
        |
    111 |   #[test]
        |   ------- in this procedural macro expansion
    112 | / fn test_planned_fft_forward_f32() {
    113 | |     let direction = FftDirection::Forward;
    114 | |     let cache: ControlCache<f32> = ControlCache::new(TEST_MAX, direction);
    115 | |
    ...   |
    123 | |     }
    124 | | }
        | |_^
        = note: this error originates in the macro `boilerplate_fft_butterfly` which comes from the expansion of the attribute macro `test` (in Nightly builds, run with -Z macro-backtrace for more info)
    
    note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
    
    error: aborting due to previous error
    
    opened by michaelgrigoryan25 10
Releases(6.1.0)
  • 6.1.0(Nov 8, 2022)

    [6.1]

    Released 7th Novemeber 2022

    Added

    • Implemented a code path for Neon-optimized FFTs on AArch64 (Thanks to Henrik Enquist!) (#84 and #78)

    Changed

    • Improved performance of power-of-3 FFTs when not using SIMD-accelerated code paths (#80)
    • Reduced memory usage for some FFT sizes (#81)
    Source code(tar.gz)
    Source code(zip)
  • 6.0.1(May 10, 2021)

    [6.0.1]

    Released 10 May 2021

    Fixed

    • Fixed a compile-time divide by zero error on nightly Rust in stdarch\crates\core_arch\src\macros.rs (#75)
    • Increased the minimum version of strength_reduce to 0.2.3
    Source code(tar.gz)
    Source code(zip)
  • 5.1.1(May 10, 2021)

    [5.1.1]

    Released 10 May 2021

    Fixed

    • Fixed a compile-time divide by zero error on nightly Rust in stdarch\crates\core_arch\src\macros.rs (Backported from v6.0.1)
    • Increased the minimum version of strength_reduce to 0.2.3 (Backported from v6.0.1)
    Source code(tar.gz)
    Source code(zip)
  • 6.0.0(May 10, 2021)

    Released 16 April 2021

    Breaking Changes

    • Increased the version of the num-complex dependency to 0.4.
      • This is a breaking change because we have a public dependency on num-complex.
      • See the num-complex changelog for a list of breaking changes in num-complex 0.4
      • As a high-level summary, most users will not need to do anything to upgrade to RustFFT 6.0: num-complex 0.4 re-exports a newer version of rand, and that's num-complex's only documented breaking change.
    Source code(tar.gz)
    Source code(zip)
  • 5.1.0(May 10, 2021)

    Released 16 April 2021

    Added

    • Implemented a code path for SSE-optimized FFTs (Thanks to Henrik Enquist!) (#60)
      • Plan a FFT using the FftPlanner (or the new FftPlannerSse) on a machine that supports SSE4.1 (but not AVX) and you'll see a 2-3x performance improvement over the default scalar code.

    Fixed

    • Fixed underflow when planning an AVX FFT of size zero (#56)
    • Fixed the FFT planner not being Send, due to internal use of Rc<> (#55)
    • Fixed typo in documentation (#54)
    • Slightly improved numerical precision of Rader's Algorithm and Bluestein's Algorithm (#66, #68)
    • Minor optimizations to Rader's Algorithm and Bluestein's Algorithm (#59)
    • Minor optimizations to MixedRadix setup time (#57)
    • Optimized performance of Radix4 (#65)
    Source code(tar.gz)
    Source code(zip)
  • 5.0.1(Jan 9, 2021)

    [5.0.1]

    Released 8 January 2021

    Fixed

    • Fixed the FFT planner not choosing an obviously faster plan in some rare cases (#46)
    • Documentation fixes and clarificarions (#47, #48, #51)
    Source code(tar.gz)
    Source code(zip)
  • 5.0(Jan 5, 2021)

    Released 4 January 2021

    Breaking Changes

    Added

    • Added support for the Avx instruction set. Plan a FFT with the FftPlanner on a machine that supports AVX, and you'll get a 5x-10x speedup in FFT performance.

    Changed

    • Even though the main focus of this release is on AVX, most users should see moderate performance improvements due to a new internal architecture that reduces the amount of internal copies required when computing a FFT.
    Source code(tar.gz)
    Source code(zip)
  • 4.1(Dec 24, 2020)

    Released 24 December 2020

    Added

    • Added a blanked impl of FFTnum to any type that implements the required traits (#7)
    • Added butterflies for many prime sizes, up to 31, and optimized the size-3, size-5, and size-7 buitterflies (#10)
    • Added an implementation of Bluestein's Algorithm (#6)

    Changed

    • Improved the performance of GoodThomasAlgorithm re-indexing (#20)
    Source code(tar.gz)
    Source code(zip)
  • 4.0(Oct 9, 2020)

Owner
Elliott Mahler
Elliott Mahler
A Machine Learning Framework for High Performance written in Rust

polarlight polarlight is a machine learning framework for high performance written in Rust. Key Features TBA Quick Start TBA How To Contribute Contrib

Chris Ohk 25 Aug 23, 2022
High-performance runtime for data analytics applications

Weld Documentation Weld is a language and runtime for improving the performance of data-intensive applications. It optimizes across libraries and func

Weld 2.9k Jan 7, 2023
High-performance automatic differentiation of LLVM.

The Enzyme High-Performance Automatic Differentiator of LLVM Enzyme is a plugin that performs automatic differentiation (AD) of statically analyzable

William Moses 870 Jan 2, 2023
High performance distributed framework for training deep learning recommendation models based on PyTorch.

PERSIA (Parallel rEcommendation tRaining System with hybrId Acceleration) is developed by AI platform@Kuaishou Technology, collaborating with ETH. It

null 340 Dec 30, 2022
Damavand is a quantum circuit simulator. It can run on laptops or High Performance Computing architectures, such CPU distributed architectures or multi GPU distributed architectures.

Damavand is a code that simulates quantum circuits. In order to learn more about damavand, refer to the documentation. Development status Core feature

prevision.io 6 Mar 29, 2022
Network-agnostic, high-level game networking library for client-side prediction and server reconciliation.

WARNING: This crate currently depends on nightly rust unstable and incomplete features. crystalorb Network-agnostic, high-level game networking librar

Ernest Wong 175 Dec 31, 2022
Instance Distance is a fast pure-Rust implementation of the Hierarchical Navigable Small Worlds paper

Fast approximate nearest neighbor searching in Rust, based on HNSW index

Instant Domain Search, Inc. 135 Dec 24, 2022
Tensors and dynamic neural networks in pure Rust.

Neuronika is a machine learning framework written in pure Rust, built with a focus on ease of use, fast prototyping and performance. Dynamic neural ne

Neuronika 851 Jan 3, 2023
Deep learning superresolution in pure rust

Rusty_SR A Rust super-resolution tool, which when given a low resolution image utilises deep learning to infer the corresponding high resolution image

zza 189 Dec 9, 2022
Automatic differentiation in pure Rust.

Niura is an automatic differentiation library written in Rust. Add niura to your project [dependencies] niura = { git = "https://github.com/taminki/n

null 10 Jun 16, 2022
A tiny embedding database in pure Rust.

tinyvector - a tiny embedding database in pure Rust ✨ Features Tiny: It's in the name. It's literally just an axum server. Extremely easy to customize

Miguel Piedrafita 210 Jul 12, 2023
A pure, low-level tensor program representation enabling tensor program optimization via program rewriting

Glenside is a pure, low-level tensor program representation which enables tensor program optimization via program rewriting, using rewriting frameworks such as the egg equality saturation library.

Gus Smith 45 Dec 28, 2022
High-level non-blocking Deno bindings to the rust-bert machine learning crate.

bertml High-level non-blocking Deno bindings to the rust-bert machine learning crate. Guide Introduction The ModelManager class manages the FFI bindin

Carter Snook 14 Dec 15, 2022
MO's Trading - an online contest for high frequency trading

MO's Trading - an online contest for high frequency trading

Runji Wang 29 Dec 14, 2022
A high level, easy to use gpgpu crate based on wgpu

A high level, easy to use gpgpu crate based on wgpu. It is made for very large computations on powerful gpus

null 18 Nov 26, 2022
Msgpack serialization/deserialization library for Python, written in Rust using PyO3, and rust-msgpack. Reboot of orjson. msgpack.org[Python]

ormsgpack ormsgpack is a fast msgpack library for Python. It is a fork/reboot of orjson It serializes faster than msgpack-python and deserializes a bi

Aviram Hassan 139 Dec 30, 2022
Simple neural network library for classification written in Rust.

Cogent A note I continue working on GPU stuff, I've made some interesting things there, but ultimately it made me realise this is far too monumental a

Jonathan Woollett-Light 41 Dec 25, 2022
l2 is a fast, Pytorch-style Tensor+Autograd library written in Rust

l2 • ?? A Pytorch-style Tensor+Autograd library written in Rust Installation • Contributing • Authors • License • Acknowledgements Made by Bilal Khan

Bilal Khan 163 Dec 25, 2022
Reinforcement learning library written in Rust

REnforce Reinforcement library written in Rust This library is still in early stages, and the API has not yet been finalized. The documentation can be

Niven Achenjang 20 Jun 14, 2022