zk-SNARK library

Overview

bellperson Crates.io

This is a fork of the great bellman library.

bellman is a crate for building zk-SNARK circuits. It provides circuit traits and primitive structures, as well as basic gadget implementations such as booleans and number abstractions.

Backend

There is currently one backend available for the implementation of Bls12 381:

  • blstrs - optimized with hand tuned assembly, using blst

GPU

This fork contains GPU parallel acceleration to the FFT and Multiexponentation algorithms in the groth16 prover codebase under the compilation features cuda and opencl.

Requirements

  • NVIDIA or AMD GPU Graphics Driver
  • OpenCL

( For AMD devices we recommend ROCm )

Environment variables

The gpu extension contains some env vars that may be set externally to this library.

  • BELLMAN_NO_GPU

    Will disable the GPU feature from the library and force usage of the CPU.

    // Example
    env::set_var("BELLMAN_NO_GPU", "1");
  • BELLMAN_VERIFIER

    Chooses the device in which the batched verifier is going to run. Can be cpu, gpu or auto.

    Example
    env::set_var("BELLMAN_VERIFIER", "gpu");
  • BELLMAN_CUSTOM_GPU

    Will allow for adding a GPU not in the tested list. This requires researching the name of the GPU device and the number of cores in the format ["name:cores"].

    // Example
    env::set_var("BELLMAN_CUSTOM_GPU", "GeForce RTX 2080 Ti:4352, GeForce GTX 1060:1280");
  • BELLMAN_CPU_UTILIZATION

    Can be set in the interval [0,1] to designate a proportion of the multiexponenation calculation to be moved to cpu in parallel to the GPU to keep all hardware occupied.

    // Example
    env::set_var("BELLMAN_CPU_UTILIZATION", "0.5");
  • RAYON_NUM_THREADS

    Restricts the number of threads used in the library to roughly twice that number (best effort). In the past this was done using BELLMAN_NUM_CPUS which is now deprecated. The default is set to the number of logical cores reported on the machine.

     // Example
     env::set_var("RAYON_NUM_THREADS", "6");
  • BELLMAN_GPU_FRAMEWORK

    Bellman can be compiled with both, OpenCL and CUDA support. When both are available, BELLMAN_GPU_FRAMEWORK can be used to set it to a specific one, either cuda or opencl.

    // Example
    env::set_var("BELLMAN_GPU_FRAMEWORK", "opencl");
  • BELLMAN_CUDA_NVCC_ARGS

    By default the CUDA kernel is compiled for several architectures, which may take a long time. BELLMAN_CUDA_NVCC_ARGS can be used to override those arguments. The input and output file will still be automatically set.

    // Example for compiling the kernel for only the Turing architecture
    env::set_var("BELLMAN_CUDA_NVCC_ARGS", "--fatbin --gpu-architecture=sm_75 --generate-code=arch=compute_75,code=sm_75");

Supported / Tested Cards

Depending on the size of the proof being passed to the gpu for work, certain cards will not be able to allocate enough memory to either the FFT or Multiexp kernel. Below are a list of devices that work for small sets. In the future we will add the cuttoff point at which a given card will not be able to allocate enough memory to utilize the GPU.

Device Name Cores Comments
Quadro RTX 6000 4608
TITAN RTX 4608
Tesla V100 5120
Tesla P100 3584
Tesla T4 2560
Quadro M5000 2048
GeForce RTX 3090 10496
GeForce RTX 3080 8704
GeForce RTX 3070 5888
GeForce RTX 2080 Ti 4352
GeForce RTX 2080 SUPER 3072
GeForce RTX 2080 2944
GeForce RTX 2070 SUPER 2560
GeForce GTX 1080 Ti 3584
GeForce GTX 1080 2560
GeForce GTX 2060 1920
GeForce GTX 1660 Ti 1536
GeForce GTX 1060 1280
GeForce GTX 1650 SUPER 1280
GeForce GTX 1650 896
gfx1010 2560 AMD RX 5700 XT
gfx906 7400 AMD RADEON VII
------------------------ ------- ----------------

Running Tests

RUSTFLAGS="-C target-cpu=native" cargo test --release --all

To run using CUDA and OpenCL, you can use:

RUSTFLAGS="-C target-cpu=native" cargo test --release --all --features cuda,opencl

To run the multiexp_consistency test you can use:

RUST_LOG=info cargo test --features cuda,opencl -- --exact multiexp::gpu_multiexp_consistency --nocapture

Considerations

Bellperson uses rust-gpu-tools as its CUDA/OpenCL backend, therefore you may see a directory named ~/.rust-gpu-tools in your home folder, which contains the compiled binaries of OpenCL kernels used in this repository.

License

Licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Comments
  • Test Failures on GTX3070

    Test Failures on GTX3070

    When I run the tests with the command $ RUSTFLAGS="-C target-cpu=native" cargo test --release --all --features gpu

    I get several failed tests, specifically:

    test domain::tests::gpu_fft_consistency ... FAILED and test groth16::proof::test_with_bls12_381::serialization ... FAILED

    Test stdout follows:

        Finished release [optimized] target(s) in 0.10s
         Running target/release/deps/bellperson-6774ccb95b1358ea
    
    running 73 tests
    test gadgets::boolean::test::test_allocated_bit ... ok
    test gadgets::boolean::test::test_boolean_negation ... ok
    test gadgets::boolean::test::test_and_not ... ok
    test gadgets::boolean::test::test_and ... ok
    test gadgets::lookup::test::test_synth ... ok
    test gadgets::boolean::test::test_xor ... ok
    test gadgets::num::test::test_allocated_num ... ok
    test gadgets::num::test::test_num_conditional_reversal ... ok
    [2021-09-21T04:48:55Z WARN  bellperson::multicore] BELLMAN_NUM_CPUS is deprecated, please switch to RAYON_NUM_THREADS
    [2021-09-21T04:48:55Z WARN  bellperson::multicore] BELLMAN_NUM_CPUS is deprecated, please switch to RAYON_NUM_THREADS
    test gadgets::num::test::test_num_squaring ... ok
    test gadgets::num::test::test_num_multiplication ... ok
    test gadgets::boolean::test::test_boolean_and ... ok
    test gadgets::boolean::test::test_nor ... ok
    test gadgets::boolean::test::test_alloc_conditionally ... ok
    test gadgets::num::test::test_num_scale ... ok
    test gadgets::num::test::test_num_nonzero ... ok
    test gadgets::test::test_cs ... ok
    test gadgets::boolean::test::test_boolean_xor ... ok
    test gadgets::boolean::test::test_enforce_equal ... ok
    test gadgets::uint32::test::test_uint32_rotr ... ok
    test gadgets::boolean::test::test_boolean_sha256_ch ... ok
    test gadgets::uint32::test::test_uint32_from_bits_be ... ok
    test gadgets::uint32::test::test_uint32_from_bits ... ok
    test gadgets::boolean::test::test_u64_into_boolean_vec_le ... ok
    test gadgets::lookup::test::test_lookup3_xy_with_conditional_negation ... ok
    test gadgets::boolean::test::test_boolean_sha256_maj ... ok
    test gadgets::lookup::test::test_lookup3_xy ... ok
    test groth16::proof::test_with_bls12_381::test_size ... ok
    test gadgets::uint32::test::test_uint32_shr ... ok
    test multicore::tests::test_read_num_cpus ... ok
    test util_cs::test_cs::tests::test_compute_path ... ok
    test groth16::aggregate::transcript::test::test_transcript ... ok
    test util_cs::test_cs::tests::test_cs ... ok
    test multicore::tests::test_log2_floor ... ok
    test multiexp::tests::test_extend_density_regular ... ok
    test tests::test_add_simplify ... ok
    test multiexp::tests::test_extend_density_input ... ok
    test gadgets::boolean::test::test_field_into_allocated_bits_le ... ok
    test groth16::aggregate::proof::tests::test_proof_check ... ok
    test groth16::prover::tests::test_proving_assignment_extend ... ok
    test gadgets::uint32::test::test_uint32_addmany_constants ... ok
    test domain::parallel_fft_consistency ... ok
    test groth16::aggregate::proof::tests::test_proof_io ... ok
    test gadgets::num::test::test_into_bits_strict ... ok
    [2021-09-21T04:48:55Z INFO  bellperson::groth16::prover] Bellperson 0.16.3 is being used!
    [2021-09-21T04:48:55Z INFO  bellperson::groth16::prover] starting proof timer
    [2021-09-21T04:48:55Z INFO  bellperson::gpu::locks] GPU is available for FFT!
    [2021-09-21T04:48:55Z DEBUG bellperson::gpu::locks] Acquiring GPU lock at "/tmp/bellman.gpu.lock" ...
    [2021-09-21T04:48:55Z DEBUG bellperson::gpu::locks] GPU lock acquired!
    test groth16::aggregate::accumulator::test::test_pairing_randomize ... ok
    test groth16::aggregate::srs::test::test_srs_invalid_length ... ok
    test domain::fft_composition ... ok
    test groth16::aggregate::commit::tests::test_commit_single ... ok
    test groth16::aggregate::commit::tests::test_commit_pair ... ok
    test gadgets::blake2s::test::test_blake2s_constant_constraints ... ok
    test gadgets::blake2s::test::test_blank_hash ... ok
    [2021-09-21T04:48:55Z DEBUG rust_gpu_tools::opencl::utils] loaded devices: [Device { vendor: Nvidia, name: "NVIDIA GeForce RTX 3070", memory: 8367439872, pci_id: PciId(16640), uuid: Some(5dbeddfe-c81d-fc88-bdf7-b90e59dba3f4), device: Device { id: 139792729927488 } }]
    [2021-09-21T04:48:55Z INFO  bellperson::gpu::utils] Device: Device { vendor: Nvidia, name: "NVIDIA GeForce RTX 3070", memory: 8367439872, pci_id: PciId(16640), uuid: Some(5dbeddfe-c81d-fc88-bdf7-b90e59dba3f4), device: Device { id: 139792729927488 } }
    [2021-09-21T04:48:55Z INFO  bellperson::gpu::utils] Device: Device { vendor: Nvidia, name: "NVIDIA GeForce RTX 3070", memory: 8367439872, pci_id: PciId(16640), uuid: Some(5dbeddfe-c81d-fc88-bdf7-b90e59dba3f4), device: Device { id: 139792729927488 } }
    [2021-09-21T04:48:55Z DEBUG bellperson::gpu::locks] Acquiring GPU lock at "/tmp/bellman.gpu.lock" ...
    [2021-09-21T04:48:55Z INFO  bellperson::gpu::utils] Device: Device { vendor: Nvidia, name: "NVIDIA GeForce RTX 3070", memory: 8367439872, pci_id: PciId(16640), uuid: Some(5dbeddfe-c81d-fc88-bdf7-b90e59dba3f4), device: Device { id: 139792729927488 } }
    test gpu::utils::test_list_devices ... ok
    test gadgets::sha256::test::test_blank_hash ... ok
    [2021-09-21T04:48:55Z INFO  bellperson::gpu::locks] GPU is available for Multiexp!
    [2021-09-21T04:48:55Z DEBUG bellperson::gpu::locks] Acquiring GPU lock at "/tmp/bellman.gpu.lock" ...
    test gadgets::uint32::test::test_uint32_sha256_maj ... ok
    test gadgets::uint32::test::test_uint32_sha256_ch ... ok
    test gadgets::uint32::test::test_uint32_addmany ... ok
    test gadgets::uint32::test::test_uint32_xor ... ok
    test gadgets::blake2s::test::test_blake2s_constraints ... ok
    test gadgets::blake2s::test::test_blake2s_precomp_constraints ... ok
    test gadgets::sha256::test::test_full_block ... ok
    [2021-09-21T04:48:55Z INFO  bellperson::gpu::fft] FFT: 1 working device(s) selected.
    [2021-09-21T04:48:55Z INFO  bellperson::gpu::fft] FFT: Device 0: NVIDIA GeForce RTX 3070
    [2021-09-21T04:48:55Z INFO  bellperson::domain] GPU FFT kernel instantiated!
    [2021-09-21T04:48:55Z DEBUG bellperson::gpu::locks] GPU lock released!
    [2021-09-21T04:48:55Z DEBUG bellperson::gpu::locks] GPU lock acquired!
    [2021-09-21T04:48:55Z INFO  bellperson::gpu::locks] GPU is available for Multiexp!
    [2021-09-21T04:48:55Z DEBUG bellperson::gpu::locks] Acquiring GPU lock at "/tmp/bellman.gpu.lock" ...
    [2021-09-21T04:48:56Z INFO  bellperson::gpu::fft] FFT: 1 working device(s) selected.
    [2021-09-21T04:48:56Z INFO  bellperson::gpu::fft] FFT: Device 0: NVIDIA GeForce RTX 3070
    [2021-09-21T04:48:56Z DEBUG bellperson::gpu::locks] GPU lock acquired!
    [2021-09-21T04:48:56Z DEBUG bellperson::gpu::locks] GPU lock released!
    test domain::tests::gpu_fft_consistency ... FAILED
    [2021-09-21T04:48:56Z WARN  bellperson::gpu::utils] Number of CUDA cores for your device (NVIDIA GeForce RTX 3070) is unknown! Best performance is only achieved when the number of CUDA cores is known! You can find the instructions on how to support custom GPUs here: https://lotu.sh/en+hardware-mining
    [2021-09-21T04:48:56Z INFO  bellperson::gpu::multiexp] Multiexp: 1 working device(s) selected. (CPU utilization: 0)
    [2021-09-21T04:48:56Z INFO  bellperson::gpu::multiexp] Multiexp: Device 0: NVIDIA GeForce RTX 3070 (Chunk-size: 4405294)
    [2021-09-21T04:48:56Z INFO  bellperson::multiexp] GPU Multiexp kernel instantiated!
    test groth16::multiscalar::tests::test_multiscalar_par ... ok
    test groth16::multiscalar::tests::test_multiscalar_single ... ok
    test gadgets::blake2s::test::test_blake2s_256_vars ... ok
    [2021-09-21T04:48:57Z DEBUG bellperson::gpu::locks] GPU lock released!
    [2021-09-21T04:48:57Z DEBUG bellperson::gpu::locks] GPU lock acquired!
    [2021-09-21T04:48:57Z WARN  bellperson::gpu::utils] Number of CUDA cores for your device (NVIDIA GeForce RTX 3070) is unknown! Best performance is only achieved when the number of CUDA cores is known! You can find the instructions on how to support custom GPUs here: https://lotu.sh/en+hardware-mining
    test multiexp::gpu_multiexp_consistency ... ok
    [2021-09-21T04:48:57Z INFO  bellperson::gpu::multiexp] Multiexp: 1 working device(s) selected. (CPU utilization: 0)
    [2021-09-21T04:48:57Z INFO  bellperson::gpu::multiexp] Multiexp: Device 0: NVIDIA GeForce RTX 3070 (Chunk-size: 4405294)
    [2021-09-21T04:48:57Z INFO  bellperson::multiexp] GPU Multiexp kernel instantiated!
    [2021-09-21T04:48:57Z DEBUG bellperson::gpu::locks] GPU lock released!
    [2021-09-21T04:48:57Z INFO  bellperson::groth16::prover] prover time: 1.984064297s
    test groth16::proof::test_with_bls12_381::serialization ... FAILED
    test gadgets::blake2s::test::test_blake2s_700_vars ... ok
    test gadgets::multipack::test_multipacking ... ok
    test multiexp::test_with_bls12 ... ok
    test gadgets::blake2s::test::test_blake2s_test_vectors ... ok
    test gadgets::num::test::test_into_bits ... ok
    test domain::polynomial_arith ... ok
    test gadgets::blake2s::test::test_blake2s ... ok
    test gadgets::sha256::test::test_against_vectors ... ok
    
    failures:
    
    ---- domain::tests::gpu_fft_consistency stdout ----
    Testing FFT for 2 elements...
    GPU took 0ms.
    CPU (64 cores) took 0ms.
    Speedup: xNaN
    thread 'domain::tests::gpu_fft_consistency' panicked at 'assertion failed: v1.coeffs == v2.coeffs', src/domain.rs:618:13
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    
    ---- groth16::proof::test_with_bls12_381::serialization stdout ----
    thread 'groth16::proof::test_with_bls12_381::serialization' panicked at 'assertion failed: verify_proof(&pvk, &proof, &[c]).unwrap()', src/groth16/proof.rs:326:13
    
    
    failures:
        domain::tests::gpu_fft_consistency
        groth16::proof::test_with_bls12_381::serialization
    
    test result: FAILED. 71 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out; finished in 13.31s
    
    error: test failed, to rerun pass '-p bellperson --lib'
    
    opened by meawoppl 19
  • perf: improve proving performance

    perf: improve proving performance

    • parallelize loading mapped params
    • load params in parallel while GPU work is being done, to avoid blocking later
    • improve parallelism inside proving code to avoid bottlenecks

    Speed Improvements

    on a benchmark machine using 1 RTX 3090 and 1 RTX 2080TI

    • Generate PoRep: before: 2,283s, after: 881s
    • Generate Window PoST: before: 321s, after: 144s
    opened by dignifiedquire 17
  • GPU preemption failure

    GPU preemption failure

    When the wdpost calculation and winningpost calculation occur at the same time, although the priority of winningpost is true and that of wdpost is false, winningpost still fails to preempt the GPU,and then winningpost computing timeout

    opened by Elhorses 12
  • feat: Parallel switching from GPU to CPU

    feat: Parallel switching from GPU to CPU

    This PR extends the file lock to allow for two processes to designate a higher and lower priority. The higher priority task can force the lower priority task into switching from GPU prover to CPU prover between multiexp (the heaviest part of computation) rounds. This PR contains two file structures. A prover lock file and an acquire GPU flag file. This is a fairly large change to the bellman lib so I will give a detailed example.

    i.e.

    Process A starts a sector sealing proof that blocks the GPU/prover for ~300-600s.

    Process B needs to create a smaller PoST proof in a short period of time and would like to use the GPU to do so concurrently.

    Before B starts the circuit proof call bellman::groth16::create_proof(c, &params, r, s) it will first use a new bellman::gpu feature to send a signal to A that it would like it to switch to CPU so that B may use the GPU at the same time.

    B checks if another process is creating a proof...

    let check = match gpu::gpu_is_available() {
        Ok(n) => n,
        Err(err) => false,
    };
    

    B creates an acquire file flag and loops for a short period of time until A notices this flag and releases their prover lock. Currently this is done between the 8 multiexp rounds that A is processing, but in later PRs will move into the lower multiexp rounds per chunk size and then into the kernel itself if less time is required for A to move over to CPU.

    if check != true { 
        info!("GPU is NOT Available! Attempting to acuire the GPU...");
        let a_lock = Some(gpu::acquire_gpu().unwrap());
    
        // We need to drop the acquire lock as soon as the lower prio 
        // process has freed the main lock so that the higher uses GPU
        loop {
            //info!("checking to see if lower prio process has freed GPU");
            let available = match gpu::gpu_is_available() {
                Ok(n) => n,
                Err(err) => false,
            };
            if available {
                info!("GPU free from lower prio process. Dropping acquire gpu file lock from switching process...");
                gpu::drop_acquire_lock(a_lock.unwrap());
                break;
            };
            continue;       
        }
    };
    

    When A acquires the flag between multiexps it will then use a CPU version of the multiexp from that point on. B is now free to start create_proof that will use the GPU prover.

    opened by nginnever 12
  • v0.9.3 build err

    v0.9.3 build err

    ~/Workspace/RustProject/bellman$ cargo build Compiling bellperson v0.9.3 (/mnt/e/Workspace/RustProject/bellman) error[E0309]: the parameter type F may not live long enough --> src/multicore.rs:68:25 | 57 | pub fn scope<'a, F, R>(&self, elements: usize, f: F) -> R | - help: consider adding an explicit lifetime bound F: 'a... ... 68 | THREAD_POOL.scope(|scope| f(scope, chunk_size)) | ^^^^^ | note: ...so that the type [closure@src/multicore.rs:68:31: 68:59 f:F, chunk_size:&usize] will meet its required lifetime bounds --> src/multicore.rs:68:25 | 68 | THREAD_POOL.scope(|scope| f(scope, chunk_size)) | ^^^^^

    error: aborting due to previous error

    For more information about this error, try rustc --explain E0309. error: could not compile bellperson.

    To learn more, run the command again with --verbose.

    opened by RustMan88 10
  •  GPU FFT failed! Error: Ocl Error on GeForce RTX 3080

    GPU FFT failed! Error: Ocl Error on GeForce RTX 3080

    original issue by @nickboot

    Describe the bug GeForce RTX 3080,GPU FFT failed! Falling back to CPU

    To Reproduce Steps to reproduce the behavior:

    1. nvidia-smi -L GPU 0: GeForce RTX 3080
    2. export BELLMAN_CUSTOM_GPU="GeForce RTX 3080:8704"
    3. lotus-bench sealing --storage-dir=/benchtmp --sector-size=32GiB --num-sectors=2 --parallel=2

    Screenshots 2020-09-22T03:20:30.079 INFO filecoin_proofs::api::seal > seal_commit_phase2:start 2020-09-22T03:20:30.079 INFO filecoin_proofs::caches > trying parameters memory cache for: STACKED[34359738368] 2020-09-22T03:20:30.079 INFO filecoin_proofs::caches > no params in memory cache for STACKED[34359738368] 2020-09-22T03:20:32.249 INFO filecoin_proofs::api::seal > got groth params (34359738368) while sealing 2020-09-22T03:20:32.249 INFO filecoin_proofs::api::seal > snark_proof:start 2020-09-22T03:20:32.259 INFO bellperson::groth16::prover > Bellperson 0.9.2 is being used! 2020-09-22T03:22:32.470 TRACE storage_proofs_porep::stacked::vanilla::proof > processing config 1/8 with column nodes 217728 2020-09-22T03:22:46.683 TRACE storage_proofs_porep::stacked::vanilla::proof > base data len 134217728, tree data len 19173961 2020-09-22T03:22:46.683 INFO storage_proofs_porep::stacked::vanilla::proof > persisting base tree_c 1/8 of length 153391689 2020-09-22T03:22:46.683 TRACE storage_proofs_porep::stacked::vanilla::proof > flattening tree_c base data of 134217728 nodes using batch size 262144 2020-09-22T03:22:48.782 TRACE storage_proofs_porep::stacked::vanilla::proof > done flattening tree_c base data 2020-09-22T03:22:48.782 TRACE storage_proofs_porep::stacked::vanilla::proof > flattening tree_c tree data of 19173961 nodes using batch size 262144 and base offset 134217728 2020-09-22T03:22:49.113 TRACE storage_proofs_porep::stacked::vanilla::proof > done flattening tree_c tree data 2020-09-22T03:22:49.113 TRACE storage_proofs_porep::stacked::vanilla::proof > writing tree_c store data 2020-09-22T03:22:49.652 TRACE storage_proofs_porep::stacked::vanilla::proof > done writing tree_c store data 2020-09-22T03:26:37.792 INFO bellperson::gpu::locks > GPU is available for FFT! 2020-09-22T03:26:37.798 DEBUG bellperson::gpu::locks > Acquiring GPU lock... 2020-09-22T03:26:37.798 DEBUG bellperson::gpu::locks > GPU lock acquired! 2020-09-22T03:26:41.586 INFO bellperson::gpu::fft > FFT: 1 working device(s) selected. 2020-09-22T03:26:41.586 INFO bellperson::gpu::fft > FFT: Device 0: GeForce RTX 3080 2020-09-22T03:26:41.586 INFO bellperson::domain > GPU FFT kernel instantiated! 2020-09-22T03:27:05.456 WARN bellperson::gpu::locks > GPU FFT failed! Falling back to CPU... Error: Ocl Error:

    ################################ OPENCL ERROR ###############################

    Error executing function: clEnqueueNDRangeKernel("radix_fft")

    Status error code: CL_MEM_OBJECT_ALLOCATION_FAILURE (-4)

    Please visit the following url for more information:

    https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueNDRangeKernel.html#errors

    #############################################################################

    2020-09-22T03:27:53.186 WARN bellperson::gpu::locks > GPU FFT failed! Falling back to CPU... Error: Ocl Error:

    ################################ OPENCL ERROR ###############################

    Error executing function: clEnqueueNDRangeKernel("radix_fft")

    Status error code: CL_MEM_OBJECT_ALLOCATION_FAILURE (-4)

    Please visit the following url for more information:

    https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clEnqueueNDRangeKernel.html#errors

    ############################################################################# Version (run lotus version): lotus-bench version 0.7.1

    opened by jennijuju 10
  • Remove Zero-Knowledge part

    Remove Zero-Knowledge part

    This commit removes the Zero-Knowledge part of the Groth16 protocol as proposed by @arielgabizon, introducing improvements in both proving-time and memory requirements of the protocol.

    Please consult @arielgabizon for security considerations before merging!

    opened by keyvank 9
  • System disk usage high when running groth16 solver

    System disk usage high when running groth16 solver

    I'm benchmarking a lot to see what would be the fastest setup to run a miner - but unfortunately I see this part of the benchmark using a disk I don't want it to use - but I'm not sure how to control it.

    This part in the benchmark;

    2020-07-23T14:23:14.735 INFO filecoin_proofs::api::seal > got groth params (34359738368) while sealing
    2020-07-23T14:23:14.735 INFO filecoin_proofs::api::seal > snark_proof:start
    2020-07-23T14:23:14.753 INFO bellperson::groth16::prover > Bellperson 0.9.2 is being used!
    

    RAM slowly climbs to 128GB with 50% of all cores at 100%.

    Once it reaches the 128GB, core usage drops everywhere, and I can see a ton or writes on my OS disk (sdb)

    Then after a few minutes it went over to:

    2020-07-23T14:38:34.473 INFO bellperson::gpu::locks > GPU is available for FFT!
    2020-07-23T14:38:40.023 INFO bellperson::gpu::fft > FFT: 1 working device(s) selected.
    2020-07-23T14:38:40.025 INFO bellperson::gpu::fft > FFT: Device 0: GeForce GTX 1080 Ti
    2020-07-23T14:38:40.025 INFO bellperson::domain > GPU FFT kernel instantiated!
    

    and I see a ton of reads on my OS disk (sdb);

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               0,1%    0,0%    0,8%    3,9%    0,0%   95,2%
    
          tps    kB_read/s    kB_wrtn/s    kB_dscd/s    kB_read    kB_wrtn    kB_dscd Device
         0,00         0,0k         0,0k         0,0k       0,0k       0,0k       0,0k loop0
         0,00         0,0k         0,0k         0,0k       0,0k       0,0k       0,0k loop1
         0,00         0,0k         0,0k         0,0k       0,0k       0,0k       0,0k loop2
         0,00         0,0k         0,0k         0,0k       0,0k       0,0k       0,0k loop3
         0,00         0,0k         0,0k         0,0k       0,0k       0,0k       0,0k loop4
         0,00         0,0k         0,0k         0,0k       0,0k       0,0k       0,0k loop5
         0,00         0,0k         0,0k         0,0k       0,0k       0,0k       0,0k loop6
         0,00         0,0k         0,0k         0,0k       0,0k       0,0k       0,0k loop7
         0,00         0,0k         0,0k         0,0k       0,0k       0,0k       0,0k loop8
         0,00         0,0k         0,0k         0,0k       0,0k       0,0k       0,0k loop9
         0,00         0,0k         0,0k         0,0k       0,0k       0,0k       0,0k nvme0n1
         0,00         0,0k         0,0k         0,0k       0,0k       0,0k       0,0k sda
      5375,00        21,0M         0,0k         0,0k      21,0M       0,0k       0,0k sdb
         0,00         0,0k         0,0k         0,0k       0,0k       0,0k       0,0k sdc
    

    The command I used to run the benchmark was like so;

    env RUST_LOG=info FIL_PROOFS_USE_GPU_COLUMN_BUILDER=1 FIL_PROOFS_USE_GPU_TREE_BUILDER=1 FIL_PROOFS_MAXIMIZE_CACHING=1 FIL_PROOFS_PARENT_CACHE=/mnt/nvme_2tb/filecoin-parents/ FIL_PROOFS_PARAMETER_CACHE=/mnt/nvme_2tb/filecoin-proof-parameters TMPDIR=/mnt/nvme_2tb ./lotus-bench sealing --storage-dir /mnt/nvme_2tb/bench --sector-size 32GiB --save-commit2-input ~/commit2_nvme_64_1080.json 2>&1 | tee bench.log
    

    nvme0n1 is mounted on /mnt/nvme_2tb - and different parts of the benchmark using those disks are reflected in iostat.

    So, my question is:

    1. what is happening on the disk?
    2. how can I move it to another disk?
    opened by RobQuistNL 8
  • Port instance aggregation from belllady.

    Port instance aggregation from belllady.

    UPDATE: Since I need a name by which to refer to this, and since @mmaller, @nikkolasg, and @ninitrava agree, we are going to call this instance aggregation SnarkPack+, in token of it being an incremental improvement on SnarkPack proof aggregation.

    I started porting @mmaller's instance-aggregation from https://github.com/mmaller/vdf_snark.

    Because bellperson has since changed, this required some adaptation. Type errors are mostly (but not completely) fixed. I translated somewhat blindly based on existing code in bellperson, and I'm quite sure I didn't get it quite right. Here, I'm talking about this section: https://github.com/filecoin-project/bellperson/compare/master...instance-aggregation#diff-933482a3ffb12497eb511ad200323a9f857d30dc58f59d58741130b1d6c538c5R180-R252

    I'm not especially conversant either with the bellperson SnarkPack code, nor with the belllady instance aggregation code — so I am likely not the best person to resolve this. However, I would like to see this code working and usable; and I've performed the first step in making that possible.

    I'm hopeful that fixing the remaining type errors (all seemingly with the same source), and fixing the optimized multiexponentiation calls will not be too onerous from here.

    @nikkolasg @dignifiedquire @mmaller Do you mind taking a look and advising or pushing code to get this over the finish line?

    ~UPDATE: my first cut excluded verification. I've added the code now for reference but left it commented out, since it introduces many new compilation errors.~

    ~Verification is now uncommented (thanks @dignifiedquire), and I have added a test. Unfortunately, the test is failing, and consuming code also fails (hopefully for the same reason).~

    The test now passes, and instance aggregation is behaving as expected in a downstream project. This should not be used in production until audited, but that should not block merging here. The experimental nature of the feature is documented in the README now.

    @mmaller Would appreciate your review if you care to provide.

    opened by porcuquine 7
  • [Feature request] An option to use single GPU for Multi-Exp

    [Feature request] An option to use single GPU for Multi-Exp

    winpostfail

    If a miner starts a winning post while it is calculating SNARK for windowed post, the winning post tends to fail due to the delays caused by the conflict. This issue gets more prominent as the miner's storage power increases.

    By default, a multi-exp task takes up all available GPUs. This makes it impossible to avoid the conflict described above because no matter how many GPUs are available, all of them get used up by a single task.

    I suggest providing an option to force a multi-exp task to use only one GPU as it seems to be the simplest workaround. Ex) MULTI_EXP_SINGLE_GPU=true

    For reference: https://filecoinproject.slack.com/archives/CEGB67XJ8/p1602589807190700

    opened by hyunmoon 7
  • Benchmark FFT and multiexp

    Benchmark FFT and multiexp

    1. Is there any official benchmark code for FFT on GPU?

    There is a one-line code for benchmarking multiexp on GPU

    RUST_LOG=info cargo test --features gpu -- --exact multiexp::gpu_multiexp_consistency --nocapture

    It shows that GPU is around 120x faster than the cpu version, when tested on a 3090 GPU.

    Where can I find a similar one-line code for benchmarking FFT on GPU?

    2. What is the maximum number of constraints supported?

    As mentioned in the README, there is a certain upper bound on proof size (i.e., the number of constraints).

    Depending on the size of the proof being passed to the gpu for work, certain cards will not be able to allocate enough memory to either the FFT or Multiexp kernel.

    While this number certainly depends on the GPU card, is there any rough number (e.g., $2^20$ constraints) for a 10GB GPU?

    3. Multi-GPU for FFT and multiexp

    Currently, could multi-gpu be used to support a larger size when out-of-memory? It seems the current implementation use each GPU independently for many small-size FFT and multiexp, according to fft.cl and multiexp.cl.

    Thanks!

    opened by BoyuanFeng 6
  • BELLMAN_NO_GPU=0 makes problems

    BELLMAN_NO_GPU=0 makes problems

    If BELLMAN_NO_GPU=0 is set, only PC2 uses GPU and C2 and WindowPoST uses CPU instead. You must remove BELLMAN_NO_GPU ENV to be able to use GPU for C2/WindowPoST. tested using RTX 3090 GPU.

    opened by emf-developer 0
  • Review Worker code again

    Review Worker code again

    As part of my work of moving FFT and Multiexp into the ec-gpu project, I was reviewing the Worker code again.

    The current Worker::scoped() code seems to do what the comment says. Though it seems to needlessly use a message passing primitive.

    THREAD_POOL.scoped(), where THREAD_POOL is a yastl pool, hence it's call to Pool::scoped(), which according to it's docs is already waiting for the execution to finish before it returns. This means the additional message passing wouldn't be needed.

    So either change the call to the pool to Pool::spawn(), which wouldn't wait for the execution to finish, or we remove the message passing. As we want it to block, I suggest we keep relying on Pool::scope().

    opened by vmx 0
  • Refactor aggregation failure handling

    Refactor aggregation failure handling

    This issue is triggered by https://github.com/filecoin-project/bellperson/issues/197.

    @nikkolasg and I spent a lot of time debugging this and finding out whether there is a deeper root issue or not. It took so long because the code isn't really ideal. The problem was in the case an aggregation turns out to be invalid, then an "early termination" is the idea. This is triggered through a call to invalidate(). That call sets the valid variable to false, which will then exit the aggregation thread.

    I want to present both of our view what the underlying issue is:

    • @nikkolasg prefers following the Go principles that the thread that spawns a child, should also be responsible to terminate it. This kind of is the case here, but it's so hidden, that it is almost invisible.
    • @vmx thinks (having learnt message passing in Erlang) the problem is the valid variable, which is kind of global state, which is manipulated somewhere hidden, with large consequences. I would hope that the code can be changed to terminate the aggregation thread as well as the ones that are sending through message passing.

    This means that this issue doesn't really propose a proper solution, but is rather a placeholder that someone will hopefully find the time in the future to look deeper into this.

    opened by vmx 0
  • Make coverage reports viewable

    Make coverage reports viewable

    Currently coverage reports are not uploaded to Codecov. We should look into alternatives. LLVM creates nice looking static HTML reports, we should be able to either publish them as build artifact on CircleCI (I'm not sure if you can view them there directly) or we put them on IPFS.

    And alternative could be other code coverage services like https://coveralls.io/.

    opened by vmx 0
  •  I have a question about MappedParameters

    I have a question about MappedParameters

    git log: 33e5d2ad27c0967fcccea5ffd59ae4970e346518

    src/groth16/prover.rs line 380

        // snip
       //  line 380
        let h_s = a_s
            .into_iter()
            .map(|a| {
                let h = multiexp(
                    &worker,
                    params.get_h(a.len())?,            //   I found that a.len() is not really used here
                    FullDensity,
                    a,
                    &mut multiexp_kern,
                );
                Ok(h)
            })
            .collect::<Result<Vec<_>, SynthesisError>>()?;
    

    src/groth16/mapped_params.rs line 59

    
        fn get_h(&self, _num_h: usize) -> Result<Self::G1Builder, SynthesisError> {
            let builder = self
                .h
                .iter()
                .cloned()
                .map(|h| read_g1::<E>(&self.params, h, self.checked))
                .collect::<Result<_, _>>()?;
    
            Ok((Arc::new(builder), 0))
        }
    
    

    Does this mean that the logic of get_h can be optimized, without having to create a new builder every time

    opened by Xib1uvXi 0
Owner
Filecoin
Filecoin
Reference implementation for the Poseidon Snark-friendly Hash algorithm.

Dusk-Poseidon Reference implementation for the Poseidon Hashing algorithm. Reference Starkad and Poseidon: New Hash Functions for Zero Knowledge Proof

Dusk Network 96 Jan 2, 2023
High-level networking library that extends the bevy_replicon library to allow snapshot interpolation and client-side prediction

bevy_replicon_snap A Snapshot Interpolation plugin for the networking solution bevy_replicon in the Bevy game engine. This library is a very rough pro

Ben 3 Oct 15, 2023
A Rust library for working with Bitcoin SV

Rust-SV A library to build Bitcoin SV applications in Rust. Documentation Features P2P protocol messages (construction and serialization) Address enco

Brenton Gunning 51 Oct 13, 2022
A Rust library for generating cryptocurrency wallets

Table of Contents 1. Overview 2. Build Guide 2.1 Install Rust 2.2a Build from Homebrew 2.2b Build from Crates.io 2.2c Build from Source Code 3. Usage

Aleo 552 Dec 29, 2022
A modern TLS library in Rust

Rustls is a modern TLS library written in Rust. It's pronounced 'rustles'. It uses ring for cryptography and libwebpki for certificate verification. S

ctz 4k Jan 9, 2023
Sodium Oxide: Fast cryptographic library for Rust (bindings to libsodium)

sodiumoxide |Crate|Documentation|Gitter| |:---:|:-----------:|:--------:|:-----:|:------:|:----:| |||| NaCl (pronounced "salt") is a new easy-to-use h

sodiumoxide 642 Dec 17, 2022
Highly modular & configurable hash & crypto library

Octavo Highly modular & configurable hash & crypto library written in pure Rust. Installation [dependencies] octavo = { git = "https://github.com/libO

Octavo Developers 139 Dec 29, 2022
rabe is an Attribute Based Encryption library, written in Rust

Rabe rabe is a rust library implementing several Attribute Based Encryption (ABE) schemes using a modified version of the bn library of zcash (type-3

Fraunhofer AISEC 52 Dec 15, 2022
WebAssembly wrapper of the rage encryption library

rage-wasm: WebAssembly wrapper of rage rage is a simple, modern, and secure file encryption tool, using the age format. It features small explicit key

Kan-Ru Chen 35 Dec 16, 2022
A Rust library for lattice-based additive homomorphic encryption.

Cupcake Cupcake is an efficient Rust library for the (additive version of) Fan-Vercauteren homomorphic encryption scheme, offering capabilities to enc

Facebook Research 365 Dec 11, 2022
Mundane is a Rust cryptography library backed by BoringSSL that is difficult to misuse, ergonomic, and performant (in that order).

Mundane Mundane is a Rust cryptography library backed by BoringSSL that is difficult to misuse, ergonomic, and performant (in that order). Issues and

Google 1.1k Jan 3, 2023
A modern, portable, easy to use crypto library.

Sodium is a new, easy-to-use software library for encryption, decryption, signatures, password hashing and more. It is a portable, cross-compilable, i

Frank Denis 10.7k Jan 3, 2023
InvArch Pallet Library - IP Infrastructure for Substrate

InvArch-Pallet-Library Intro This repository should contains the Pallets used in the InvArch blockchain, and reviews their relationships and functions

InvArch 20 Dec 18, 2022
Zei is a library that provide tools to create and verify public transaction with confidential data.

#Zei: Findora's Cryptographic Library Zei is a library that provide tools to create and verify public transaction with confidential data. Support: Bas

Findora Foundation 0 Oct 23, 2022
The most advanced Merkle tree library for Rust

rs-merkle rs-merkle is the most advanced Merkle tree library for Rust. Basic features include building a Merkle tree, creation, and verification of Me

Anton Suprunchuk 85 Dec 31, 2022
The Solana Program Library (SPL) is a collection of on-chain programs targeting the Sealevel parallel runtime.

Solana Program Library The Solana Program Library (SPL) is a collection of on-chain programs targeting the Sealevel parallel runtime. These programs a

null 6 Jun 12, 2022
Cryptography-oriented big integer library with constant-time, stack-allocated (no_std-friendly) implementations of modern formulas

RustCrypto: Cryptographic Big Integers Pure Rust implementation of a big integer library which has been designed from the ground-up for use in cryptog

Rust Crypto 88 Dec 31, 2022
A wallet library for Elements / Liquid written in Rust!

EDK Elements Dev Kit A modern, lightweight, descriptor-based wallet library for Elements / Liquid written in Rust! Inspired by BDK for Elements & Liqu

luca vaccaro 11 Dec 11, 2021
A template for AVR executable (non-library) projects

Rust AVR executable template A template for Rust based AVR executables. NOTE: This software template repository is offered in the public domain. It is

The AVR-Rust project 16 Sep 18, 2022