The Computer Language Benchmarks Game: Rust implementations

Guillaume P.

Last update: Jul 11, 2022

Related tags

Overview

The Computer Language Benchmarks Game: Rust implementations

This is the version I propose to the The Computer Language Benchmarks Game.

For regex-dna, first propose your change to https://github.com/rust-lang/regex/blob/master/examples/shootout-regex-dna.rs, then submit a pull request here.

Comments

Rewrote chameneos_redux.rs
Rewrote chameneos_redux.

Well, actually I dropped the first rewrite 'cause it was too slow.

...And the second rewrite, since that was still too slow and used a ton of unsafe.

I guess this is the third rewrite plus a bit, since it took a lot of mauling to get it into shape.

The problem is that the competition is really good. The C++ uses a thread pool, you see. But to make it fast, they used a really biased selection algorithm... and then just spawned enough threads to hide the bias. The C version doesn't use a thread pool, but it deals with the meeting count in the same atomic instruction as the mall exchange. And both of them do unsynchronized cross-thread writes...

But, I'm glad to say, a lot of elbow grease later and I'm pretty sure I've soundly beaten both.

Advantages

It uses a Java-like thread pool. Except mine is atomic and can only actually hold 7 threads - the other 3 must be active or in the mall lest they just disappear. A form of juggling, perhaps. And when I say atomic I mean it - in the best case it can atomically dequeue a task, add it to an empty mall and switch into a fresh thread in a single exchange. After a meeting, it'll enqueue both tasks and dequeue the next in another single exchange.

I use no unsafe at all, relying instead on gratuitously tuned uncontested atomics. A lot of assembly has been read to make sure of that status.

My timings put this roughly 2-3x the speed of the next-best competitor. I have no doubt other people's timings will be slightly different, but I'm pretty confident in the lead. I can actually make it a bit faster by dropping the number of executor threads - unlike C++ I don't need one per virtual thread to hide bias, and unlike the Java I don't have contention problems with fewer threads. However, I thought that might get me into trouble, so I didn't. I might mention it when I submit it, though.

Disadvantages

let x = || Default::default(); let mut states: [ChameneosState; 16] = [ x(), x(), x(), x(), x(), x(), x(), x(), x(), x(), x(), x(), x(), x(), x(), x(), ];

PS: help me i can't stop
opened by Veedrac 24
New mandelbrot.rs

Still completely green to rust, wound up creating a new version rather than fixing the old because it was easier to understand this way. LLVM seems to vectorise it reasonably well and seems to run in about 2.5x the c time on my system.

opened by Schroedingers-Hat 13
regex_redux

regex-dna has been replaced by regex-redux. This PR replaces the regex-dna program and data files with converted regex-redux program and data files from the web site.

(edit: fixed links)

opened by mbrubeck 12
k_nucleotide: rewrite for compliance of the new rules and performances

To everyone watching this project, please review this code. @llogiq @BurntSushi @Veedrac ?

In particular, I'm not found of my hand scheduling "there is 4 CPU, thus launch the 3 more costy jobs in their own thread and do the rest syncronously".

Performances on the output of fasta 25000000:

| Version | Real | User | Sys | | --------- | ------ | ------ | ----- | | this repo version (no rules compliant) | 0m5.850s | 0m20.700s | 0m0.096s | | current rust version on benchmarksgame | 0m8.391s | 0m26.832s | 0m0.144s | | proposed version | 0m4.855s | 0m12.800s | 0m0.080s |

Thanks for your time.

opened by TeXitoi 11
Re-wrote k_nucleotide
This is a complete rewrite of k_nucleotide.rs, staying largely on the original's lines. The important changes are

an improved hash table, using flat Robin Hood Hashing,

pre-compacted input,

better load balanced threading,

really cool pack_byte implementation.

I'll submit this (and the new thread-ring.rs) to Rust later, and the Benchmarks Game after.

One quick question: did I get the attribution right? Should I have dropped "contributed by TeXitoi"? Or should I just let TeXitoi submit it? ;P
opened by Veedrac 11
spectralnorm: rayon and de-vectorize

I got rid of all the f64x2 and usizex2 structs since the code wouldn't vectorize no matter how hard I tried. Unlike #51, I also completely get rid of chunking, letting rayon do all the work, using par_iter_mut.

This speeds up the code a little bit from 1.6s to 1.4s on my laptop

opened by cristicbz 10
contribute with fixed num cpus

It is really sad that these code cannot be published because of num_cpus in particular. We could force it to 4 or 1 ... and add a message stating that the proper way is to use num_cpu crate

opened by tafia 10
reverse_complement: Do line wrapping and reverse complement in one step

This combines newline handling and the main "reverse complement" step into a single loop. The downside of this is that it can no longer treat the input as u16 in order to transform two bytes with one table lookup. The advantages are faster speed from making a single pass over the data, and no more unsafe code!

This runs about 18% faster than the master branch on my computer.

opened by mbrubeck 9
mandelbrot: rayon and improvements

This shaves off a consistent 7-10% on my machine, I'd be eager to see if you can reproduce these results. (though this wouldn't put us ahead of any other benchmarks, it will put us at under 2s on Isaac's machine).

opened by cristicbz 9
refactor

As part of lolbench (rustc's run-time performance tracking) it might be better to refactor this crate into one crate per benchmark that use the normal Cargo.toml to fetch dependencies. This would allow each of the benchmarks to have their own benches, and make it more clear how to run them.

opened by gnzlbg 8
reverse_complement: More paralellism and safety

Rather than process each sequence on one thread, this processes each sequence on n threads where n is the number of CPU cores. This improves CPU utilization, and reduces running time on input25000000.txt by 2–3% on my 4-core desktop.

opened by mbrubeck 8
regex-redux crossbeamed

This removes a needless thread from regex-redux and also removes the Arc in favor of scoped crossbeam threads, shaving off a few milliseconds. It is already submitted

opened by llogiq 0
SIMD spectralnorm

This replaces the autovectorized code with explicit SIMD instructions and inlines the former div_and_add function. On my machine, the difference is negligible, but I suspect that skylake has better branch prediction than the old Core 2, so here goes nothing...

opened by llogiq 1
Update for SIMD stabilization?

Is this the repo that's currently being used for https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/rust.html ? If so, are there planned updates given the recent SIMD stabilization?

opened by egilburg 4

Add make target to run benchmarks

Something like the following script:

/bin/time --format="binary_trees\t%e\t%U\t%S" bin/binary_trees 20 > /dev/null && \
/bin/time --format="fannkuch_redux\t%e\t%U\t%S" bin/fannkuch_redux 12 > /dev/null && \
/bin/time --format="fasta\t\t%e\t%U\t%S" bin/fasta 25000000 > /dev/null && \
/bin/time --format="fasta_redux\t%e\t%U\t%S" bin/fasta_redux 25000000 > /dev/null && \
/bin/time --format="k_nucleotide\t%e\t%U\t%S" bin/k_nucleotide 0 < data/k_nucleotide.txt > /dev/null && \
/bin/time --format="mandelbrot\t%e\t%U\t%S" bin/mandelbrot 16000 > /dev/null && \
/bin/time --format="n_body\t\t%e\t%U\t%S" bin/n_body 50000000 > /dev/null && \
/bin/time --format="regex_dna\t%e\t%U\t%S" bin/regex_dna 0 < data/regex_dna.txt > /dev/null && \
/bin/time --format="reverse_comp.\t%e\t%U\t%S" bin/reverse_complement 0 < data/reverse_complement.txt > /dev/null && \
/bin/time --format="spectralnorm\t%e\t%U\t%S" bin/spectralnorm 5500 > /dev/null

But I'm unsure how portable this is. Also someone should check if I got the arguments right.

opened by llogiq 0

try using the system allocator

On r/rust, Alex Chrichton suggested we can use the system allocator.

This would presumably rid us of the 6MB upfront allocation of jemalloc, and make Rust look leaner in the benchmarks, so I'd like to benchmark this once I get to my PC.

opened by llogiq 3

Owner

Guillaume P.

GitHub

Benchmarks of algorithms for convex min-plus convolutions

Convex min-plus convolution を実現するアルゴリズムの実行時間を比較します。やり方長さ N の convex な Vec<i32> を２本生成して、それらの min-plus convolution を３通りの方法で計算します。 brute: 定義どおり計算します。(Θ

2 Oct 24, 2021

A collection of comparison-benchmarks for Nova & related Proving systems

Nova benchmarks Here's a list of some of the benchmarks we've been taking to better understand how Nova performs vs other proof systems. Live version:

Privacy & Scaling Explorations (formerly known as appliedzkp)

18 Apr 17, 2023

A simple programming language, created for AP Computer Science

3 Sep 2, 2022

Multi-platform desktop app to download and run Large Language Models(LLM) locally in your computer.

Multi-platform desktop app to download and run Large Language Models(LLM) locally in your computer ?? Download | Give it a Star ⭐ | Share it on Twitte

73 Jun 15, 2023

Idiomatic Rust implementations for various Windows string types (like UNICODE_STRING)

nt-string by Colin Finck <[email protected]> Provides idiomatic Rust implementations for various Windows string types: NtUnicodeString (with NtUnicode

5 Jun 4, 2023

Rust implementations of finite-field arithmetic based on big integers with configurable limb sizes

multiprecision TODO: rewrite readme implement CIOS for 16-bit limbs All operations are in little-endian form (where the digit in memory at the smalles

4 Mar 30, 2024

Experimental Quantum Computer Simulator + Quantum Chess Implementation

Quantum Chess A somewhat hacky implementation of this paper (made in a week over a holiday). It's not heavily tested and probably has some bugs still

19 Jan 21, 2022

Simulator for the pioneering TX-2 computer

TX-2 Simulator We are trying to create a simulator for Lincoln Lab's historic TX-2 computer. Notably, this is the computer on which Ivan Sutherland's

10 Dec 6, 2022

Game Boy Emulator written in Rust, as a way to fully grasp the Rust programming language

Flan's Game Boy Emulator Game Boy Emulator written in Rust, as a way to get hands-on with the Rust programming language, and creating a proper project

3 Dec 31, 2022

Game development practices with Rust programming language. I want to use different crates for this.

Hazır Oyun Motorlarını Kullanarak Rust Dili Yardımıyla Oyunlar Geliştirmek Rust programlama dilinde oyun geliştirmek için popüler birkaç hazır çatıyı

16 Dec 27, 2022

A repository for showcasing my knowledge of the Rust programming language, and continuing to learn the language.

Learning Rust I started learning the Rust programming language before using GitHub, but increased its usage afterwards. I have found it to be a fast a

2 Nov 8, 2022

Nyah is a programming language runtime built for high performance and comes with a scripting language.

?? Nyah ( Unfinished ) Nyah is a programming language runtime built for high performance and comes with a scripting language. ??️ Status Nyah is not c

3 Mar 6, 2022

Mewl, program in cats' language; A just-for-fun language

Mewl The programming language of cats' with the taste of lisp ?? What,Why? Well, 2 years ago in 2020, I created a esoteric programming language called

14 Oct 23, 2022

lelang programming language is a toy language based on LLVM.

lelang leang是一门使用Rust编写，基于LLVM(inkwell llvm safe binding library)实现的编程语言，起初作为课程实验项目，现在为个人长期维护项目。 Target Features 支持8至64位的整形类型和32/64位浮点基本的函数定义，调用，声明外部

5 Sep 4, 2022

The Devils' Programming Language (Quantum Programming Language)

devilslang has roots in Scheme and ML-flavored languages: it's the culmination of everything I expect from a programming language, including the desire to keep everything as minimalistic and concise as possible. At its core, devilslang is lambda-calculus with pattern-matching, structural types, fiber-based concurrency, and syntactic extension.

2 Aug 26, 2022

The source code that accompanies Hands-on Rust: Effective Learning through 2D Game Development and Play by Herbert Wolverson

Hands-on Rust Source Code This repository contains the source code for the examples found in Hands-on Rust. These are also available from my publisher

261 Dec 14, 2022

The Computer Language Benchmarks Game: Rust implementations

Related tags

Overview

The Computer Language Benchmarks Game: Rust implementations

Comments

Owner

Guillaume P.

Benchmarks of algorithms for convex min-plus convolutions

A collection of comparison-benchmarks for Nova & related Proving systems

A simple programming language, created for AP Computer Science

Multi-platform desktop app to download and run Large Language Models(LLM) locally in your computer.

Idiomatic Rust implementations for various Windows string types (like UNICODE_STRING)

Rust implementations of finite-field arithmetic based on big integers with configurable limb sizes

Experimental Quantum Computer Simulator + Quantum Chess Implementation

Simulator for the pioneering TX-2 computer

Game Boy Emulator written in Rust, as a way to fully grasp the Rust programming language

Game development practices with Rust programming language. I want to use different crates for this.

A repository for showcasing my knowledge of the Rust programming language, and continuing to learn the language.

Nyah is a programming language runtime built for high performance and comes with a scripting language.

Mewl, program in cats' language; A just-for-fun language

lelang programming language is a toy language based on LLVM.

The Devils' Programming Language (Quantum Programming Language)

The source code that accompanies Hands-on Rust: Effective Learning through 2D Game Development and Play by Herbert Wolverson

This experiment shows connecting wasm-bindgen generated code to a good-web-game application.

Unify your game sources in one place and aquire more of them, using modules made by the community.

🎮 game loop + 🐷 coroutine + 🌯 burrito = 🚀🔥 blazingly synchronous async executor for games 🔥🚀