This is the version I propose to the The Computer Language Benchmarks Game.
For regex-dna, first propose your change to https://github.com/rust-lang/regex/blob/master/examples/shootout-regex-dna.rs, then submit a pull request here.
This is the version I propose to the The Computer Language Benchmarks Game.
For regex-dna, first propose your change to https://github.com/rust-lang/regex/blob/master/examples/shootout-regex-dna.rs, then submit a pull request here.
Rewrote chameneos_redux
.
Well, actually I dropped the first rewrite 'cause it was too slow.
...And the second rewrite, since that was still too slow and used a ton of unsafe
.
I guess this is the third rewrite plus a bit, since it took a lot of mauling to get it into shape.
The problem is that the competition is really good. The C++ uses a thread pool, you see. But to make it fast, they used a really biased selection algorithm... and then just spawned enough threads to hide the bias. The C version doesn't use a thread pool, but it deals with the meeting count in the same atomic instruction as the mall exchange. And both of them do unsynchronized cross-thread writes...
But, I'm glad to say, a lot of elbow grease later and I'm pretty sure I've soundly beaten both.
Advantages
It uses a Java-like thread pool. Except mine is atomic and can only actually hold 7 threads - the other 3 must be active or in the mall lest they just disappear. A form of juggling, perhaps. And when I say atomic I mean it - in the best case it can atomically dequeue a task, add it to an empty mall and switch into a fresh thread in a single exchange. After a meeting, it'll enqueue both tasks and dequeue the next in another single exchange.
I use no unsafe at all, relying instead on gratuitously tuned uncontested atomics. A lot of assembly has been read to make sure of that status.
My timings put this roughly 2-3x the speed of the next-best competitor. I have no doubt other people's timings will be slightly different, but I'm pretty confident in the lead. I can actually make it a bit faster by dropping the number of executor threads - unlike C++ I don't need one per virtual thread to hide bias, and unlike the Java I don't have contention problems with fewer threads. However, I thought that might get me into trouble, so I didn't. I might mention it when I submit it, though.
Disadvantages
let x = || Default::default();
let mut states: [ChameneosState; 16] = [
x(), x(), x(), x(), x(), x(), x(), x(),
x(), x(), x(), x(), x(), x(), x(), x(),
];
PS: help me i can't stop
Still completely green to rust, wound up creating a new version rather than fixing the old because it was easier to understand this way. LLVM seems to vectorise it reasonably well and seems to run in about 2.5x the c time on my system.
regex-dna has been replaced by regex-redux. This PR replaces the regex-dna program and data files with converted regex-redux program and data files from the web site.
(edit: fixed links)
To everyone watching this project, please review this code. @llogiq @BurntSushi @Veedrac ?
In particular, I'm not found of my hand scheduling "there is 4 CPU, thus launch the 3 more costy jobs in their own thread and do the rest syncronously".
Performances on the output of fasta 25000000:
| Version | Real | User | Sys | | --------- | ------ | ------ | ----- | | this repo version (no rules compliant) | 0m5.850s | 0m20.700s | 0m0.096s | | current rust version on benchmarksgame | 0m8.391s | 0m26.832s | 0m0.144s | | proposed version | 0m4.855s | 0m12.800s | 0m0.080s |
Thanks for your time.
This is a complete rewrite of k_nucleotide.rs
, staying largely on the original's lines. The important changes are
pack_byte
implementation.I'll submit this (and the new thread-ring.rs
) to Rust later, and the Benchmarks Game after.
One quick question: did I get the attribution right? Should I have dropped "contributed by TeXitoi"? Or should I just let TeXitoi submit it? ;P
I got rid of all the f64x2
and usizex2
structs since the code wouldn't vectorize no matter how hard I tried. Unlike #51, I also completely get rid of chunking, letting rayon do all the work, using par_iter_mut
.
This speeds up the code a little bit from 1.6s to 1.4s on my laptop
It is really sad that these code cannot be published because of num_cpus in particular. We could force it to 4 or 1 ... and add a message stating that the proper way is to use num_cpu crate
This combines newline handling and the main "reverse complement" step into a single loop. The downside of this is that it can no longer treat the input as u16 in order to transform two bytes with one table lookup. The advantages are faster speed from making a single pass over the data, and no more unsafe code!
This runs about 18% faster than the master branch on my computer.
This shaves off a consistent 7-10% on my machine, I'd be eager to see if you can reproduce these results. (though this wouldn't put us ahead of any other benchmarks, it will put us at under 2s on Isaac's machine).
As part of lolbench
(rustc's run-time performance tracking) it might be better to refactor this crate into one crate per benchmark that use the normal Cargo.toml
to fetch dependencies. This would allow each of the benchmarks to have their own bench
es, and make it more clear how to run them.
Rather than process each sequence on one thread, this processes each sequence on n
threads where n
is the number of CPU cores. This improves CPU utilization, and reduces running time on input25000000.txt
by 2–3% on my 4-core desktop.
This removes a needless thread from regex-redux and also removes the Arc in favor of scoped crossbeam threads, shaving off a few milliseconds. It is already submitted
This replaces the autovectorized code with explicit SIMD instructions and inlines the former div_and_add
function. On my machine, the difference is negligible, but I suspect that skylake has better branch prediction than the old Core 2, so here goes nothing...
Is this the repo that's currently being used for https://benchmarksgame-team.pages.debian.net/benchmarksgame/faster/rust.html ? If so, are there planned updates given the recent SIMD stabilization?
Something like the following script:
/bin/time --format="binary_trees\t%e\t%U\t%S" bin/binary_trees 20 > /dev/null && \
/bin/time --format="fannkuch_redux\t%e\t%U\t%S" bin/fannkuch_redux 12 > /dev/null && \
/bin/time --format="fasta\t\t%e\t%U\t%S" bin/fasta 25000000 > /dev/null && \
/bin/time --format="fasta_redux\t%e\t%U\t%S" bin/fasta_redux 25000000 > /dev/null && \
/bin/time --format="k_nucleotide\t%e\t%U\t%S" bin/k_nucleotide 0 < data/k_nucleotide.txt > /dev/null && \
/bin/time --format="mandelbrot\t%e\t%U\t%S" bin/mandelbrot 16000 > /dev/null && \
/bin/time --format="n_body\t\t%e\t%U\t%S" bin/n_body 50000000 > /dev/null && \
/bin/time --format="regex_dna\t%e\t%U\t%S" bin/regex_dna 0 < data/regex_dna.txt > /dev/null && \
/bin/time --format="reverse_comp.\t%e\t%U\t%S" bin/reverse_complement 0 < data/reverse_complement.txt > /dev/null && \
/bin/time --format="spectralnorm\t%e\t%U\t%S" bin/spectralnorm 5500 > /dev/null
But I'm unsure how portable this is. Also someone should check if I got the arguments right.
On r/rust, Alex Chrichton suggested we can use the system allocator.
This would presumably rid us of the 6MB upfront allocation of jemalloc, and make Rust look leaner in the benchmarks, so I'd like to benchmark this once I get to my PC.
Convex min-plus convolution を実現するアルゴリズムの実行時間を比較します。 やり方 長さ N の convex な Vec<i32> を2本生成して、それらの min-plus convolution を3通りの方法で計算します。 brute: 定義どおり計算します。(Θ
Nova benchmarks Here's a list of some of the benchmarks we've been taking to better understand how Nova performs vs other proof systems. Live version:
A simple programming language, created for AP Computer Science
Multi-platform desktop app to download and run Large Language Models(LLM) locally in your computer ?? Download | Give it a Star ⭐ | Share it on Twitte
nt-string by Colin Finck <[email protected]> Provides idiomatic Rust implementations for various Windows string types: NtUnicodeString (with NtUnicode
multiprecision TODO: rewrite readme implement CIOS for 16-bit limbs All operations are in little-endian form (where the digit in memory at the smalles
Quantum Chess A somewhat hacky implementation of this paper (made in a week over a holiday). It's not heavily tested and probably has some bugs still
TX-2 Simulator We are trying to create a simulator for Lincoln Lab's historic TX-2 computer. Notably, this is the computer on which Ivan Sutherland's
Flan's Game Boy Emulator Game Boy Emulator written in Rust, as a way to get hands-on with the Rust programming language, and creating a proper project
Hazır Oyun Motorlarını Kullanarak Rust Dili Yardımıyla Oyunlar Geliştirmek Rust programlama dilinde oyun geliştirmek için popüler birkaç hazır çatıyı
Learning Rust I started learning the Rust programming language before using GitHub, but increased its usage afterwards. I have found it to be a fast a
?? Nyah ( Unfinished ) Nyah is a programming language runtime built for high performance and comes with a scripting language. ??️ Status Nyah is not c
Mewl The programming language of cats' with the taste of lisp ?? What,Why? Well, 2 years ago in 2020, I created a esoteric programming language called
lelang leang是一门使用Rust编写,基于LLVM(inkwell llvm safe binding library)实现的编程语言,起初作为课程实验项目,现在为个人长期维护项目。 Target Features 支持8至64位的整形类型和32/64位浮点 基本的函数定义,调用,声明外部
devilslang has roots in Scheme and ML-flavored languages: it's the culmination of everything I expect from a programming language, including the desire to keep everything as minimalistic and concise as possible. At its core, devilslang is lambda-calculus with pattern-matching, structural types, fiber-based concurrency, and syntactic extension.
Hands-on Rust Source Code This repository contains the source code for the examples found in Hands-on Rust. These are also available from my publisher
GWG + wasm-bindgen example This experiment shows connecting wasm-bindgen generated code to a good-web-game application. It uses simple_logger crate to
Project Black Pearl Unify your game sources in one place by using modules made by the community. What is Project Black Pearl? Project Black Pearl (or
?? Koryto ?? Pronounced like corrito, which is pronounced as if you combined coroutine and burrito, because everyone knows coroutines are burritos in