Reading Getting Friendly With CPU Caches

Overview

Getting Friendly With CPU Caches

Reading Getting Friendly With CPU Caches, by Miki Tebeka and William Kennedy, inspired me to look at some Rust equivalents.

I've used Criterion for benchmarks, and the final version users the itertools crate.

Techniques

  1. original_slow_go.rs - a line-by-line port of the original---sluggish---Go code.
  2. original_fast_go.rs - a line-by-line port of the improved---fast---Go code. Image has been turned into a Box, a safe (no null pointer issues here) pointer to a heap-allocated Image struct.
  3. idiomatic_rust takes the code from (2), and replaces the for loops with an iterator-based approach. This retains the HashMap, countries are still strings---but using an iterator allows the compiler to elide some bounds checks.
  4. no_map removes the HashMap completely---because hashing is slow. Instead, it returns a vector of tuples (count, country string).
  5. no_map_country is the same as (4), but replaces the country string with a pointer to the static countries list.
  6. no_map_country_idx replaces country altogether with an index into the countries list. This could easily be stored separately and re-attached as needed (when returning the user via the API). It'll make your API faster if your client obtains and keeps a country list, too!

All benchmarks were performed under Windows 11, on a 12th generation Intel Core i7 with 32 gb of RAM.

Results

Test Mean Performance
original_slow_go 419.24 µs
original_fast_go 329.51 µs
idiomatic_rust 330.13 µs
no_map 77.627 µs
no_map_country 77.256 µs
no_map_country_idx 21.911 µs

Explanation

The original article explains the difference between the "slow" and "fast" Go---the User structure shrinks massively by storing a pointer to the image data, allowing for much better cache utilization. Translating the for loop into a Rust iterator makes a negligible difference---they compile into very similar code.

no_map reasoned that the HashMap---in particular hashing values---was taking up a lot of time. Sorting is very fast, and itertools provides a great dedup_with_counts function. Combining the two gives you a HashMap-free solution. The speed increase is huge.

I then reasoned that chasing pointers for strings was problematic. The no_map_country example offered very little improvement: instead of discrete strings, it reduces memory usage by storing the countries once and pointing to that structure. The performance difference was negligible.

Using an index of the country table is massively faster. The User structure is still the same size---a usize and a pointer are the same size. But storing just the index removes an entire "pointer chase"---the program doesn't have to follow the pointer into the countries table to read the value. It just reads the index. This is a huge win.

You might also like...
Tool and framework for securely reading untrusted USB mass storage devices.

usbsas is a free and open source (GPLv3) tool and framework for securely reading untrusted USB mass storage devices. Description Following the concept

📚 flow state reading in the terminal
📚 flow state reading in the terminal

fsrx 📚(f)low (s)tate (r)eading e(x)change – flow state reading in the terminal Inspired by (but not affiliated with) Renato Casutt and his revolution

Zenith - sort of like top or htop but with zoom-able charts, CPU, GPU, network, and disk usage
Zenith - sort of like top or htop but with zoom-able charts, CPU, GPU, network, and disk usage

Zenith - sort of like top or htop but with zoom-able charts, CPU, GPU, network, and disk usage

Blink program on RISC L106 80Mhz 32bit CPU

esp8266-blink Blink program on RISC L106 80Mhz 32bit CPU Flashing Running rust on ESP* is sort of hard... We won't cover the installation process, ins

A CLI tool which can help you automatically kill process of your choice. Useful for freeing up memory and CPU usage!
A CLI tool which can help you automatically kill process of your choice. Useful for freeing up memory and CPU usage!

Quickiller There are always programs such as chrome that keep eating up your resources even when closed! The only way to prevent this is to kill all o

Sample and plot power consumption, average frequency and cpu die temperatures over time.
Sample and plot power consumption, average frequency and cpu die temperatures over time.

sense Sense is a small tool to gather data on cpu temperature, power usage and clock frequency and plot graphs during some load. Dependencies Sense is

A tool to control the fan speed by monitoring the temperature of CPU via IPMI.

ipmi-fan-control A tool to control the fan speed by monitoring the temperature of CPU via IPMI. Why Our Dell R730 server's iDRAC is not works as expec

Raspberry Pi's CPU and GPU temperature exporter for Prometheus consumption.
Raspberry Pi's CPU and GPU temperature exporter for Prometheus consumption.

Pi Temperature Exporter A simple application for collecting Raspberry Pi's CPU and GPU temperatures and exporting them for Prometheus consumption. Ins

A small script in rust to get the cpu usage in %'s with a gradient color for the text
A small script in rust to get the cpu usage in %'s with a gradient color for the text

cpu_usage-polybar A small script in rust to get the cpu usage in %'s with a gradient color for the text To get it to work on your PC you will have to

Owner
Herbert "TheBracket"
Developer of RLTK and Nox Futura open source projects, and consultant on Rust, C++, TypeScript, C#, Java, Mikrotik and anything else you want to hand my way!
Herbert
A CLI tool for getting screenshots of URLs using headless chrome

rustywitness ?? ?? ?? Web screenshot utility A CLI tool for getting screenshots of URLs using headless chrome Built with ❤︎ by swanandx and contributo

Swanand Mulay 16 Jan 3, 2023
MollySocket allows getting signal notifications via UnifiedPush.

MollySocket This software is still in alpha. MollySocket allows getting signal notifications via UnifiedPush. It works like a linked device, which doe

null 6 Dec 31, 2022
Cross-platform GameMaker extension for getting system information and resource usage

GM Sysinfo Cross-platform GameMaker extension for getting system information and resource usage Table of Contents Table of Contents Examples Display m

SpikeHD 3 Dec 1, 2023
Simple, cross-platform GameMaker lib for getting file metadata

File Metadata Tiny baby library for getting file metadata. Originally written to work for a GameMaker game a friend is creating. Table of Contents Ins

SpikeHD 3 Nov 28, 2023
A simple and efficient terminal UI implementation with ratatui.rs for getting quick insights from csv files right on the terminal

CSV-GREP csv-grep is an intuitive TUI application writting with ratatui.rs for reading, viewing and quickly analysing csv files right on the terminal.

Anthony Ezeabasili 16 Mar 10, 2024
Rust crate `needleman_wunsch` of the `fasebare` package: reading FASTA sequences, Needleman-Wunsch alignment

fasebare Rust crate needleman_wunsch of the fasebare package: reading FASTA sequences, Needleman-Wunsch alignment. Synopsis The crate needleman_wunsch

Laurent Bloch 2 Nov 19, 2021
An open source, programmed in rust, privacy focused tool for reading programming resources (like stackoverflow) fast, efficient and asynchronous from the terminal.

Falion An open source, programmed in rust, privacy focused tool for reading programming resources (like StackOverFlow) fast, efficient and asynchronou

Obscurely 17 Dec 20, 2022
📚 flow state reading in the terminal

fsrx ?? (f)low (s)tate (r)eading e(x)change – flow state reading in the terminal Inspired by (but not affiliated with) Renato Casutt and his revolutio

colby thomas 276 Dec 14, 2022
Desktop app for reading and downloading manga. With clean distraction-free design and no clutter

Tonbun Tonbun is a desktop app for reading and downloading manga. With clean distraction-free design and no clutter. Build with Rust, Tauri, Vue.js, a

null 23 Nov 30, 2022
A blazingly fast rust-based bionic reader for blazingly fast reading within a terminal console 🦀

This Rust-based CLI tool reads text and returns it back in bionic reading format for blazingly fast loading and even faster reading! Bionic reading is

Ismet Handzic 5 Aug 5, 2023