Sampling profiler and tracer for Ruby (CRuby) which runs in BPF

Related tags

Utilities rbperf
Overview

rbperf

rbperf is a low-overhead sampling profiler and tracer for Ruby (CRuby) which runs in BPF

Build

To build rbperf you would need a Linux machine with:

  • The Rust toolchain
  • clang to compile the BPF code
  • elfutils and zlib installed
  • make and pkg-config to build libbpf

Once the dependencies are installed:

# As we are statically linking elfutils and zlib, we have to tell Rustc
# where are they located. On my Ubuntu system they are under
$ export RUSTFLAGS='-L /usr/lib/x86_64-linux-gnu'
$ cargo build [--release]

Stay tuned for pre-compiled binaries!

Usage

CPU sampling

$ sudo rbperf record --pid `pidof ruby` cpu

System call tracing

The available system calls to trace can be found with sudo ls /sys/kernel/debug/tracing/events/syscalls/

$ sudo rbperf record --pid `pidof ruby` syscall enter_writev

Some debug information will be printed, and a flamegraph called rbperf_flame_$date will be written to disk 🎉

Stability

rbperf is in active development and the CLI and APIs might change any time

Bugs

If you encounter any bugs, feel free to open an issue on rbperf's repo

Acknowledgements

rbperf wouldn't be possible without all the open source projects that we benefit from, such as Rust and all the superb crates we use in this project, Ruby and its GDB file, the BPF ecosystem, and many others!

License

Licensed under the MIT license

Comments
  • setup_perf_event fails with ENODEV (`perf` works)

    setup_perf_event fails with ENODEV (`perf` works)

    Hi there,

    When I run the suggested steps in tests/programs/Dockerfile.server, rbperf errors out with

    pid: 38686
    libruby: /usr/local/lib/libruby.so.3.0.0 @ 0x7f54251d7000
    ruby main thread address: 0x7f5425598138
    process base address: 0x55b26e96f000
    ruby version: "3.0.0"
    
    Error: setup_perf_event failed with errno No such device
    

    According to the perf_event_open(2) docs, this indicates that my CPU is missing a feature, but standard Linux perf can sample from that ruby process without problem.

    My cpuinfo is attached in case it's helpful (AMD 5800X3D): cpuinfo.txt

    I'm running kernel Linux CRAGNOR 6.0.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 16 Nov 2022 17:01:17 +0000 x86_64 GNU/Linux

    I'm not sure if it's a configuration issue (in which case I'll submit a PR for README.md) or some limitation of the way that rbperf is configuring the perf events.

    opened by shaver 5
  • Handle ctrl+c

    Handle ctrl+c

    rather than exiting without writing the profile

    [2022-08-07T20:16:43Z DEBUG rbperf::rbperf] program type Tracepoint
    ^C[2022-08-07T20:16:49Z DEBUG rbperf::rbperf] Polling perfbuf failed with System(4)
    Got 120 samples and 0 errors
    Flamegraph written to: rbperf_flame_08072022_20h16m49s.svg
    

    Signed-off-by: Francisco Javier Honduvilla Coto [email protected]

    opened by javierhonduco 1
  • Add line number tracking

    Add line number tracking

    This is disabled by default as it's not fully implemented and the numbers might not be totally accurate unfortunately. Can be enabled with --enable-linenos

    Signed-off-by: Francisco Javier Honduvilla Coto [email protected]

    opened by javierhonduco 0
  • Fixes for statically compiled Ruby

    Fixes for statically compiled Ruby

    Tested with Ruby built with:

    RUBY_CONFIGURE_OPTS="--disable-shared" rbenv install 3.0.4
    
    $ cargo run record  -p 2529227 syscall enter_writev
    pid: 2534079
    statically linked
    ruby main thread address: 0x5650b11eecb0
    process base address: 0x5650b0df2000
    ruby version: "3.0.4"
    

    Signed-off-by: Francisco Javier Honduvilla Coto [email protected]

    opened by javierhonduco 0
  • Simplify the Ruby stack walker

    Simplify the Ruby stack walker

    The overly complicated stack walking in two steps was implemented to try to pack as much work as possible per loop iteration to prevent the BPF verifier from rejecting our program due to the high complexity while analysing all the potential code paths, but I don't think this is needed anymore, so removing it to make the code vastly simpler!

    Signed-off-by: Francisco Javier Honduvilla Coto [email protected]

    opened by javierhonduco 0
  • Generate Ruby struct information programatically

    Generate Ruby struct information programatically

    Using rbspy's bindgen-generated structures for the ABI we rely on. There's some fields that we need that haven't been generated, so still hardcoding these for the time being

    The xtask binary to generate the Ruby configuration is quite repetitive and could be improved by using macros at some point

    Signed-off-by: Francisco Javier Honduvilla Coto [email protected]

    opened by javierhonduco 0
  • Upgrade all deps

    Upgrade all deps

    cargo upgrade --incompatible
    

    And updated the code. One of the most exciting changes is that this updates us to libbpf v1.0!

    Test plan

    $ cargo t
    running 26 tests
    test binary::tests::test_malformed_ruby_version_should_panic - should panic ... ok
    test bindgen_test_layout_RubyFrame ... ok
    test bindgen_test_layout_RubyVersionOffsets ... ok
    test bindgen_test_layout_RubyStack ... ok
    test bindgen_test_layout_RubyStackAddresses ... ok
    test bindgen_test_layout_RubyStackAddress ... ok
    test bindgen_test_layout_ProcessData ... ok
    test bindgen_test_layout_SampleState ... ok
    test binary::tests::test_find_main_works ... ok
    test binary::tests::test_ruby_current_thread_does_not_exist ... ok
    test rbperf::tests::test_verbose_bpf_logging_disabled ... ok
    test ruby_readers::tests::test_parse_char_buffer_works ... ok
    test ruby_readers::tests::test_parse_empty_char_buffer ... ok
    test ruby_readers::tests::test_parse_malformed_char_buffer_errors ... ok
    test ruby_readers::tests::test_parse_stack ... ok
    test rbperf::tests::rbperf_test_3_0_0 ... ok
    test rbperf::tests::rbperf_test_2_7_1 ... ok
    test rbperf::tests::rbperf_test_2_7_4 ... ok
    test rbperf::tests::test_ringbuf ... ok
    test rbperf::tests::rbperf_test_2_6_3 ... ok
    test rbperf::tests::rbperf_test_3_0_4 ... ok
    test rbperf::tests::rbperf_test_2_7_6 ... ok
    test rbperf::tests::rbperf_test_3_1_2 ... ok
    test rbperf::tests::rbperf_test_2_6_0 ... ok
    test rbperf::tests::test_cpu_profiling ... ok
    test rbperf::tests::test_big_stack ... ok
    

    Signed-off-by: Francisco Javier Honduvilla Coto [email protected]

    opened by javierhonduco 0
  • Add BPF features detection and info subcommand

    Add BPF features detection and info subcommand

    Right now this is most useful as a troubleshooting tool. I am planning to use this information to choose between implementations, such as whether ring buffers can be used, so --ringbufdoesn't have to be specified in systems that support this more efficient API

    [javierhonduco@fedora rbperf]$ cargo run info
       Compiling rbperf v0.1.0-beta (/home/javierhonduco/code/rbperf)
        Finished dev [unoptimized + debuginfo] target(s) in 7.90s
         Running `sudo -E target/debug/rbperf info`
    System info
    -----------
    Kernel release: 5.18.13-200.fc36.x86_64
    Debugfs mounted: true
    
    BPF features
    ------------
    is jited: true
    has stats: true
    has tail_call: true
    has ringbuf: true
    has bpf_loop: true
    

    Signed-off-by: Francisco Javier Honduvilla Coto [email protected]

    opened by javierhonduco 0
  • Detect process id reuse

    Detect process id reuse

    Process IDs (PIDs) are reused on Linux, so it's possible that we start profiling a Ruby process with some PID and later on, while still profiling, this process exits and a new one is spawned with this same PID.

    The profiling information will be wrong. rbperf has some guardrails to help here, such as ensuring that method and path names are valid unicode, otherwise we consider the stack to be invalid, but it would be best to actively ensure we don't have a race condition here.

    By using the pid + start_time of the process, we should be able to have a truly unique identifier of every process

    Signed-off-by: Francisco Javier Honduvilla Coto [email protected]

    opened by javierhonduco 0
  • Improve integration tests

    Improve integration tests

    • by abstracting the test process functionality
    • adding on-CPU profiling tests
    • testing the ringbuffer
    • adding a test that doesn't run the BPF code in verbose mode

    Signed-off-by: Francisco Javier Honduvilla Coto [email protected]

    opened by javierhonduco 0
  • Add more thorough tests

    Add more thorough tests

    To ensure that all the below works fine:

    • https://github.com/javierhonduco/rbperf/pull/27
      • Ring buffer
      • Perf buffers
      • CPU profiling
    • Big stacks (https://github.com/javierhonduco/rbperf/commit/b474d3fa4a8a9b3bf8558134c43df928b5708454)
    opened by javierhonduco 0
  • Add integration test for garbled stacktraces

    Add integration test for garbled stacktraces

    https://github.com/javierhonduco/rbperf/blob/28e2f524b2f97703ca8e975fd23128361f34217f/src/rbperf.rs#L436

    p ((struct RString)(*((rb_control_frame_t*) (ruby_current_vm_ptr->ractor->main_thread->ec->vm_stack + ruby_current_vm_ptr->ractor->main_thread->ec->vm_stack_size) - 2).iseq.body.location.label)).as.heap.ptr
    
    set *(0x7fd016296c42+2) = 0xf
    set {char [10]} 0x7fd016296c42 = 0xf
    
    opened by javierhonduco 0
  • Add a mode to run a ruby process directly

    Add a mode to run a ruby process directly

    Summary

    This is a feature that several people have brought up. The idea would be to spawn a Ruby process directly and start profiling it. So instead of having to do this

    # in one terminal or in the background
    $ bundler run my_ruby_app
    $ rbperf record --pid `pidof my_ruby_app` cpu
    

    We could do

    $ rbperf record --exec 'bundler run my_ruby_app' cpu
    

    Possible implementation

    This feature needs to control the execution of another process so the ptrace(2) system call is a good fit.

    Something that this implementation might have to handle is the race condition between the moment when the process is executed and when all the libraries have been mapped in memory as there's a window when libruby, in the case of dynamically compiled CRuby, might not be mapped into memory yet, and we would have to fail to find the global data that rbperf needs to read

    https://github.com/javierhonduco/rbperf/blob/48ce25b2db9e88553dfdf7d3136933ee580c2294/src/process.rs#L73-L76

    help wanted good first issue 
    opened by javierhonduco 0
  • perf: Measure overhead of running rbperf

    perf: Measure overhead of running rbperf

    rbperf has two components that ought to be analysed, the BPF stack walker, and all the userspace facilities to process the events sent by the BPF program that will be used to build the profiles.

    The overhead of the userspace part can be measured with perf or with higher-level tools such as top or htop. Understanding the performance of the BPF program would be very interesting, but unfortunately, the readily available metrics aren't representative.

    For example, bpftool can show the avg runtime for BPF programs as well as how many times they've run. As the Ruby stack walker has a very fast path when the program running isn't profiled and a slower path, when it has to walk the stack, having only average biases the result and doesn't give us the complete picture

    [javierhonduco@fedora rbperf]$ sudo sysctl -w kernel.bpf_stats_enabled=1
    kernel.bpf_stats_enabled = 1
    [javierhonduco@fedora rbperf]$ sudo bpftool prog  show id 763
    763: perf_event  name on_event  tag 97fe3cd3a6716fe2  gpl run_time_ns 61692489 run_cnt 54968
            loaded_at 2022-10-30T19:05:51+0000  uid 0
            xlated 1520B  jited 1009B  memlock 4096B  map_ids 1021,1025,1018,1017,1019
            btf_id 844
            pids rbperf(532862)
    

    Having the distribution of the run time of the BPF program would be ideal. I am planning to work on a tool to get this data to get a more accurate understanding of the actual performance impact of running rbperf

    opened by javierhonduco 0
  • Roadmap of potential features and fixes

    Roadmap of potential features and fixes

    Some of the things I have planned:

    • UX
      • [x] Better error handling (e.g. when providing a wrong syscall tracepoint name, we don't handle that nicely and the UX is bad) (https://github.com/javierhonduco/rbperf/commit/2abb8c0c49ad6e6ad1f6fa6f600fea95da5c3c62)
      • [x] Allow running tests and whatnot with cargo (https://github.com/javierhonduco/rbperf/commit/7c14cd0328f339ac6b9bf337f436d2c11970e2ca)
      • [x] Add info subcommand to show environmental details that might affect rbperf to aid debugging (https://github.com/javierhonduco/rbperf/commit/e81748a3861723e8891c28836467e147afbe2e8f)
    • Quality
      • [x] Add tests for C data structure parsing and "serialisation". Run valgrind / asan on them, too (https://github.com/javierhonduco/rbperf/commit/2e3d1c017be1082395e2819955aa0ecae65818da)
      • [x] Compile BPF code with warnings on (https://github.com/javierhonduco/rbperf/commit/cb0e7b4be6e9c3ad433ddd1bfb78eee8bbf21a1b)
      • [x] Gather error statistics (right now they are mixed / not well categorised, e.g. a map can fail to write in kernel or to read from in userspace) (https://github.com/javierhonduco/rbperf/commit/4de5dee16e3ad68304057bb52e7c6cc5e09b14c8)
      • [x] Upstream elfutils and zlib static linking patches (https://github.com/libbpf/libbpf-sys/commit/371a85d2b453b6998bc91a09bb5f43ff45e521eb, https://github.com/libbpf/libbpf-rs/commit/5bed52a25fba063c871499c54a5ffab63d32dd08)
      • [x] Add tests for big stacks that require BPF tail-calls (https://github.com/javierhonduco/rbperf/commit/b474d3fa4a8a9b3bf8558134c43df928b5708454)
      • [x] Address PID reuse race condition: ( https://github.com/javierhonduco/rbperf/commit/272c8f3087b9f85c092ffccfbd131088e6b7d21d)
      • [ ] Verify that frames that we think are native are indeed native frames (https://github.com/javierhonduco/rbperf/blob/422c5ca6ffd0d7693/src/bpf/rbperf.bpf.c#L235)
    • BPF
      • [x] Logging is enabled by default in the BPF program. This has high overhead and it is not needed most of the time (https://github.com/javierhonduco/rbperf/commit/422c5ca6ffd0d7693106b415d1daa742f84dcbc7)
      • [ ] Evaluate using ring buffers
        • [x] Add it as opt-in (https://github.com/javierhonduco/rbperf/commit/8a1e048b0684f0b8d89b87e9b9d3e7b7e6f8425f)
        • [ ] Run some tests to see how it behaviour compares to perf buffers
    • Docs
      • Add a document on architecture, as well as in-depth comments in the BPF code
      • How to debug issues
      • How to add support for Ruby versions

    • New features
      • Binary disk format
      • More output formats (folded stacks, chrome tracing, raw?)
      • Ensure it works in arm64
      • C function tracing, both from cruby or the libraries it dynamically links to (uprobes)

    • Experimental ideas
      • Allocation tracing (w/ mem leak detection)
      • Request-specific data

    • Other
      • [x] Ensure we work with YJIT (asked in https://github.com/Shopify/yjit/. It works so far, but this might change)
      • [ ] Add git revision to the future info subcommand and in the BPF's metadata section

    • Release
      • Publish x86_64 binaries

    Simplify execution context fetching:

    (gdb) p/x ruby_current_vm_ptr->ractor->main_thread->ec
    $1 = 0x20729b0
    (gdb) p/x ruby_current_ec
    $2 = 0x20729b0
    
    opened by javierhonduco 0
Releases(v0.2.1)
  • v0.2.1(Oct 30, 2022)

    Changes

    • Now rbperf is also shipped as a statically linked binary. Until now, libc and other libraries were required in the system, which was a problem in some distros that shipped older, incompatible versions. There's an added CI job to ensure that the static build works
    • Enabled LTO for release builds, reducing binary size enough to produce binaries with the same size even though now they are statically linked
    • Fixes to allow profiling Ruby processes that are statically linked, rather than dynamically linking libruby
    Source code(tar.gz)
    Source code(zip)
    rbperf(7.78 MB)
    rbperf-dynamic(6.28 MB)
  • v0.2.0(Oct 16, 2022)

    Changes

    • Added a xtask task to generate the Ruby configuration files which contain details of its ABI we need to walk the stack rather than having all values manually generated. There's still some work to do to avoid duplication and ensure that every value is programmatically generated https://github.com/javierhonduco/rbperf/commit/ec4724d1c14473448c13dceec9b765b71b43cc4e;
    • Simplify the Ruby stack walker, which used to be done in two phases. This reduces the number of CPU instructions needed to walk the stack and increases the readability of the code https://github.com/javierhonduco/rbperf/commit/6f4f78c8a9060f16d5ec4bccc8cec7c0018bc1c5;
    Source code(tar.gz)
    Source code(zip)
    rbperf(8.95 MB)
  • v0.1.0(Oct 2, 2022)

    This is the first rbperf release! 🎉

    New features

    • Written in Rust, which brings a lot of performance improvements and excellent dev UX features. Stay tuned for a write-up on why Rust is an excellent fit for rbperf https://github.com/javierhonduco/rbperf/commit/50231c3c71f2847cfdf51cf31339d9e4135ed0ca;
    • Using libbpf, via libbpf-rs, which brings us a lot of goodies such as not having to ship/use LLVM and recompile the BPF code every single time, BTF, CO-RE, among many other features;
    • A bunch of correctness issues were squashed. On some occasions, the Ruby stack walker did not stop after the last frame and bogus frames were introduced;
    • Added support for Ruby 3.0.0, 3.0.4, and 3.1.2;
    • Added rbperf info, which shows useful information about the system and the BPF features it supports https://github.com/javierhonduco/rbperf/commit/e81748a3861723e8891c28836467e147afbe2e8f;
    • Added detection of PID reuse, to ensure that the right process is the one being profiled https://github.com/javierhonduco/rbperf/commit/272c8f3087b9f85c092ffccfbd131088e6b7d21d;
    • rbperf record --pid <pid> syscall --list lists the available system calls we can trace https://github.com/javierhonduco/rbperf/commit/eda4f218ed33a40be6de3416e7a066a22c153e64;
    • With --ringbuf the new ring buffer interface can be used instead of perf events. This new API can send data to userspace with lower overhead https://github.com/javierhonduco/rbperf/commit/8a1e048b0684f0b8d89b87e9b9d3e7b7e6f8425f;
    • Added --verbose-bpf-logging to the record subcommand to enable BPF logging that can be tailed at /sys/kernel/debug/tracing/trace_pipe. This is very useful while troubhleshooting BPF issues and having it as a flat helps reduce rbperf's overhead as the loader removes branches that can be proved as non-reachable https://github.com/javierhonduco/rbperf/commit/422c5ca6ffd0d7693106b415d1daa742f84dcbc7;
    • Libelf and zlib are now statically linked https://github.com/libbpf/libbpf-sys/commit/371a85d2b453b6998bc91a09bb5f43ff45e521eb, https://github.com/libbpf/libbpf-rs/commit/5bed52a25fba063c871499c54a5ffab63d32dd08;
    • And many many others!

    Feel free to send any bugs, feedback, ideas, or comments, either by opening an issue or directly to me!

    Removed features (so far)

    • Uprobe/USDTs have not been implemented yet. This is super useful, especially for allocation profiling and will come later on;
    • There's no binary format. While it would be very useful to have an intermediate format that can be converted to a variety of outputs, such as flamegraphs and so on, I wanted to keep the focus on correctness in improving the current code and APIs. Once these things are fleshed out the binary format will be reconsidered

    Acknowledgements

    Thanks so much to all of you that have tried rbperf. Your feedback has been invaluable!

    Source code(tar.gz)
    Source code(zip)
    rbperf(8.99 MB)
Owner
Javier Honduvilla Coto
Javier Honduvilla Coto
BPF library for Async Rust, complementary for libbpf-rs.

libbpf-async A library for writing BPF programs in Async Rust, complementary for libbpf-rs, Rust wrapper for libbpf. Currently, this provides Async-fr

FUJITA Tomonori 13 Nov 9, 2022
List of Persian Colors and hex colors for CSS, SCSS, PHP, JS, Python, and Ruby.

Persian Colors (Iranian colors) List of Persian Colors and hex colors for CSS, SCSS, PHP, C++, QML, JS, Python, Ruby and CSharp. Persian colors Name H

Max Base 12 Sep 3, 2022
Simple ray tracer written in Rust

Simple ray tracer written in Rust from scratch I've just finished my first semester at the Faculty of Applied Mathematics and Computer Science at the

Vladislav 190 Dec 21, 2022
The axiom profiler for exploring and visualizing SMT solver quantifier instantiations (made via E-matching).

Axiom Profiler A tool for visualising, analysing and understanding quantifier instantiations made via E-matching in a run of an SMT solver (at present

Viper Project 18 Oct 18, 2022
Malloc frequency profiler

Malloc frequency profiler This malloc frequency profiler helps detect program hotspots that perform a large number of memory allocations.

Leonid Ryzhyk 7 Jan 7, 2022
A tracing profiler for the Sega MegaDrive/Genesis

md-profiler, a tracing profiler for the Sega MegaDrive/Genesis This program, meant to be used with this fork of BlastEm, helps you finding bottlenecks

null 15 Nov 3, 2022
Simple timings profiler

profl Simple timings profiler Example fn main() -> std::io::Result<()> { profl::init("timings.data"); let mut total = 0; for i in 0..1000

Broxus 1 Dec 9, 2021
🐦 Friendly little instrumentation profiler for Rust 🦀

?? puffin The friendly little instrumentation profiler for Rust How to use fn my_function() { puffin::profile_function!(); ... if ... {

Embark 848 Dec 29, 2022
A memory profiler for Linux.

Bytehound - a memory profiler for Linux Features Can be used to analyze memory leaks, see where exactly the memory is being consumed, identify tempora

Koute 3.3k Dec 25, 2022
🐝🦀🔥 An ebpf based CPU profiler written in Rust

profile-bee ?? ?? ?? Profile Bee is an eBPF based CPU profiler written in Rust for performance and efficiency. Aya is used for building the BPF progra

Joshua Koo 5 Dec 16, 2022
BSV stdlib written in Rust and runs in WASM environments

BSV.WASM A Rust/WASM Library to interact with Bitcoin SV Installation NodeJS: npm i bsv-wasm --save Web: npm i bsv-wasm-web --save Rust: https://crate

null 56 Dec 15, 2022
Watches changes in a rust project, runs test and shows friendly notification

Cargo testify Automatically runs tests on your Rust project and notifies about the result. Install Install prerequisites (for Debian/Ubuntu): apt-get

Sergey Potapov 77 May 16, 2022
runs init, preview and apply on pulumi stacks right in your Github Actions. Inspired from Atalantis for Terraform

pulumi-actions runs init, preview and apply on pulumi stacks right in your Github-Actions. Inspired from Atlantis for Terraform PREVIEW Release Curren

Meet Vasani 6 Aug 7, 2023
A thin-hypervisor that runs on aarch64 CPUs.

How to build the hypervisor By Rust toolchain (TBD) By docker Requirements Docker (Tested by Docker version 20.10.8, build 3967b7d28e) I tested by non

RIKEN R-CCS 54 Dec 12, 2022
Creates a DLL that runs a payload once injected into a process.

Educational purposes only Don't use this project maliciously. Prerequisites Install rust Install windows toolchain Setup Run cargo run --bin builder -

RadonCoding 3 Aug 27, 2022
This contract is to provide vesting account feature for the both cw20 and native tokens, which is controlled by a master address

Token Vesting This contract is to provide vesting account feature for the both cw20 and native tokens, which is controlled by a master address. Instan

yys 7 Oct 7, 2022
This article is about the unsound api which I found in owning_ref. Owning_ref is a library that has 11 million all-time downloads and 60 reverse dependencies.

Unsoundness in owning_ref This article is about the unsound api which I found in owning_ref. Owning_ref is a library that has 11 million all-time down

Noam Ta Shma 20 Aug 3, 2022
A cross platform tool which instantly notifies about COVID vaccine availability.

?? CoWIN Notifier ?? A cross-platform tool written in rust, which instantly notifies users about COVID-19 vaccine availability at their regions. Curre

Sanskar Jaiswal 20 May 20, 2021
Rust crate which provides direct access to files within a Debian archive

debarchive This Rust crate provides direct access to files within a Debian archive. This crate is used by our debrep utility to generate the Packages

Pop!_OS 11 Dec 18, 2021