memory-profiler — A memory profiler for Linux

Koute

Last update: Jan 9, 2023

Related tags

Profiling memory-profiler

Overview

A memory profiler for Linux

Features

Can be used to analyze memory leaks, see where exactly the memory is being consumed, identify temporary allocations and investigate excessive memory fragmentation
Gathers every allocation and deallocation, along with full stack traces
Uses a custom, tailor-made stack unwinding implementation which makes it a lot cheaper than other similar tools, potentially up to orders of magnitude faster in some cases
Can export the data it gathered into various different formats; it can export the data as JSON (so you can analyze it yourself if you want), as Heaptrack (so you can use the excellent Heaptrack GUI for analysis) and as a flamegraph
Has its own Web-based GUI which can be used for analysis
Can dynamically stream the profiling data to another machine instead of saving it locally, which is useful for profiling on memory-constrained systems
Supports AMD64, ARM, AArch64 and MIPS64 architectures (where MIPS64 requires a tiny out-of-tree kernel patch for perf_event_open)

Screenshots

Building

Install GCC, Rust and the Yarn package manager (for building the GUI)

Build it:

 $ cargo build --release -p memory-profiler
 $ cargo build --release -p memory-profiler-cli

Grab the binaries from target/release/libmemory_profiler.so and target/release/memory-profiler-cli

Usage

Basic usage

$ LD_PRELOAD=./libmemory_profiler.so ./your_application
$ ./memory-profiler-cli server memory-profiling_*.dat

Then open your Web browser and point it at http://localhost:8080 to access the GUI.

If you'd rather not use the GUI you can also make use of the REST API exposed by the server. For example:

Generate a flamegraph of leaked allocations:

$ curl "http://localhost:8080/data/last/export/flamegraph?lifetime=only_leaked" > flame.svg

Export the leaked allocations as an ASCII tree:

$ curl "http://localhost:8080/data/last/allocation_ascii_tree?lifetime=only_leaked"

Export the biggest three allocations made by the application to JSON: (You should pipe the output to json_reformat for human readable output.)
```
$ curl "http://localhost:8080/data/last/allocations?sort_by=size&order=dsc&count=3"
```

Export the biggest three call sites with at least 10 allocations where at least 50% are leaked:

$ curl "http://localhost:8080/data/last/allocation_groups?group_allocations_min=10&group_leaked_allocations_min=50%&sort_by=all.size&count=3"

REST API exposed by `memory-profiler-cli server`

Available endpoints:

A list of loaded data files:
```
/list
```

JSON containing a list of matched allocations:

/data/<id>/allocations?<allocation_filter>&sort_by=<sort_by>&order=<order>&count=<count>&skip=<skip>

JSON whose each entry corresponds to a group of matched allocations from a single, unique backtrace:

/data/<id>/allocation_groups?<allocation_filter>&sort_by=<group_sort_by>&order=<order>&count=<count>&skip=<skip>

An ASCII tree with matched allocations:

/data/<id>/allocation_ascii_tree?<allocation_filter>`

Exports matched allocations as a flamegraph:

/data/<id>/export/flamegraph?<allocation_filter>

Exports matched allocations into a format accepted by flamegraph.pl:
```
/data/<id>/export/flamegraph.pl?<allocation_filter>
```
Exports matched allocations into a format accepted by Heaptrack GUI:
```
/data/<id>/export/heaptrack?<allocation_filter>
```
JSON containing a list of mmap calls:
```
/data/<id>/mmaps
```
JSON containing a list of mallopt calls:
```
/data/<id>/mallopts
```

The <id> can either be an actual ID of a loaded data file which you can get by querying the /list endpoint, or can be equal to last which will use the last loaded data file.

The <allocation_filter> can be composed of any of the following parameters:

from, to - a timestamp in seconds or a percentage (of total runtime) specifying the chronological range of matched allocations
lifetime - an enum specifying the lifetime of matched allocations:
- all - matches every allocation (default)
- only_leaked - matches only leaked allocations
- only_not_deallocated_in_current_range - matches allocation which were not deallocated in the interval specified by from/to
- only_deallocated_in_current_range - matches allocations which were deallocated in the interval specified by from/to
- only_temporary - matches only temporary allocations
- only_whole_group_leaked - matches only allocations whose whole group (that is - every allocation from a given call site) was leaked
address_min, address_max - an integer with a minimum/maximum address of matched allocations
size_min, size_max - an integer with a minimum/maximum size of matched allocations in bytes
lifetime_min, lifetime_max - an integer with a minimum/maximum lifetime of matched allocations in seconds
backtrace_depth_min, backtrace_depth_max
function_regex - a regexp which needs to match with one of the functions in the backtrace of the matched allocation
source_regex - a regexp which needs to match with one of the source files in the backtrace of the matched allocation
negative_function_regex - a regexp which needs to NOT match with all of the functions in the backtrace of the matched allocation
negative_source_regex - a regexp which needs to NOT match with all of the source files in the backtrace of the matched allocation
group_interval_min, group_interval_max - a minimum/maximum interval in seconds or a percentage (of total runtime) between the first and the last allocation from the same call site
group_allocations_min, group_allocations_max - an integer with a minimum/maximum number of allocations from the same call site
group_leaked_allocations_min, group_leaked_allocations_max - an integer or a percentage of all allocations which were leaked from the same call site

The <sort_by> for allocations can be one of:

timestamp
address
size

The <group_sort_by> for allocation groups can be one of:

only_matched.min_timestamp
only_matched.max_timestamp
only_matched.interval
only_matched.allocated_count
only_matched.leaked_count
only_matched.size
all.min_timestamp
all.max_timestamp
all.interval
all.allocated_count
all.leaked_count
all.size

The only_matched.* variants will sort by aggregate values derived only from allocations which were matched by the allocation_filter, while the all.* variants will sort by values derived from every allocation in a given group.

The <order> specifies the ordering of the results and can be either asc or dsc.

Environment variables used by `libmemory_profiler.so`

`MEMORY_PROFILER_OUTPUT`

Default: memory-profiling_%e_%t_%p.dat

A path to a file to which the data will be written to.

This environment variable supports placeholders which will be replaced at runtime with the following:

%p -> PID of the process
%t -> number of seconds since UNIX epoch
%e -> name of the executable
%n -> auto-incrementing counter (0, 1, .., 9, 10, etc.)

`MEMORY_PROFILER_LOG`

Default: unset

The log level to use; possible values:

trace
debug
info
warn
error

Unset by default, which disables logging altogether.

`MEMORY_PROFILER_LOGFILE`

Default: unset

Path to the file to which the logs will be written to; if unset the logs will be emitted to stderr (if they're enabled with MEMORY_PROFILER_LOG).

This supports placeholders similar to MEMORY_PROFILER_OUTPUT (except %n).

`MEMORY_PROFILER_DISABLE_BY_DEFAULT`

Default: 0

When set to 1 the tracing will be disabled be default at startup.

`MEMORY_PROFILER_REGISTER_SIGUSR1`

Default: 1

When set to 1 the profiler will register a SIGUSR1 signal handler which can be used to toggle (enable or disable) profiling.

If disabled and reenabled a new data file will be created according to the pattern set in MEMORY_PROFILER_OUTPUT.

`MEMORY_PROFILER_REGISTER_SIGUSR2`

Default: 1

When set to 1 the profiler will register a SIGUSR2 signal handler which can be used to toggle (enable or disable) profiling.

`MEMORY_PROFILER_ENABLE_SERVER`

Default: 0

When set to 1 the profiled process will start an embedded server which can be used to stream the profiling data through TCP using memory-profiler-cli gather and memory-profiler-gather.

This server will only be started when the profiling is enabled.

`MEMORY_PROFILER_BASE_SERVER_PORT`

Default: 8100

TCP port of the embedded server on which the profiler will listen on.

If the profiler won't be able to bind a socket to this port it will try to find the next free port to bind to. It will succesively probe the ports in a linear fashion, e.g. 8100, 8101, 8102, etc., up to 100 times before giving up.

Requires MEMORY_PROFILER_ENABLE_SERVER to be set to 1.

`MEMORY_PROFILER_ENABLE_BROADCAST`

Default: 0

When set to 1 the profiled process will send UDP broadcasts announcing that it's being profiled. This is used by memory-profiler-cli gather and memory-profiler-gather to automatically discover memory-profiler instances to which to connect.

Requires MEMORY_PROFILER_ENABLE_SERVER to be set to 1.

`MEMORY_PROFILER_PRECISE_TIMESTAMPS`

Default: 0

Decides whenever timestamps will be gathered for every event, or only for chunks of events. When enabled the timestamps will be more precise at a cost of extra CPU usage.

`MEMORY_PROFILER_WRITE_BINARIES_TO_OUTPUT`

Default: 1

Controls whenever the profiler will embed the profiled application (and all of the libraries used by the application) inside of the profiling data it writes to disk.

This makes it possible to later decode the profiling data without having to manually hunt down the original binaries.

`MEMORY_PROFILER_ZERO_MEMORY`

Default: 0

Decides whenever malloc will behave like calloc and fill the memory it returns with zeros.

`MEMORY_PROFILER_USE_SHADOW_STACK`

Default: 1

Whenever to use a more intrusive, faster unwinding algorithm; enabled by default.

Setting it to 0 will on average significantly slow down unwinding. This option is provided only for debugging purposes.

Enabling full debug logs

By default the profiler is compiled with most of its debug logs disabled for performance reasons. To reenable them be sure to recompile it with the debug-logs feature, e.g. like this:

$ cd preload
$ cargo build --release --features debug-logs

License

Licensed under either of

Apache License, Version 2.0, (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Comments

'memalign' is unimplemented! panic

The following problem occurred today, it didn't exist before:

thread '' panicked at 'not implemented: 'memalign' is unimplemented!', preload/src/api.rs:831:5

opened by harlanc 10

SIGSEGV when loading app with libmemory_profiler.so

When attempting to load my application with libmemory_profiler.so in LD_PRELOAD, it immediately segfaults.

Built with:

rustc 1.36.0-nightly (6afcb5628 2019-05-19)

Captured from GDB:

gdb --args env LD_PRELOAD=/usr/local/lib/libmemory_profiler.so ./build/myapp

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7881e75 in memory_profiler::initialize () at preload/src/lib.rs:1239
1239	fn initialize() {
(gdb) bt
#0  0x00007ffff7881e75 in memory_profiler::initialize () at preload/src/lib.rs:1239
#1  0x00007ffff7883cc9 in memory_profiler::allocate (size=32, is_calloc=<optimized out>) at preload/src/lib.rs:1417
#2  malloc (size=32) at preload/src/lib.rs:1459
#3  0x00007ffff7883df6 in memory_profiler::allocate (size=32, is_calloc=<optimized out>) at preload/src/lib.rs:1429
#4  malloc (size=32) at preload/src/lib.rs:1459
...

opened by NuSkooler 10

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value'

We have troubles when analysing gathered data. Gathered data is about 9GB. We tried to squeeze the data but it failed. Then we tried to load big data and it failed as well

./bytehound strip --output X Y.dat
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /tmp/koute/memory-profiler/cli-core/src/squeeze.rs:217:100
none: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted

when loading big data:

./bytehound server Y.dat
[2022-12-01T15:11:46Z INFO  server_core] Trying to load "Y.dat"...
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', cli-core/src/loader.rs:864:68
stack backtrace:
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Aborted

Did you encounter such problem? Do you know how we should proceed with that?

opened by krzysiek6d 7

A kingdom for a flamegraph of live allocations!

I am profiling an application which uses obscene amounts of memory, but is not, strictly speaking, leaking memory: if I kill the app, everything is cleanly deallocated.

Still, I'd love to know what takes all the memory. I think what I need is to take a look at the live allocations at some specific point in time, and get a famegraph. This'll give me essentially a profile for a heap snapshot at a point in time.

Can I already get this with bytehound? I've seen flamegraph for leaked memory, and the graph of the total live allocations, but neither is quite what I am looking for.

opened by matklad 6

thread '' panicked at 'cannot access a TLS value during or after it is destroyed: AccessError'

On debian10 with nigthly rust

thread '<unnamed>' panicked at 'cannot access a TLS value during or after it is destroyed: AccessError', src/libcore/result.rs:999:5
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39
   1: std::panicking::default_hook::{{closure}}
             at src/libstd/sys_common/backtrace.rs:71
             at src/libstd/sys_common/backtrace.rs:59
             at src/libstd/panicking.rs:197
   2: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:211
             at src/libstd/panicking.rs:474
   3: std::panicking::continue_panic_fmt
             at src/libstd/panicking.rs:381
   4: rust_begin_unwind
             at src/libstd/panicking.rs:308
   5: core::panicking::panic_fmt
             at src/libcore/panicking.rs:85
   6: core::result::unwrap_failed
             at /rustc/37ff5d388f8c004ca248adb635f1cc84d347eda0/src/libcore/macros.rs:18
   7: memory_profiler::unwind::grab
             at /root/.cargo/git/checkouts/not-perf-e01bfa01482c86ed/9739e8b/nwind/src/local_unwinding.rs:0
             at preload/src/unwind.rs:166
   8: calloc
             at preload/src/lib.rs:1423
             at preload/src/lib.rs:1470
   9: g_malloc0
  10: g_slice_free_chain_with_offset
  11: g_queue_free
  12: __nptl_deallocate_tsd.part.8
  13: start_thread
  14: clone
Aborted (core dumped)

opened by rmanus 6

Failed to profile rust application

Failed to profile rust application due to error

thread '<unnamed>' panicked at 'not implemented: 'aligned_alloc' is unimplemented!', preload/src/api.rs:907:5
stack backtrace:
   0:     0x7f9661783cea - std::backtrace_rs::backtrace::libunwind::trace::h972caad916e73545
                               at /rustc/d68e7ebc38cb42b8b237392b28045edeec761503/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   1:     0x7f9661783cea - std::backtrace_rs::backtrace::trace_unsynchronized::he59049878fe5a05d
                               at /rustc/d68e7ebc38cb42b8b237392b28045edeec761503/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7f9661783cea - std::sys_common::backtrace::_print_fmt::he4a91f9bcfad9b40
                               at /rustc/d68e7ebc38cb42b8b237392b28045edeec761503/library/std/src/sys_common/backtrace.rs:66:5
   3:     0x7f9661783cea - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h51433dc001920472
                               at /rustc/d68e7ebc38cb42b8b237392b28045edeec761503/library/std/src/sys_common/backtrace.rs:45:22
   4:     0x7f966172d10c - core::fmt::write::hc9dbd37d69b2c204
                               at /rustc/d68e7ebc38cb42b8b237392b28045edeec761503/library/core/src/fmt/mod.rs:1198:17
   5:     0x7f9661762b14 - std::io::Write::write_fmt::h6b2550ce8adb9e04
                               at /rustc/d68e7ebc38cb42b8b237392b28045edeec761503/library/std/src/io/mod.rs:1672:15
   6:     0x7f9661784af5 - std::sys_common::backtrace::_print::h006829bd22a5a4ee
                               at /rustc/d68e7ebc38cb42b8b237392b28045edeec761503/library/std/src/sys_common/backtrace.rs:48:5
   7:     0x7f9661784af5 - std::sys_common::backtrace::print::h0f4d319136ab4456
                               at /rustc/d68e7ebc38cb42b8b237392b28045edeec761503/library/std/src/sys_common/backtrace.rs:35:9
   8:     0x7f9661784af5 - std::panicking::default_hook::{{closure}}::h5b3cdff51fbe7401
                               at /rustc/d68e7ebc38cb42b8b237392b28045edeec761503/library/std/src/panicking.rs:295:22
   9:     0x7f9661785075 - std::panicking::default_hook::hdc1d8baf28b4ffd7
                               at /rustc/d68e7ebc38cb42b8b237392b28045edeec761503/library/std/src/panicking.rs:314:9
  10:     0x7f9661785075 - std::panicking::rust_panic_with_hook::h80e138cc00203db9
                               at /rustc/d68e7ebc38cb42b8b237392b28045edeec761503/library/std/src/panicking.rs:698:17
  11:     0x7f96617f9221 - nwind_ret_trampoline_start
                               at /home/user/.cargo/git/checkouts/not-perf-af1a46759dd83df9/18bd8d3/nwind/src/arch/amd64_trampoline.s:17
  12:                0x0 - <unknown>

used command

LD_PRELOAD=./libbytehound.so <rust_application> <args>

Tried with local build from the latest master, and with pre-built versions Target OS: Ubuntu 20.04

Please let me know if any option is missing to run tool properly, or rust applications are not support for now. Thanks in advance.

opened by aregng 5

Build memory-profiler-cli failed
My environment is Ubuntu 18.04

Yarn has been installed already

Rust has been changed to nightly version

I run "cargo build --release -p memory-profiler" suceessfully and libmemory_profiler.so has been generate already

When I run "cargo build --release -p memory-profiler-cli", it failed and the error log as below:

davidwang@system-MS-7918:~/Tools/memory-profiler$ cargo build --release -p memory-profiler-cli Compiling server-core v0.1.0 (/home/davidwang/Tools/memory-profiler/server-core) 00h00m00s 0/0: : **error: failed to run custom build command for server-core v0.1.0** (/home/davidwang/Tools/memory-profiler/server-core)

Caused by: process didn't exit successfully: /home/davidwang/Tools/memory-profiler/target/release/build/server-core-5dfb1577abf53aa3/build-script-build (exit code: 101) --- stderr ERROR: [Errno 2] No such file or directory: 'install' thread 'main' panicked at 'Failed to install the dependencies for the WebUI; child process exited with error code Some(1)! You might want to try to run 'rm -Rf ~/.cache/yarn' and try again.', server-core/build.rs:61:21 stack backtrace: 0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39 1: std::sys_common::backtrace::_print at src/libstd/sys_common/backtrace.rs:71 2: std::panicking::default_hook::{{closure}} at src/libstd/sys_common/backtrace.rs:59 at src/libstd/panicking.rs:197 3: std::panicking::default_hook at src/libstd/panicking.rs:211 4: std::panicking::rust_panic_with_hook at src/libstd/panicking.rs:474 5: std::panicking::continue_panic_fmt at src/libstd/panicking.rs:381 6: std::panicking::begin_panic_fmt at src/libstd/panicking.rs:336 7: semalock::Semalock::with at server-core/build.rs:61 at /home/davidwang/.cargo/registry/src/github.com-1ecc6299db9ec823/semalock-0.2.0/src/lib.rs:99 at /rustc/8869ee03d7f258e1b76a11c6fbb01b5708a9f504/src/libcore/result.rs:639 at /home/davidwang/.cargo/registry/src/github.com-1ecc6299db9ec823/semalock-0.2.0/src/lib.rs:96 8: build_script_build::main at server-core/build.rs:46 9: std::rt::lang_start::{{closure}} at /rustc/8869ee03d7f258e1b76a11c6fbb01b5708a9f504/src/libstd/rt.rs:64 10: std::panicking::try::do_call at src/libstd/rt.rs:49 at src/libstd/panicking.rs:293 11: __rust_maybe_catch_panic at src/libpanic_unwind/lib.rs:85 12: std::rt::lang_start_internal at src/libstd/panicking.rs:272 at src/libstd/panic.rs:388 at src/libstd/rt.rs:48 13: main 14: __libc_start_main 15: _start

It seems to find install dir failed, but where should the dir be?
opened by aguludunu 5
How to compile the 32-bit version of bytehound

I got an error when using bytehound: ERROR: ld.so: object './libbytehound.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS64): ignored.

I guess it's because my app is 32bit version but bytehound is 64bit. So how should I compile bytehound for use by a 32-bit program?

System info: Linux dell-7060 5.15.0-46-generic #49~20.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

opened by Gary-Hobson 4
Idea: Generate memory flamegraph with inferno

Hello, maybe the most things for this are already in place and it's easy to do. Currently manual usage of the heaptrack GUI is needed to see the memory flamegraph. Inferno could be used to directly generate an interactive SVG by exporting a perf-compatible text format (the text format can even be without collapsed stacks, as there is the inferno-collapse-perf command for that).

opened by pothos 4

thread_local_const_init is being stabilized

Hi,

Currently (on 1.59-nightly), bytehound doesn't compile because of the stabilization of thread_local_const_init feature:

error[E0635]: unknown feature `thread_local_const_init`
 --> preload/src/lib.rs:1:12
  |
1 | #![feature(thread_local_const_init)]
  |            ^^^^^^^^^^^^^^^^^^^^^^^

For more information about this error, try `rustc --explain E0635`.
error: could not compile `bytehound-preload` due to previous error

The feature line should be removed from the code from now on I think.

Cheers, Gerry

opened by gagbo 3

Plotting memory usage of allocations from given backtrace

Adds plots to allocations so that it's possible to tell if given backtrace is the reason for potentially visible memory leak in overview page. Improves greatly feedback regarding if backtrace is a leak or not, as one sees the trend of allocations from this backtrace - are they increasing? or maybe it was just one spike at the end of trace?

The plot will be visible when "Group by backtraces" is selected.

PR code is a little wonky though... but it seems to work well.

opened by stoperro 3

Bug with not-perf(local_unwinding)

thread '<unnamed>' panicked at 'index out of bounds: the len is 256 but the index is 18446744073709551615', /home/user/.cargo/git/checkouts/not-perf-af1a46759dd83df9/911723c/nwind/src/local_unwinding.rs:455:26
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Aborted

I got the error(unfortunately I can't attach the code), there is some problem with index calculations.

I can only say that we are using memory mapping of a huge file (150 GB), while I do not access this memory, everything is ok, I believe that the problems begin immediately when accessing(but that's just a hypothesis).

Linux version 5.15.0-1026-aws gcc (Ubuntu ~20.04.1) 9.4.0, x86_64


stack backtrace:
   0: rust_begin_unwind
             at /rustc/d0dc9efff14ac0a1eeceffd1e605e37eeb8362a0/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/d0dc9efff14ac0a1eeceffd1e605e37eeb8362a0/library/core/src/panicking.rs:64:14
   2: core::panicking::panic_bounds_check
             at /rustc/d0dc9efff14ac0a1eeceffd1e605e37eeb8362a0/library/core/src/panicking.rs:147:5
   3: nwind::local_unwinding::ShadowStack::push
             at /home/user/.cargo/git/checkouts/not-perf-af1a46759dd83df9/911723c/nwind/src/local_unwinding.rs:455:26
   4: nwind::local_unwinding::LocalAddressSpace::unwind_through_fresh_frames
             at /home/user/.cargo/git/checkouts/not-perf-af1a46759dd83df9/911723c/nwind/src/local_unwinding.rs:905:27
   5: bytehound::unwind::grab_with_unwind_state
             at /home/user/bytehound/preload/src/unwind.rs:393:31
   6: nwind_ret_trampoline_start
             at /home/user/cargo/git/checkouts/not-perf-af1a46759dd83df9/911723c/nwind/src/arch/amd64_trampoline.s:17

I seeamd64_trampoline here, is it ok(I use x86)?

opened by nuttert 2

Max output file size

I did not find the maximum size for the output file in the environment variables, I would like to split it into files, for example, 5GB each and send them to remote storage. Maybe this is possible and I missed something?

opened by nuttert 0
query about meaning of terminologies like 'global' and 'matched'.
Hi! in the backtraces, i see graphs with plots in green and red color. where green is area is said to be 'global' and red is said to be 'matched'.

What exactly global and matched means?

Also what should i interpret from the following graph for a backtrace:

some details about the scenario: About the above graph, the first part where we can see memory consumption increasing continuously is the duration where we were running lock and unlock of a component in our system for some 200 iterations. The later part, which is flat(and in red), is the duration when iterations had completed but profiling continued for some time.
opened by shapradh 0
fix: panic for unknown backtraces

I had a debugging usecase where the executable would either never finish or get OOM-killed. So I guess the output could be somewhat incomplete/messed up. The output was very large (34GB) so I stripped it (bytehound strip --threshold 10 ...). That bytehound server would crash because it didn't find some backtraces. This PR fixes that and instead just prints a warning.

opened by crepererum 0
LD_PRELOAD undefined symbol: memfd_create
Hi, great project! I successfully compiled the project, but when executing this statement：

LD_PRELOAD=./libbytehound.so ./bytehound serverv

i got: ./bytehound: symbol lookup error: ./libbytehound.so: undefined symbol: memfd_create

and every program is like this Did I do something wrong？

Information about env： In docker container (uname -r = 5.10.104-linuxkit) already install gcc g++
opened by kp-tux 4
[Suggestion] Make the case study scripts an interactive page

Dearest Maintainer,

Bytehound has been a joy to use. Thank you for this. I found the "Case study: Memory leak analysis" after searching for how to use bytehound. I have found the example scripting to be amazingly helpful. My ask would to to make that flow in to a page.

"by backtrace" which gives the chart and then you can select the groups one at a time.

A page for leaked until the end as well. maybe that is a filter on by backtrace.

Last question I have is what does "memory lost to fragmentation" really mean? what does one do with this information?

Either way! thanks for reading this. I looked at the ui folder and saw it was react. I am very lost in that world but one day might take the time.

Thanks for the amazing software. It has been very helpful.

Becker

opened by sbeckeriv 0

Releases(0.11.0)

0.11.0(Nov 23, 2022)
Major changes:

Added support for _rjem_aligned_alloc (jemalloc variant of aligned_malloc)

The initialization is now done eagerly in the static constructor, in case the profiled executable overrides all of the LD_PRELOAD-hooked functions by itself so none actually get called

Source code(tar.gz)
Source code(zip)
bytehound-x86_64-unknown-linux-gnu.tgz(41.76 MB)
0.10.0(Nov 17, 2022)
Major changes:

Performance improvements; CPU overhead of allocation-heavy heavily multithreaded programs was cut down by up to ~80%

You can now control whether child processes are profiled with the MEMORY_PROFILER_TRACK_CHILD_PROCESSES environment variable (disabled by default)

The fragmentation timeline was removed from the UI

mmap/munmap calls are now gathered by default (you can disable this with MEMORY_PROFILER_GATHER_MAPS)

Total actual memory usage is now gathered by periodically polling /proc/self/smaps

Maps can now be browsed in the UI and analyzed through the scripting API

Maps are now named according to their source using PR_SET_VMA_ANON_NAME (Linux 5.17 or newer; on older kernels this is emulated in user space)

Glibc-internal __mmap and __munmap are now hooked into

Bytehound-internal allocations now exclusively use mimalloc as their allocator

New scripting APIs:

AllocationList::only_alive_at

AllocationList::only_from_maps

Graph::start_at

Graph::end_at

Graph::show_address_space

Graph::show_rss

MapList

Map

Removed scripting APIs:

AllocationList::only_not_deallocated_after_at_least

AllocationList::only_not_deallocated_until_at_most

Graph::truncate_until

Graph::extend_until

Removed lifetime filters in the UI: only_not_deallocated_in_current_range, only_deallocated_in_current_range

Fixed a rare crash when profiling programs using jemalloc

Added support for aligned_alloc

Added support for memalign

Relative scale in the generated graphs is now always relative to the start of profiling

Gathered backtraces will now include an extra Bytehound-specific frame on the bottom to indicate which function was called

Minor improvements to the UI

Source code(tar.gz)
Source code(zip)
bytehound-x86_64-unknown-linux-gnu.tgz(41.70 MB)
0.9.0(Jul 25, 2022)
Major changes:

Deallocation backtraces are now gathered by default; you can use the MEMORY_PROFILER_GRAB_BACKTRACES_ON_FREE environment variable to turn this off

Deallocation backtraces are now shown in the GUI for each allocation

Allocations can now be filtered according to where exactly they were deallocated

Allocations can now be filtered according to whether the last allocation in their realloc chain was leaked or not

Profiling of executables larger than 4GB is now supported

Profiling of executables using unprefixed jemalloc is now supported

New scripting APIs:

AllocationList::only_matching_deallocation_backtraces

AllocationList::only_not_matching_deallocation_backtraces

AllocationList::only_position_in_chain_at_least

AllocationList::only_position_in_chain_at_most

AllocationList::only_chain_leaked

The server subcommand of the CLI should now use less memory when loading large data files

The behavior of malloc_usable_size when called with a NULL argument now matches glibc

At minimum Rust 1.62 is now required to build the crates; older versions might still work, but will not be supported

The way the profiler is initialized was reworked; this should increase compatibility and might fix some of the crashes seen when trying to profile certain programs

Source code(tar.gz)
Source code(zip)
bytehound-x86_64-unknown-linux-gnu.tgz(40.25 MB)
0.8.0(Nov 16, 2021)
Major changes:

Significantly lower CPU usage when temporary allocation culling is turned on

Each thread has now its own first-level backtrace cache; this might result in higher memory usage when profiling

The MEMORY_PROFILER_BACKTRACE_CACHE_SIZE environment variable knob was replaced with MEMORY_PROFILER_BACKTRACE_CACHE_SIZE_LEVEL_1 and MEMORY_PROFILER_BACKTRACE_CACHE_SIZE_LEVEL_2 to control the size of the per-thread caches and the global cache respectively

The MEMORY_PROFILER_PRECISE_TIMESTAMPS environment variable knob was removed (always gathering precise timestamps is fast enough on amd64)

The default value of MEMORY_PROFILER_TEMPORARY_ALLOCATION_PENDING_THRESHOLD is now unset, which means that the allocations will be buffered indefinitely until they're either culled or until they'll live long enough to not be eligible for culling (might increase memory usage in certain cases)

Backtraces are now not emitted for allocations which were completely culled

You can now see whether a given allocation was made through jemalloc, and filter according to that

You can now see when a given allocation group reached its maximum memory usage was, and filter according to that

New scripting APIs:

Graph::show_memory_usage

Graph::show_live_allocations

Graph::show_new_allocations

Graph::show_deallocations

AllocationList::only_group_max_total_usage_first_seen_at_least

AllocationList::only_jemalloc

New subcommand: extract (will unpack all of the files embedded into a given data file)

The strip subcommand will now not buffer allocations indefinitely when using the --threshold option, which results in a significantly lower memory usage when stripping huge data files from long profiling runs

malloc_usable_size now works properly when compiled with the jemalloc feature

reallocarray doesn't segfault anymore

The compilation should now work on distributions with an ancient version of Yarn

Source code(tar.gz)
Source code(zip)
bytehound-x86_64-unknown-linux-gnu.tgz(36.15 MB)
0.7.0(Aug 18, 2021)
Major changes:

The project was rebranded from memory-profiler to bytehound

Profiling of applications using jemalloc is now fully supported (AMD64-only, jemallocator crate only)

Added built-in scripting capabilities which can be used for automated analysis and report generation; those can be accessed through the script subcommand

Added a scripting console to the GUI

Added the ability to define programmatic filters in the GUI

Allocation graphs are now shown in the GUI when browsing through the allocations grouped by backtraces

Improved support for tracking and analyzing reallocations

Improved paralellization of the analyzer's internals, which should result in snappier behavior on modern multicore machines

The cutoff point for determining allocations' lifetime is now the end of profiling for those allocations which were never deallocated

The squeeze subcommand was renamed to strip

You can now use the strip subcommand to strip away only a subset of temporary allocations

Information about allocations culled at runtime is now emitted on a per-backtrace basis during profiling

Fixed an issue where the shadow stack based unwinding was incompatible with Rust's ABI in certain rare cases

mmap calls are now always gathered in order (if you have enabled their gathering)

Improved runtime backtrace deduplication which should result in smaller datafiles

Many other miscellaneous bugfixes

Source code(tar.gz)
Source code(zip)
bytehound-x86_64-unknown-linux-gnu.tgz(34.30 MB)
0.6.1(Jun 10, 2021)

This is a bugfix release that fixes a possible deadlock when FDEs are dynamically registered at runtime.
Source code(tar.gz)
Source code(zip)
memory-profiler-x86_64-unknown-linux-gnu.tgz(24.21 MB)
0.6.0(Jun 9, 2021)
Major changes:

Added a runtime backtrace cache; backtraces are now deduplicated when profiling, which results in less data being generated.

Added automatic culling of temporary allocations when running with MEMORY_PROFILER_CULL_TEMPORARY_ALLOCATIONS set to 1.

Added support for reallocarray.

Added support for unwinding through JITed code, provided the JIT compiler registers its unwinding tables through __register_frame.

Added support for unwinding through frames which require arbitrary DWARF expressions to be evaluated when resolving register values.

Added support for DWARF expressions that fetch memory.

Allocations are not tracked by their addresses anymore; they're now tracked by unique IDs, which fixes a race condition when multiple threads are simultaneously allocating and deallocating memory in quick succession.

mmap calls are now not gathered by default.

Rewrote TLS state management; some deallocations from TLS destructors which were previously missed by the profiler are now gathered.

When profiling is disabled at runtime the profiler doesn't completely shutdown anymore, and will keep on gathering data for those allocations which were made before it was disabled; when reenabled it won't create a new file anymore and instead it will keep on writing to the same file as it did before it was disabled.

The profiler now requires Rust nightly to compile.

Source code(tar.gz)
Source code(zip)
memory-profiler-x86_64-unknown-linux-gnu.tgz(24.23 MB)
0.5.0(Oct 7, 2019)
Major changes:

Shadow stack based unwinding is now supported on stable Rust and turned on by default.

Systems where perf_event_open is unavailable (e.g. unpatched MIPS64 systems, docker containers, etc.) are now supported.

The mechanism for exception handling when using shadow stack based unwinding was completely rewritten using proper landing pads.

Programs which call longjmp/setjmp are now partially supported when using shadow stack based unwinding.

Shared objects dynamically loaded through dlopen are now properly handled.

Rust symbol demangling is now supported.

Fixed an issue where calling backtrace on certain architectures while using shadow stack based unwinding would crash the program.

The profiler can now be compiled with the jemalloc feature to use jemalloc instead of the system allocator.

The profiler can now be started and stopped programmatically through memory_profiler_start and memory_profiler_stop functions exported by libmemory_profiler.so. Those are equivalent to controlling the profiler through signals.

Source code(tar.gz)
Source code(zip)
memory-profiler-x86_64-unknown-linux-gnu.tgz(23.39 MB)
0.4.0(Jul 14, 2019)
Major changes:

The profiler can now be compiled on Rust stable, with the caveat that the shadow stack based unwinding will be then disabled.

The profiler is now fully lazily initialized; if disabled with MEMORY_PROFILER_DISABLE_BY_DEFAULT the profiler will not initialize itself nor create an output file.

The signal handler registration can now be disabled with MEMORY_PROFILER_REGISTER_SIGUSR1 and MEMORY_PROFILER_REGISTER_SIGUSR2.

When the profiling is disabled at runtime it will more thoroughly deinitialize itself, and when reenabled it will create a new output file instead of continuing to write data to the old one.

The embedded server is now disabled by default and can be reenabled with the MEMORY_PROFILER_ENABLE_SERVER environment variable.

The base port of the embedded server can now be set with the MEMORY_PROILER_BASE_SERVER_PORT environment variable.

The MEMORY_PROFILER_OUTPUT now supports an %n placeholder.

The GUI has now a graph which shows allocations and deallocations per second.

Source code(tar.gz)
Source code(zip)
memory-profiler-x86_64-unknown-linux-gnu.tgz(23.82 MB)
0.3.0(Jun 6, 2019)
Major changes:

More performance improvements. In the average case the cost per a single allocation was cut down to approximately 75%. Every thread has now its own unwind context, so stack traces can be now gathered in parallel.

The profiler should no longer crash on systems with a recent version of libstdc++ when a C++ exception is thrown.

Source code(tar.gz)
Source code(zip)
memory-profiler-x86_64-unknown-linux-gnu.tgz(23.71 MB)
0.2.0(May 28, 2019)
Major changes:

Massive performance improvements. In the average case on AMD64 the cost per a single allocation was cut down to 20%; on ARM it was cut down to less than 50%.

The profiler no longer crashes when a memory operation is triggered from a destructor of an object residing in TLS.

The gathered timestamps are no longer as precise as they were; they should be at most off by ~250ms if your application isn't making a lot of allocations. You can restore the previous behavior if you need it by setting MEMORY_PROFILER_PRECISE_TIMESTAMPS to 1 at the cost of extra CPU time.

Source code(tar.gz)
Source code(zip)
memory-profiler-x86_64-unknown-linux-gnu.tgz(24.29 MB)
0.1.0(May 18, 2019)

Initial public release
Source code(tar.gz)
Source code(zip)
memory-profiler-x86_64-unknown-linux-gnu.tgz(23.93 MB)