Running Profiling Tools

This page contains tips and tricks for capturing and understanding profiling reports.

Some other good references are:


Install valgrind, kcachegrind, and graphviz. For example on Ubuntu:

sudo apt-get install valgrind kcachegrind graphviz

Compiling your example program

When profiling, you should use a representative example program.

Compile your example program with debug symbols (line numbers) enabled, but still in optimized mode (-O2). That will allow you to drill down into line-by-line performance costs, while maintainaing representative performance. These builds are usually termed as “release with debug info”. This is different than “debug builds” which typically have compiler optimizations disabled (-O0).

In Bazel, build like this:

bazel build -c opt --copt=-g //foo/bar:example

In CMake, configure like this:

cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo

Capturing a trace

Run valgrind with callgrind to get profiling data. For example:

valgrind --tool=callgrind bazel-bin/foo/bar/example

Each time this is run, it will create an output file file named callgrind.out.${PID}. Typically you’d capture just one trace and inspect it on its own, but in advanced uses you can combine multiple traces.

Oftentimes the call patterns in Drake will be difficult to view under the default instrumentation settings, so we recommend running with extra flags. These flags will slow down data collection, but are usually worth it:

valgrind --tool=callgrind \
  --separate-callers=10 \

If you’re trying to micro-optimize a function, use --dump-instr=yes to see per-instruction costs in the object code disassembly.

Viewing the trace

Run kcachegrind to analyze profiling data. For example:

kcachegrind callgrind.out.19482

Profiling Google Benchmark executables

Google Benchmark throttles the number of iterations it runs on each benchmark, which will skew the performance metrics you see in the profiler. You may want to manually specify the number of iterations to run on each benchmark.

Debugging and profiling on macOS

On macOS, DWARF debug symbols are emitted to a .dSYM file. The Bazel cc_binary and cc_test rules do not natively generate or expose this file, so we have implemented a workaround in Drake, --config=apple_debug. This config turns off sandboxing, which allows a genrule to access the .o files and process them into a .dSYM. Use as follows:

bazel build --config=apple_debug path/to/my:binary_or_test_dsym
lldb ./bazel-bin/path/to/my/binary_or_test

Profiling on macOS can be done by building with the debug symbols and then running

xcrun xctrace record -t "Time Profiler" --launch ./bazel-bin/path/to/my/binary_or_test

This will generate a .trace file that can be opened in the Instruments app:

open -a Instruments myfile.trace