Awesome profiling
March 19, 2026 · View on GitHub
General utilites
- https://ebpf.io/: Revolutionary sandboxed kernel profiling technology that makes it easier to build various profiling utilities. Tons of options here in Python https://github.com/iovisor/bcc
- Dtrace: Available on Solaris (includes Mac but not Ubuntu) with notable highlights
prstatandmpstat.prstatis not available on Ubuntu but can be replicated withhtopandps - collectl: Full system level profiling including CPU, disk, memory and network
- perf: CPU level performance counters
- gprof: sampling and instrumentation aware profiling
- google perf tools
- Heaptrack: a heap memory profiler for linux
- jemalloc: another heap memory profiler
- ETW: Event tracing for windows
- Mac OS instruments: Mac OS instruments for profiling based on top of Dtrace
- Renderdoc: Multi platform graphics debugger for OpenGL and Vulkan
- Windows Perf Analyzer: If
htopcould plot lines, windows only but recently added support for android - htop: Visualize utilization as bar charts or line charts, issue commands to processes
- Magic Trace: High resolution programmable traces
- pprof: pprof is a tool for visualization and analysis of profiling data
- Samply: a command line CPU profiler which uses the Firefox profiler as its UI. works on macOS, Linux, and Windows.
Continuous Profiling
- parca: Continuous profiling for analysis of CPU and memory usage, down to the line number and throughout time. Saving infrastructure cost, improving performance, and increasing reliability
- parca-agent: eBPF-based always-on profiler auto-discovering targets in Kubernetes and systemd, zero code changes or restarts needed! Supports multiple languages: C/C++, Rust, Go, Python, Ruby, Java, etc.
Python specific
- viztracer
- psutil: Like htop but from within your python code
- pyinstrument:python call stack visualizer
- pycallgraph: Visualize call stack as a graph (Maintenance mode)
- py-spy: Sampling profiler for Python
- line profiler: Line by line profiling
- palanteer: Fanciest UI, looks like something out of the matrix
- yappi: multi threaded profiling
- Pycharm profiler: Built in profiler in Pycharm
- TAU
- gprof2dot: Graphical call stack visualizer (Maintenance mode)
- snakeviz: Visualize python cprofile data
- scalene: CPU and GPU based profiling with a web GUI
- pprofile: Very low overhead line profile
- austin-python: Line-level very low overhead time & memory profiler with web & terminal UI
- py-perf A low-overhead, sampling CPU profiler for Python implemented using eBPF.
- oracletrace: “Lightweight Python execution tracer that detects performance regressions by comparing execution traces across runs
Ruby specific
- rbspy: Sampling CPU profiler for Ruby
- rbperf: Low-overhead sampling profiler and tracer for Ruby implemented in BPF
- vernier: Next generation CRuby profiler
Java specific
- JProfiler: Java profiler for cpu, multithreading, graphical call stack visualizer
- Java visual VM: Bundled with JDK
Mobile specific
C# specific
- Unity profiler: profiling tools specific for game development
C++ specific
- Tracy: Windows only but very comprehensive and helpful for game development
- Callgrind: Valgrind extension
Web specific
- Chrome profiler: Support for throttling, emulating weak hardware,
PyTorch specific
- Pytorch profiler: Visual profiles of computations and data loading for PyTorch models, requires changes to code
- PyTorch memory profiler: Can help debug OOMs and memory spikes
CPU specific
- ARM profiling: ARM specific profiling tools, heavyweight UI
- Intel Vtune
- Intel GPA: Intel Graphics performance analyzer
GPU specific
- pynvml: Like
nvidia-smifor your code with deeper level instrumentation - NVIDIA visual profiler
- NVIDIA tools
- GPU View: Windows specific GPU profiling
- Ingero: eBPF-based GPU causal observability agent. Traces CUDA Runtime/Driver APIs via uprobes and host kernel events via tracepoints. Builds causal chains explaining GPU latency with full Python-to-CUDA stack traces.
- ROC profiler: AMD ROCm profiler
- Omniperf: AMD profiler for MI100 and MI200 accelerators
- NVIDIA NCU: Infinitely more useful than NVIDIA's nsys, does a godbolt style view on PTX and gives actionable performance hints
Books
Blogs
- Flame Graphs: flame graphs vs flame charts, off cpu profiling, icicle charts and more
- How to read icicle and flame graphs: Flame graphs and icicle graphs are a great way to visualize performance profiles. In this post, we will learn how to read and interpret them.
- Sampling vs Tracing: sampling based profilers are easier to use since they don't require any code change while instrumentation based profilers require code changes but are generally more informative
- C++ performance tools: reddit post with tons of links
Talks
Understanding code structure
- pyreverse: Get python classes and then visualize with
graphviz - pdb: Use step in functionality or line by line to understand how your code works
- IntelliJ UML Class diagrams