Using Intel VTune on Cloudlab machines

Intel VTune Profiler is a tool for analyzing the performance of an application/system. VTune has a myriad of analysis options – hotspots, architectural bottlenecks, memory bandwidth measurement and so on. For a general primer on VTune, please refer to the VTune documentation.

We have used VTune extensively to optimize our high-performance hashtable.

VTune offers both command line and GUI features. However, most of our development is done on cloudlab infrastructure and we typically use a remote VTune setup, where a collector collects all the data on the remote cloudlab node and displays it locally on a nice GUI.

One of the frequent problems we have seen with some versions is the inability to start microarchitectural exploration due to sep driver not being loaded/installed properly on the remote node, which seem to have disappeared with the latest version.

Installation

Typical setup involves installing a VTune profiler on the local machine and connecting through ssh to the remote host where the application we want to profile runs. I have tested this setup on a MacBook Pro (M2) running MacOS v13.4.1 (c). However, the same steps should work on a Linux machine as well.

Running

Using the VTune API

The above setup is sufficient for normal profiling. However, if you want to use the VTune API to insert profiling markers on your application or to control when to start/stop profiling from the application itself, you need a full installation on the remote node as well that contains the necessary header files and libraries. Follow the same installation steps for installing VTune on the remote Linux machine. More details on how to use the API is available on their official documentation.