The LIKWID tool suite has no problem with this as it restricts itself to simple end to end measurements of hardware performance counter data. The relation between the measurement and your code is realized through pinning the execution of code to dedicated cores (which the tool can also do for you). As you might think the Nehalem Uncore is a bad idea Intel introduced the EX type processors. This new design mainly introduced a completely new Uncore, which is now a system on a chip (Uncore HPM manual: Intel document reference number 323535). In its first implementation this was very complex to program with tons of MSR registers which needed to be programmed and a lot of dependencies and restrictions. The new mainstream server/desktop SandyBridge microarchitecture also uses this system on a chip type Uncore design. Still the implementation of the hardware performance monitoring was changed.
First I have to warn you: Intel is not very strict about consistency in naming. E.g. the naming of the MSR registers in the SDM manuals can be different from the naming used for the same MSR registers in documents written in other parts of the company (e.g. the Uncore manuals). This is unfortunatly also true for the naming of the entities in the Uncore. The Uncore does not have one HPM unit anymore but a bunch of them. On NehalemEX and WestmereEX the different parts of the Uncore were called boxes, there were mbox's (main memory controller) and cbox's (last level cache segments) and a bunch of others. While in SNB there still exist the same type of boxes they are named differently now, e.g. the mbox's are now called iMC and the cbox's are called CBo's. Still in LIKWID I stick with the old naming, since I want to build on the stuff I implemented for the EX type processors.
The mapping is as follows:
- Caching agent, SNB: CBo EX: CBOX
- Home agent, SNB: HA EX: BBOX
- Memory controller, SNB: iMC EX: MBOX
- Power Control, SNB: PCU EX: WBOX
- QPI, SNB: QPI EX: SBOX/RBOX
Before hardware performance monitoring was controlled via writing/reading to MSR registers (model specific registers). This was still true on EX type processors. Starting with SNB the Uncore is now partly programmed by PCI bus address space. Some parts are still programmed using the MSR registers, e.g. the CBo boxes. Still most of the unit are now programmed with PCI config space registers.
I am no specialist on PCI buses, still for the practical part the operating system maps the the pci configuration space. For PCI this is 256bytes per device using usually 32bit addressing. The device memory is organized in BUS / DEVICE / FUNCTION . The BUS is the socket in the HPM sense, or the other way round there is one new BUS per socket in the system. DEVICE is the HPM unit type (e.g. main memory box) and the FUNCTION is then the concrete HPM unit.
On a two socket SandyBridge-EP system there are the following devices (this is taken from LIKWID source):
typedef enum {
PCI_R3QPI_DEVICE_LINK_0 = 0,
PCI_R3QPI_DEVICE_LINK_1,
PCI_R2PCIE_DEVICE,
PCI_IMC_DEVICE_CH_0,
PCI_IMC_DEVICE_CH_1,
PCI_IMC_DEVICE_CH_2,
PCI_IMC_DEVICE_CH_3,
PCI_HA_DEVICE,
PCI_QPI_DEVICE_PORT_0,
PCI_QPI_DEVICE_PORT_1,
PCI_QPI_MASK_DEVICE_PORT_0,
PCI_QPI_MASK_DEVICE_PORT_1,
PCI_QPI_MISC_DEVICE_PORT_0,
PCI_QPI_MISC_DEVICE_PORT_1,
MAX_NUM_DEVICES
} PciDeviceIndex;
static char* pci_DevicePath[MAX_NUM_DEVICES] = {
"13.5", "13.6", "13.1", "10.0", "10.1", "10.4",
"10.5", "0e.1", "08.2", "09.2", "08.6", "09.6",
"08.0", "09.0" };
So e.g. the memory channel 1 (PCI_IMC_DEVICE_CH_1) on socket 0 is: BUS 0x7f DEVICE 0x10 FUNCTION 0x1 .
The Linux OS maps this memory in different locations in /sys and /proc filesystems. In LIKWID I use the /proc filesystem. Above device is accessible via the path: /proc/bus/pci/7f/10.1 . Unfortunatly if you make a hexdump as user on such a file you only get the header part (first 30-40 bytes). The rest is only visible to root. For LIKWID this means you have to use the tool as root if you want to use direct access or you have to setup the daemon mode proxy to access these files. In my next post I will explain how the SNB Uncore is implemented in likwid-perfctr and what performance groups are available.