Wednesday, June 4, 2014

LIKWID, the capabilities system and setuid root

The set of LIKWID tools provide important measuring and managing functionalities that access hardware registers and priviledged sections of the operating system.
Similar to other operating system tools that open, manipulate and close features of the operating system, it needs special permission to perform these operations.
The common way were setuid root applications that are allowed to change their uid to root during runtime. Examples for those applications are ping, (un)mount and su.

But there is also another way to switch priviledges between the two user groups, root and non-root. The Linux kernel supports the feature of capabilities for processes since the version 2.2. When a process is forked it inherits the capabilities of the parent process. The parent process was then able to restrict the scope of a child process by removing capabilities from the child's set. With the version 2.6.24 the full support for the capabilities feature was implemented. Executable files can be associated with sets of capability flags for fine grained management of permissions. But the management of these sets is rather complicated and the names of the three sets per executable are missleading. The sets are called permitted, effective and inheritable.

  • The effective set contains the capabilities that are currently set for executable and which are checked when triggering system calls.
  • The permitted set is the superset of the effective set and specifies the flags that can be maximally enabled for the executable.
  • The inheritable set is used at fork time to specify which flags are set in the child process's permitted set.
For a more detailed and possibly more correct explaination of the capabilities see the manpage (man 7 capabilities or online) or this website .

So how does this affect the LIKWID tools?
For measuring the performance counters with likwid-perfctr we need permission to read and write the MSR registers. Since the related assembler instructions RDMSR and WRMSR are priviledged operations, they can only be executed at kernel space. User space applications can access the registers through a kernel module that exports the functionality with device files.

Access Daemon

LIKWID therefore started to offer a daemon application that forwards the read and write requests to the MSR registers with higher priviledges. Hence, in order to gain higher priviledges the daemon must be owned by root and an application needs the permission to set its uid to root. The common way was:

$ sudo chown root:root likwid-accessD
$ sudo chmod u+s likwid-accessD


This still works but, having security in mind, this gives LIKWID's access daemon more priviledges than it needs. As an advantage the permissions of the MSR device files do not need to be modified because the daemon will access the files as user root.

The alternative is to give the daemon the capability to read and write the MSR device files. The commands to do this:

$ sudo chmod 666 /dev/cpu/*/msr
$ sudo setcap cap_sys_rawio+ep likwid-accessD


The changing of the permissions is mandatory because the daemon will still be executed with the uid of the user but with more capabilities. This only works until the next reboot because the file permissions are not persistent. You can use udev to set them every time the MSR kernel module is loaded:

<must be done by root>
$ cat > /etc/udev/rules.d/99-persistent-msr.rules <<EOF
> KERNEL=="msr*", MODE="0666"
> EOF


As a possibility, one can also avoid setting the read and write permission for _others_ on the device files, create a likwid group for all LIKWID users and assign it to the MSR device files. Add to the rules GROUP="<likwid-group>" and modify the MODE string to afford this.

All commands are tested on an Ubuntu 12.04.4 LTS system with Haswell CPUs.
Since the common Haswell CPU does not have Uncore support, I wanted to test the same procedures for a SandyBridge EP system with SuSE Linux Enterprise Server 11, patchlevel 3. The setuid root way is usable but the setting of capabilities for the daemon is not enough for the SuSE system. The system forbids the access to the MSR device files even if the file permissions are valid. The setuid root method has another advantage and maybe also mandatory for accessing the Uncore counters. Those counters are mapped in the PCI address space and can be accessed through PCI device files. The current documentation of capabilities does not mention a flag that can permit these accesses, the used flag cap_sys_rawio only brings up MSR device file access.

Consequently, since the method works for both operating systems and Uncore counters, we recommend using the setuid root method for LIKWID's access daemon.

One might think that if we need this, why not using these procedures on the likwid-perfctr executable directly. When building LIKWID with a static library the setuid root method as well as the capabilities method is usable. But when LIKWID should be linked again the shared library another problem arises. When an application has the  setuid root bit set, some environment variables are ignored for security. One of these variables is the LD_LIBRARY_PATH and consequently the application cannot find its library anymore.
The capabilites do not ignore those variables. Therefore a setuid root application with linked shared library should either use the capabilites method, build the application with static libraries or build the application with static search path for the library. Static search paths can be included in LIKWID like this:

$ vi make/include_<Compiler>.mk
<add rpath to SHARED_CFLAGS and SHARED_LFLAGS>
SHARED_CFLAGS = -fpic -Wl,-rpath=$(PREFIX)/lib
SHARED_LFLAGS = -shared -Wl,-rpath=$(PREFIX)/lib

Sometime the setting of capabilities is not possible because the underlying file system does not support extended attributes. You can test this with:

$ zcat /proc/config.gz | grep FS_XATTR
or
$ cat /boot/config-<kernel-version> | grep FS_XATTR

For EXT3 you also need to set the EXT3_FS_SECURITY kernel option to enable the storing of capabilities.
For kernels that have only a slight higher number as 2.6.24, the first kernel supporting capabilities, the TMPFS does not support the extended attributes.
One example of an operating system is CentOs 6.5.

Frequency Daemon

Starting with the version 3.1.2, LIKWID also contains an executable to manipulate the frequency of CPU cores. The following commands enable the setting of the scaling governor and the frequency for each CPU core for the Ubuntu 12.04.4 LTS system:

$ sudo setcap cap_sys_rawio+ep likwid-setFreq
$ cat > /etc/udev/rules.d/99-persistent-setFreq.rules <<EOF
> KERNEL=="cpu*", RUN+="/bin/chmod 0666 /sys/devices/system/cpu/cpu%n/cpufreq/scaling_governor"
> KERNEL=="cpu*", RUN+="/bin/chmod 0666 /sys/devices/system/cpu/cpu%n/cpufreq/scaling_setspeed"
> EOF


For the SuSE Linux Enterprise Server system, it is also enough to set the capabilities, therefore the capabilities method is perferable for the likwid-setFreq daemon.
For completeness, the setuid root method for the frequency daemon: 

$ sudo chown root:root likwid-setFreq
$ sudo chmod u+s likwid-setFreq