Tuesday, March 20, 2018

LIKWID and recent Intel Compilers (>17.0up05)


Starting with version 17.0 update 05 of Intel's C/C++ Compiler there is a problem with LIKWID's MarkerAPI with AccessDaemon access method. This problem was already reported to Intel and they are (hopefully) working on a fix.

Short explanation

LIKWID uses for the communication with the AccessDaemon UNIX sockets. The communication over these sockets is not thread-safe which is no problem without the MarkerAPI because only one CPU reads the performance registers and thus communicates with the AccessDaemon. When switching to MarkerAPI, each application thread reads its own CPU and consequently one AccessDaemon for each thread is required. Since it's not clear how many threads will read the counters, the AccessDaemons are only started when needed (in first call of LIKWID_MARKER_START or LIKWID_MARKER_REGISTER). This is done using fork() and exec().

Moreover, in some update of version 17.0, Intel removed the so-called shepard thread(s). A shepard thread's duty is to handle stuff that happens under the hood of OpenMP. The problem is that the shepard threads shouldn't be pinned to a CPU, so the pinning library has to skip them but had no knowledge which thread is a worker and which a shepard thread. In order to detect it, the start routine given to pthread_create is analyzed. This requires to do ldd (with popen()) on the executable to get the OpenMP runtime vendor (GCC: libgomp, LLVM/Intel: libomp/libiomp) followed by the retrieval of the name of the start routine (with dladdr() and nm).

What is the problem?

Version 17.0 update 05 added some functionality to the Intel OpenMP runtime that causes fork() and popen() calls to hang forever. This is also true for version 18.0.

Workarounds?

For the shepard thread detection I added an ugly workaround using C's system(), piping the output of nm to a file and read that file. This is not nice but since it is commonly not in performance critical parts of the application, it works.
I have not found a workaround for the fork() problem yet. I don't know whether switching to clone() would help. Moreover, I want to make the accessDaemon persistent in the future and hand over sockets to threads after a handshake procedure but I have to test whether all required socket functions work.

So what to do?

If you want to use MarkerAPI in your application, compile it with GCC, Clang or Intel C/C++-Compiler older than version 17.0 update 05.

I'll update this post as soon as Intel provides a fix or I implemented a working workaround.

Solution!

My colleague found the solution for both problems. You need to set the environment variable KMP_INIT_AT_FORK to FALSE and it work. He found it in the Intel forums [1]. I will add it to likwid-perfctr so that it sets it automatically (starting with version 4.3.2)

[1]: https://software.intel.com/en-us/forums/intel-c-compiler/topic/758961