Intel MPI Segmentation faults

Audience: Faculty, Postdocs, Researchers, Staff and Students

This KB Article References: High Performance Computing
This Information is Intended for: Faculty, Postdocs, Researchers, Staff, Students
Last Updated: July 11, 2019

If using Intel MPI version 19.0.3 or 19.0.4, you may receive an error like this trying to run anything using mpirun:

/gpfs/software/intel/parallel-studio-xe/2019_4/compilers_and_libraries/linux/mpi/intel64/bin/mpirun: line 103: 115153 Segmentation fault      (core dumped) mpiexec.hydra "$@" 0<&0

This will occur when you try to use these versions of MPI on the login nodes or large memory node. However, the error should go away once you try running your MPI program on any compute node. If you need to use Intel MPI on the large memory node, just use version 19.0.0 or lower. Alternatively, you could try running your program with mvapich2 MPI (module load mvapich2).

But why?

In order to prevent important system processes on the login nodes and large memory node from crashing, we use cgroups to restrict the set of CPU's that Seawulf users can access, leaving some reserved just for system processes. We allow full use of all CPU's on the compute nodes since these nodes will only affect one job from one user if they crash. On the two newest updates of Intel MPI, Intel changed the way their program assigns threads to specific CPU's, and for some reason their  new code fails to ignore the CPU's restricted by cgroups. When MPI tries to schedule threads to run on the restricted CPU's, an error occurs, and their program crashes with a Segmentation fault. This will hopefully be fixed in a future update, but for now just use the solutions mentioned above.


Additional Information

There are no additional resources available for this article.

Getting Help

The Division of Information Technology provides support on all of our services. If you require assistance please submit a support ticket through the IT Service Management system.

Submit A Ticket