1 | MPICH Release 3.2.1 |
The mxm netmod provides support for Mellanox InfiniBand adapters. It
can be built with the following configure option:
–with-device=ch3:nemesis:mxm
If your MXM library is installed in a non-standard location, you might
need to help configure find it using the following configure option
(assuming the libraries are present in /path/to/mxm/lib and the
include headers are present in /path/to/mxm/include):
–with-mxm=/path/to/mxm
(or)
–with-mxm-lib=/path/to/mxm/lib
–with-mxm-include=/path/to/mxm/include
By default, the mxm library throws warnings when the system does not
enable certain features that might hurt performance. These are
important warnings that might cause performance degradation on your
system. But you might need root privileges to fix some of them. If
you would like to disable such warnings, you can set the MXM log level
to “error” instead of the default “warn” by using:
MXM_LOG_LEVEL=error
export MXM_LOG_LEVEL
portals4 network module1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33The portals4 netmod provides support for the Portals 4 network
programming interface. To enable, configure with the following option:
--with-device=ch3:nemesis:portals4
If the Portals 4 include files and libraries are not in the normal
search paths, you can specify them with the following options:
--with-portals4-include= and --with-portals4-lib=
... or the if lib/ and include/ are in the same directory, you can use
the following option:
--with-portals4=
If the Portals libraries are shared libraries, they need to be in the
shared library search path. This can be done by adding the path to
/etc/ld.so.conf, or by setting the LD_LIBRARY_PATH variable in your
environment. It's also possible to set the shared library search path
in the binary. If you're using gcc, you can do this by adding
LD_LIBRARY_PATH=/path/to/lib
(and)
LDFLAGS="-Wl,-rpath -Wl,/path/to/lib"
... as arguments to configure.
Currently, use of MPI_ANY_SOURCE and MPI dynamic processes are unsupported
with the portals4 netmod.
ofi network module
The ofi netmod provides support for the OFI network programming interface.
To enable, configure with the following option:
–with-device=ch3:nemesis:ofi
If the OFI include files and libraries are not in the normal search paths,
you can specify them with the following options:
–with-ofi-include= and –with-ofi-lib=
… or the if lib/ and include/ are in the same directory, you can use
the following option:
–with-ofi=
If the OFI libraries are shared libraries, they need to be in the
shared library search path. This can be done by adding the path to
/etc/ld.so.conf, or by setting the LD_LIBRARY_PATH variable in your
environment. It’s also possible to set the shared library search path
in the binary. If you’re using gcc, you can do this by adding
LD_LIBRARY_PATH=/path/to/lib
(and)
LDFLAGS=”-Wl,-rpath -Wl,/path/to/lib”
… as arguments to configure.
sock channel
sock is the traditional TCP sockets based communication channel. It
uses TCP/IP sockets for all communication including intra-node
communication. So, though the performance of this channel is worse
than that of nemesis, it should work on almost every platform. This
channel can be configured using the following option:
–with-device=ch3:sock
pamid device
This is the device used on the IBM Blue Gene/Q system. The following
configure options can be used:
./configure –host=powerpc64-bgq-linux \
–with-device=pamid:BGQ \
–with-file-system=bg+bglockless
The Blue Gene/Q cross compilers must either be in the $PATH, or
explicitly specified using environment variables, before configure.
For example:
PATH=$PATH:/bgsys/drivers/ppcfloor/gnu-linux/bin
or
CC=/bgsys/drivers/ppcfloor/gnu-linux/bin/powerpc64-bgq-linux-gcc
CXX=…
…
There are several other configure options that are specific to building
on a Blue Gene/Q system. See the wiki page for more information:
https://wiki.mpich.org/mpich/index.php/BGQ
5. Alternate Process Managers
hydra
Hydra is the default process management framework that uses existing
daemons on nodes (e.g., ssh, pbs, slurm, sge) to start MPI
processes. More information on Hydra can be found at
http://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager
gforker
gforker is a process manager that creates processes on a single
machine, by having mpiexec directly fork and exec them. gforker is
mostly meant as a research platform and for debugging purposes, as it
is only meant for single-node systems.
slurm
SLURM is an external process manager not distributed with
MPICH. MPICH’s default process manager, hydra, has native support
for slurm and you can directly use it in slurm environments (it will
automatically detect slurm and use slurm capabilities). However, if
you want to use the slurm provided “srun” process manager, you can use
the “–with-pmi=slurm –with-pm=no” option with configure. Note that
the “srun” process manager that comes with slurm uses an older PMI
standard which does not have some of the performance enhancements that
hydra provides in slurm environments.
6. Alternate Configure Options
MPICH has a number of other features. If you are exploring MPICH as
part of a development project, you might want to tweak the MPICH
build with the following configure options. A complete list of
configuration options can be found using:
./configure –help
7. Testing the MPICH installation
To test MPICH, we package the MPICH test suite in the MPICH
distribution. You can run the test suite using:
make testing
The results summary will be placed in test/summary.xml
8. Fault Tolerance
MPICH has some tolerance to process failures, and supports
checkpointing and restart.
Tolerance to Process Failures
The features described in this section should be considered
experimental. Which means that they have not been fully tested, and
the behavior may change in future releases. The below notes are some
guidelines on what can be expected in this feature:
ERROR RETURNS: Communication failures in MPICH are not fatal
errors. This means that if the user sets the error handler to
MPI_ERRORS_RETURN, MPICH will return an appropriate error code in
the event of a communication failure. When a process detects a
failure when communicating with another process, it will consider
the other process as having failed and will no longer attempt to
communicate with that process. The user can, however, continue
making communication calls to other processes. Any outstanding
send or receive operations to a failed process, or wildcard
receives (i.e., with MPI_ANY_SOURCE) posted to communicators with a
failed process, will be immediately completed with an appropriate
error code.COLLECTIVES: For collective operations performed on communicators
with a failed process, the collective would return an error on
some, but not necessarily all processes. A collective call
returning MPI_SUCCESS on a given process means that the part of the
collective performed by that process has been successful.PROCESS MANAGER: If used with the hydra process manager, hydra will
detect failed processes and notify the MPICH library. Users can
query the list of failed processes using MPIX_Comm_group_failed().
This functions returns a group consisting of the failed processes
in the communicator. The function MPIX_Comm_remote_group_failed()
is provided for querying failed processes in the remote processes
of an intercommunicator.Note that hydra by default will abort the entire application when
any process terminates before calling MPI_Finalize. In order to
allow an application to continue running despite failed processes,
you will need to pass the -disable-auto-cleanup option to mpiexec.FAILURE NOTIFICATION: THIS IS AN UNSUPPORTED FEATURE AND WILL
ALMOST CERTAINLY CHANGE IN THE FUTURE!In the current release, hydra notifies the MPICH library of failed
processes by sending a SIGUSR1 signal. The application can catch
this signal to be notified of failed processes. If the application
replaces the library’s signal handler with its own, the application
must be sure to call the library’s handler from it’s own
handler. Note that you cannot call any MPI function from inside a
signal handler.
Checkpoint and Restart
MPICH supports checkpointing and restart fault-tolerance using BLCR.
CONFIGURATION
First, you need to have BLCR version 0.8.2 or later installed on your
machine. If it’s installed in the default system location, you don’t
need to do anything.
If BLCR is not installed in the default system location, you’ll need
to tell MPICH’s configure where to find it. You might also need to
set the LD_LIBRARY_PATH environment variable so that BLCR’s shared
libraries can be found. In this case add the following options to
your configure command:
–with-blcr=
LD_LIBRARY_PATH=
where
installed (whatever was specified in –prefix when BLCR was
configured).
After it’s configured compile as usual (e.g., make; make install).
Note, checkpointing is only supported with the Hydra process manager.
VERIFYING CHECKPOINTING SUPPORT
Make sure MPICH is correctly configured with BLCR. You can do this
using:
mpiexec -info
This should display ‘BLCR’ under ‘Checkpointing libraries available’.
CHECKPOINTING THE APPLICATION
There are two ways to cause the application to checkpoint. You can ask
mpiexec to periodically checkpoint the application using the mpiexec
option -ckpoint-interval (seconds):
mpiexec -ckpointlib blcr -ckpoint-prefix /tmp/app.ckpoint \
-ckpoint-interval 3600 -f hosts -n 4 ./app
Alternatively, you can also manually force checkpointing by sending a
SIGUSR1 signal to mpiexec.
The checkpoint/restart parameters can also be controlled with the
environment variables HYDRA_CKPOINTLIB, HYDRA_CKPOINT_PREFIX and
HYDRA_CKPOINT_INTERVAL.
To restart a process:
mpiexec -ckpointlib blcr -ckpoint-prefix /tmp/app.ckpoint -f hosts -n 4 -ckpoint-num
where
These instructions can also be found on the MPICH wiki:
http://wiki.mpich.org/mpich/index.php/Checkpointing
9. Developer Builds
For MPICH developers who want to directly work on the primary version
control system, there are a few additional steps involved (people
using the release tarballs do not have to follow these steps). Details
about these steps can be found here:
http://wiki.mpich.org/mpich/index.php/Getting_And_Building_MPICH
10. Multiple Fortran compiler support
If the C compiler that is used to build MPICH libraries supports both
multiple weak symbols and multiple aliases of common symbols, the
Fortran binding can support multiple Fortran compilers. The
multiple weak symbols support allow MPICH to provide different name
mangling scheme (of subroutine names) required by differen Fortran
compilers. The multiple aliases of common symbols support enables
MPICH to equal different common block symbols of the MPI Fortran
constant, e.g. MPI_IN_PLACE, MPI_STATUS_IGNORE. So they are understood
by different Fortran compilers.
Since the support of multiple aliases of common symbols is
new/experimental, users can disable the feature by using configure
option –disable-multi-aliases if it causes any undesirable effect,
e.g. linker warnings of different sizes of common symbols, MPIFCMB*
(the warning should be harmless).
We have only tested this support on a limited set of
platforms/compilers. On linux, if the C compiler that builds MPICH is
either gcc or icc, the above support will be enabled by configure. At
the time of this writing, pgcc does not seem to have this multiple
aliases of common symbols, so configure will detect the deficiency and
disable the feature automatically. The tested Fortran compilers
include GNU Fortran compilers (gfortan), Intel Fortran compiler
(ifort), Portland Group Fortran compilers (pgfortran), Absoft Fortran
compilers (af90), and IBM XL fortran compiler (xlf). What this means
is that if mpich is built by gcc/gfortran, the resulting mpich library
can be used to link a Fortran program compiled/linked by another
fortran compiler, say pgf90, say through mpifort -fc=pgf90. As long
as the Fortran program is linked without any errors by one of these
compilers, the program shall be running fine.
11. ABI Compatibility
The MPICH ABI compatibility initiative was announced at SC 2014
(http://www.mpich.org/abi). As a part of this initiative, Argonne,
Intel, IBM and Cray have committed to maintaining ABI compatibility
with each other.
As a first step in this initiative, starting with version 3.1, MPICH
is binary (ABI) compatible with Intel MPI 5.0. This means you can
build your program with one MPI implementation and run with the other.
Specifically, binary-only applications that were built and distributed
with one of these MPI implementations can now be executed with the
other MPI implementation.
Some setup is required to achieve this. Suppose you have MPICH
installed in /path/to/mpich and Intel MPI installed in /path/to/impi.
You can run your application with mpich using:
% export LD_LIBRARY_PATH=/path/to/mpich/lib:$LD_LIBRARY_PATH
% mpiexec -np 100 ./foo
or using Intel MPI using:
% export LD_LIBRARY_PATH=/path/to/impi/lib:$LD_LIBRARY_PATH
% mpiexec -np 100 ./foo
This works irrespective of which MPI implementation your application
was compiled with, as long as you use one of the MPI implementations
in the ABI compatibility initiative.
```