mpi tcp连接报错_MPI分布式编程 --3.OpenMPI多节点运行报错
1. OpenMPI多节点运行报错问题问题描述:节点一即host3,通过mpirun调用节点二即host4的mpi程序,报错如下。$ mpirun -np 1 --host host4 ./main[[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ess_env_module.c at line 367[[INVALID],INVALID]
1. OpenMPI多节点运行报错问题
问题描述:节点一即host3,通过mpirun调用节点二即host4的mpi程序,报错如下。
$ mpirun -np 1 --host host4 ./main
[[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ess_env_module.c at line 367
[[INVALID],INVALID]-[[59225,0],0] mca_oob_tcp_peer_try_connect: connect to 255.255.255.255:51754 failed: Network is unreachable (101)
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:
* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default
* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.
* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.
* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.
* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------
解决方案
在确保节点一和节点二都能单机运行OpenMPI程序的前提下,检查两个节点的OpenMPI版本是否一致。如果不一致,重装OpenMPI使之版本一致。
参考资料
[1. OpenMPI报错问题] https://www.slothparadise.com/fix-orte-error-unknown-option-hnp-topo-sig/
更多推荐
所有评论(0)