GNU bug report logs -
#46229
rdma-core 33.x breaks InfiniBand support in Open MPI
Previous Next
Full log
View this message in rfc822 format
[Message part 1 (text/plain, inline)]
Your bug report
#46229: rdma-core 33.x breaks InfiniBand support in Open MPI
which was filed against the guix package, has been closed.
The explanation is attached below, along with your original report.
If you require more details, please reply to 46229 <at> debbugs.gnu.org.
--
46229: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=46229
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
Good news! This is fixed by:
https://git.savannah.gnu.org/cgit/guix.git/commit/?id=37e997bc7867901dc5eaf9060358dfddacae8dd6
Ludo’.
[Message part 3 (message/rfc822, inline)]
Hello,
We noticed that the recent rdma-core upgrade to 33.1¹ leads to segfaults
in InfiniBand related routines:
--8<---------------cut here---------------start------------->8---
$ guix time-machine --commit=23a5dcce1d893b8f5c5301ae3c1af863776ed3cf -- environment --pure --ad-hoc openmpi openssh intel-mpi-benchmarks --with-debug-info=rdma-core -- mpiexec -np 2 IMB-MPI1 PingPong
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 0 on node devel02 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
$ file core.20879
core.20879: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'IMB-MPI1 PingPong', real uid: 10218, effective uid: 10218, real gid: 11018, effective gid: 11018, execfn: '/gnu/store/ls8pkyi05iabk952x7gy545lc7zyr4cv-profile/bin/IMB-MPI1', platform: 'x86_64'
$ gdb /gnu/store/ls8pkyi05iabk952x7gy545lc7zyr4cv-profile/bin/IMB-MPI1 core.20879
(gdb) bt
#0 0x00007f93b2789e88 in ibv_cmd_create_cq ()
from /gnu/store/n52snxjsq25m1wgmm6h1v60myld8dyjr-rdma-core-33.1/lib/libibverbs.so.1
#1 0x00007f93b28c57bb in hfi1_create_cq ()
from /gnu/store/n52snxjsq25m1wgmm6h1v60myld8dyjr-rdma-core-33.1/lib/libibverbs/libhfi1verbs-rdmav33.so
#2 0x00007f93b2796331 in ibv_create_cq@@IBVERBS_1.1 ()
from /gnu/store/n52snxjsq25m1wgmm6h1v60myld8dyjr-rdma-core-33.1/lib/libibverbs.so.1
#3 0x00007f93b27c0a55 in opal_common_verbs_qp_test ()
from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libmca_common_verbs.so.40
#4 0x00007f93b27f4e83 in btl_openib_component_init ()
from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/openmpi/mca_btl_openib.so
#5 0x00007f93b4516aaf in mca_btl_base_select ()
from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libopen-pal.so.40
#6 0x00007f93b29552c2 in mca_bml_r2_component_init ()
from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/openmpi/mca_bml_r2.so
#7 0x00007f93b4b81b54 in mca_bml_base_init ()
from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libmpi.so.40
#8 0x00007f93b4bc4ef8 in ompi_mpi_init ()
from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libmpi.so.40
#9 0x00007f93b4b5ee55 in PMPI_Init_thread ()
from /gnu/store/sk7ngrmr529050qx4nn545lfcxxqkh6h-openmpi-4.0.5/lib/libmpi.so.40
#10 0x0000000000405b55 in main ()
--8<---------------cut here---------------end--------------->8---
Conversely, a pre-upgrade commit works fine:
--8<---------------cut here---------------start------------->8---
$ guix time-machine --commit=c2538db5617032788ac2f140496d00d8107579c8 -- environment --pure --ad-hoc openmpi openssh intel-mpi-benchmarks -- mpiexec -np 2 IMB-MPI1 PingPong
--8<---------------cut here---------------end--------------->8---
Does that ring a bell?
Thanks,
Ludo’.
¹ https://git.savannah.gnu.org/cgit/guix.git/commit/?id=c2739c0801ebc5461564e862ce8f08405e2782dc
This bug report was last modified 4 years and 165 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.