Package: guix;
Reported by: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Date: Mon, 1 Nov 2021 03:08:02 UTC
Severity: important
View this message in rfc822 format
From: Ludovic Courtès <ludo <at> gnu.org> To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com> Cc: Ricardo Wurmus <rekado <at> elephly.net>, 51536 <at> debbugs.gnu.org Subject: bug#51536: openblas builds not reproducible on different x86_64 machines Date: Thu, 03 Feb 2022 00:13:33 +0100
[Message part 1 (text/plain, inline)]
Hi! Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skribis: > Our OpenBLAS package uses DYNAMIC_ARCH=1 to provide optimizations for > all supported targets, at least of x86 and x86_64. In theory that seems > OK, but in practice the builds differ depending on the host CPU. What follows is the log of an investigation that didn’t find the root cause, but perhaps it’ll give us ideas… Right now the build results of ci.guix and bordeaux.guix differ: --8<---------------cut here---------------start------------->8--- $ guix describe Generacio 202 Jan 30 2022 23:57:03 (nuna) guix 43dd34c repository URL: https://git.savannah.gnu.org/git/guix.git branch: master commit: 43dd34c7777a212c99a97da7a2c237158faa9a1b ludo <at> ribbon ~/src/guix$ guix challenge openblas /gnu/store/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18 contents differ: no local build for '/gnu/store/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18' https://ci.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18: 0m1jlc26yrwxn8gxwpj8452kw4g84ywclh0hnab93873ifz87s5c https://bordeaux.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18: 1d0m9v3kpsqzplpl1law2lfhm6rrbhkkqsvh19dlg9wx45vbbvjb differing file: /lib/libopenblasp-r0.3.18.so 1 store items were analyzed: - 0 (0.0%) were identical - 1 (100.0%) differed - 0 (0.0%) were inconclusive --8<---------------cut here---------------end--------------->8--- To get an idea, I thought we could compare the two build logs: https://ci.guix.gnu.org/log/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18 https://bordeaux.guix.gnu.org/build/3fab433c-e7d3-498d-86f8-4bcd5da9c4db (Protip: I found the second one via <http://data.guix.gnu.org/gnu/store/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18>.) The “ar -ru ../libopenblasp-r0.3.18.a …” are apparently the same in both cases, which rules out the simple case of unsorted .o files. The .so on ci.guix is slightly bigger: --8<---------------cut here---------------start------------->8--- $ wget -qO - https://ci.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18| lzip -d | guix archive -x /tmp/o1 $ wget -qO - https://bordeaux.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18| lzip -d | guix archive -x /tmp/o2 $ ls -l /tmp/{o1,o2}/lib/libopenblasp-r0.3.18.so -r-xr-xr-x 1 ludo users 40538768 Jan 1 1970 /tmp/o1/lib/libopenblasp-r0.3.18.so -r-xr-xr-x 1 ludo users 40436368 Jan 1 1970 /tmp/o2/lib/libopenblasp-r0.3.18.so --8<---------------cut here---------------end--------------->8--- Both have the same symbols though, and in the same order: --8<---------------cut here---------------start------------->8--- $ diff -u <(objdump -T /tmp/o1/lib/libopenblasp-r0.3.18.so |cut -c 60- ) <(objdump -T /tmp/o2/lib/libopenblasp-r0.3.18.so |cut -c60- ) $ echo $? 0 --8<---------------cut here---------------end--------------->8--- … which suggests they include code optimized for the same micro-architectures because symbols include the name of the micro-architecture: --8<---------------cut here---------------start------------->8--- $ objdump -T /tmp/o1/lib/libopenblasp-r0.3.18.so |cut -c 60-|tail -10 csymm3m_RU cgemv_c_BARCELONA csymv_U_HASWELL dtrmm_iltncopy_CORE2 LAPACKE_dsytrs2 openblas_num_threads_env csycon_rook_ csytri_rook_ --8<---------------cut here---------------end--------------->8--- Some of the offsets differ though:
[Message part 2 (text/x-patch, inline)]
$ diff -u <(objdump -T /tmp/o1/lib/libopenblasp-r0.3.18.so ) <(objdump -T /tmp/o2/lib/libopenblasp-r0.3.18.so ) --- /dev/fd/63 2022-02-03 00:10:17.308357982 +0100 +++ /dev/fd/62 2022-02-03 00:10:17.276357923 +0100 @@ -1,5 +1,5 @@ -/tmp/o1/lib/libopenblasp-r0.3.18.so: format de fixer elf64-x86-64 +/tmp/o2/lib/libopenblasp-r0.3.18.so: format de fixer elf64-x86-64 DYNAMIC SYMBOL TABLE: 0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3.2 pthread_cond_signal @@ -91,57 +91,57 @@ 00000000013edb70 g DF .text 00000000000001be Base zgemm3m_incopyb_BULLDOZER 0000000000e6d200 g DF .text 0000000000002b06 Base strsm_kernel_RT_BOBCAT 0000000000512c00 g DF .text 0000000000000a0a Base zsymv_U_PRESCOTT -00000000023c7530 g DF .text 0000000000000201 Base LAPACKE_dpttrs_work +00000000023ae930 g DF .text 0000000000000201 Base LAPACKE_dpttrs_work 0000000000692000 g DF .text 0000000000000b89 Base srot_k_PENRYN 000000000179caa0 g DF .text 0000000000000200 Base dgemm_beta_HASWELL 0000000000a44690 g DF .text 00000000000004b4 Base dtrsm_iutucopy_OPTERON -000000000231cfc0 g DF .text 000000000000021d Base LAPACKE_sstein_work -0000000002327800 g DF .text 000000000000014b Base LAPACKE_ssytrd -0000000001ad9100 g DF .text 00000000000002aa Base chemm_outcopy_SKYLAKEX +00000000023043c0 g DF .text 000000000000021d Base LAPACKE_sstein_work +000000000230ec00 g DF .text 000000000000014b Base LAPACKE_ssytrd +0000000001acc900 g DF .text 00000000000002aa Base chemm_outcopy_SKYLAKEX 00000000017d6c10 g DF .text 0000000000000c38 Base cgemv_n_HASWELL -0000000002327b70 g DF .text 0000000000000143 Base LAPACKE_ssytrf +000000000230ef70 g DF .text 0000000000000143 Base LAPACKE_ssytrf 000000000018f010 g DF .text 000000000000025c Base cblas_stbmv 0000000000195a20 g DF .text 000000000000003b Base cblas_idamin -0000000002328d40 g DF .text 0000000000000101 Base LAPACKE_ssytri +0000000002310140 g DF .text 0000000000000101 Base LAPACKE_ssytri 000000000077be00 g DF .text 0000000000000e65 Base ztrsm_kernel_RN_PENRYN 0000000001583f20 g DF .text 0000000000001c22 Base dtrmm_iltucopy_STEAMROLLER -00000000021bf830 g DF .text 0000000000000527 Base ztbcon_ -0000000001a70630 g DF .text 00000000000001c7 Base dsymm_oltcopy_SKYLAKEX -000000000245a910 g DF .text 000000000000001b Base LAPACKE_zpp_nancheck +00000000021a6c30 g DF .text 0000000000000527 Base ztbcon_ +0000000001a640c0 g DF .text 000000000000066d Base dsymm_oltcopy_SKYLAKEX +0000000002441d10 g DF .text 000000000000001b Base LAPACKE_zpp_nancheck 000000000108ee20 g DF .text 000000000000014d Base zgemm3m_oncopyb_ATOM -0000000002409df0 g DF .text 000000000000035c Base LAPACKE_zgtsvx_work -0000000001e7d120 g DF .text 0000000000001743 Base dlatrs_ -0000000001e948a0 g DF .text 00000000000001d1 Base drscl_ +00000000023f11f0 g DF .text 000000000000035c Base LAPACKE_zgtsvx_work +0000000001e64520 g DF .text 0000000000001743 Base dlatrs_ +0000000001e7bca0 g DF .text 00000000000001d1 Base drscl_ 00000000019ac700 g DF .text 00000000000004bd Base zhemm3m_iucopyb_ZEN 00000000003c0f30 g DF .text 000000000000001e Base support_avx512_bf16 -0000000002329ac0 g DF .text 0000000000000107 Base LAPACKE_ssytrs +0000000002310ec0 g DF .text 0000000000000107 Base LAPACKE_ssytrs 0000000000f94890 g DF .text 00000000000002d3 Base ztrmm_oltncopy_BOBCAT
[Message part 3 (text/plain, inline)]
On #guix-hpc Ricardo mentioned encountering this reproducibility issue earlier. Ludo’.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.