GNU bug report logs -
#43802
Knot: Linker runs very slowly and crashes during build
Previous Next
Reported by: Simon South <simon <at> simonsouth.net>
Date: Sun, 4 Oct 2020 21:01:02 UTC
Severity: normal
Tags: notabug
Done: Simon South <simon <at> simonsouth.net>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 43802 in the body.
You can then email your comments to 43802 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guix <at> gnu.org
:
bug#43802
; Package
guix
.
(Sun, 04 Oct 2020 21:01:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Simon South <simon <at> simonsouth.net>
:
New bug report received and forwarded. Copy sent to
bug-guix <at> gnu.org
.
(Sun, 04 Oct 2020 21:01:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Building Knot 3.0.0 using "guix build knot" consistently appears to hang
for me when it gets to this point during the linking stage:
CCLD knsec3hash
ar: `u' modifier ignored since `D' is the default (see `U')
CCLD kdig
CCLD khost
While it sits here the compiler is tying up 100% of a single CPU
core. On my ROCK64 with 4 GB of RAM, it eventually crashes with an
internal error:
gcc: internal compiler error: Killed (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.
make[3]: *** [Makefile:5381: libzscanner/la-scanner.lo] Error 1
make[3]: Leaving directory '/tmp/guix-build-knot-3.0.0.drv-0/knot-3.0.0/src'
dmesg shows the compiler was killed for running out of memory:
cc1 invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
CPU: 2 PID: 22340 Comm: cc1 Not tainted 5.8.11-gnu #1
(...)
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=cc1,pid=22340,uid=999
Out of memory: Killed process 22340 (cc1) total-vm:2573780kB, anon-rss:2540708kB, file-rss:0kB, shmem-rss:0kB, UID:999 pgtables:5044kB oom_score_adj:0
oom_reaper: reaped process 22340 (cc1), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
On my x86_64 machine the build eventually completes (that machine has
much more memory), but there is the same, weirdly long delay during
linking while the compiler runs.
I see no such delay however when I build the code "manually", using
"guix environment --pure knot" or even "guix environment --no-grafts
--container knot" as the manual suggests. The build then completes
quickly and successfully on either machine; the problem appears to
happen only when guix-daemon is involved.
Is there a known issue that can cause the linker to consume orders of
magnitude more resources when run by the Guix build process?
Apart from rebuilding gcc with debugging symbols (which seems to make
Guix want to rebuild every other package in the system as well) and
trying to understand what the compiler is doing, how might I go about
diagnosing this?
--
Simon South
simon <at> simonsouth.net
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43802
; Package
guix
.
(Sun, 04 Oct 2020 23:06:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 43802 <at> debbugs.gnu.org (full text, mbox):
So naturally, as soon as I submit the bug report something occurs to me
that gets me unstuck.
The delay and crash are occuring while libtool is using gcc to compile
src/libzscanner/scanner.c, which appears to be generated at build time
from the file scanner.c.t0 in the same directory.
When I build Knot on my own, scanner.c has a size of 272 KB. When guix
builds it, scanner.c somehow balloons out to 1.9 MB! So naturally gcc is
going to need some time and space to make its way through all that code.
In fact the build process actually points out
NOTE: Compilation of scanner.c can take several minutes!
So perhaps all this is completely expected. Still... 1.9 MB. Of C
code. It's tempting to think something is going wrong here. (And anyway,
why the huge discrepancy in file size?)
I'm investigating.
--
Simon South
simon <at> simonsouth.net
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43802
; Package
guix
.
(Mon, 05 Oct 2020 00:14:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 43802 <at> debbugs.gnu.org (full text, mbox):
Turns out this is not a bug. Knot ships with two parser implementations:
A smaller, slower one (272 KB) and a larger, faster one (1.9 MB). The
larger one is a bit too big to build reliably on systems with 4 GB or
less of available memory.
To test Knot on these machines, you can run "configure" with
"--disable-fastparser" as an argument (or edit gnu/packages/dns.scm to
do so) to force it to use the smaller parser. This also allows the build
to complete more quickly on systems that can use either.
So how was I getting the smaller implementation in my own builds without
realizing it? The configure script has some magical behaviour: It will
automatically select the faster-building implementation if it finds a
".git" folder in the current directory. This is presumably meant to help
developers, but the confusion it caused me demonstrates why I think this
sort of magical programming is bad practice.
At any rate, this bug report can be closed.
--
Simon South
simon <at> simonsouth.net
Added tag(s) notabug.
Request was from
Simon South <simon <at> simonsouth.net>
to
control <at> debbugs.gnu.org
.
(Mon, 05 Oct 2020 00:21:01 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
43802 <at> debbugs.gnu.org and Simon South <simon <at> simonsouth.net>
Request was from
Simon South <simon <at> simonsouth.net>
to
control <at> debbugs.gnu.org
.
(Mon, 05 Oct 2020 00:21:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43802
; Package
guix
.
(Mon, 05 Oct 2020 14:17:01 GMT)
Full text and
rfc822 format available.
Message #18 received at 43802 <at> debbugs.gnu.org (full text, mbox):
Hi,
Simon South <simon <at> simonsouth.net> skribis:
> Building Knot 3.0.0 using "guix build knot" consistently appears to hang
> for me when it gets to this point during the linking stage:
>
> CCLD knsec3hash
> ar: `u' modifier ignored since `D' is the default (see `U')
> CCLD kdig
> CCLD khost
>
> While it sits here the compiler is tying up 100% of a single CPU
> core. On my ROCK64 with 4 GB of RAM, it eventually crashes with an
> internal error:
>
> gcc: internal compiler error: Killed (program cc1)
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <https://gcc.gnu.org/bugs/> for instructions.
> make[3]: *** [Makefile:5381: libzscanner/la-scanner.lo] Error 1
> make[3]: Leaving directory '/tmp/guix-build-knot-3.0.0.drv-0/knot-3.0.0/src'
>
> dmesg shows the compiler was killed for running out of memory:
>
> cc1 invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
> CPU: 2 PID: 22340 Comm: cc1 Not tainted 5.8.11-gnu #1
> (...)
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=cc1,pid=22340,uid=999
> Out of memory: Killed process 22340 (cc1) total-vm:2573780kB, anon-rss:2540708kB, file-rss:0kB, shmem-rss:0kB, UID:999 pgtables:5044kB oom_score_adj:0
> oom_reaper: reaped process 22340 (cc1), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
>
> On my x86_64 machine the build eventually completes (that machine has
> much more memory), but there is the same, weirdly long delay during
> linking while the compiler runs.
I this an LTO build (with ‘-flto’ in the compile and link flags)? That
could explain the memory requirements.
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43802
; Package
guix
.
(Mon, 05 Oct 2020 15:27:02 GMT)
Full text and
rfc822 format available.
Message #21 received at 43802 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Simon,
Would it make sense to provide a faster-building slower-starting
Knot variant alongside the main package?
Ludovic Courtès 写道:
> I this an LTO build (with ‘-flto’ in the compile and link
> flags)? That
> could explain the memory requirements.
No, but good guess.
Simon South 写道:
> Turns out this is not a bug.
The fast parser is written in Ragel[0], which compiles down to
almost 2 MiB of ‘C’, which is then thrown at GCC to sort out. I
know to put the kettle on before hacking on Knot locally.
What I didn't know was that these generated C files were included
in the release tarball. We have the Ragel, we can rebuild them,
and we now do so in commit
2b73e50c31a61b5dcef35a1e4b9484d9dbcb0fbc. Thanks for bringing it
to my attention.
Kind regards,
T G-R
[0]: http://www.colm.net/open-source/ragel/
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43802
; Package
guix
.
(Mon, 05 Oct 2020 15:49:02 GMT)
Full text and
rfc822 format available.
Message #24 received at 43802 <at> debbugs.gnu.org (full text, mbox):
Tobias Geerinckx-Rice <me <at> tobias.gr> writes:
> Would it make sense to provide a faster-building slower-starting Knot
> variant alongside the main package?
I'm inclined to say "no", especially if we assume a substitute will
(nearly always) be available.
Unless someone is hacking on the scanner directly it ought to be safe to
add "--disable-fastparser" to dns.scm temporarily during testing, then
remove it before submitting a patch. If it isn't then probably _that_ is
the bug to be fixed.
> What I didn't know was that these generated C files were included in
> the release tarball. We have the Ragel, we can rebuild them, and we
> now do so in commit 2b73e50c31a61b5dcef35a1e4b9484d9dbcb0fbc.
Neat!
--
Simon South
simon <at> simonsouth.net
Information forwarded
to
bug-guix <at> gnu.org
:
bug#43802
; Package
guix
.
(Wed, 07 Oct 2020 22:07:01 GMT)
Full text and
rfc822 format available.
Message #27 received at 43802 <at> debbugs.gnu.org (full text, mbox):
Simon South <simon <at> simonsouth.net> skribis:
>> What I didn't know was that these generated C files were included in
>> the release tarball. We have the Ragel, we can rebuild them, and we
>> now do so in commit 2b73e50c31a61b5dcef35a1e4b9484d9dbcb0fbc.
>
> Neat!
+1, yay for bootstrapping!
Ludo’.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 05 Nov 2020 12:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 285 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.