GNU bug report logs - #77613
grep-3.11.69-a4628 on GNU/Hurd

Previous Next

Package: grep;

Reported by: Bruno Haible <bruno <at> clisp.org>

Date: Mon, 7 Apr 2025 17:34:02 UTC

Severity: normal

To reply to this bug, email your comments to 77613 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#77613; Package grep. (Mon, 07 Apr 2025 17:34:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bruno Haible <bruno <at> clisp.org>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Mon, 07 Apr 2025 17:34:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Bruno Haible <bruno <at> clisp.org>
To: bug-grep <at> gnu.org
Subject: Re: grep-3.11.69-a4628 on GNU/Hurd
Date: Mon, 07 Apr 2025 19:33:00 +0200
[Message part 1 (text/plain, inline)]
On
  - GNU/Hurd x86_64 from 2024,
  - GNU/Hurd i386 from 2023,
I see a test hang: hash-collision-perf.

On GNU/Hurd x86_64:

When I interrupted the build, the file 'in' has 5120000 lines, and
find attached the log file of this test. As you can see, the value of
small_ms stays 0 even for larger files.

By running
  $ date; LC_ALL=C ../../src/grep --file=in empty; date
I can see that the execution times grow like this:
  640000  0.3 sec
 1280000  0.9 sec
 2560000  1.5 sec
 5120000  > 60 sec

On GNU/Hurd i386, it's similar. Here it's when the file 'in' has
40960000 lines, that the grep execution hangs. Find attached the
last stack trace I was able to obtain before it hung.

Regardless how much RAM I give to the machine, there will always
be a point where "grep --file=in empty" will take more RAM than
available, and (since Hurd does not have an OOM killer) the machine
then hangs.

IMO, the correct behaviour would be that 'grep' exits via xalloc_die(),
not that it hangs.

Whereas on GNU/Linux (in a machine that has the same amount of RAM as
the GNU/Hurd machine):

  $ : > empty
  $ seq 640000 > in; LC_ALL=C time ./src/grep --file=in empty
  real 0.44s
  $ seq 1280000 > in; LC_ALL=C time ./src/grep --file=in empty
  real 0.99s
  $ seq 2560000 > in; LC_ALL=C time ./src/grep --file=in empty
  real 2.22s
  $ seq 5120000 > in; LC_ALL=C time ./src/grep --file=in empty
  real 4.84s
  $ seq 10240000 > in; LC_ALL=C time ./src/grep --file=in empty
  real 24.19s
  $ seq 20480000 > in; LC_ALL=C time ./src/grep --file=in empty
  Killed
  real 24.40s

Here it was the OOM killer that saved the machine from hanging.

So, IMO, there are two bugs:

  1) When the allocation of the kwset takes more memory than available,
     'grep' should exit via xalloc_die(), instead of waiting to be killed
     by the OOM killer.

  2) In the 'hash-collision-perf' unit test: The use of a perl primitive
     for measuring the execution time of a child process, that is not
     properly ported to GNU/Hurd.

Bruno
[hash-collision-perf.log (text/x-log, attachment)]
[last-stacktrace.png (image/png, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#77613; Package grep. (Tue, 08 Apr 2025 05:49:02 GMT) Full text and rfc822 format available.

Message #8 received at 77613 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Bruno Haible <bruno <at> clisp.org>
Cc: 77613 <at> debbugs.gnu.org
Subject: Re: bug#77613: grep-3.11.69-a4628 on GNU/Hurd
Date: Mon, 7 Apr 2025 22:47:45 -0700
[Message part 1 (text/plain, inline)]
On Mon, Apr 7, 2025 at 10:34 AM Bruno Haible via Bug reports for GNU
grep <bug-grep <at> gnu.org> wrote:
> On
>   - GNU/Hurd x86_64 from 2024,
>   - GNU/Hurd i386 from 2023,
> I see a test hang: hash-collision-perf.
>
> On GNU/Hurd x86_64:
>
> When I interrupted the build, the file 'in' has 5120000 lines, and
> find attached the log file of this test. As you can see, the value of
> small_ms stays 0 even for larger files.
>
> By running
>   $ date; LC_ALL=C ../../src/grep --file=in empty; date
> I can see that the execution times grow like this:
>   640000  0.3 sec
>  1280000  0.9 sec
>  2560000  1.5 sec
>  5120000  > 60 sec
>
> On GNU/Hurd i386, it's similar. Here it's when the file 'in' has
> 40960000 lines, that the grep execution hangs. Find attached the
> last stack trace I was able to obtain before it hung.
>
> Regardless how much RAM I give to the machine, there will always
> be a point where "grep --file=in empty" will take more RAM than
> available, and (since Hurd does not have an OOM killer) the machine
> then hangs.
>
> IMO, the correct behaviour would be that 'grep' exits via xalloc_die(),
> not that it hangs.
>
> Whereas on GNU/Linux (in a machine that has the same amount of RAM as
> the GNU/Hurd machine):
>
>   $ : > empty
>   $ seq 640000 > in; LC_ALL=C time ./src/grep --file=in empty
>   real 0.44s
>   $ seq 1280000 > in; LC_ALL=C time ./src/grep --file=in empty
>   real 0.99s
>   $ seq 2560000 > in; LC_ALL=C time ./src/grep --file=in empty
>   real 2.22s
>   $ seq 5120000 > in; LC_ALL=C time ./src/grep --file=in empty
>   real 4.84s
>   $ seq 10240000 > in; LC_ALL=C time ./src/grep --file=in empty
>   real 24.19s
>   $ seq 20480000 > in; LC_ALL=C time ./src/grep --file=in empty
>   Killed
>   real 24.40s
>
> Here it was the OOM killer that saved the machine from hanging.
>
> So, IMO, there are two bugs:
>
>   1) When the allocation of the kwset takes more memory than available,
>      'grep' should exit via xalloc_die(), instead of waiting to be killed
>      by the OOM killer.
>
>   2) In the 'hash-collision-perf' unit test: The use of a perl primitive
>      for measuring the execution time of a child process, that is not
>      properly ported to GNU/Hurd.

Thanks for reporting that!
Adding a timeout should resolve this. Expect to push tomorrow:
[gr-Hurd-hang.diff (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#77613; Package grep. (Tue, 08 Apr 2025 07:55:01 GMT) Full text and rfc822 format available.

Message #11 received at 77613 <at> debbugs.gnu.org (full text, mbox):

From: Bruno Haible <bruno <at> clisp.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 77613 <at> debbugs.gnu.org
Subject: Re: bug#77613: grep-3.11.69-a4628 on GNU/Hurd
Date: Tue, 08 Apr 2025 09:54:33 +0200
[Message part 1 (text/plain, inline)]
Hi Jim,

> > So, IMO, there are two bugs:
> >
> >   1) When the allocation of the kwset takes more memory than available,
> >      'grep' should exit via xalloc_die(), instead of waiting to be killed
> >      by the OOM killer.
> >
> >   2) In the 'hash-collision-perf' unit test: The use of a perl primitive
> >      for measuring the execution time of a child process, that is not
> >      properly ported to GNU/Hurd.
> 
> Thanks for reporting that!
> Adding a timeout should resolve this. Expect to push tomorrow:

No, it does not resolve the problem.

In both of my Hurd machines, with the patch, the 'hash-collision-perf'
unit test is still running after 20 minutes.
In the Hurd (32-bit) machine, a 'grep --file=in empty' command crashed from
signal 6 (SIGABRT); see attached screenshot.
Both machines are unresponsive and need to be rebooted.
[hurd-hang.png (image/png, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#77613; Package grep. (Tue, 08 Apr 2025 14:28:02 GMT) Full text and rfc822 format available.

Message #14 received at 77613 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Bruno Haible <bruno <at> clisp.org>
Cc: 77613 <at> debbugs.gnu.org
Subject: Re: bug#77613: grep-3.11.69-a4628 on GNU/Hurd
Date: Tue, 8 Apr 2025 07:26:40 -0700
[Message part 1 (text/plain, inline)]
On Tue, Apr 8, 2025 at 12:54 AM Bruno Haible <bruno <at> clisp.org> wrote:
> Hi Jim,
>
> > > So, IMO, there are two bugs:
> > >
> > >   1) When the allocation of the kwset takes more memory than available,
> > >      'grep' should exit via xalloc_die(), instead of waiting to be killed
> > >      by the OOM killer.
> > >
> > >   2) In the 'hash-collision-perf' unit test: The use of a perl primitive
> > >      for measuring the execution time of a child process, that is not
> > >      properly ported to GNU/Hurd.
> >
> > Thanks for reporting that!
> > Adding a timeout should resolve this. Expect to push tomorrow:
>
> No, it does not resolve the problem.
>
> In both of my Hurd machines, with the patch, the 'hash-collision-perf'
> unit test is still running after 20 minutes.
> In the Hurd (32-bit) machine, a 'grep --file=in empty' command crashed from
> signal 6 (SIGABRT); see attached screenshot.
> Both machines are unresponsive and need to be rebooted.

Oh! Sorry. I made only the final invocation use the timeout. Must use
it in the loop, too.
Here's a better patch:
[gr-Hurd-hang.diff (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#77613; Package grep. (Tue, 08 Apr 2025 15:03:02 GMT) Full text and rfc822 format available.

Message #17 received at 77613 <at> debbugs.gnu.org (full text, mbox):

From: Bruno Haible <bruno <at> clisp.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 77613 <at> debbugs.gnu.org
Subject: Re: bug#77613: grep-3.11.69-a4628 on GNU/Hurd
Date: Tue, 08 Apr 2025 17:02:11 +0200
Jim Meyering wrote:
> Oh! Sorry. I made only the final invocation use the timeout. Must use
> it in the loop, too.
> Here's a better patch:

The 'hash-collision-perf' test still hangs both of my GNU/Hurd machines.
One of the machines now says:
        vm_page warning: unable to recycle any page

The reason is that you terminate the loop if small_ms >= 200, but
small_ms is always 0, each time.

Bruno







Information forwarded to bug-grep <at> gnu.org:
bug#77613; Package grep. (Tue, 08 Apr 2025 16:20:01 GMT) Full text and rfc822 format available.

Message #20 received at 77613 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Bruno Haible <bruno <at> clisp.org>
Cc: 77613 <at> debbugs.gnu.org
Subject: Re: bug#77613: grep-3.11.69-a4628 on GNU/Hurd
Date: Tue, 8 Apr 2025 09:19:10 -0700
[Message part 1 (text/plain, inline)]
On Tue, Apr 8, 2025 at 8:02 AM Bruno Haible <bruno <at> clisp.org> wrote:
> Jim Meyering wrote:
> > Oh! Sorry. I made only the final invocation use the timeout. Must use
> > it in the loop, too.
> > Here's a better patch:
>
> The 'hash-collision-perf' test still hangs both of my GNU/Hurd machines.
> One of the machines now says:
>         vm_page warning: unable to recycle any page
>
> The reason is that you terminate the loop if small_ms >= 200, but
> small_ms is always 0, each time.

Thanks again. This one should do it: skipping the test in that case.
[gr-Hurd-hang-skip.diff (application/octet-stream, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#77613; Package grep. (Tue, 08 Apr 2025 16:39:01 GMT) Full text and rfc822 format available.

Message #23 received at 77613 <at> debbugs.gnu.org (full text, mbox):

From: Bruno Haible <bruno <at> clisp.org>
To: Jim Meyering <jim <at> meyering.net>
Cc: 77613 <at> debbugs.gnu.org
Subject: Re: bug#77613: grep-3.11.69-a4628 on GNU/Hurd
Date: Tue, 08 Apr 2025 18:37:57 +0200
Jim Meyering wrote:
> Thanks again. This one should do it: skipping the test in that case.

Yes, this one does it. Now "make check" proceeds through all tests.
No failure.

Bruno







This bug report was last modified 69 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.