GNU bug report logs - #22624
[bug-coreutils] coreutils-8.25: big success, but problem on GNU/Hurd

Previous Next

Package: coreutils;

Reported by: "Nelson H. F. Beebe" <beebe <at> math.utah.edu>

Date: Wed, 10 Feb 2016 21:59:02 UTC

Severity: normal

Tags: fixed

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22624 in the body.
You can then email your comments to 22624 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#22624; Package coreutils. (Wed, 10 Feb 2016 21:59:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Nelson H. F. Beebe" <beebe <at> math.utah.edu>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 10 Feb 2016 21:59:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Nelson H. F. Beebe" <beebe <at> math.utah.edu>
To: bug-coreutils <at> gnu.org
Cc: beebe <at> math.utah.edu
Subject: [bug-coreutils] coreutils-8.25: big success, but problem on GNU/Hurd
Date: Wed, 10 Feb 2016 14:57:53 -0700
I'm pleased to report successful builds, validations, and
installations of coreutils-8.25 on at least 72 of the 77 machines in
our lab running various flavors of Unix.

The one problematic system is GNU/Hurd, aka Debian GNU/Hurd
stretch/sid.  We ran Hurd on VMware/ESX for a couple of years, but it
was never stable, and crashed or hung every few hours.  Every such
failure requires a manual fsck on reboot, preventing automated
recovery.

Last summer, I moved Hurd to virt-manager + QEMU on my desktop, where
it has proved substantially more stable, sometimes staying up for many
days.

Debian GNU/Hurd has about 47,580 packages available in the Debian
apt-get system, so others have clearly done a lot of work on it.
There are major, and reasonably-current, packages like these available
via apt-get:

	/usr/bin/clang-3.6 --version
	Debian clang version 3.6.2-1 (tags/RELEASE_362/final) (based on LLVM 3.6.2)
	Target: i386-pc--gnu
	Thread model: posix

	/usr/bin/gcc --version
	gcc (Debian 5.2.1-26) 5.2.1 20151125

	/bin/ls --version
	ls (GNU coreutils) 8.23

With builds of coreutils-8.25 at my site, the "make check" run ALWAYS
hangs Hurd, requiring a reboot and an fsck.

I've just made further experiments that confirm that the hang always
happens in the same place, about 60 seconds after starting this
command:

	$ make check
	... lots of PASS reports, except FAIL in tests/misc/kill.sh and tests/split/filter.sh ...
	PASS: tests/split/b-chunk.sh
	PASS: tests/split/fail.sh
	PASS: tests/split/lines.sh
	line-bytes.sh: skipped test: this shell lacks ulimit support
	SKIP: tests/split/line-bytes.sh
	Timeout, server 192.168.122.66 not responding.

The default memory size is 1GB, but today I got the same results when
the VM was restarted with 2GB and with 8GB.

I have also run the "make check" in a console window, eliminating
possible network timeouts from the dataflow, with "top" running in a
separate xterm + ssh window, and got this output at the point of the
hang:

	# in console window
	SKIP: tests/split/line-bytes.sh
	no more room for vm_map_find_entry in 8022b080
	no more room for kmem_realloc in 8022b080
	/hurd/mach-defpage: panic: (default pager):

	# in simulataneous xterm window
	% top 
	top - 14:10:49 up 10 min,  8 users,  load average: 0.55, 1.33, 1.46
	Tasks:  74 total,   2 running,  69 sleeping,   0 stopped,   0 zombie
	%Cpu(s):  54.3 us,   0.0 sy,   0.0 ni,  45.7 id,   0.0 wa,   0.0 hi,   0.0 si
	KiB Mem:   1900540 total,  1550052 used,   350488 free,        0 buffers
	KiB Swap:        0 total,        0 used,        0 free.     1792 cached Mem

	  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND     
	 1015 beebe     20   0  151616    144      0 S  0.0  0.0   0:00.02 -bash       
	 1081 beebe     30  10  150060    148      0 S  0.0  0.0   0:00.00 time        

The coreutils developers should probably not view this as a coreutils
bug, because Hurd has many oddities, and the pager-panic report
definitely suggests a kernel issue.

However, because coreutils has long been built and distributed on
Hurd, I thought it would be worthwhile to at least report my
experience, in the hope that other list members with GNU/Hurd systems
might be able to report their own result with the latest coreutils.

I unfortunately do not have any spare physical hardware on which to
run GNU/Hurd; my only access to it is on virtual machines.

My desktop is currently running 18 different VMs, on top of its CentOS
7 base operating system.  Apart from GNU/Hurd, all of the others have
been perfectly stable for 4 to 6 months of operation, so I think that
it is unlikely that the above failure is due to the virtual machine
environment.  There are two significant differences, however: the
others have virtual SATA disks and are 64-bit systems, whereas Hurd
supports only (virtual) EIDE disks, and is a 32-bit system.  Our
suspicions of the instability of Hurd on VMware/ESX have to do with
the EIDE virtual disk system, which may have been less well tested
than SATA.



-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe <at> math.utah.edu  -
- 155 S 1400 E RM 233                       beebe <at> acm.org  beebe <at> computer.org -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------




Information forwarded to bug-coreutils <at> gnu.org:
bug#22624; Package coreutils. (Thu, 11 Feb 2016 02:40:01 GMT) Full text and rfc822 format available.

Message #8 received at 22624 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Nelson H. F. Beebe" <beebe <at> math.utah.edu>, 22624 <at> debbugs.gnu.org
Subject: Re: bug#22624: [bug-coreutils] coreutils-8.25: big success, but
 problem on GNU/Hurd
Date: Wed, 10 Feb 2016 18:39:50 -0800
[Message part 1 (text/plain, inline)]
On 02/10/2016 01:57 PM, Nelson H. F. Beebe wrote:
> 	SKIP: tests/split/line-bytes.sh
> 	Timeout, server 192.168.122.66 not responding.
I presume the test that crashes your system is tests/split/l-chunk.sh, 
which invokes commands like 'split -n l/10 /dev/null' and 'split -n 1/2 
/dev/zero'.

This sounds like <http://bugs.gnu.org/11424>, which was reported for 
GNU/Hurd. Most likely GNU/Hurd is reporting an st_size of OFF_T_MAX for 
/dev/zero, and this is messing up 'split'. I will look into fixing this; 
I expect that the bottom line is that split should not trust st_size for 
special files like /dev/zero.
[hurdtest.c (text/plain, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#22624; Package coreutils. (Thu, 11 Feb 2016 03:09:02 GMT) Full text and rfc822 format available.

Message #11 received at 22624 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: "Nelson H. F. Beebe" <beebe <at> math.utah.edu>, 22624 <at> debbugs.gnu.org
Subject: Re: bug#22624: [bug-coreutils] coreutils-8.25: big success, but
 problem on GNU/Hurd
Date: Wed, 10 Feb 2016 19:08:43 -0800
On 10/02/16 13:57, Nelson H. F. Beebe wrote:
> I'm pleased to report successful builds, validations, and
> installations of coreutils-8.25 on at least 72 of the 77 machines in
> our lab running various flavors of Unix.

Looks like were improving well in portability :)
Many thanks for giving access to, support for,
and now verification on these machines.
It's been invaluable.

> The one problematic system is GNU/Hurd, aka Debian GNU/Hurd

Paul looks to be on the right track here.

cheers,
Pádraig.





Information forwarded to bug-coreutils <at> gnu.org:
bug#22624; Package coreutils. (Thu, 11 Feb 2016 08:32:01 GMT) Full text and rfc822 format available.

Message #14 received at 22624 <at> debbugs.gnu.org (full text, mbox):

From: Bernhard Voelker <mail <at> bernhard-voelker.de>
To: Paul Eggert <eggert <at> cs.ucla.edu>, "Nelson H. F. Beebe"
 <beebe <at> math.utah.edu>, 22624 <at> debbugs.gnu.org
Subject: Re: bug#22624: [bug-coreutils] coreutils-8.25: big success, but
 problem on GNU/Hurd
Date: Thu, 11 Feb 2016 09:30:31 +0100
--- hurdtest.c-ORIG     2016-02-11 09:27:57.422023914 +0100
+++ hurdtest.c  2016-02-11 09:28:29.781433313 +0100
@@ -10,7 +10,7 @@
   struct stat st;
   off_t cur_offset;
   off_t end_offset;
-  int fd = open ("/dev/zero", O_RDONLY);
+  int fd = open (file, O_RDONLY);
   printf ("file=%s\n", file);
   if (fd < 0)
     return perror ("open"), 1;
@@ -58,7 +58,7 @@
 main (int argc, char **argv)
 {
   static char dev_zero[] = "/dev/zero";
-  static char dev_null[] = "/dev/zero";
+  static char dev_null[] = "/dev/null";
   static char *dev_zero_argv[] = { dev_zero, dev_null, 0 };
   char **av = argc == 1 ? dev_zero_argv : argv + 1;
   while (*av)




Information forwarded to bug-coreutils <at> gnu.org:
bug#22624; Package coreutils. (Thu, 11 Feb 2016 16:12:01 GMT) Full text and rfc822 format available.

Message #17 received at 22624 <at> debbugs.gnu.org (full text, mbox):

From: "Nelson H. F. Beebe" <beebe <at> math.utah.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: sysstaff <at> math.utah.edu, "Nelson H. F. Beebe" <beebe <at> math.utah.edu>,
 22624 <at> debbugs.gnu.org
Subject: Re: bug#22624: [bug-coreutils] coreutils-8.25: big success, but
 problem on GNU/Hurd
Date: Thu, 11 Feb 2016 09:10:51 -0700
Thanks, Paul, for hurdtest.c and the subsequent tiny patch to it.
Here is the test on my GNU/Hurd system on virt-manager + QEMU-KVM
on top of CentOS 7:

$ cc hurdtest.c && time ./a.out
file=/dev/zero
CHR
st_size=9223372036854775807
st_blksize=8192
st_blocks=8
cur_offset=0
end_offset=9223372036854775807
pagesize=4096

file=/dev/null
CHR
st_size=0
st_blksize=1048576
st_blocks=0
cur_offset=0
end_offset=0
pagesize=4096


real    0m0.010s
user    0m0.000s
sys     0m0.000s

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe <at> math.utah.edu  -
- 155 S 1400 E RM 233                       beebe <at> acm.org  beebe <at> computer.org -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------




Information forwarded to bug-coreutils <at> gnu.org:
bug#22624; Package coreutils. (Thu, 11 Feb 2016 17:44:01 GMT) Full text and rfc822 format available.

Message #20 received at 22624 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Nelson H. F. Beebe" <beebe <at> math.utah.edu>
Cc: sysstaff <at> math.utah.edu, 22624 <at> debbugs.gnu.org
Subject: Re: bug#22624: [bug-coreutils] coreutils-8.25: big success, but
 problem on GNU/Hurd
Date: Thu, 11 Feb 2016 09:43:11 -0800
[Message part 1 (text/plain, inline)]
On 02/11/2016 08:10 AM, Nelson H. F. Beebe wrote:
> end_offset=9223372036854775807
>

Thanks, that confirms my suspicions about GNU/Hurd. I'm attaching a 
proposed patch; please give it a try if you have a chance. Turned out to 
be trickier than I thought, but oh well.
[0001-split-fix-problems-with-dev-zero.patch (application/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#22624; Package coreutils. (Fri, 12 Feb 2016 04:14:02 GMT) Full text and rfc822 format available.

Message #23 received at 22624 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, "Nelson H. F. Beebe"
 <beebe <at> math.utah.edu>
Cc: sysstaff <at> math.utah.edu, 22624 <at> debbugs.gnu.org
Subject: Re: bug#22624: [bug-coreutils] coreutils-8.25: big success, but
 problem on GNU/Hurd
Date: Thu, 11 Feb 2016 20:13:27 -0800
On 11/02/16 09:43, Paul Eggert wrote:
> On 02/11/2016 08:10 AM, Nelson H. F. Beebe wrote:
>> end_offset=9223372036854775807
>>
> 
> Thanks, that confirms my suspicions about GNU/Hurd. I'm attaching a 
> proposed patch; please give it a try if you have a chance. Turned out to 
> be trickier than I thought, but oh well.

Thanks for working on this.
The changes look good, except for this:

  $ seq 1000 | split -n4
  $ seq 100000 | split -n4
  split: -: cannot determine file size: Illegal seek

I.E. it would be better to indicate immediately
if there is an issue determining the file size,
because it's a gotcha that may hit users as data increases,
and -n is complex enough anyway, that it's better to
do as much checking up front as possible.
I'd still disallow this case even for -n1 in case the
number was parameterized to number of CPUs or whatever.

A small point on the tests is that we use `returns_ 1 ... || fail=1`
rather than `... && fail=1` so that we catch seg faults etc. in tests.

thanks!
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#22624; Package coreutils. (Fri, 12 Feb 2016 18:19:01 GMT) Full text and rfc822 format available.

Message #26 received at 22624 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>,
 "Nelson H. F. Beebe" <beebe <at> math.utah.edu>
Cc: sysstaff <at> math.utah.edu, 22624 <at> debbugs.gnu.org
Subject: Re: bug#22624: [bug-coreutils] coreutils-8.25: big success, but
 problem on GNU/Hurd
Date: Fri, 12 Feb 2016 10:18:14 -0800
On 02/11/2016 08:13 PM, Pádraig Brady wrote:
> The changes look good, except for this:
>
>    $ seq 1000 | split -n4
>    $ seq 100000 | split -n4
>    split: -: cannot determine file size: Illegal seek
>
> I.E. it would be better to indicate immediately
> if there is an issue determining the file size,
> because it's a gotcha that may hit users as data increases,
> and -n is complex enough anyway, that it's better to
> do as much checking up front as possible.
> I'd still disallow this case even for -n1 in case the
> number was parameterized to number of CPUs or whatever.

Hmm, well, I already spent too much time on this so I think I'll check 
in what I have (since it fixes the GNU/Hurd problem) and let it 
percolate a bit first.

I have some qualms about the approach suggested above, as it would cause 
'split' to give up on files that it currently handles (e.g., typical 
files in /proc), on the theory that we don't want to spoil users into 
thinking that 'split' can handle larger files. It'd be better to fix 
'split' to handle the larger files. It could do this for a troublesome 
case (e.g., a large /proc file) by copying the file's data into the 
first output file F1, then doing a split-in-place from F1 to the 
remaining output files F2 ... Fn (this would be done by copying to F2 
... Fn and then truncating F1). If the input file is /dev/zero, though, 
'split' should just give up right away as it does now, as there's no 
point in copying forever. Anyway, I view this as relatively low 
priority, as the troublesome cases should be quite rare in practice.

> A small point on the tests is that we use `returns_ 1 ... || fail=1`
> rather than `... && fail=1` so that we catch seg faults etc. in tests.
Thanks, I fixed that before installing the patch.




Information forwarded to bug-coreutils <at> gnu.org:
bug#22624; Package coreutils. (Fri, 12 Feb 2016 19:05:01 GMT) Full text and rfc822 format available.

Message #29 received at 22624 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Nelson H. F. Beebe" <beebe <at> math.utah.edu>, P <at> draigBrady.com,
 22624 <at> debbugs.gnu.org
Subject: Re: bug#22624: [bug-coreutils] coreutils-8.25: big success, but
 problem on GNU/Hurd
Date: Fri, 12 Feb 2016 11:04:24 -0800
[Message part 1 (text/plain, inline)]
On 02/12/2016 06:25 AM, Nelson H. F. Beebe wrote:
> The effectively-zero load, and no CPU consumption, suggests that the
> "make check" run is in a wait state.

Thanks for checking. I installed the attached patch to try to fix the 
test script so that it doesn't hang forever in this situation. The old 
test had a race condition anyway.

I don't know if the failure reflects a bug in coreutils, or in bash, or 
in GNU/Hurd elsewhere. If the revised test passes I guess we don't need 
to worry about it.
[0001-tests-don-t-wait-forever-on-GNU-Hurd.patch (application/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#22624; Package coreutils. (Sat, 13 Feb 2016 05:08:02 GMT) Full text and rfc822 format available.

Message #32 received at 22624 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, "Nelson H. F. Beebe"
 <beebe <at> math.utah.edu>
Cc: sysstaff <at> math.utah.edu, 22624 <at> debbugs.gnu.org
Subject: Re: bug#22624: [bug-coreutils] coreutils-8.25: big success, but
 problem on GNU/Hurd
Date: Fri, 12 Feb 2016 21:07:06 -0800
[Message part 1 (text/plain, inline)]
On 12/02/16 10:18, Paul Eggert wrote:
> On 02/11/2016 08:13 PM, Pádraig Brady wrote:
>> The changes look good, except for this:
>>
>>    $ seq 1000 | split -n4
>>    $ seq 100000 | split -n4
>>    split: -: cannot determine file size: Illegal seek
>>
>> I.E. it would be better to indicate immediately
>> if there is an issue determining the file size,
>> because it's a gotcha that may hit users as data increases,
>> and -n is complex enough anyway, that it's better to
>> do as much checking up front as possible.
>> I'd still disallow this case even for -n1 in case the
>> number was parameterized to number of CPUs or whatever.
> 
> Hmm, well, I already spent too much time on this so I think I'll check 
> in what I have (since it fixes the GNU/Hurd problem) and let it 
> percolate a bit first.
> 
> I have some qualms about the approach suggested above, as it would cause 
> 'split' to give up on files that it currently handles (e.g., typical 
> files in /proc), on the theory that we don't want to spoil users into 
> thinking that 'split' can handle larger files.

I've attached a patch that keeps support for /proc (seekable) files,
while immediately failing for pipes.  Also it fixes a regression
for the the -n r/... case, where it again exits immediately
when all --filters have exited.

> It'd be better to fix 
> 'split' to handle the larger files. It could do this for a troublesome 
> case (e.g., a large /proc file) by copying the file's data into the 
> first output file F1, then doing a split-in-place from F1 to the 
> remaining output files F2 ... Fn (this would be done by copying to F2 
> ... Fn and then truncating F1).

Clever. Theoretically that could support pipes as input too!
That also got me thinking that split(1) could be made very efficient
with an existing regular file, where reflink(range) is supported,
by reflinking the new files to the existing parts of the data.

> If the input file is /dev/zero, though, 
> 'split' should just give up right away as it does now, as there's no 
> point in copying forever.

+1

thanks,
Pádraig.
[split-n-fixes.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#22624; Package coreutils. (Thu, 25 Oct 2018 15:51:02 GMT) Full text and rfc822 format available.

Message #35 received at 22624 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: 22624 <at> debbugs.gnu.org
Subject: Re: bug#22624: [bug-coreutils] coreutils-8.25: big success, but
 problem on GNU/Hurd
Date: Thu, 25 Oct 2018 09:50:19 -0600
tags 22624 fixed
close 22624
stop

(triaging old bugs)

With fixes commited in:
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=632eda520f7cf49d9d1662835c7c37e17033e128
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=62e7af0326786a7dec91d982238948eddab9d6af

And no further comments in over a year,
I'm marking this as "fixed".

-assaf




Added tag(s) fixed. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Thu, 25 Oct 2018 15:51:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 22624 <at> debbugs.gnu.org and "Nelson H. F. Beebe" <beebe <at> math.utah.edu> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Thu, 25 Oct 2018 15:51:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 23 Nov 2018 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 213 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.