GNU bug report logs -
#22624
[bug-coreutils] coreutils-8.25: big success, but problem on GNU/Hurd
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22624 in the body.
You can then email your comments to 22624 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#22624
; Package
coreutils
.
(Wed, 10 Feb 2016 21:59:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"Nelson H. F. Beebe" <beebe <at> math.utah.edu>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Wed, 10 Feb 2016 21:59:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
I'm pleased to report successful builds, validations, and
installations of coreutils-8.25 on at least 72 of the 77 machines in
our lab running various flavors of Unix.
The one problematic system is GNU/Hurd, aka Debian GNU/Hurd
stretch/sid. We ran Hurd on VMware/ESX for a couple of years, but it
was never stable, and crashed or hung every few hours. Every such
failure requires a manual fsck on reboot, preventing automated
recovery.
Last summer, I moved Hurd to virt-manager + QEMU on my desktop, where
it has proved substantially more stable, sometimes staying up for many
days.
Debian GNU/Hurd has about 47,580 packages available in the Debian
apt-get system, so others have clearly done a lot of work on it.
There are major, and reasonably-current, packages like these available
via apt-get:
/usr/bin/clang-3.6 --version
Debian clang version 3.6.2-1 (tags/RELEASE_362/final) (based on LLVM 3.6.2)
Target: i386-pc--gnu
Thread model: posix
/usr/bin/gcc --version
gcc (Debian 5.2.1-26) 5.2.1 20151125
/bin/ls --version
ls (GNU coreutils) 8.23
With builds of coreutils-8.25 at my site, the "make check" run ALWAYS
hangs Hurd, requiring a reboot and an fsck.
I've just made further experiments that confirm that the hang always
happens in the same place, about 60 seconds after starting this
command:
$ make check
... lots of PASS reports, except FAIL in tests/misc/kill.sh and tests/split/filter.sh ...
PASS: tests/split/b-chunk.sh
PASS: tests/split/fail.sh
PASS: tests/split/lines.sh
line-bytes.sh: skipped test: this shell lacks ulimit support
SKIP: tests/split/line-bytes.sh
Timeout, server 192.168.122.66 not responding.
The default memory size is 1GB, but today I got the same results when
the VM was restarted with 2GB and with 8GB.
I have also run the "make check" in a console window, eliminating
possible network timeouts from the dataflow, with "top" running in a
separate xterm + ssh window, and got this output at the point of the
hang:
# in console window
SKIP: tests/split/line-bytes.sh
no more room for vm_map_find_entry in 8022b080
no more room for kmem_realloc in 8022b080
/hurd/mach-defpage: panic: (default pager):
# in simulataneous xterm window
% top
top - 14:10:49 up 10 min, 8 users, load average: 0.55, 1.33, 1.46
Tasks: 74 total, 2 running, 69 sleeping, 0 stopped, 0 zombie
%Cpu(s): 54.3 us, 0.0 sy, 0.0 ni, 45.7 id, 0.0 wa, 0.0 hi, 0.0 si
KiB Mem: 1900540 total, 1550052 used, 350488 free, 0 buffers
KiB Swap: 0 total, 0 used, 0 free. 1792 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1015 beebe 20 0 151616 144 0 S 0.0 0.0 0:00.02 -bash
1081 beebe 30 10 150060 148 0 S 0.0 0.0 0:00.00 time
The coreutils developers should probably not view this as a coreutils
bug, because Hurd has many oddities, and the pager-panic report
definitely suggests a kernel issue.
However, because coreutils has long been built and distributed on
Hurd, I thought it would be worthwhile to at least report my
experience, in the hope that other list members with GNU/Hurd systems
might be able to report their own result with the latest coreutils.
I unfortunately do not have any spare physical hardware on which to
run GNU/Hurd; my only access to it is on virtual machines.
My desktop is currently running 18 different VMs, on top of its CentOS
7 base operating system. Apart from GNU/Hurd, all of the others have
been perfectly stable for 4 to 6 months of operation, so I think that
it is unlikely that the above failure is due to the virtual machine
environment. There are two significant differences, however: the
others have virtual SATA disks and are 64-bit systems, whereas Hurd
supports only (virtual) EIDE disks, and is a 32-bit system. Our
suspicions of the instability of Hurd on VMware/ESX have to do with
the EIDE virtual disk system, which may have been less well tested
than SATA.
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: beebe <at> math.utah.edu -
- 155 S 1400 E RM 233 beebe <at> acm.org beebe <at> computer.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22624
; Package
coreutils
.
(Thu, 11 Feb 2016 02:40:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 22624 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 02/10/2016 01:57 PM, Nelson H. F. Beebe wrote:
> SKIP: tests/split/line-bytes.sh
> Timeout, server 192.168.122.66 not responding.
I presume the test that crashes your system is tests/split/l-chunk.sh,
which invokes commands like 'split -n l/10 /dev/null' and 'split -n 1/2
/dev/zero'.
This sounds like <http://bugs.gnu.org/11424>, which was reported for
GNU/Hurd. Most likely GNU/Hurd is reporting an st_size of OFF_T_MAX for
/dev/zero, and this is messing up 'split'. I will look into fixing this;
I expect that the bottom line is that split should not trust st_size for
special files like /dev/zero.
[hurdtest.c (text/plain, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22624
; Package
coreutils
.
(Thu, 11 Feb 2016 03:09:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 22624 <at> debbugs.gnu.org (full text, mbox):
On 10/02/16 13:57, Nelson H. F. Beebe wrote:
> I'm pleased to report successful builds, validations, and
> installations of coreutils-8.25 on at least 72 of the 77 machines in
> our lab running various flavors of Unix.
Looks like were improving well in portability :)
Many thanks for giving access to, support for,
and now verification on these machines.
It's been invaluable.
> The one problematic system is GNU/Hurd, aka Debian GNU/Hurd
Paul looks to be on the right track here.
cheers,
Pádraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22624
; Package
coreutils
.
(Thu, 11 Feb 2016 08:32:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 22624 <at> debbugs.gnu.org (full text, mbox):
--- hurdtest.c-ORIG 2016-02-11 09:27:57.422023914 +0100
+++ hurdtest.c 2016-02-11 09:28:29.781433313 +0100
@@ -10,7 +10,7 @@
struct stat st;
off_t cur_offset;
off_t end_offset;
- int fd = open ("/dev/zero", O_RDONLY);
+ int fd = open (file, O_RDONLY);
printf ("file=%s\n", file);
if (fd < 0)
return perror ("open"), 1;
@@ -58,7 +58,7 @@
main (int argc, char **argv)
{
static char dev_zero[] = "/dev/zero";
- static char dev_null[] = "/dev/zero";
+ static char dev_null[] = "/dev/null";
static char *dev_zero_argv[] = { dev_zero, dev_null, 0 };
char **av = argc == 1 ? dev_zero_argv : argv + 1;
while (*av)
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22624
; Package
coreutils
.
(Thu, 11 Feb 2016 16:12:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 22624 <at> debbugs.gnu.org (full text, mbox):
Thanks, Paul, for hurdtest.c and the subsequent tiny patch to it.
Here is the test on my GNU/Hurd system on virt-manager + QEMU-KVM
on top of CentOS 7:
$ cc hurdtest.c && time ./a.out
file=/dev/zero
CHR
st_size=9223372036854775807
st_blksize=8192
st_blocks=8
cur_offset=0
end_offset=9223372036854775807
pagesize=4096
file=/dev/null
CHR
st_size=0
st_blksize=1048576
st_blocks=0
cur_offset=0
end_offset=0
pagesize=4096
real 0m0.010s
user 0m0.000s
sys 0m0.000s
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: beebe <at> math.utah.edu -
- 155 S 1400 E RM 233 beebe <at> acm.org beebe <at> computer.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22624
; Package
coreutils
.
(Thu, 11 Feb 2016 17:44:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 22624 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 02/11/2016 08:10 AM, Nelson H. F. Beebe wrote:
> end_offset=9223372036854775807
>
Thanks, that confirms my suspicions about GNU/Hurd. I'm attaching a
proposed patch; please give it a try if you have a chance. Turned out to
be trickier than I thought, but oh well.
[0001-split-fix-problems-with-dev-zero.patch (application/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22624
; Package
coreutils
.
(Fri, 12 Feb 2016 04:14:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 22624 <at> debbugs.gnu.org (full text, mbox):
On 11/02/16 09:43, Paul Eggert wrote:
> On 02/11/2016 08:10 AM, Nelson H. F. Beebe wrote:
>> end_offset=9223372036854775807
>>
>
> Thanks, that confirms my suspicions about GNU/Hurd. I'm attaching a
> proposed patch; please give it a try if you have a chance. Turned out to
> be trickier than I thought, but oh well.
Thanks for working on this.
The changes look good, except for this:
$ seq 1000 | split -n4
$ seq 100000 | split -n4
split: -: cannot determine file size: Illegal seek
I.E. it would be better to indicate immediately
if there is an issue determining the file size,
because it's a gotcha that may hit users as data increases,
and -n is complex enough anyway, that it's better to
do as much checking up front as possible.
I'd still disallow this case even for -n1 in case the
number was parameterized to number of CPUs or whatever.
A small point on the tests is that we use `returns_ 1 ... || fail=1`
rather than `... && fail=1` so that we catch seg faults etc. in tests.
thanks!
Pádraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22624
; Package
coreutils
.
(Fri, 12 Feb 2016 18:19:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 22624 <at> debbugs.gnu.org (full text, mbox):
On 02/11/2016 08:13 PM, Pádraig Brady wrote:
> The changes look good, except for this:
>
> $ seq 1000 | split -n4
> $ seq 100000 | split -n4
> split: -: cannot determine file size: Illegal seek
>
> I.E. it would be better to indicate immediately
> if there is an issue determining the file size,
> because it's a gotcha that may hit users as data increases,
> and -n is complex enough anyway, that it's better to
> do as much checking up front as possible.
> I'd still disallow this case even for -n1 in case the
> number was parameterized to number of CPUs or whatever.
Hmm, well, I already spent too much time on this so I think I'll check
in what I have (since it fixes the GNU/Hurd problem) and let it
percolate a bit first.
I have some qualms about the approach suggested above, as it would cause
'split' to give up on files that it currently handles (e.g., typical
files in /proc), on the theory that we don't want to spoil users into
thinking that 'split' can handle larger files. It'd be better to fix
'split' to handle the larger files. It could do this for a troublesome
case (e.g., a large /proc file) by copying the file's data into the
first output file F1, then doing a split-in-place from F1 to the
remaining output files F2 ... Fn (this would be done by copying to F2
... Fn and then truncating F1). If the input file is /dev/zero, though,
'split' should just give up right away as it does now, as there's no
point in copying forever. Anyway, I view this as relatively low
priority, as the troublesome cases should be quite rare in practice.
> A small point on the tests is that we use `returns_ 1 ... || fail=1`
> rather than `... && fail=1` so that we catch seg faults etc. in tests.
Thanks, I fixed that before installing the patch.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22624
; Package
coreutils
.
(Fri, 12 Feb 2016 19:05:01 GMT)
Full text and
rfc822 format available.
Message #29 received at 22624 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 02/12/2016 06:25 AM, Nelson H. F. Beebe wrote:
> The effectively-zero load, and no CPU consumption, suggests that the
> "make check" run is in a wait state.
Thanks for checking. I installed the attached patch to try to fix the
test script so that it doesn't hang forever in this situation. The old
test had a race condition anyway.
I don't know if the failure reflects a bug in coreutils, or in bash, or
in GNU/Hurd elsewhere. If the revised test passes I guess we don't need
to worry about it.
[0001-tests-don-t-wait-forever-on-GNU-Hurd.patch (application/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22624
; Package
coreutils
.
(Sat, 13 Feb 2016 05:08:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 22624 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 12/02/16 10:18, Paul Eggert wrote:
> On 02/11/2016 08:13 PM, Pádraig Brady wrote:
>> The changes look good, except for this:
>>
>> $ seq 1000 | split -n4
>> $ seq 100000 | split -n4
>> split: -: cannot determine file size: Illegal seek
>>
>> I.E. it would be better to indicate immediately
>> if there is an issue determining the file size,
>> because it's a gotcha that may hit users as data increases,
>> and -n is complex enough anyway, that it's better to
>> do as much checking up front as possible.
>> I'd still disallow this case even for -n1 in case the
>> number was parameterized to number of CPUs or whatever.
>
> Hmm, well, I already spent too much time on this so I think I'll check
> in what I have (since it fixes the GNU/Hurd problem) and let it
> percolate a bit first.
>
> I have some qualms about the approach suggested above, as it would cause
> 'split' to give up on files that it currently handles (e.g., typical
> files in /proc), on the theory that we don't want to spoil users into
> thinking that 'split' can handle larger files.
I've attached a patch that keeps support for /proc (seekable) files,
while immediately failing for pipes. Also it fixes a regression
for the the -n r/... case, where it again exits immediately
when all --filters have exited.
> It'd be better to fix
> 'split' to handle the larger files. It could do this for a troublesome
> case (e.g., a large /proc file) by copying the file's data into the
> first output file F1, then doing a split-in-place from F1 to the
> remaining output files F2 ... Fn (this would be done by copying to F2
> ... Fn and then truncating F1).
Clever. Theoretically that could support pipes as input too!
That also got me thinking that split(1) could be made very efficient
with an existing regular file, where reflink(range) is supported,
by reflinking the new files to the existing parts of the data.
> If the input file is /dev/zero, though,
> 'split' should just give up right away as it does now, as there's no
> point in copying forever.
+1
thanks,
Pádraig.
[split-n-fixes.patch (text/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#22624
; Package
coreutils
.
(Thu, 25 Oct 2018 15:51:02 GMT)
Full text and
rfc822 format available.
Message #35 received at 22624 <at> debbugs.gnu.org (full text, mbox):
tags 22624 fixed
close 22624
stop
(triaging old bugs)
With fixes commited in:
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=632eda520f7cf49d9d1662835c7c37e17033e128
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=62e7af0326786a7dec91d982238948eddab9d6af
And no further comments in over a year,
I'm marking this as "fixed".
-assaf
Added tag(s) fixed.
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Thu, 25 Oct 2018 15:51:02 GMT)
Full text and
rfc822 format available.
bug closed, send any further explanations to
22624 <at> debbugs.gnu.org and "Nelson H. F. Beebe" <beebe <at> math.utah.edu>
Request was from
Assaf Gordon <assafgordon <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Thu, 25 Oct 2018 15:51:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 23 Nov 2018 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 6 years and 213 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.