GNU bug report logs -
#7489
[coreutils] over aggressive threads in sort
Previous Next
Reported by: DJ Lucas <dj <at> linuxfromscratch.org>
Date: Fri, 26 Nov 2010 19:40:02 UTC
Severity: normal
Tags: fixed
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Paul Eggert wrote:
> On 11/29/2010 02:46 PM, Paul Eggert wrote:
>> My current guess, by the way,
>> is that it's not a bug that can be triggered: it's merely
>> useless code that is harmless and can safely be removed.
>
> I removed it as part of the following series of cleanup
> patches. These are intended merely to refactor the code
> and simplify it a bit, to make it easier to fix the CPU
> spinlock bug. Please feel free to undo anything that
> looks at all questionable.
Hi Paul,
Thanks for all the clean-up.
I have no idea if the following is as a result of your changes,
since the segfault failure has been hard to reproduce.
It is from the sort-compress test, and has happened so far
only twice during "make -j9 check" on a quad-core F14 system:
Core was generated by `sort --compress-program=./dzip -S 1k in'.
Program terminated with signal 11, Segmentation fault.
#0 queue_check_insert (queue=0x7fffdbdc5620, node=0x5) at sort.c:3322
3322 if (! node->queued)
(gdb) p node
$1 = (struct merge_node *) 0x5
(gdb) bt
#0 queue_check_insert (queue=0x7fffdbdc5620, node=0x5) at sort.c:3322
#1 0x00000000004055a9 in queue_check_insert_parent (
lines=<value optimized out>, dest=<value optimized out>,
nthreads=140173261458952, total_lines=10, parent=<value optimized out>,
lo_child=<value optimized out>, queue=0x7fffdbdc5620, tfp=0x1c2fb90,
temp_output=0x1c2f72c "./sortpns55x") at sort.c:3340
#2 merge_loop (lines=<value optimized out>, dest=<value optimized out>,
nthreads=140173261458952, total_lines=10, parent=<value optimized out>,
lo_child=<value optimized out>, queue=0x7fffdbdc5620, tfp=0x1c2fb90,
temp_output=0x1c2f72c "./sortpns55x") at sort.c:3374
#3 sortlines (lines=<value optimized out>, dest=<value optimized out>,
nthreads=140173261458952, total_lines=10, parent=<value optimized out>,
lo_child=<value optimized out>, queue=0x7fffdbdc5620, tfp=0x1c2fb90,
temp_output=0x1c2f72c "./sortpns55x") at sort.c:3515
#4 0x00000000004059cb in sortlines_thread (data=<value optimized out>)
at sort.c:3428
#5 0x0000003f49806d5b in start_thread () from /lib64/libpthread-2.12.90.so
#6 0x0000003f48ce4aad in clone () from /lib64/libc-2.12.90.so
However, there is another failure that makes me suspicious:
(also based on sort-compress):
seq -w 200000 > exp && tac exp > in
PATH=.:$PATH ./sort --compress-program=dzip -S 1k in > out
That gets stuck in waitpid (from sort.c's reap), waiting for a
dzip invocation that appears will never terminate. This is also
on that same 4-core system, and is relatively easy to reproduce,
so it should be easy to identify the offending change, but I'm
out of time, now.
The hang is also reproducible with just 2000 input lines,
but then it doesn't arise as consistently.
I'll note in passing that the spinlock CPU utilization problem
is particularly noticeable when using --compress-program= because
there is a lot more waiting.
This bug report was last modified 6 years and 202 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.