#7489 - [coreutils] over aggressive threads in sort

GNU bug report logs - #7489
[coreutils] over aggressive threads in sort

Reported by: DJ Lucas <dj <at> linuxfromscratch.org>

Date: Fri, 26 Nov 2010 19:40:02 UTC

Severity: normal

Tags: fixed

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Jim Meyering <jim <at> meyering.net> To: Paul Eggert <eggert <at> cs.ucla.edu> Cc: Chen Guo <chen.guo.0625 <at> gmail.com>, Pádraig Brady <P <at> draigBrady.com>, DJ Lucas <dj <at> linuxfromscratch.org>, 7489 <at> debbugs.gnu.org, coreutils <at> gnu.org Subject: bug#7489: [coreutils] over aggressive threads in sort Date: Sun, 05 Dec 2010 12:21:01 +0100

Paul Eggert wrote: > On 11/29/2010 02:46 PM, Paul Eggert wrote: >> My current guess, by the way, >> is that it's not a bug that can be triggered: it's merely >> useless code that is harmless and can safely be removed. > > I removed it as part of the following series of cleanup > patches. These are intended merely to refactor the code > and simplify it a bit, to make it easier to fix the CPU > spinlock bug. Please feel free to undo anything that > looks at all questionable. Hi Paul, Thanks for all the clean-up. I have no idea if the following is as a result of your changes, since the segfault failure has been hard to reproduce. It is from the sort-compress test, and has happened so far only twice during "make -j9 check" on a quad-core F14 system: Core was generated by `sort --compress-program=./dzip -S 1k in'. Program terminated with signal 11, Segmentation fault. #0 queue_check_insert (queue=0x7fffdbdc5620, node=0x5) at sort.c:3322 3322 if (! node->queued) (gdb) p node $1 = (struct merge_node *) 0x5 (gdb) bt #0 queue_check_insert (queue=0x7fffdbdc5620, node=0x5) at sort.c:3322 #1 0x00000000004055a9 in queue_check_insert_parent ( lines=<value optimized out>, dest=<value optimized out>, nthreads=140173261458952, total_lines=10, parent=<value optimized out>, lo_child=<value optimized out>, queue=0x7fffdbdc5620, tfp=0x1c2fb90, temp_output=0x1c2f72c "./sortpns55x") at sort.c:3340 #2 merge_loop (lines=<value optimized out>, dest=<value optimized out>, nthreads=140173261458952, total_lines=10, parent=<value optimized out>, lo_child=<value optimized out>, queue=0x7fffdbdc5620, tfp=0x1c2fb90, temp_output=0x1c2f72c "./sortpns55x") at sort.c:3374 #3 sortlines (lines=<value optimized out>, dest=<value optimized out>, nthreads=140173261458952, total_lines=10, parent=<value optimized out>, lo_child=<value optimized out>, queue=0x7fffdbdc5620, tfp=0x1c2fb90, temp_output=0x1c2f72c "./sortpns55x") at sort.c:3515 #4 0x00000000004059cb in sortlines_thread (data=<value optimized out>) at sort.c:3428 #5 0x0000003f49806d5b in start_thread () from /lib64/libpthread-2.12.90.so #6 0x0000003f48ce4aad in clone () from /lib64/libc-2.12.90.so However, there is another failure that makes me suspicious: (also based on sort-compress): seq -w 200000 > exp && tac exp > in PATH=.:$PATH ./sort --compress-program=dzip -S 1k in > out That gets stuck in waitpid (from sort.c's reap), waiting for a dzip invocation that appears will never terminate. This is also on that same 4-core system, and is relatively easy to reproduce, so it should be easy to identify the offending change, but I'm out of time, now. The hang is also reproducible with just 2000 input lines, but then it doesn't arise as consistently. I'll note in passing that the spinlock CPU utilization problem is particularly noticeable when using --compress-program= because there is a lot more waiting.

This bug report was last modified 6 years and 249 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #7489 [coreutils] over aggressive threads in sort

GNU bug report logs - #7489
[coreutils] over aggressive threads in sort