Package: coreutils;
Reported by: Mina Naguib <mina.naguib <at> bloomdigital.com>
Date: Wed, 8 Jun 2011 20:41:02 UTC
Severity: normal
Tags: moreinfo
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Mina Naguib <mina.naguib <at> bloomdigital.com> To: 8824 <at> debbugs.gnu.org Subject: bug#8824: sort occasionally hangs - appears to be in a merge-sort loop Date: Wed, 8 Jun 2011 16:16:17 -0400
Hi I've observed a few instances when `sort` simply "hangs" and never returns the sorted data. Currently using coreutils 8.12 on gentoo linux 2.6.25, `locale` outputs: LANG= LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= An example ongoing right now on one of my servers. Sort invoked as: /usr/bin/sort -t '|' -k1,1 -k6,6n It was fed via STDIN pipe-delimited line-based records, and STDIN was closed. The parent process is waiting on sort to start receiving the output via STDOUT which never happens. Input was roughly 350 million records, each record looking something like this: 000WQg16MdC2Puk|direct_count|2049|63581|15090|1306695571 I dug into the sort process with strace and I believe it's doing one of the final merge sorts: read(12, "986|61518|14528|1301616354\nnHM5Og"..., 4096) = 4096 read(12, "|1967|61925|14584|1301634780\nnHMk"..., 4096) = 4096 read(12, "533\nnHNQDohSrKT3Ffn|direct_count|"..., 4096) = 4096 write(4, "4488|1302016964\nnGq7jG4hfFS9flL|d"..., 4096) = 4096 write(4, "nGq9POv11Jh7OiQ|direct_count|1965"..., 4096) = 4096 write(4, "05|1299763468\nnGqCgAT3jwkrzhc|dir"..., 4096) = 4096 write(4, "460\nnGqFItcr2F1Pz8n|tag|1995|6275"..., 4096) = 4096 write(4, "JuhUfMkOPC7J|direct_count|1914|57"..., 4096) = 4096 write(4, "|1914|57180|13801|1302640204\nnGqO"..., 4096) = 4096 After watching it do this diddy for 5.5 hours I dug in further. Here's its open temp files: $ date; lsof -n -p 31370 | grep tmp | nl Wed Jun 8 15:37:41 EDT 2011 1 sort 31370 dmt 4u REG 8,1 3123896320 12705303 /tmp/sort0YtEtm 2 sort 31370 dmt 5r REG 8,1 1199907358 12705551 /tmp/sortWNOpxy 3 sort 31370 dmt 7r REG 8,1 1181994889 12705627 /tmp/sortIpm44r 4 sort 31370 dmt 8r REG 8,1 1199857626 12705636 /tmp/sort2WOPPo 5 sort 31370 dmt 9r REG 8,1 1196868088 12705640 /tmp/sortCUkW6a 6 sort 31370 dmt 10r REG 8,1 1199488185 12705644 /tmp/sortSDopQX 7 sort 31370 dmt 11r REG 8,1 1188267263 12705646 /tmp/sortUdXqke 8 sort 31370 dmt 12r REG 8,1 1176087055 12705665 /tmp/sortMiavw5 9 sort 31370 dmt 13r REG 8,1 1184572313 12705666 /tmp/sortUf5q03 10 sort 31370 dmt 14r REG 8,1 1187433621 12705672 /tmp/sorti3mKO9 11 sort 31370 dmt 15r REG 8,1 1184537051 12705680 /tmp/sort0iYrQr 12 sort 31370 dmt 16r REG 8,1 1186874578 12705696 /tmp/sortUwVkpQ 13 sort 31370 dmt 17r REG 8,1 1173767980 12705703 /tmp/sortov5uyz 14 sort 31370 dmt 18r REG 8,1 1172616590 12705711 /tmp/sort2ibXqt 15 sort 31370 dmt 19r REG 8,1 1184319006 12705714 /tmp/sortA1DmGE 16 sort 31370 dmt 20r REG 8,1 1188372691 12705728 /tmp/sortibGbvd 17 sort 31370 dmt 21r REG 8,1 1185407259 12705713 /tmp/sort4Ou1u6 again: $ date; lsof -n -p 31370 | grep tmp | nl Wed Jun 8 15:37:58 EDT 2011 1 sort 31370 dmt 4u REG 8,1 3432607744 12705303 /tmp/sort0YtEtm 2 sort 31370 dmt 5r REG 8,1 1199907358 12705551 /tmp/sortWNOpxy 3 sort 31370 dmt 7r REG 8,1 1181994889 12705627 /tmp/sortIpm44r 4 sort 31370 dmt 8r REG 8,1 1199857626 12705636 /tmp/sort2WOPPo 5 sort 31370 dmt 9r REG 8,1 1196868088 12705640 /tmp/sortCUkW6a 6 sort 31370 dmt 10r REG 8,1 1199488185 12705644 /tmp/sortSDopQX 7 sort 31370 dmt 11r REG 8,1 1188267263 12705646 /tmp/sortUdXqke 8 sort 31370 dmt 12r REG 8,1 1176087055 12705665 /tmp/sortMiavw5 9 sort 31370 dmt 13r REG 8,1 1184572313 12705666 /tmp/sortUf5q03 10 sort 31370 dmt 14r REG 8,1 1187433621 12705672 /tmp/sorti3mKO9 11 sort 31370 dmt 15r REG 8,1 1184537051 12705680 /tmp/sort0iYrQr 12 sort 31370 dmt 16r REG 8,1 1186874578 12705696 /tmp/sortUwVkpQ 13 sort 31370 dmt 17r REG 8,1 1173767980 12705703 /tmp/sortov5uyz 14 sort 31370 dmt 18r REG 8,1 1172616590 12705711 /tmp/sort2ibXqt 15 sort 31370 dmt 19r REG 8,1 1184319006 12705714 /tmp/sortA1DmGE 16 sort 31370 dmt 20r REG 8,1 1188372691 12705728 /tmp/sortibGbvd 17 sort 31370 dmt 21r REG 8,1 1185407259 12705713 /tmp/sort4Ou1u6 again several times, you see as the main writable file approaches the total size sum of the other files combined: Wed Jun 8 15:55:20 EDT 2011 1 sort 31370 dmt 4u REG 8,1 18965975040 12705303 /tmp/sort0YtEtm 2 sort 31370 dmt 5r REG 8,1 1199907358 12705551 /tmp/sortWNOpxy 3 sort 31370 dmt 7r REG 8,1 1181994889 12705627 /tmp/sortIpm44r 4 sort 31370 dmt 8r REG 8,1 1199857626 12705636 /tmp/sort2WOPPo 5 sort 31370 dmt 9r REG 8,1 1196868088 12705640 /tmp/sortCUkW6a 6 sort 31370 dmt 10r REG 8,1 1199488185 12705644 /tmp/sortSDopQX 7 sort 31370 dmt 11r REG 8,1 1188267263 12705646 /tmp/sortUdXqke 8 sort 31370 dmt 12r REG 8,1 1176087055 12705665 /tmp/sortMiavw5 9 sort 31370 dmt 13r REG 8,1 1184572313 12705666 /tmp/sortUf5q03 10 sort 31370 dmt 14r REG 8,1 1187433621 12705672 /tmp/sorti3mKO9 11 sort 31370 dmt 15r REG 8,1 1184537051 12705680 /tmp/sort0iYrQr 12 sort 31370 dmt 16r REG 8,1 1186874578 12705696 /tmp/sortUwVkpQ 13 sort 31370 dmt 17r REG 8,1 1173767980 12705703 /tmp/sortov5uyz 14 sort 31370 dmt 18r REG 8,1 1172616590 12705711 /tmp/sort2ibXqt 15 sort 31370 dmt 19r REG 8,1 1184319006 12705714 /tmp/sortA1DmGE 16 sort 31370 dmt 20r REG 8,1 1188372691 12705728 /tmp/sortibGbvd 17 sort 31370 dmt 21r REG 8,1 1185407259 12705713 /tmp/sort4Ou1u6 then what appears to be finalization and cleanup: Wed Jun 8 15:55:28 EDT 2011 1 sort 31370 dmt 4u REG 8,1 18990366720 12705303 /tmp/sort0YtEtm 2 sort 31370 dmt 5r REG 8,1 1199907358 12705551 /tmp/sortWNOpxy 3 sort 31370 dmt 7r REG 8,1 1181994889 12705627 /tmp/sortIpm44r 4 sort 31370 dmt 8r REG 8,1 1199857626 12705636 /tmp/sort2WOPPo 5 sort 31370 dmt 9r REG 8,1 1196868088 12705640 /tmp/sortCUkW6a 6 sort 31370 dmt 12r REG 8,1 1176087055 12705665 /tmp/sortMiavw5 7 sort 31370 dmt 13r REG 8,1 1184572313 12705666 /tmp/sortUf5q03 8 sort 31370 dmt 14r REG 8,1 1187433621 12705672 /tmp/sorti3mKO9 9 sort 31370 dmt 15r REG 8,1 1184537051 12705680 /tmp/sort0iYrQr 10 sort 31370 dmt 16r REG 8,1 1186874578 12705696 /tmp/sortUwVkpQ 11 sort 31370 dmt 17r REG 8,1 1173767980 12705703 /tmp/sortov5uyz 12 sort 31370 dmt 18r REG 8,1 1172616590 12705711 /tmp/sort2ibXqt again, almost done: Wed Jun 8 15:55:34 EDT 2011 1 sort 31370 dmt 4u REG 8,1 18990370816 12705303 /tmp/sort0YtEtm 2 sort 31370 dmt 16r REG 8,1 1186874578 12705696 /tmp/sortUwVkpQ again, 1 main file left: Wed Jun 8 15:55:35 EDT 2011 1 sort 31370 dmt 4u REG 8,1 18990370816 12705303 /tmp/sort0YtEtm But alas, 16 more readable files pop up and 1 writable starts adding up again: Wed Jun 8 15:55:36 EDT 2011 1 sort 31370 dmt 4u REG 8,1 0 12705551 /tmp/sortevbKK1 2 sort 31370 dmt 5r REG 8,1 1178392494 12705739 /tmp/sortIIvZhW 3 sort 31370 dmt 7r REG 8,1 1180102559 12706259 /tmp/sortwZr2sU 4 sort 31370 dmt 8r REG 8,1 1181439270 12706530 /tmp/sort8B2hbx 5 sort 31370 dmt 9r REG 8,1 1183275901 12706731 /tmp/sorteGJa35 6 sort 31370 dmt 10r REG 8,1 1178538478 12706793 /tmp/sortoGXmOo 7 sort 31370 dmt 11r REG 8,1 1191511729 12706795 /tmp/sortG0RNr9 8 sort 31370 dmt 12r REG 8,1 1199965207 12706799 /tmp/sort6Q8tAy 9 sort 31370 dmt 13r REG 8,1 1192944995 12706817 /tmp/sortCGXzT2 10 sort 31370 dmt 14r REG 8,1 1191485559 12706823 /tmp/sortMwsygv 11 sort 31370 dmt 15r REG 8,1 1200144993 12706830 /tmp/sortyA2RWw 12 sort 31370 dmt 16r REG 8,1 1198251727 12706866 /tmp/sortyZlqHC 13 sort 31370 dmt 17r REG 8,1 1184876212 12706871 /tmp/sortCCjKOH 14 sort 31370 dmt 18r REG 8,1 1174916186 12706880 /tmp/sortKrUJSJ 15 sort 31370 dmt 19r REG 8,1 1184800950 12706824 /tmp/sort2Vjy7s 16 sort 31370 dmt 20r REG 8,1 1187817055 12706891 /tmp/sort89xsBm 17 sort 31370 dmt 21r REG 8,1 1185686697 12706944 /tmp/sortwjlE5L Wed Jun 8 15:55:40 EDT 2011 1 sort 31370 dmt 4u REG 8,1 62631936 12705551 /tmp/sortevbKK1 2 sort 31370 dmt 5r REG 8,1 1178392494 12705739 /tmp/sortIIvZhW 3 sort 31370 dmt 7r REG 8,1 1180102559 12706259 /tmp/sortwZr2sU 4 sort 31370 dmt 8r REG 8,1 1181439270 12706530 /tmp/sort8B2hbx 5 sort 31370 dmt 9r REG 8,1 1183275901 12706731 /tmp/sorteGJa35 6 sort 31370 dmt 10r REG 8,1 1178538478 12706793 /tmp/sortoGXmOo 7 sort 31370 dmt 11r REG 8,1 1191511729 12706795 /tmp/sortG0RNr9 8 sort 31370 dmt 12r REG 8,1 1199965207 12706799 /tmp/sort6Q8tAy 9 sort 31370 dmt 13r REG 8,1 1192944995 12706817 /tmp/sortCGXzT2 10 sort 31370 dmt 14r REG 8,1 1191485559 12706823 /tmp/sortMwsygv 11 sort 31370 dmt 15r REG 8,1 1200144993 12706830 /tmp/sortyA2RWw 12 sort 31370 dmt 16r REG 8,1 1198251727 12706866 /tmp/sortyZlqHC 13 sort 31370 dmt 17r REG 8,1 1184876212 12706871 /tmp/sortCCjKOH 14 sort 31370 dmt 18r REG 8,1 1174916186 12706880 /tmp/sortKrUJSJ 15 sort 31370 dmt 19r REG 8,1 1184800950 12706824 /tmp/sort2Vjy7s 16 sort 31370 dmt 20r REG 8,1 1187817055 12706891 /tmp/sort89xsBm 17 sort 31370 dmt 21r REG 8,1 1185686697 12706944 /tmp/sortwjlE5L I may be mistaken, but I don't think this is normal behaviour - once the 16-way merge sort is done, I see no reason to observe this whole process happening over and over. Ironically, I had upgraded from coreutils 8.5 to 8.12 because other tools on the same machine invokes sort with "--compress-program=gzip" and had observed similar-but-different hanging on that tool's invocation. I dug in and found this commit ( http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=1b31ce6982a9151d9dfe2ea3595ad7595cb9ca86 ) which encouraged me to try the upgrade to fix the hang-while-compressing issue, but I'm now seeing this hanging even in the non-compressed invocation. In my case, I think I'll downgrade coreutils back down to 8.5 as well as disable all tools from invoking it with "--compress-program" to get predictable non-hanging behaviour across the board. If I can be of further help please let me know. Thank you.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.