GNU bug report logs - #70231
Performance issue on sort with zero-sized pseudo files

Previous Next

Package: coreutils;

Reported by: Takashi Kusumi <tkusumi <at> zlab.co.jp>

Date: Sat, 6 Apr 2024 06:39:02 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Takashi Kusumi <tkusumi <at> zlab.co.jp>
Subject: bug#70231: closed (Re: bug#70231: Performance issue on sort with
 zero-sized pseudo files)
Date: Sun, 07 Apr 2024 12:56:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#70231: Performance issue on sort with zero-sized pseudo files

which was filed against the coreutils package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 70231 <at> debbugs.gnu.org.

-- 
70231: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=70231
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Takashi Kusumi <tkusumi <at> zlab.co.jp>,
 70231-done <at> debbugs.gnu.org
Subject: Re: bug#70231: Performance issue on sort with zero-sized pseudo files
Date: Sun, 7 Apr 2024 13:55:40 +0100
On 06/04/2024 23:22, Paul Eggert wrote:
> On 2024-04-06 03:09, Pádraig Brady wrote:
>> I'll apply this.
> 
> Heh, I beat you to it by looking for similar errors elsewhere and
> applying the attached patches to fix the issues I found. None of them
> look like serious bugs.

Cool. I thought the sort(1) change worthy of a NEWS entry so pushed one.
Marking this as done.

>> BTW we should improve sort buffer handling in general
> 
> Oh yes.
> 
> PS. My current little task is to get i18n to work better with 'sort'.
> Among other things I want Unicode-style full case folding.

Excellent, that will help keep the related uniq(1) and join(1)
commands more aligned in their ordering.

cheers,
Pádraig

[Message part 3 (message/rfc822, inline)]
From: Takashi Kusumi <tkusumi <at> zlab.co.jp>
To: bug-coreutils <at> gnu.org
Subject: Performance issue on sort with zero-sized pseudo files
Date: Sat, 6 Apr 2024 11:52:17 +0900
[Message part 4 (text/plain, inline)]
Hi,

I have found a performance issue with the sort command when used on
pseudo files with zero size. For instance, sorting `/proc/kallsyms`, as
demonstrated below, takes significantly longer than executing with
`cat`, generating numerous temporary files. I confirmed this issue on
v8.32 as well as on commit 8f3989d in the master branch.

  $ time cat /proc/kallsyms | sort > /dev/null
  real    0m0.954s
  user    0m0.873s
  sys     0m0.096s

  $ time sort /proc/kallsyms > /dev/null
  real    0m8.555s
  user    0m3.367s
  sys     0m5.064s

  $ strace -e trace=openat sort /proc/kallsyms 2>&1 > /dev/null \
    | grep /tmp/sort | head -100
  ...
  openat(AT_FDCWD, "/tmp/sortM6Y6Y1", ...
  openat(AT_FDCWD, "/tmp/sortPrHKMG", ...

  $ strace -e trace=openat -c sort /proc/kallsyms > /dev/null
  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
  100.00    6.419777          19    333258         8 openat
  ------ ----------- ----------- --------- --------- ----------------
  100.00    6.419777          19    333258         8 total

It appears that the buffer size allocated for pseudo files with zero
size is insufficient, likely because it is based on their file size,
which is zero. As seen in the attached patch, I think using
`INPUT_FILE_SIZE_GUESS` to calculate the buffer size when the file size
is zero would resolve this issue.

Best regards,
Takashi Kusumi
[0001-sort-fix-performance-issue-on-zero-sized-pseudo-file.patch (text/plain, attachment)]

This bug report was last modified 1 year and 39 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.