GNU bug report logs - #70231
Performance issue on sort with zero-sized pseudo files

Previous Next

Package: coreutils;

Reported by: Takashi Kusumi <tkusumi <at> zlab.co.jp>

Date: Sat, 6 Apr 2024 06:39:02 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

Full log


Message #8 received at 70231 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Takashi Kusumi <tkusumi <at> zlab.co.jp>, 70231 <at> debbugs.gnu.org
Subject: Re: bug#70231: Performance issue on sort with zero-sized pseudo files
Date: Sat, 6 Apr 2024 11:09:02 +0100
On 06/04/2024 03:52, Takashi Kusumi wrote:
> Hi,
> 
> I have found a performance issue with the sort command when used on
> pseudo files with zero size. For instance, sorting `/proc/kallsyms`, as
> demonstrated below, takes significantly longer than executing with
> `cat`, generating numerous temporary files. I confirmed this issue on
> v8.32 as well as on commit 8f3989d in the master branch.
> 
>     $ time cat /proc/kallsyms | sort > /dev/null
>     real    0m0.954s
>     user    0m0.873s
>     sys     0m0.096s
> 
>     $ time sort /proc/kallsyms > /dev/null
>     real    0m8.555s
>     user    0m3.367s
>     sys     0m5.064s
> 
>     $ strace -e trace=openat sort /proc/kallsyms 2>&1 > /dev/null \
>       | grep /tmp/sort | head -100
>     ...
>     openat(AT_FDCWD, "/tmp/sortM6Y6Y1", ...
>     openat(AT_FDCWD, "/tmp/sortPrHKMG", ...
> 
>     $ strace -e trace=openat -c sort /proc/kallsyms > /dev/null
>     % time     seconds  usecs/call     calls    errors syscall
>     ------ ----------- ----------- --------- --------- ----------------
>     100.00    6.419777          19    333258         8 openat
>     ------ ----------- ----------- --------- --------- ----------------
>     100.00    6.419777          19    333258         8 total
> 
> It appears that the buffer size allocated for pseudo files with zero
> size is insufficient, likely because it is based on their file size,
> which is zero. As seen in the attached patch, I think using
> `INPUT_FILE_SIZE_GUESS` to calculate the buffer size when the file size
> is zero would resolve this issue.

I'll apply this.

BTW we should improve sort buffer handling in general. From my TODO...

0. Have sort --debug output memory buffer sizes and space avail at $TMPDIR(s)
1. auto increase buffer when reading from pipe or zero sized files.
This will be more efficient and more importantly enable parallel operation.
See http://superuser.com/questions/938558/sort-parallel-isnt-parallelizing/
At least your more appropriate default buffer sizes in this case.
I.e. bigger mins and probably smaller maxs as half avail mem is too aggressive.
2. check() should not need full buffer size?
only merge buffer size or something small at least.
3. Look at minimizing the amount of mem used by default.
Hmm, sort auto adjusts down to avail mem in initbuf() (Test with ulimit -v)
4. Careful with too small buffers as that may initiate
an extra merge step (see section above).

If anyone wants to look at the above give me a heads up,
or I'll get to it sometime in the next release cycle.

thanks!
Pádraig.





This bug report was last modified 1 year and 39 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.