GNU bug report logs - #49340
small sort takes hours for UTF-8 locale

Previous Next

Package: coreutils;

Reported by: Jon Klaas <blagothakus <at> gmail.com>

Date: Fri, 2 Jul 2021 20:52:01 UTC

Severity: normal

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 49340 <at> debbugs.gnu.org
Subject: bug#49340: small sort takes hours for UTF-8 locale
Date: Fri, 2 Jul 2021 17:25:51 -0700
On 7/2/21 4:19 PM, Pádraig Brady wrote:
> we might be able to improve things.
> For example, using strxfrm() + strcmp() to minimize processing.

I tried that long ago, and it was waaayyy slower than strcoll in the 
typical case. glibc strxfrm is not at all optimized.

Which is fine, since strxfrm is a dumb API: its only point is 
performance but its portable API is inherently low-performance for 
typical uses. I've never seen it useful.

In short, this is a glibc strcoll bug and should be fixed there.




This bug report was last modified 4 years and 65 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.