GNU bug report logs - #10880
multibyte: tr: TR operates on bytes, not characters

Previous Next

Package: coreutils;

Reported by: "Marton Kadar" <marton.kadar <at> mail.com>

Date: Fri, 24 Feb 2012 17:31:02 UTC

Severity: wishlist

Merged with 9365, 9569, 12192, 13362

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Marton Kadar <marton.kadar <at> mail.com>
Cc: 10880 <at> debbugs.gnu.org
Subject: bug#10880: instead of characters, tr works on bytes
Date: Sat, 25 Feb 2012 15:20:44 -0800
On 02/25/2012 02:07 PM, Marton Kadar wrote:

> the execution path (sigle byte specific or generalized
> multibyte capable) can be determined at program startup, so in the
> worst case there can be a tr and a tr-slow-but-multibyte version,
> former calling the latter when so directed by the locale settings.

Something like that should work, yes.  Unfortunately so far nobody has
volunteered to do it.  The task would not be trivial.  We don't want
to maintain two copies of the code, one for single-byte and one for
multibyte, as that'd be a maintenance problem.  Instead, we'd like to
have just one copy of the code, which is easy to read and which
compiles into either unibyte or multibyte versions.

> avoiding a solely performance related penalty in text handling
> command line utilities can never be a justifiable reason for
> incorrect functionality.

As far as I know there is no requirement in POSIX that applications
must support multibyte locales, and there's no documentation claiming
that the utilities in question support multibyte location, so this is
not a bug; it's a feature request.

My opinion about this may be colored by an experience I had yesterday
with the latest version of GNU sed.  Single-byte it worked fine;
multibyte it was so slow that I gave up.  We don't want this to
happen with the core utilities.





This bug report was last modified 6 years and 304 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.