GNU bug report logs - #18991
[PATCH] tests: fix encoding with `tr' to support multibyte in test

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sat, 8 Nov 2014 08:09:02 UTC

Severity: normal

Tags: patch

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Jim Meyering <jim <at> meyering.net>
Cc: tracker <at> debbugs.gnu.org
Subject: bug#18991: closed ([PATCH] tests: fix encoding with `tr' to
 support multibyte in test)
Date: Sun, 09 Nov 2014 03:02:01 +0000
[Message part 1 (text/plain, inline)]
Your message dated Sat, 8 Nov 2014 19:00:55 -0800
with message-id <CA+8g5KHhuebkh4mDxmb-X2-wCMrbMPCfKPEY08pbk64DM45bCQ <at> mail.gmail.com>
and subject line Re: bug#18991: [PATCH] tests: fix encoding with `tr' to support multibyte in test
has caused the debbugs.gnu.org bug report #18991,
regarding [PATCH] tests: fix encoding with `tr' to support multibyte in test
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs <at> gnu.org.)


-- 
18991: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18991
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: bug-grep <at> gnu.org
Subject: [PATCH] tests: fix encoding with `tr' to support multibyte in test
Date: Sat, 08 Nov 2014 17:07:40 +0900
[Message part 3 (text/plain, inline)]
It seems that `tr' in GNU coreutils does not recoginize multibyte
character, but other imprementation, e.g. HP-UX, Solaris, recoginizes it.

As a result, [ echo AB | LC_ALL=ja_JP.eucJP tr AB '\244\263' ] is
transformed as [ echo AB | LC_ALL=ja_JP.eucJP tr A '\244\263' ], so that
'\244\263' is recognized as a single multibyte character.  We do not
expect that.
[0001-grep-fix-encoding-with-tr-to-support-multibyte-in-te.patch (text/plain, attachment)]
[Message part 5 (message/rfc822, inline)]
From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 18991-done <at> debbugs.gnu.org
Subject: Re: bug#18991: [PATCH] tests: fix encoding with `tr' to support
 multibyte in test
Date: Sat, 8 Nov 2014 19:00:55 -0800
[Message part 6 (text/plain, inline)]
On Sat, Nov 8, 2014 at 12:07 AM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> It seems that `tr' in GNU coreutils does not recoginize multibyte
> character, but other imprementation, e.g. HP-UX, Solaris, recoginizes it.
>
> As a result, [ echo AB | LC_ALL=ja_JP.eucJP tr AB '\244\263' ] is
> transformed as [ echo AB | LC_ALL=ja_JP.eucJP tr A '\244\263' ], so that
> '\244\263' is recognized as a single multibyte character.  We do not
> expect that.

Thank you for the report and patch.
However, it is not maintainable to modify every use of "tr" in
the tests.  Instead, I've addressed this by making all of the
tests use tr through a wrapper that always sets LC_ALL=C:
[0001-tests-avoid-a-multibyte-tr-portability-problem.patch (application/octet-stream, attachment)]

This bug report was last modified 10 years and 248 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.