GNU bug report logs - #18991
[PATCH] tests: fix encoding with `tr' to support multibyte in test

Previous Next

Package: grep;

Reported by: Norihiro Tanaka <noritnk <at> kcn.ne.jp>

Date: Sat, 8 Nov 2014 08:09:02 UTC

Severity: normal

Tags: patch

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Subject: bug#18991: closed (Re: bug#18991: [PATCH] tests: fix encoding
 with `tr' to support multibyte in test)
Date: Sun, 09 Nov 2014 03:02:02 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#18991: [PATCH] tests: fix encoding with `tr' to support multibyte in test

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 18991 <at> debbugs.gnu.org.

-- 
18991: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18991
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: 18991-done <at> debbugs.gnu.org
Subject: Re: bug#18991: [PATCH] tests: fix encoding with `tr' to support
 multibyte in test
Date: Sat, 8 Nov 2014 19:00:55 -0800
[Message part 3 (text/plain, inline)]
On Sat, Nov 8, 2014 at 12:07 AM, Norihiro Tanaka <noritnk <at> kcn.ne.jp> wrote:
> It seems that `tr' in GNU coreutils does not recoginize multibyte
> character, but other imprementation, e.g. HP-UX, Solaris, recoginizes it.
>
> As a result, [ echo AB | LC_ALL=ja_JP.eucJP tr AB '\244\263' ] is
> transformed as [ echo AB | LC_ALL=ja_JP.eucJP tr A '\244\263' ], so that
> '\244\263' is recognized as a single multibyte character.  We do not
> expect that.

Thank you for the report and patch.
However, it is not maintainable to modify every use of "tr" in
the tests.  Instead, I've addressed this by making all of the
tests use tr through a wrapper that always sets LC_ALL=C:
[0001-tests-avoid-a-multibyte-tr-portability-problem.patch (application/octet-stream, attachment)]
[Message part 5 (message/rfc822, inline)]
From: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
To: bug-grep <at> gnu.org
Subject: [PATCH] tests: fix encoding with `tr' to support multibyte in test
Date: Sat, 08 Nov 2014 17:07:40 +0900
[Message part 6 (text/plain, inline)]
It seems that `tr' in GNU coreutils does not recoginize multibyte
character, but other imprementation, e.g. HP-UX, Solaris, recoginizes it.

As a result, [ echo AB | LC_ALL=ja_JP.eucJP tr AB '\244\263' ] is
transformed as [ echo AB | LC_ALL=ja_JP.eucJP tr A '\244\263' ], so that
'\244\263' is recognized as a single multibyte character.  We do not
expect that.
[0001-grep-fix-encoding-with-tr-to-support-multibyte-in-te.patch (text/plain, attachment)]

This bug report was last modified 10 years and 248 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.