GNU bug report logs - #18817
\w is not synonym for [[:alnum:]] in UTF-8 locales

Previous Next

Package: grep;

Reported by: Jaroslav Skarvada <jskarvad <at> redhat.com>

Date: Fri, 24 Oct 2014 14:21:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: help-debbugs <at> gnu.org (GNU bug Tracking System)
To: Jaroslav Skarvada <jskarvad <at> redhat.com>
Subject: bug#18817: closed (Re: bug#18817: \w is not synonym for
 [[:alnum:]] in UTF-8 locales)
Date: Wed, 29 Oct 2014 03:56:03 +0000
[Message part 1 (text/plain, inline)]
Your bug report

#18817: \w is not synonym for [[:alnum:]] in UTF-8 locales

which was filed against the grep package, has been closed.

The explanation is attached below, along with your original report.
If you require more details, please reply to 18817 <at> debbugs.gnu.org.

-- 
18817: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18817
GNU Bug Tracking System
Contact help-debbugs <at> gnu.org with problems
[Message part 2 (message/rfc822, inline)]
From: Jim Meyering <jim <at> meyering.net>
To: Norihiro Tanaka <noritnk <at> kcn.ne.jp>
Cc: Eric Blake <eblake <at> redhat.com>, 18817-done <at> debbugs.gnu.org
Subject: Re: bug#18817: \w is not synonym for [[:alnum:]] in UTF-8 locales
Date: Tue, 28 Oct 2014 20:55:07 -0700
FYI, I noticed only after pushing that "make check" was
failing a test because that new script was not executable,
so I've just pushed a follow-up patch to fix that.

On Tue, Oct 28, 2014 at 6:07 PM, Jim Meyering <jim <at> meyering.net> wrote:
> I've adjusted the commit subject and ChangeLog content, and will push
> this today, then I'll make a pre-release snapshot.

[Message part 3 (message/rfc822, inline)]
From: Jaroslav Skarvada <jskarvad <at> redhat.com>
To: bug-grep <at> gnu.org
Subject: \w is not synonym for [[:alnum:]] in UTF-8 locales
Date: Fri, 24 Oct 2014 10:19:49 -0400 (EDT)
Hi,

in the man page there is the following sentence:

"The symbol \w is a synonym for [_[:alnum:]] and \W is a synonym for [^_[:alnum:]]"

Not counting that in man pages for some other languages (e.g. czech) there is written
that \w is a synonym for [[:alnum:]] and \W is a synonym for [^[:alnum:]], but
none of them seems to be synonym for \w | \W in UTF-8 locales:

$ export LANG=en_US.UTF-8

$ echo 'á' | grep '[[:alnum:]]'
á
$ echo 'á' | grep '[_[:alnum:]]'
á
$ echo 'á' | grep '\w'

$ echo 'á' | grep '[^[:alnum:]]'
$ echo 'á' | grep '[^_[:alnum:]]'
$ echo 'á' | grep '\W'
á

$ grep --version
grep (GNU grep) 2.20
...



This bug report was last modified 10 years and 212 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.