GNU bug report logs - #18817
\w is not synonym for [[:alnum:]] in UTF-8 locales

Previous Next

Package: grep;

Reported by: Jaroslav Skarvada <jskarvad <at> redhat.com>

Date: Fri, 24 Oct 2014 14:21:02 UTC

Severity: normal

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jaroslav Skarvada <jskarvad <at> redhat.com>
To: bug-grep <at> gnu.org
Subject: \w is not synonym for [[:alnum:]] in UTF-8 locales
Date: Fri, 24 Oct 2014 10:19:49 -0400 (EDT)
Hi,

in the man page there is the following sentence:

"The symbol \w is a synonym for [_[:alnum:]] and \W is a synonym for [^_[:alnum:]]"

Not counting that in man pages for some other languages (e.g. czech) there is written
that \w is a synonym for [[:alnum:]] and \W is a synonym for [^[:alnum:]], but
none of them seems to be synonym for \w | \W in UTF-8 locales:

$ export LANG=en_US.UTF-8

$ echo 'á' | grep '[[:alnum:]]'
á
$ echo 'á' | grep '[_[:alnum:]]'
á
$ echo 'á' | grep '\w'

$ echo 'á' | grep '[^[:alnum:]]'
$ echo 'á' | grep '[^_[:alnum:]]'
$ echo 'á' | grep '\W'
á

$ grep --version
grep (GNU grep) 2.20
...




This bug report was last modified 10 years and 212 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.