GNU bug report logs - #41004
Documentation:enhancement - search for hexvalue

Previous Next

Package: grep;

Reported by: Radisson97 <at> web.de

Date: Fri, 1 May 2020 17:07:01 UTC

Severity: wishlist

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #11 received at 41004 <at> debbugs.gnu.org (full text, mbox):

From: Stephane Chazelas <stephane <at> chazelas.org>
To: Radisson97 <at> web.de
Cc: 41004 <at> debbugs.gnu.org
Subject: Re: bug#41004: Documentation:enhancement - search for hexvalue
Date: Sun, 10 May 2020 17:46:44 +0100
2020-05-01 19:05:28 +0200, Radisson97 <at> web.de:
[...]
> problem: grep for a character where only the hexcode in known.
> 
> solution:        use $'\xNN'
>                      then shell expands this to the required code
> 
> example:       printf "A\nB\nC\n" | grep $'\x41'
[...]

The $'\x41' ksh93 quoting operator expands to *byte* values.

To get a character based on the Unicode codepoint value, you'd
need the $'\u41' zsh operator (or $'\U10000' for code points
above 0xffff).

But in any case, that is done by the shell, that has nothing to
do with grep and the syntax of those shell operators varies
between shells.

In the fish shell you'd use:

grep \u41

or

grep \x41

instead.

Also, since it's done by the shell, things like:

grep $'\u2e'

where U+002E is "FULL STOP", would not only match on "."
characters but on any character. All grep sees is a "."
character. That would be different from grep -P '\x2e' which
matches "." (U+002E) only.

Note that:

grep -P '\xE9'

matches on the byte 0xE9 in singlebyte locales (regardless of 
what character that byte represents in the locale's charset) and
on character U+00E9 in UTF-8 locales (so the 0xc3 0xa9 sequence
of bytes, not byte 0xe9).

-- 
Stephane




This bug report was last modified 4 years and 245 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.