GNU bug report logs -
#41004
Documentation:enhancement - search for hexvalue
Previous Next
Reported by: Radisson97 <at> web.de
Date: Fri, 1 May 2020 17:07:01 UTC
Severity: wishlist
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
On Sun, May 10, 2020 at 10:00 AM Stephane Chazelas
<stephane <at> chazelas.org> wrote:
>
> 2020-05-01 19:05:28 +0200, Radisson97 <at> web.de:
> [...]
> > problem: grep for a character where only the hexcode in known.
> >
> > solution: use $'\xNN'
> > then shell expands this to the required code
> >
> > example: printf "A\nB\nC\n" | grep $'\x41'
> [...]
>
> The $'\x41' ksh93 quoting operator expands to *byte* values.
>
> To get a character based on the Unicode codepoint value, you'd
> need the $'\u41' zsh operator (or $'\U10000' for code points
> above 0xffff).
>
> But in any case, that is done by the shell, that has nothing to
> do with grep and the syntax of those shell operators varies
> between shells.
>
> In the fish shell you'd use:
>
> grep \u41
>
> or
>
> grep \x41
>
> instead.
>
> Also, since it's done by the shell, things like:
>
> grep $'\u2e'
>
> where U+002E is "FULL STOP", would not only match on "."
> characters but on any character. All grep sees is a "."
> character. That would be different from grep -P '\x2e' which
> matches "." (U+002E) only.
>
> Note that:
>
> grep -P '\xE9'
>
> matches on the byte 0xE9 in singlebyte locales (regardless of
> what character that byte represents in the locale's charset) and
> on character U+00E9 in UTF-8 locales (so the 0xc3 0xa9 sequence
> of bytes, not byte 0xe9).
Thank you for the thorough reply, Stephane!
Bearing that in mind, Radisson, please consider submitting a revised patch.
I suggest to recommend something like this:
$ printf '%s\n' A B C| LC_ALL=C grep -P '\x41'
A
so that the example is independent of both the current locale and the shell.
This bug report was last modified 4 years and 245 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.