GNU bug report logs - #78276
grep on file with 0xF3 byte in utf-8 locale

Previous Next

Package: grep;

Reported by: Arkadiusz Miśkiewicz <arekm <at> maven.pl>

Date: Tue, 6 May 2025 07:39:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Arkadiusz Miśkiewicz <arekm <at> maven.pl>
To: bug-grep <at> gnu.org
Subject: grep on file with 0xF3 byte in utf-8 locale
Date: Tue, 6 May 2025 09:37:36 +0200
Hi.

I was trying to grep logs for some mail log entries and spammer used 
0xF3 byte to try to hide / trick things. For grep it looks like this:

$ printf 'a\xF3bcdefgh' > x2

$ LC_ALL=C.UTF-8 grep 'a.*h' x2
$

$ LC_ALL=C grep 'a.*h' x2
abcdefgh

$ LC_ALL=C.UTF-8 grep -a 'a.*h' x2
$

[arekm <at> ixion ~]$ LC_ALL=C grep -a 'a.*h' x2
abcdefgh


Is that expected behavior, no binary file warning and no matching with 
utf-8 locale, even with -a? AFAIK that's not correct utf-8 sequence.


$ grep --version x2
grep (GNU grep) 3.12
Copyright (C) 2025 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others; see
<https://git.savannah.gnu.org/cgit/grep.git/tree/AUTHORS>.

grep -P uses PCRE2 10.45 2025-02-05
-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )





This bug report was last modified 17 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.