GNU bug report logs -
#42762
GREP does not support Unicode
Previous Next
To reply to this bug, email your comments to 42762 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-grep <at> gnu.org
:
bug#42762
; Package
grep
.
(Sat, 08 Aug 2020 14:51:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
<carlo <at> kwpg.info>
:
New bug report received and forwarded. Copy sent to
bug-grep <at> gnu.org
.
(Sat, 08 Aug 2020 14:51:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Its 2020.
GREP really should support Unicode. (UTF-16, UTF-8, with and without signature)
Format recognition wouldn't have to be automatic; command line switches would be sufficient.
I am using version Git for Windows v2.25.0
Kind regards
Information forwarded
to
bug-grep <at> gnu.org
:
bug#42762
; Package
grep
.
(Sat, 08 Aug 2020 15:17:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 42762 <at> debbugs.gnu.org (full text, mbox):
Hi Carlo!
On Sat, 8 Aug 2020 15:13:40 +0200
<carlo <at> kwpg.info> wrote:
> Its 2020.
>
> GREP really should support Unicode. (UTF-16, UTF-8, with and without
> signature) Format recognition wouldn't have to be automatic; command line
> switches would be sufficient.
>
From what I recall gnu grep has supported UTF-8 and unicode for years. E.g:
«
shlomif[homepage]:$trunk$ rg -n 'שלום' src/humour/fortunes/shlomif.xml
8120:the Hebrew Wikipedia. סעי לשלום - המפתחות בפנים!
8690:<saying who="rindolf">archmint: שלום!</saying>
shlomif[homepage]:$trunk$ grep -n 'שלום' src/humour/fortunes/shlomif.xml
8120:the Hebrew Wikipedia. סעי לשלום - המפתחות בפנים!
8690:<saying who="rindolf">archmint: שלום!</saying>
shlomif[homepage]:$trunk$ grep --version
grep (GNU grep) 3.4
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and others; see
<https://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
shlomif[homepage]:$trunk$ uname -a
Linux telaviv1.shlomifish.org 5.7.12-desktop-1.mga8 #1 SMP Sat Aug 1 21:39:47
UTC 2020 x86_64 x86_64 x86_64 GNU/Linux shlomif[homepage]:$trunk$
»
note that "שלום" is the hebrew spelling of https://en.wikipedia.org/wiki/Shalom
.
Regards,
Shlomi Fish
> I am using version Git for Windows v2.25.0
>
> Kind regards
>
>
>
>
>
--
Shlomi Fish https://www.shlomifish.org/
https://www.shlomifish.org/humour/bits/New-versions-of-the-GPL/
Wikipedia has a page about everything including the
https://en.wikipedia.org/wiki/Kitchen_sink .
— https://www.shlomifish.org/humour.html
Please reply to list if it's a mailing list post - https://shlom.in/reply .
Information forwarded
to
bug-grep <at> gnu.org
:
bug#42762
; Package
grep
.
(Sat, 08 Aug 2020 16:59:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 42762 <at> debbugs.gnu.org (full text, mbox):
Hi Shlomi,
mine does not. Maybe this version is too old. I got it with a GIT installation.
So I will look for an updated one.
Kind regards
-----Original Message-----
From: Shlomi Fish <shlomif <at> shlomifish.org>
Sent: Saturday, August 8, 2020 17:16
To: carlo <at> kwpg.info
Cc: 42762 <at> debbugs.gnu.org
Subject: Re: bug#42762: GREP does not support Unicode
Hi Carlo!
On Sat, 8 Aug 2020 15:13:40 +0200
<carlo <at> kwpg.info> wrote:
> Its 2020.
>
> GREP really should support Unicode. (UTF-16, UTF-8, with and without
> signature) Format recognition wouldn't have to be automatic; command
> line switches would be sufficient.
>
From what I recall gnu grep has supported UTF-8 and unicode for years. E.g:
«
shlomif[homepage]:$trunk$ rg -n 'שלום' src/humour/fortunes/shlomif.xml 8120:the Hebrew Wikipedia. סעי לשלום - המפתחות בפנים!
8690:<saying who="rindolf">archmint: שלום!</saying> shlomif[homepage]:$trunk$ grep -n 'שלום' src/humour/fortunes/shlomif.xml 8120:the Hebrew Wikipedia. סעי לשלום - המפתחות בפנים!
8690:<saying who="rindolf">archmint: שלום!</saying> shlomif[homepage]:$trunk$ grep --version grep (GNU grep) 3.4 Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Mike Haertel and others; see
<https://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
shlomif[homepage]:$trunk$ uname -a
Linux telaviv1.shlomifish.org 5.7.12-desktop-1.mga8 #1 SMP Sat Aug 1 21:39:47 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux shlomif[homepage]:$trunk$ »
note that "שלום" is the hebrew spelling of https://en.wikipedia.org/wiki/Shalom
.
Regards,
Shlomi Fish
> I am using version Git for Windows v2.25.0
>
> Kind regards
>
>
>
>
>
--
Shlomi Fish https://www.shlomifish.org/
https://www.shlomifish.org/humour/bits/New-versions-of-the-GPL/
Wikipedia has a page about everything including the https://en.wikipedia.org/wiki/Kitchen_sink .
— https://www.shlomifish.org/humour.html
Please reply to list if it's a mailing list post - https://shlom.in/reply .
Information forwarded
to
bug-grep <at> gnu.org
:
bug#42762
; Package
grep
.
(Sat, 08 Aug 2020 18:55:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 42762 <at> debbugs.gnu.org (full text, mbox):
Hi Carlo,
On Sat, 8 Aug 2020 18:17:46 +0200
<carlo <at> kwpg.info> wrote:
> Hi Shlomi,
>
> mine does not. Maybe this version is too old. I got it with a GIT
> installation.
>
> So I will look for an updated one.
>
I doubt it is that old, but you may have to rebuild it with extra build-time
options or tweak the https://en.wikipedia.org/wiki/Environment_variable -s.
Also see https://beyondgrep.com/ .
> Kind regards
>
> -----Original Message-----
> From: Shlomi Fish <shlomif <at> shlomifish.org>
> Sent: Saturday, August 8, 2020 17:16
> To: carlo <at> kwpg.info
> Cc: 42762 <at> debbugs.gnu.org
> Subject: Re: bug#42762: GREP does not support Unicode
>
> Hi Carlo!
>
> On Sat, 8 Aug 2020 15:13:40 +0200
> <carlo <at> kwpg.info> wrote:
>
> > Its 2020.
> >
> > GREP really should support Unicode. (UTF-16, UTF-8, with and without
> > signature) Format recognition wouldn't have to be automatic; command
> > line switches would be sufficient.
> >
>
> From what I recall gnu grep has supported UTF-8 and unicode for years. E.g:
>
> «
> shlomif[homepage]:$trunk$ rg -n 'שלום' src/humour/fortunes/shlomif.xml
> 8120:the Hebrew Wikipedia. סעי לשלום - המפתחות בפנים! 8690:<saying
> who="rindolf">archmint: שלום!</saying> shlomif[homepage]:$trunk$ grep -n
> 'שלום' src/humour/fortunes/shlomif.xml 8120:the Hebrew Wikipedia. סעי לשלום -
> המפתחות בפנים! 8690:<saying who="rindolf">archmint: שלום!</saying>
> shlomif[homepage]:$trunk$ grep --version grep (GNU grep) 3.4 Copyright (C)
> 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or
> later <https://gnu.org/licenses/gpl.html>. This is free software: you are
> free to change and redistribute it. There is NO WARRANTY, to the extent
> permitted by law.
>
> Written by Mike Haertel and others; see
> <https://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
> shlomif[homepage]:$trunk$ uname -a
> Linux telaviv1.shlomifish.org 5.7.12-desktop-1.mga8 #1 SMP Sat Aug 1 21:39:47
> UTC 2020 x86_64 x86_64 x86_64 GNU/Linux shlomif[homepage]:$trunk$ »
>
> note that "שלום" is the hebrew spelling of
> https://en.wikipedia.org/wiki/Shalom .
>
> Regards,
>
> Shlomi Fish
>
> > I am using version Git for Windows v2.25.0
> >
> > Kind regards
> >
> >
> >
> >
> >
>
>
>
--
Shlomi Fish https://www.shlomifish.org/
Freecell Solver - https://fc-solve.shlomifish.org/
God signs people into the book of life using a pen that Chuck Norris gave him.
— https://www.shlomifish.org/humour/bits/facts/Chuck-Norris/
Please reply to list if it's a mailing list post - https://shlom.in/reply .
Severity set to 'wishlist' from 'normal'
Request was from
Paul Eggert <eggert <at> cs.ucla.edu>
to
control <at> debbugs.gnu.org
.
(Mon, 21 Sep 2020 19:48:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-grep <at> gnu.org
:
bug#42762
; Package
grep
.
(Sun, 31 Jan 2021 16:11:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 42762 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Supporting Unicode means it should also support UTF-16 and UTF-32 files,
which grep does not.
Look at the insane number of hoops this person had to do to workaround this
major issue (and then it doesn't even print out the matching lines and just
says "Binary file ... matches"):
https://stackoverflow.com/a/3781221/2016290
Are there any plans for adding UTF-16 and UTF-32 file support to grep?
Thank you,
Paulie
P.S. The footer on https://debbugs.gnu.org/ pages has a date of 2003, which
of course is also very outdated. I hope I'm doing the right thing by
sending this email, since there's no way to comment directly in
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=42762
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-grep <at> gnu.org
:
bug#42762
; Package
grep
.
(Tue, 02 Feb 2021 00:14:01 GMT)
Full text and
rfc822 format available.
Message #22 received at 42762 <at> debbugs.gnu.org (full text, mbox):
On 1/31/21 8:10 AM, Paulie Pena IV wrote:
> Are there any plans for adding UTF-16 and UTF-32 file support to grep?
No.
I doubt whether it'd be worth the hassle, just as I doubt whether it'd
be worth adding support for lots of similar but more-useful features
(such as grepping through compressed files). However, I'm willing to be
proved wrong by someone adding support in such a way that doesn't hurt
performance for the more-common case.
This bug report was last modified 4 years and 134 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.