GNU bug report logs - #42762
GREP does not support Unicode

Previous Next

Package: grep;

Reported by: <carlo <at> kwpg.info>

Date: Sat, 8 Aug 2020 14:51:02 UTC

Severity: wishlist

To reply to this bug, email your comments to 42762 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#42762; Package grep. (Sat, 08 Aug 2020 14:51:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to <carlo <at> kwpg.info>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Sat, 08 Aug 2020 14:51:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: <carlo <at> kwpg.info>
To: <bug-grep <at> gnu.org>
Subject: GREP does not support Unicode
Date: Sat, 8 Aug 2020 15:13:40 +0200
Its 2020.

GREP really should support Unicode. (UTF-16, UTF-8, with and without signature)
Format recognition wouldn't have to be automatic; command line switches would be sufficient.

I am using version Git for Windows v2.25.0

Kind regards






Information forwarded to bug-grep <at> gnu.org:
bug#42762; Package grep. (Sat, 08 Aug 2020 15:17:01 GMT) Full text and rfc822 format available.

Message #8 received at 42762 <at> debbugs.gnu.org (full text, mbox):

From: Shlomi Fish <shlomif <at> shlomifish.org>
To: <carlo <at> kwpg.info>
Cc: 42762 <at> debbugs.gnu.org
Subject: Re: bug#42762: GREP does not support Unicode
Date: Sat, 8 Aug 2020 18:16:11 +0300
Hi Carlo!

On Sat, 8 Aug 2020 15:13:40 +0200
<carlo <at> kwpg.info> wrote:

> Its 2020.
> 
> GREP really should support Unicode. (UTF-16, UTF-8, with and without
> signature) Format recognition wouldn't have to be automatic; command line
> switches would be sufficient.
> 

From what I recall gnu grep has supported UTF-8 and unicode for years. E.g:

«
shlomif[homepage]:$trunk$ rg -n 'שלום' src/humour/fortunes/shlomif.xml
8120:the Hebrew Wikipedia. סעי לשלום - המפתחות בפנים!
8690:<saying who="rindolf">archmint: שלום!</saying>
shlomif[homepage]:$trunk$ grep -n 'שלום' src/humour/fortunes/shlomif.xml
8120:the Hebrew Wikipedia. סעי לשלום - המפתחות בפנים!
8690:<saying who="rindolf">archmint: שלום!</saying>
shlomif[homepage]:$trunk$ grep --version
grep (GNU grep) 3.4
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others; see
<https://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
shlomif[homepage]:$trunk$ uname -a
Linux telaviv1.shlomifish.org 5.7.12-desktop-1.mga8 #1 SMP Sat Aug 1 21:39:47
UTC 2020 x86_64 x86_64 x86_64 GNU/Linux shlomif[homepage]:$trunk$ 
»

note that "שלום" is the hebrew spelling of https://en.wikipedia.org/wiki/Shalom
.

Regards,

	Shlomi Fish

> I am using version Git for Windows v2.25.0
> 
> Kind regards
> 
> 
> 
> 
> 



-- 

Shlomi Fish       https://www.shlomifish.org/
https://www.shlomifish.org/humour/bits/New-versions-of-the-GPL/

Wikipedia has a page about everything including the
https://en.wikipedia.org/wiki/Kitchen_sink .

    — https://www.shlomifish.org/humour.html

Please reply to list if it's a mailing list post - https://shlom.in/reply .




Information forwarded to bug-grep <at> gnu.org:
bug#42762; Package grep. (Sat, 08 Aug 2020 16:59:02 GMT) Full text and rfc822 format available.

Message #11 received at 42762 <at> debbugs.gnu.org (full text, mbox):

From: <carlo <at> kwpg.info>
To: "'Shlomi Fish'" <shlomif <at> shlomifish.org>
Cc: 42762 <at> debbugs.gnu.org
Subject: RE: bug#42762: GREP does not support Unicode
Date: Sat, 8 Aug 2020 18:17:46 +0200
Hi Shlomi,

mine does not. Maybe this version is too old. I got it with a GIT installation.

So I will look for an updated one.

Kind regards

-----Original Message-----
From: Shlomi Fish <shlomif <at> shlomifish.org> 
Sent: Saturday, August 8, 2020 17:16
To: carlo <at> kwpg.info
Cc: 42762 <at> debbugs.gnu.org
Subject: Re: bug#42762: GREP does not support Unicode

Hi Carlo!

On Sat, 8 Aug 2020 15:13:40 +0200
<carlo <at> kwpg.info> wrote:

> Its 2020.
> 
> GREP really should support Unicode. (UTF-16, UTF-8, with and without
> signature) Format recognition wouldn't have to be automatic; command 
> line switches would be sufficient.
> 

From what I recall gnu grep has supported UTF-8 and unicode for years. E.g:

«
shlomif[homepage]:$trunk$ rg -n 'שלום' src/humour/fortunes/shlomif.xml 8120:the Hebrew Wikipedia. סעי לשלום - המפתחות בפנים!
8690:<saying who="rindolf">archmint: שלום!</saying> shlomif[homepage]:$trunk$ grep -n 'שלום' src/humour/fortunes/shlomif.xml 8120:the Hebrew Wikipedia. סעי לשלום - המפתחות בפנים!
8690:<saying who="rindolf">archmint: שלום!</saying> shlomif[homepage]:$trunk$ grep --version grep (GNU grep) 3.4 Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others; see
<https://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
shlomif[homepage]:$trunk$ uname -a
Linux telaviv1.shlomifish.org 5.7.12-desktop-1.mga8 #1 SMP Sat Aug 1 21:39:47 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux shlomif[homepage]:$trunk$ »

note that "שלום" is the hebrew spelling of https://en.wikipedia.org/wiki/Shalom
.

Regards,

	Shlomi Fish

> I am using version Git for Windows v2.25.0
> 
> Kind regards
> 
> 
> 
> 
> 



-- 

Shlomi Fish       https://www.shlomifish.org/
https://www.shlomifish.org/humour/bits/New-versions-of-the-GPL/

Wikipedia has a page about everything including the https://en.wikipedia.org/wiki/Kitchen_sink .

    — https://www.shlomifish.org/humour.html

Please reply to list if it's a mailing list post - https://shlom.in/reply .





Information forwarded to bug-grep <at> gnu.org:
bug#42762; Package grep. (Sat, 08 Aug 2020 18:55:02 GMT) Full text and rfc822 format available.

Message #14 received at 42762 <at> debbugs.gnu.org (full text, mbox):

From: Shlomi Fish <shlomif <at> shlomifish.org>
To: <carlo <at> kwpg.info>
Cc: 42762 <at> debbugs.gnu.org
Subject: Re: bug#42762: GREP does not support Unicode
Date: Sat, 8 Aug 2020 21:53:56 +0300
Hi Carlo,

On Sat, 8 Aug 2020 18:17:46 +0200
<carlo <at> kwpg.info> wrote:

> Hi Shlomi,
> 
> mine does not. Maybe this version is too old. I got it with a GIT
> installation.
> 
> So I will look for an updated one.
>

I doubt it is that old, but you may have to rebuild it with extra build-time
options or tweak the https://en.wikipedia.org/wiki/Environment_variable -s.

Also see https://beyondgrep.com/ .
 
> Kind regards
> 
> -----Original Message-----
> From: Shlomi Fish <shlomif <at> shlomifish.org> 
> Sent: Saturday, August 8, 2020 17:16
> To: carlo <at> kwpg.info
> Cc: 42762 <at> debbugs.gnu.org
> Subject: Re: bug#42762: GREP does not support Unicode
> 
> Hi Carlo!
> 
> On Sat, 8 Aug 2020 15:13:40 +0200
> <carlo <at> kwpg.info> wrote:
> 
> > Its 2020.
> > 
> > GREP really should support Unicode. (UTF-16, UTF-8, with and without
> > signature) Format recognition wouldn't have to be automatic; command 
> > line switches would be sufficient.
> >   
> 
> From what I recall gnu grep has supported UTF-8 and unicode for years. E.g:
> 
> «
> shlomif[homepage]:$trunk$ rg -n 'שלום' src/humour/fortunes/shlomif.xml
> 8120:the Hebrew Wikipedia. סעי לשלום - המפתחות בפנים! 8690:<saying
> who="rindolf">archmint: שלום!</saying> shlomif[homepage]:$trunk$ grep -n
> 'שלום' src/humour/fortunes/shlomif.xml 8120:the Hebrew Wikipedia. סעי לשלום -
> המפתחות בפנים! 8690:<saying who="rindolf">archmint: שלום!</saying>
> shlomif[homepage]:$trunk$ grep --version grep (GNU grep) 3.4 Copyright (C)
> 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or
> later <https://gnu.org/licenses/gpl.html>. This is free software: you are
> free to change and redistribute it. There is NO WARRANTY, to the extent
> permitted by law.
> 
> Written by Mike Haertel and others; see
> <https://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
> shlomif[homepage]:$trunk$ uname -a
> Linux telaviv1.shlomifish.org 5.7.12-desktop-1.mga8 #1 SMP Sat Aug 1 21:39:47
> UTC 2020 x86_64 x86_64 x86_64 GNU/Linux shlomif[homepage]:$trunk$ »
> 
> note that "שלום" is the hebrew spelling of
> https://en.wikipedia.org/wiki/Shalom .
> 
> Regards,
> 
> 	Shlomi Fish
> 
> > I am using version Git for Windows v2.25.0
> > 
> > Kind regards
> > 
> > 
> > 
> > 
> >   
> 
> 
> 



-- 

Shlomi Fish       https://www.shlomifish.org/
Freecell Solver - https://fc-solve.shlomifish.org/

God signs people into the book of life using a pen that Chuck Norris gave him.
    — https://www.shlomifish.org/humour/bits/facts/Chuck-Norris/

Please reply to list if it's a mailing list post - https://shlom.in/reply .




Severity set to 'wishlist' from 'normal' Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Mon, 21 Sep 2020 19:48:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-grep <at> gnu.org:
bug#42762; Package grep. (Sun, 31 Jan 2021 16:11:02 GMT) Full text and rfc822 format available.

Message #19 received at 42762 <at> debbugs.gnu.org (full text, mbox):

From: Paulie Pena IV <paulie4 <at> gmail.com>
To: 42762 <at> debbugs.gnu.org
Subject: Re: bug#42762: GREP does not support Unicode
Date: Sun, 31 Jan 2021 11:10:10 -0500
[Message part 1 (text/plain, inline)]
Supporting Unicode means it should also support UTF-16 and UTF-32 files,
which grep does not.

Look at the insane number of hoops this person had to do to workaround this
major issue (and then it doesn't even print out the matching lines and just
says "Binary file ... matches"):
https://stackoverflow.com/a/3781221/2016290

Are there any plans for adding UTF-16 and UTF-32 file support to grep?

Thank you,
Paulie

P.S. The footer on https://debbugs.gnu.org/ pages has a date of 2003, which
of course is also very outdated. I hope I'm doing the right thing by
sending this email, since there's no way to comment directly in
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=42762
[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#42762; Package grep. (Tue, 02 Feb 2021 00:14:01 GMT) Full text and rfc822 format available.

Message #22 received at 42762 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Paulie Pena IV <paulie4 <at> gmail.com>
Cc: 42762 <at> debbugs.gnu.org
Subject: Re: bug#42762: GREP does not support Unicode
Date: Mon, 1 Feb 2021 16:13:04 -0800
On 1/31/21 8:10 AM, Paulie Pena IV wrote:
> Are there any plans for adding UTF-16 and UTF-32 file support to grep?

No.

I doubt whether it'd be worth the hassle, just as I doubt whether it'd 
be worth adding support for lots of similar but more-useful features 
(such as grepping through compressed files). However, I'm willing to be 
proved wrong by someone adding support in such a way that doesn't hurt 
performance for the more-common case.




This bug report was last modified 4 years and 134 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.