GNU bug report logs - #18398
Probably found a bug in grep

Previous Next

Package: grep;

Reported by: "Bergen, Andreas" <Andreas.Bergen <at> all-for-one.com>

Date: Wed, 3 Sep 2014 19:15:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18398 in the body.
You can then email your comments to 18398 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#18398; Package grep. (Wed, 03 Sep 2014 19:15:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Bergen, Andreas" <Andreas.Bergen <at> all-for-one.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Wed, 03 Sep 2014 19:15:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Bergen, Andreas" <Andreas.Bergen <at> all-for-one.com>
To: "'bug-grep <at> gnu.org'" <bug-grep <at> gnu.org>
Subject: Probably found a bug in grep
Date: Wed, 3 Sep 2014 19:11:45 +0000
[Message part 1 (text/plain, inline)]
Hi all,

I've probably found a bug in "grep".
Here's a way how to reproduce it:

s53mgt:/test2 # cat testfile
A
Ä
s53mgt:/test2 # grep -F -eÄ -eA testfile
A
Ä
s53mgt:/test2 # grep -i -eÄ -eA testfile
A
Ä
s53mgt:/test2 # grep -iF -eÄ -eA testfile
A

As you can see the last one does not give A and Ä but only A.

When I do the same with another testfile without an "Ä" (A-Umlaut) in it, it works like expected:
s53mgt:/test2 # cat testfile2
A
B
s53mgt:/test2 # grep -F -eB -eA testfile2
A
B
s53mgt:/test2 # grep -i -eB -eA testfile2
A
B
s53mgt:/test2 # grep -iF -eB -eA testfile2
A
B

s53mgt:/test2 # file testfile testfile2
testfile:  UTF-8 Unicode text
testfile2: ASCII text

Here's some information on my version of "grep".

s53mgt:/test2 # rpm -qif /bin/grep
Name        : grep                         Relocations: (not relocatable)
Version     : 2.5.1a                            Vendor: SUSE LINUX Products GmbH, Nuernberg, Germany
Release     : 20.17                         Build Date: Tue Apr 22 03:47:13 2008
Install Date: Mon Jul  6 16:21:37 2009      Build Host: blacher.suse.de
Group       : Productivity/Text/Utilities   Source RPM: grep-2.5.1a-20.17.src.rpm
Size        : 461697                           License: GPL v2 or later
Signature   : DSA/SHA1, Tue Apr 22 03:49:23 2008, Key ID a84edae89c800aca
Packager    : http://bugs.opensuse.org
URL         : http://www.gnu.org/software/grep/
Summary     : Print lines matching a pattern
Description :
GNU grep, the "fastest grep in the west" (hopefully).

`grep' searches for lines matching a pattern.



Can you confirm this?

What can I do about it?

Regards
  Andreas

---
Andreas Bergen
Solution Architect

All for One Steeb AG
Gottlieb-Manz-Straße 1
70794 Filderstadt
T  +49 711 78807-689
F  +49 711 78807-92689
M +49 151 53824-689
Andreas.Bergen <at> all-for-one.com<mailto:Andreas.Bergen <at> all-for-one.com>
www.all-for-one.com<http://www.all-for-one.com/>


________________________________

All for One Steeb AG, Sitz der Gesellschaft: Filderstadt. Amtsgericht Stuttgart: HRB 19 539,
Vorstand: Lars Landwehrkamp (Sprecher), Stefan Land
Vorsitzender des Aufsichtsrats: Peter Brogle

Diese E-Mail (einschließlich aller Anhänge) kann Betriebs- oder Geschäftsgeheimnisse bzw. sonstige vertrauliche und/oder rechtlich geschützte Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen jede Kenntnisnahme des Inhalts, Nutzung, Vervielfältigung, oder Weitergabe der E-Mail (einschließlich aller Anhänge) ausdrücklich untersagt. Bitte benachrichtigen Sie uns umgehend und vernichten Sie die empfangene E-Mail. Vielen Dank.

This e-mail (including any attachments) may contain business or trade secrets or other confidential and / or legally protected information. If you have received this e-mail in error, you are hereby notified that any review, use, copying, or distribution of it is strictly prohibited. Please inform us immediately and destroy this e-mail. Thank you.
[Message part 2 (text/html, inline)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Wed, 03 Sep 2014 20:16:02 GMT) Full text and rfc822 format available.

Notification sent to "Bergen, Andreas" <Andreas.Bergen <at> all-for-one.com>:
bug acknowledged by developer. (Wed, 03 Sep 2014 20:16:03 GMT) Full text and rfc822 format available.

Message #10 received at 18398-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: "Bergen, Andreas" <Andreas.Bergen <at> all-for-one.com>, 
 18398-done <at> debbugs.gnu.org
Subject: Re: bug#18398: Probably found a bug in grep
Date: Wed, 03 Sep 2014 13:15:19 -0700
Bergen, Andreas wrote:
> Version     : 2.5.1a

Thanks for the report.  As shown below, I can't reproduce the bug with 
grep 2.20 (the current version) in either the en_US.utf8 or the 
de_DE.utf8 locales.  grep 2.5.1a is pretty old (dated 2004) and several 
bugs have been fixed in this area in the last ten years, so I suggest 
upgrading and I'm taking the liberty of marking this as done.

$ cat testfile
A
Ä
$ grep -F -eÄ -eA testfile
A
Ä
$ grep -i -eÄ -eA testfile
A
Ä
$ grep -iF -eÄ -eA testfile
A
Ä




Information forwarded to bug-grep <at> gnu.org:
bug#18398; Package grep. (Thu, 04 Sep 2014 08:29:02 GMT) Full text and rfc822 format available.

Message #13 received at 18398-done <at> debbugs.gnu.org (full text, mbox):

From: "Bergen, Andreas" <Andreas.Bergen <at> all-for-one.com>
To: "'Paul Eggert'" <eggert <at> cs.ucla.edu>, "18398-done <at> debbugs.gnu.org"
 <18398-done <at> debbugs.gnu.org>
Subject: AW: bug#18398: Probably found a bug in grep
Date: Thu, 4 Sep 2014 08:28:07 +0000
Thanks a lot.
I've tried with the newest version available for Suse Linux (grep 2.7-5.7.1 in SLES 11 SP3) and the bug seems to be fixed there as well.

Best regards
  Andreas

---
Andreas Bergen
Solution Architect

All for One Steeb AG
Gottlieb-Manz-Straße 1
70794 Filderstadt
T  +49 711 78807-689
F  +49 711 78807-92689
M +49 151 53824-689
Andreas.Bergen <at> all-for-one.com
www.all-for-one.com


-----Ursprüngliche Nachricht-----
Von: Paul Eggert [mailto:eggert <at> cs.ucla.edu]
Gesendet: Mittwoch, 3. September 2014 22:15
An: Bergen, Andreas; 18398-done <at> debbugs.gnu.org
Betreff: Re: bug#18398: Probably found a bug in grep

Bergen, Andreas wrote:
> Version     : 2.5.1a

Thanks for the report.  As shown below, I can't reproduce the bug with
grep 2.20 (the current version) in either the en_US.utf8 or the
de_DE.utf8 locales.  grep 2.5.1a is pretty old (dated 2004) and several
bugs have been fixed in this area in the last ten years, so I suggest
upgrading and I'm taking the liberty of marking this as done.

$ cat testfile
A
Ä
$ grep -F -eÄ -eA testfile
A
Ä
$ grep -i -eÄ -eA testfile
A
Ä
$ grep -iF -eÄ -eA testfile
A
Ä

________________________________

All for One Steeb AG, Sitz der Gesellschaft: Filderstadt. Amtsgericht Stuttgart: HRB 19 539,
Vorstand: Lars Landwehrkamp (Sprecher), Stefan Land
Vorsitzender des Aufsichtsrats: Peter Brogle

Diese E-Mail (einschließlich aller Anhänge) kann Betriebs- oder Geschäftsgeheimnisse bzw. sonstige vertrauliche und/oder rechtlich geschützte Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen jede Kenntnisnahme des Inhalts, Nutzung, Vervielfältigung, oder Weitergabe der E-Mail (einschließlich aller Anhänge) ausdrücklich untersagt. Bitte benachrichtigen Sie uns umgehend und vernichten Sie die empfangene E-Mail. Vielen Dank.

This e-mail (including any attachments) may contain business or trade secrets or other confidential and / or legally protected information. If you have received this e-mail in error, you are hereby notified that any review, use, copying, or distribution of it is strictly prohibited. Please inform us immediately and destroy this e-mail. Thank you.

Information forwarded to bug-grep <at> gnu.org:
bug#18398; Package grep. (Thu, 04 Sep 2014 08:30:02 GMT) Full text and rfc822 format available.

Message #16 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Johannes Meixner <jsmeix <at> suse.de>
To: "Bergen, Andreas" <Andreas.Bergen <at> all-for-one.com>
Cc: bug-grep <at> gnu.org
Subject: Re: bug#18398: Probably found a bug in grep
Date: Thu, 4 Sep 2014 10:29:00 +0200 (CEST)
Hello,

On Sep 3 19:11 Bergen, Andreas wrote (excerpt):
> I've probably found a bug in "grep".
...
> testfile:  UTF-8 Unicode text
> testfile2: ASCII text
...
> Name        : grep
> Version     : 2.5.1a
> Vendor: SUSE LINUX Products GmbH, Nuernberg, Germany
> Build Date: Tue Apr 22 03:47:13 2008
> Install Date: Mon Jul  6 16:21:37 2009
> Source RPM: grep-2.5.1a-20.17.src.rpm

This grep version is very old.
I found grep version 2.5.1a only in SUSE Linux Enterprise Server 10.
openSUSE distributions with such an old grep are no longer available.

I do not know if that old grep version was really meant to support
UTF-8 character encoding (multibyte characters) actually well
because I find almost nothing about "UTF" (ignore case) in the 
grep-2.5.1a sources. There is some multibyte character support
in grep-2.5.1a but I wonder to what extent it actually works.

In contrast in the grep-2.7 sources that we provide since
SUSE Linux Enterprise Server 11 Service Pack 2 (SLES11-SP2)
there is a lot more about "UTF" (ignore case). In the RPM changelog
of our grep RPM package for SLES11-SP2 there is in particular:
------------------------------------------------------------------
  Version upgrade to grep-2.7
  and reset to full compliance with upstream
...
  version upgrade to grep-2.6.3, which brings among various
  compile fixes vast improvements for UTF-8 / multibyte handling.
------------------------------------------------------------------

In general:

Any issues with various "traditional" Unix/Linux tools
that depend on the locale are very often no real bugs.

For users it is crucial to understand that any kind of
behaviour can depend on the locale (from keyboard input
via program behaviour to what is shown on the screen).

For basic information see
http://en.opensuse.org/SDB:Plain_Text_versus_Locale

When programs process "plain text files", the user who runs
the program must set up the locale environment to match the
encoding of the "plain text file" before he runs the program.

When you like to process your "plain text files" as you did
"since ever" with various "traditional" Unix/Linux tools,
you must use the POSIX locale, otherwise you will get weird
results and unexpected side-effects.

See also
http://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html


Kind Regards
Johannes Meixner
-- 
SUSE LINUX Products GmbH -- Maxfeldstrasse 5 -- 90409 Nuernberg -- Germany
HRB 16746 (AG Nuernberg) GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer




Information forwarded to bug-grep <at> gnu.org:
bug#18398; Package grep. (Thu, 04 Sep 2014 13:02:02 GMT) Full text and rfc822 format available.

Message #19 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Johannes Meixner <jsmeix <at> suse.de>
To: bug-grep <at> gnu.org
Subject: Re: bug#18398: AW: bug#18398: Probably found a bug in grep
Date: Thu, 4 Sep 2014 15:01:06 +0200 (CEST)
Hello,

On Sep 4 08:28 Bergen, Andreas wrote (excerpt):
> ... the newest version available
> for Suse Linux (grep 2.7-5.7.1 in SLES 11 SP3)

FWIW:
The newest available grep versions for openSUSE are
grep-2.14 for openSUSE:13.1 and
grep-2.20 for openSUSE:Factory


Kind Regards
Johannes Meixner
-- 
SUSE LINUX Products GmbH -- Maxfeldstrasse 5 -- 90409 Nuernberg -- Germany
HRB 16746 (AG Nuernberg) GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 03 Oct 2014 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 10 years and 340 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.