GNU bug report logs - #18266
grep -P and invalid exits with error

Previous Next

Package: grep;

Reported by: Santiago <santiago <at> debian.org>

Date: Thu, 14 Aug 2014 15:43:02 UTC

Severity: wishlist

Merged with 18455

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: 18266 <at> debbugs.gnu.org, Santiago <santiago <at> debian.org>, 758105 <at> bugs.debian.org
Subject: bug#18266: handling bytes not part of the charset, and other garbage
Date: Thu, 11 Sep 2014 09:22:49 -0700
Vincent Lefevre wrote:

> There's no reason that '.' matches something that doesn't belong to
> the charset in C locale, but doesn't match in a UTF-8 locale.

In the C locale on GNU/Linux, all byte values are members of the 
charset.  That is why it's OK for '.' to accept that byte in the C 
locale but reject it in a UTF-8 locale.

> It's annoying that now in UTF-8, one can no longer match ISO-8859-1 text

This has been true for quite some time in 'grep', at least with the 
standard matchers.  It may not have been true for -P but that relied on 
undefined behavior that could crash grep, and we can't have that.

It would make sense to add a notation to mean "match any character or 
invalid byte", as an extension.  That'd take some work, though.




This bug report was last modified 10 years and 249 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.