GNU bug report logs - #18266
grep -P and invalid exits with error

Previous Next

Package: grep;

Reported by: Santiago <santiago <at> debian.org>

Date: Thu, 14 Aug 2014 15:43:02 UTC

Severity: wishlist

Merged with 18455

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: 18266 <at> debbugs.gnu.org, 758105 <at> bugs.debian.org
Subject: bug#18266: handling bytes not part of the charset, and other garbage
Date: Fri, 12 Sep 2014 17:57:39 -0700
Vincent Lefevre wrote:
> I wonder whether anyone is interested in matching individual bytes
> in a file regarded as UTF-8 encoded. This seems weird.

It's not weird at all.  For example, suppose we invent the notation 
[[:error:]] to match encoding errors.  Then the pattern '[[:error:]]' 
would match all encoding errors in a file, which could well be a useful 
thing.

Currently, for example, the tz package <http://www.iana.org/time-zones> 
has a Make rule 'check_character_set' that verifies that the source 
files are all properly encoded.  It executes this shell command:

! grep -nv '^.*$' file names

This relies on GNU grep's behavior that "." does not match an encoding 
error.  But it's a command that is not obvious.  It'd be simpler and 
clearer to write this:

! grep -n '[[:error:]]' file names

if such a feature were available.




This bug report was last modified 10 years and 249 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.