GNU bug report logs - #18266
grep -P and invalid exits with error

Previous Next

Package: grep;

Reported by: Santiago <santiago <at> debian.org>

Date: Thu, 14 Aug 2014 15:43:02 UTC

Severity: wishlist

Merged with 18455

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: 18266 <at> debbugs.gnu.org, 758105 <at> bugs.debian.org
Subject: bug#18266: handling bytes not part of the charset, and other garbage
Date: Fri, 12 Sep 2014 19:08:38 -0700
Vincent Lefevre wrote:

> But both of these solutions have the drawback of working only in
> UTF-8 locales.

Not at all; '[[:error:]]' would match a single-byte encoding error in 
the current locale.  The tz database is interested in UTF-8 so it sets 
the LC_ALL environment variable to a UTF-8 locale, but that setting 
shouldn't be required in general.

Also, the tz database needs grep patterns that iconv doesn't support. 
For example, one rule is that commentary (which starts with #) can 
contain UTF-8 characters, but the ordinary data (before the #) is 
limited to a smaller set.  This is captured by the command:

grep -Env '^[ordinarycharset]*(#.*)?$'

where 'ordinarycharset' is the set of ASCII characters in ordinary tz 
data.  Here it's useful that '.' does not match encoding errors on 
GNU/Linux.




This bug report was last modified 10 years and 301 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.