GNU bug report logs - #18266
grep -P and invalid exits with error

Previous Next

Package: grep;

Reported by: Santiago <santiago <at> debian.org>

Date: Thu, 14 Aug 2014 15:43:02 UTC

Severity: wishlist

Merged with 18455

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #111 received at 18266 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: 18266 <at> debbugs.gnu.org, 758105 <at> bugs.debian.org
Subject: Re: bug#18266: handling bytes not part of the charset, and other
 garbage
Date: Thu, 11 Sep 2014 18:16:29 -0700
Vincent Lefevre wrote:
> the C locale corresponds to ANSI_X3.4-1968,

No it doesn't, at least not on any current platform I'm aware of.  And 
POSIX does not require that.  POSIX even allows the C locale to be 
multibyte, e.g., UTF-8.

> I would say that this should be the same for invalid
> byte sequences in a UTF-8 locale.

One *could* design an encoding with that property, but it wouldn't be 
UTF-8; it would be something else.  I don't know of any C library that 
does that to UTF-8.  There are good arguments against doing it, e.g., 
one loses the property that one can concatenate character strings by 
concatenating their byte representations.

Anyway I'm afraid we may be going off the deep end here.  After all, 
grep can't impose its coding system design onto the operating system; 
it's more the other way around.




This bug report was last modified 10 years and 249 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.