GNU bug report logs - #15199
UTF-16 surrogate pair handling in grep -i option

Previous Next

Package: grep;

Reported by: Paolo Bonzini <bonzini <at> gnu.org>

Date: Tue, 27 Aug 2013 15:54:01 UTC

Severity: normal

Tags: moreinfo

Merged with 15192

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Corinna Vinschen <vinschen <at> redhat.com>
To: Paolo Bonzini <bonzini <at> gnu.org>
Cc: bug-grep <at> gnu.org
Subject: Re: UTF-16 surrogate pair handling in grep -i option
Date: Tue, 27 Aug 2013 18:14:40 +0200
[Message part 1 (text/plain, inline)]
On Aug 27 17:53, Paolo Bonzini wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Il 20/08/2013 17:11, Corinna Vinschen ha scritto:
> > That's what I did when I started to write this patch, but then I 
> > decided against it for the following reason:
> > 
> > The implementation of mbrtowc, wcrtomb and towlower using UTF-16 
> > wchar_t works *only* in the Cygwin/Newlib-provided functions in 
> > exactly the way used in this patch.  I'm not aware that any other 
> > platform provides an equivalent implementation, even if wchar_t is 
> > 2 bytes.  Thus, the assumption that the code works in all cases in 
> > which sizeof (wchar_t) == 2, is wrong.  It would, for instance,
> > not work with the Windows implementation of wcrtomb, AFAIK.
> 
> Right, MSVCRT is exactly what I was thinking about.
> 
> > I'm not strongly opposed to changing this, but IMHO, to be on the 
> > safe side, this code should only be activated on a case by case 
> > basis, so only for Cygwin for now.  Same with a potential fix to 
> > the regex compiler, for which I have no idea how to do it, yet :(
> 
> Feel free to bug me on IRC if I can be of any help.

Thanks for the offer!  I'll get back to it probably in November and
I would be glad if you could help me through the gnulib regex code
then.


Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat
[Message part 2 (application/pgp-signature, inline)]

This bug report was last modified 11 years and 77 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.