GNU bug report logs - #15192
UTF-16 surrogate pair handling in grep -i option

Previous Next

Package: grep;

Reported by: Corinna Vinschen <vinschen <at> redhat.com>

Date: Mon, 26 Aug 2013 08:56:02 UTC

Severity: normal

Tags: moreinfo

Merged with 15199

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 15192 in the body.
You can then email your comments to 15192 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#15192; Package grep. (Mon, 26 Aug 2013 08:56:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Corinna Vinschen <vinschen <at> redhat.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Mon, 26 Aug 2013 08:56:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Corinna Vinschen <vinschen <at> redhat.com>
To: bug-grep <at> gnu.org
Subject: Re: UTF-16 surrogate pair handling in grep -i option
Date: Mon, 26 Aug 2013 10:54:40 +0200
[Message part 1 (text/plain, inline)]
On Aug 25 12:49, Jim Meyering wrote:
> On Mon, Aug 19, 2013 at 5:43 AM, Corinna Vinschen <...> wrote:
> > But, here's a question:  If the surrogate-pair test fails without the
> > patch due to the SEGV, and it also fails with the patch, just in a
> > different way, what's the idea of the testcase?  In theory, shouldn't
> > there be two tests, one of them testing only for this very SEGV, and
> > another test testing how grep handles 4 byte UTF-8 values, since that's
> > another problem entirely?
> 
> It's a trade-off.  Split surrogate-pair testing into two very similar
> test scripts?
> Factor the similar parts into cfg.sh and use them from two test scripts?
> It didn't fee like it was justified in this case, since it's a
> cygwin-specific bug.
> 
> If there's a short/reliable shell-level test for "is-cygwin", I suppose we

  case $(uname -s) in
  CYGWIN*)
    ...;;
  *)
    ...;
  esac
  
> could make the loop that iterates over grep options skip the currently-
> known-to-fail cases on Cygwin systems.

No, that's not right, IMHO.  It's a matter how you define the test.

Only one part of the test is actually testing for the SEGV bug, is all
I'm saying.  If you want to have a PASS in the testsuite if this works,
it should be a standalone test.

The second part of the test tests if grep handles 4 byte UTF-8 sequences
in regex'es correctly.  It's a different test.  If you define this one
as a target-agnostic test, it requires another test script.

If you define the whole script as *the* test for UTF-16 surrogates,
I suppose it should stay as is and the testcase should FAIL on Cygwin
as long as not all parts of grep grok UTF-16 surrogates.

It's probably just a different point of view, so, never mind.


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#15192; Package grep. (Tue, 27 Aug 2013 05:00:03 GMT) Full text and rfc822 format available.

Message #8 received at 15192 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Corinna Vinschen <vinschen <at> redhat.com>
Cc: 15192 <at> debbugs.gnu.org
Subject: Re: bug#15192: UTF-16 surrogate pair handling in grep -i option
Date: Mon, 26 Aug 2013 21:58:53 -0700
I guess it is a different point of view.  Maybe I'm just too
forward-thinking? :-)
I.e., if the remaining cygwin-specific bug is fixed soon, there will
be little reason for separate tests.
Are you planning to work on the cygwin/regexp bug?

On Mon, Aug 26, 2013 at 1:54 AM, Corinna Vinschen <vinschen <at> redhat.com> wrote:
> On Aug 25 12:49, Jim Meyering wrote:
>> On Mon, Aug 19, 2013 at 5:43 AM, Corinna Vinschen <...> wrote:
>> > But, here's a question:  If the surrogate-pair test fails without the
>> > patch due to the SEGV, and it also fails with the patch, just in a
>> > different way, what's the idea of the testcase?  In theory, shouldn't
>> > there be two tests, one of them testing only for this very SEGV, and
>> > another test testing how grep handles 4 byte UTF-8 values, since that's
>> > another problem entirely?
>>
>> It's a trade-off.  Split surrogate-pair testing into two very similar
>> test scripts?
>> Factor the similar parts into cfg.sh and use them from two test scripts?
>> It didn't fee like it was justified in this case, since it's a
>> cygwin-specific bug.
>>
>> If there's a short/reliable shell-level test for "is-cygwin", I suppose we
>
>   case $(uname -s) in
>   CYGWIN*)
>     ...;;
>   *)
>     ...;
>   esac
>
>> could make the loop that iterates over grep options skip the currently-
>> known-to-fail cases on Cygwin systems.
>
> No, that's not right, IMHO.  It's a matter how you define the test.
>
> Only one part of the test is actually testing for the SEGV bug, is all
> I'm saying.  If you want to have a PASS in the testsuite if this works,
> it should be a standalone test.
>
> The second part of the test tests if grep handles 4 byte UTF-8 sequences
> in regex'es correctly.  It's a different test.  If you define this one
> as a target-agnostic test, it requires another test script.
>
> If you define the whole script as *the* test for UTF-16 surrogates,
> I suppose it should stay as is and the testcase should FAIL on Cygwin
> as long as not all parts of grep grok UTF-16 surrogates.
>
> It's probably just a different point of view, so, never mind.
>
>
> Thanks,
> Corinna
>
> --
> Corinna Vinschen
> Cygwin Maintainer
> Red Hat




Information forwarded to bug-grep <at> gnu.org:
bug#15192; Package grep. (Tue, 27 Aug 2013 09:36:02 GMT) Full text and rfc822 format available.

Message #11 received at 15192 <at> debbugs.gnu.org (full text, mbox):

From: Corinna Vinschen <vinschen <at> redhat.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 15192 <at> debbugs.gnu.org
Subject: Re: bug#15192: UTF-16 surrogate pair handling in grep -i option
Date: Tue, 27 Aug 2013 11:35:37 +0200
[Message part 1 (text/plain, inline)]
On Aug 26 21:58, Jim Meyering wrote:
> I guess it is a different point of view.  Maybe I'm just too
> forward-thinking? :-)
> I.e., if the remaining cygwin-specific bug is fixed soon, there will
> be little reason for separate tests.
> Are you planning to work on the cygwin/regexp bug?

Not in the next couple of weeks.  I'll be abroad for a while.  Maybe in
October or November.  I put it on my TODO list.  The biggest problem is
that I know the BSD regex code pretty well, but the gnulib regex code is
very different.  I already had a look two weeks ago, but I have not
found the right place to mount surrogate pairs into it :(


Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#15192; Package grep. (Sat, 31 Aug 2013 20:26:02 GMT) Full text and rfc822 format available.

Message #14 received at 15192 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Corinna Vinschen <vinschen <at> redhat.com>
Cc: 15192 <at> debbugs.gnu.org
Subject: Re: bug#15192: UTF-16 surrogate pair handling in grep -i option
Date: Sat, 31 Aug 2013 13:24:41 -0700
[Message part 1 (text/plain, inline)]
Hi Corinna,
Sorry about the delay.
I'm prepared to push the following.  Are you ok with that?

On Tue, Aug 27, 2013 at 2:35 AM, Corinna Vinschen <vinschen <at> redhat.com> wrote:
> On Aug 26 21:58, Jim Meyering wrote:
>> I guess it is a different point of view.  Maybe I'm just too
>> forward-thinking? :-)
>> I.e., if the remaining cygwin-specific bug is fixed soon, there will
>> be little reason for separate tests.
>> Are you planning to work on the cygwin/regexp bug?
>
> Not in the next couple of weeks.  I'll be abroad for a while.  Maybe in
> October or November.  I put it on my TODO list.  The biggest problem is
> that I know the BSD regex code pretty well, but the gnulib regex code is
> very different.  I already had a look two weeks ago, but I have not
> found the right place to mount surrogate pairs into it :(
>
>
> Corinna
>
> --
> Corinna Vinschen
> Cygwin Maintainer
> Red Hat
[k.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#15192; Package grep. (Sun, 01 Sep 2013 08:58:02 GMT) Full text and rfc822 format available.

Message #17 received at 15192 <at> debbugs.gnu.org (full text, mbox):

From: Corinna Vinschen <vinschen <at> redhat.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 15192 <at> debbugs.gnu.org
Subject: Re: bug#15192: UTF-16 surrogate pair handling in grep -i option
Date: Sun, 1 Sep 2013 10:57:40 +0200
[Message part 1 (text/plain, inline)]
Hi Jim,

On Aug 31 13:24, Jim Meyering wrote:
> Hi Corinna,
> Sorry about the delay.
> I'm prepared to push the following.  Are you ok with that?

Looks good, thank you.


Corinna
[Message part 2 (application/pgp-signature, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#15192; Package grep. (Sun, 01 Sep 2013 15:27:02 GMT) Full text and rfc822 format available.

Message #20 received at 15192 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Corinna Vinschen <vinschen <at> redhat.com>
Cc: 15192 <at> debbugs.gnu.org
Subject: Re: bug#15192: UTF-16 surrogate pair handling in grep -i option
Date: Sun, 1 Sep 2013 08:25:43 -0700
pushed

On Sun, Sep 1, 2013 at 1:57 AM, Corinna Vinschen <vinschen <at> redhat.com> wrote:
> Hi Jim,
>
> On Aug 31 13:24, Jim Meyering wrote:
>> Hi Corinna,
>> Sorry about the delay.
>> I'm prepared to push the following.  Are you ok with that?
>
> Looks good, thank you.
>
>
> Corinna




bug closed, send any further explanations to 15192 <at> debbugs.gnu.org and Corinna Vinschen <vinschen <at> redhat.com> Request was from Jim Meyering <jim <at> meyering.net> to control <at> debbugs.gnu.org. (Sun, 01 Sep 2013 15:29:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 30 Sep 2013 11:24:04 GMT) Full text and rfc822 format available.

bug unarchived. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Mon, 28 Apr 2014 14:06:01 GMT) Full text and rfc822 format available.

Forcibly Merged 15192 15199. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Mon, 28 Apr 2014 14:06:01 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 27 May 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 30 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.