GNU bug report logs - #15192
UTF-16 surrogate pair handling in grep -i option

Previous Next

Package: grep;

Reported by: Corinna Vinschen <vinschen <at> redhat.com>

Date: Mon, 26 Aug 2013 08:56:02 UTC

Severity: normal

Tags: moreinfo

Merged with 15199

Done: Jim Meyering <jim <at> meyering.net>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Corinna Vinschen <vinschen <at> redhat.com>
To: 15192 <at> debbugs.gnu.org
Subject: bug#15192: UTF-16 surrogate pair handling in grep -i option
Date: Mon, 26 Aug 2013 10:54:40 +0200
[Message part 1 (text/plain, inline)]
On Aug 25 12:49, Jim Meyering wrote:
> On Mon, Aug 19, 2013 at 5:43 AM, Corinna Vinschen <...> wrote:
> > But, here's a question:  If the surrogate-pair test fails without the
> > patch due to the SEGV, and it also fails with the patch, just in a
> > different way, what's the idea of the testcase?  In theory, shouldn't
> > there be two tests, one of them testing only for this very SEGV, and
> > another test testing how grep handles 4 byte UTF-8 values, since that's
> > another problem entirely?
> 
> It's a trade-off.  Split surrogate-pair testing into two very similar
> test scripts?
> Factor the similar parts into cfg.sh and use them from two test scripts?
> It didn't fee like it was justified in this case, since it's a
> cygwin-specific bug.
> 
> If there's a short/reliable shell-level test for "is-cygwin", I suppose we

  case $(uname -s) in
  CYGWIN*)
    ...;;
  *)
    ...;
  esac
  
> could make the loop that iterates over grep options skip the currently-
> known-to-fail cases on Cygwin systems.

No, that's not right, IMHO.  It's a matter how you define the test.

Only one part of the test is actually testing for the SEGV bug, is all
I'm saying.  If you want to have a PASS in the testsuite if this works,
it should be a standalone test.

The second part of the test tests if grep handles 4 byte UTF-8 sequences
in regex'es correctly.  It's a different test.  If you define this one
as a target-agnostic test, it requires another test script.

If you define the whole script as *the* test for UTF-16 surrogates,
I suppose it should stay as is and the testcase should FAIL on Cygwin
as long as not all parts of grep grok UTF-16 surrogates.

It's probably just a different point of view, so, never mind.


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat
[Message part 2 (application/pgp-signature, inline)]

This bug report was last modified 11 years and 30 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.