GNU bug report logs - #51698
surrogate-pair test fails under Cygwin

Previous Next

Package: grep;

Reported by: Duncan Roe <duncan_roe <at> optusnet.com.au>

Date: Tue, 9 Nov 2021 02:56:02 UTC

Severity: minor

Merged with 27555, 49983

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Duncan Roe <duncan_roe <at> optusnet.com.au>
To: bug-grep <at> gnu.org
Cc: Duncan Roe <duncan_roe <at> optusnet.com.au>
Subject: surrogate-pair test fails under Cygwin
Date: Tue, 9 Nov 2021 13:48:02 +1100
[Message part 1 (text/plain, inline)]
The 3rd surrogate-pair test fails under Cygwin:
> # Also test whether a surrogate-pair in the search string works.
Fails at grep-3.7 or latest commit.

Reproduces easily enough from the command line:
> printf '%s\n' "$(printf '\360\220\220\205')" >in
> LANG=en_US.utf8
> locale
> src/grep --file=in in

Reports a match under Linux but not under Cygwin. Tested Cygwin64 on Windows 7
Home and Windows 10.

Comparing gdb sessions between the platforms, I noticed:
> linux:  sbclen = '\001' <repeats 128 times>, '\377' <repeats 66 times>, '\376' <repeats 60 times>, "\377\377"
> cygwin: sbclen = '\001' <repeats 128 times>, '\377' <repeats 64 times>, '\376' <repeats 53 times>, '\377' <repeats 11 times>
in `dfa` (i.e. dfa.localeinfo.sbclen).

Also this:
> linux:  enlistnew (cpp=0x, new=0x "\360\220\220\205") at dfa.c:3928
> cygwin: enlistnew (cpp=0x, new=0x "\360\355\260\205") at dfa.c:3928

Locale data is different for the same locale on the 2 systems. I investigated
this further by breakpointing the code as it starts to compute sbclen[250] which
is \376 ubder Linux but \377 under Cygwin. I captured the gdb sessions using
`script` and have attached them in the hope they are some help.

If your system rejects the tar.gz attachment I'll send them plaintext in
separate emails. They compare best in a side-by-side diff highlighting changed
characters. I find `tkdiff` good for this: from View choose "Show inline
comparison (recursive)".

Uninteresting changes between the sessions are removed:
 Automatic
 - strip hex numbers (addresses usually) to plain 0x
 - remove escape sequences (colouring &c.)
 - probably other stuff
 Specifics
 - force matching locale names
 - insert blank lines at linux:72 to line up return stmt
 - split linux:100 to more easily see later args

Cheers ... Duncan.
[gdb_sessions.tar.gz (application/x-tar-gz, attachment)]

This bug report was last modified 3 years and 180 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.