GNU bug report logs -
#73546
sed 4.9 UTF-8 SMP mismatch on Cygwin
Previous Next
To reply to this bug, email your comments to 73546 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-sed <at> gnu.org
:
bug#73546
; Package
sed
.
(Sun, 29 Sep 2024 06:51:04 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Brian.Inglis <at> SystematicSW.ab.ca
:
New bug report received and forwarded. Copy sent to
bug-sed <at> gnu.org
.
(Sun, 29 Sep 2024 06:51:04 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hi folks,
I was just trying to compare compose key sequences from RFC1345, as provided in
vim "digraphs", X11 in xterm, and also mintty.
While trying to convert X11 Compose Multi_key sequences into
di-/tri-/quad-graphs comparable to vim, I found that I could not match some
UTF-8 SMP Supplementary Multilingual Plane codepoints > U+FFFF specifically
those > U+1F000 using a negated term as in '"[^"]\+"' but '".\+"' worked, as no
other '"' appears in any line.
I wondered if this may be a known issue on platforms like Cygwin and others
(SunOS?, AIX?) where SMP low/high surrogates are used internally in the library
with sizeof(wchar_t) == sizeof(char16_t) != sizeof(wint_t) == sizeof(char32_t),
or a bug?
The attached shell script and log demonstrates the issue, using the commonly
installed libX11/-common Multi_key Compose sequences data file, the "🄯" U+1F12F
COPYLEFT SYMBOL, and mainly normally installed utilities in standard paths,
providing some related ancillary information about the data and environment.
The Cygwin environment is up to date as of 2024-09-20 including unifont 16 and
last-resort-font 16 with Unicode 16 glyphs.
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut
-- Antoine de Saint-Exupéry
[sed-4.9-UTF-8-SMP-mismatch-Cygwin.sh (text/plain, attachment)]
[sed-4.9-UTF-8-SMP-mismatch-Cygwin.log (text/plain, attachment)]
This bug report was last modified 258 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.