GNU bug report logs - #73546
sed 4.9 UTF-8 SMP mismatch on Cygwin

Previous Next

Package: sed;

Reported by: Brian.Inglis <at> SystematicSW.ab.ca

Date: Sun, 29 Sep 2024 06:51:03 UTC

Severity: normal

To reply to this bug, email your comments to 73546 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-sed <at> gnu.org:
bug#73546; Package sed. (Sun, 29 Sep 2024 06:51:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Brian.Inglis <at> SystematicSW.ab.ca:
New bug report received and forwarded. Copy sent to bug-sed <at> gnu.org. (Sun, 29 Sep 2024 06:51:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Brian Inglis <Brian.Inglis <at> SystematicSW.ab.ca>
To: bug-sed <at> GNU.org
Subject: sed 4.9 UTF-8 SMP mismatch on Cygwin
Date: Sun, 29 Sep 2024 00:50:01 -0600
[Message part 1 (text/plain, inline)]
Hi folks,

I was just trying to compare compose key sequences from RFC1345, as provided in 
vim "digraphs", X11 in xterm, and also mintty.

While trying to convert X11 Compose Multi_key sequences into 
di-/tri-/quad-graphs comparable to vim, I found that I could not match some 
UTF-8 SMP Supplementary Multilingual Plane codepoints > U+FFFF specifically 
those > U+1F000 using a negated term as in '"[^"]\+"' but '".\+"' worked, as no 
other '"' appears in any line.

I wondered if this may be a known issue on platforms like Cygwin and others 
(SunOS?, AIX?) where SMP low/high surrogates are used internally in the library 
with sizeof(wchar_t) == sizeof(char16_t) != sizeof(wint_t) == sizeof(char32_t), 
or a bug?

The attached shell script and log demonstrates the issue, using the commonly 
installed libX11/-common Multi_key Compose sequences data file, the "🄯" U+1F12F 
COPYLEFT SYMBOL, and mainly normally installed utilities in standard paths, 
providing some related ancillary information about the data and environment.

The Cygwin environment is up to date as of 2024-09-20 including unifont 16 and 
last-resort-font 16 with Unicode 16 glyphs.

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                -- Antoine de Saint-Exupéry
[sed-4.9-UTF-8-SMP-mismatch-Cygwin.sh (text/plain, attachment)]
[sed-4.9-UTF-8-SMP-mismatch-Cygwin.log (text/plain, attachment)]

This bug report was last modified 258 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.