GNU bug report logs - #7968
multibyte: 16-bit wchar_t on Windows and Cygwin

Previous Next

Package: coreutils;

Reported by: Bastien ROUCARIES <roucaries.bastien <at> gmail.com>

Date: Wed, 2 Feb 2011 19:04:02 UTC

Severity: wishlist

Merged with 7948, 7963

To reply to this bug, email your comments to 7968 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils <at> gnu.org:
bug#7968; Package coreutils. (Wed, 02 Feb 2011 19:04:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Bastien ROUCARIES <roucaries.bastien <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 02 Feb 2011 19:04:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Bastien ROUCARIES <roucaries.bastien <at> gmail.com>
To: Bruno Haible <bruno <at> clisp.org>
Cc: bug-coreutils <bug-coreutils <at> gnu.org>, cygwin <cygwin <at> cygwin.com>,
	bug-gnulib <at> gnu.org, Eric Blake <eblake <at> redhat.com>
Subject: RE : Re: 16-bit wchar_t on Windows and Cygwin
Date: Wed, 2 Feb 2011 19:53:43 +0100
[Message part 1 (text/plain, inline)]
Using -fno-short-wchar will avoid to change the api.

Bastien

Le 2 févr. 2011 18:42, "Bruno Haible" <bruno <at> clisp.org> a écrit :

Hello Corinna,

> And, please note the wording in SUSv4, for instance in
> http://calimero.vinschen.de/susv4/functions/iswalpha.html

Likewise in POSIX:2008, at the URL
http://www.opengroup.org/onlinepubs/9699919799/functions/iswalpha.html

>   The wc argument is a wint_t, the value of which the application shall
>                        ^^^^^^                         ^^^^^^^^^^^
>   ensure is a wide-character code corresponding to a valid character in
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   the current locale, or equal to the value of the macro WEOF. If the
>   argument has any other value, the behavior is undefined.

What this sentence means in formulas, is that when an application passes
a 'wint_t x' to iswalpha(), it has to satisfy

  x == (wint_t) (wchar_t) x || x == EOF

> iswalpha takes wint_t, not wchar_t.  Since sizeof (wint_t) is 4 byte,
> the function can return the correct value, provided that the application
> converts the UTF-16 surrogate to UTF-32 before calling iswalpha.

When an application does this, is passes an invalid wint_t value to
iswalpha(), according to the spec paragraph that you have just cited.
So the application uses an extension to POSIX functionality, not
POSIX itself.

I see that Cygwin 1.7.x iswalpha() works in this way you describe (but
mingw's iswalpha() doesn't). So this means that gnulib's proposed
iswwalpha(wwchar_t) function could be implemented using iswalpha()
on Cygwin 1.7.x and will not cause the Unicode based tables to be
included in the executable. This is good and nice.

But if you say that the application should convert UTF-16 surrogates
to UTF-32 before calling iswalpha: That's certainly a requirement
for Cygwin 1.7.x application that want to support the entire Unicode
character set. But it's outside of POSIX, and many GNU programs will
not want to include this added complexity. Just try to apply this
suggestion to gnulib's quotearg.c, then estimate the time someone
would need to apply it also to regcomp.c, strftime.c, mbscasestr.c,
coreutils/src/wc.c, and so on.

For this reason I propose the wwchar_t type with an API that is similar
to POSIX <wctype.h> but includes the surrogate handling, rather than
pushing it into each application's code.


Bruno
-- 
In memoriam Carl Friedrich Goerdeler <
http://en.wikipedia.org/wiki/Carl_Friedrich_Goerdel...
[Message part 2 (text/html, inline)]

Merged 7948 7963 7968. Request was from era eriksson <era <at> iki.fi> to control <at> debbugs.gnu.org. (Thu, 30 Aug 2012 08:08:02 GMT) Full text and rfc822 format available.

Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Fri, 19 Oct 2018 16:42:02 GMT) Full text and rfc822 format available.

Changed bug title to 'multibyte: 16-bit wchar_t on Windows and Cygwin' from 'RE : Re: 16-bit wchar_t on Windows and Cygwin' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Fri, 19 Oct 2018 16:42:02 GMT) Full text and rfc822 format available.

This bug report was last modified 6 years and 240 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.