GNU bug report logs -
#73307
Fix ctype(3) usage
Previous Next
Reported by: Thomas Klausner <wiz <at> gatalith.at>
Date: Mon, 16 Sep 2024 21:26:01 UTC
Severity: normal
Tags: patch
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 73307 in the body.
You can then email your comments to 73307 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#73307
; Package
emacs
.
(Mon, 16 Sep 2024 21:26:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Thomas Klausner <wiz <at> gatalith.at>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Mon, 16 Sep 2024 21:26:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Tags: patch
When compiling emacs on NetBSD-10.99.12/amd64, I get the following
warning
In file included from /usr/include/ctype.h:100,
from fns.c:29:
fns.c: In function 'Fyes_or_no_p':
fns.c:3582:33: warning: array subscript has type 'char' [-Wchar-subscripts]
3582 | if ((len > 0) && !isspace (s[len - 1]))
| ^
The NetBSD man page for ctype(3): https://man.netbsd.org/ctype.3
is quite explicit about the problems with this - ctype(3) functions only
accept -1 and "unsigned char" and you can get very weird problems if
this is disregarded.
The attached patch adds the missing cast.
In GNU Emacs 31.0.50 (build 1, x86_64--netbsd, GTK+ Version 3.24.43,
cairo version 1.18.0) of 2024-09-16
Repository revision: f27553c30a772a0103d2e6762e4d7f588f302e4b
Repository branch: HEAD
System Description: NetBSD 10.99.12/amd64
Configured using:
'configure --srcdir=/scratch/wip/emacs-git/work/emacs
--localstatedir=/var --with-native-compilation --without-ns
--without-imagemagick --without-xaw3d --with-x-toolkit=gtk3
--prefix=/usr/pkg --build=x86_64--netbsd --host=x86_64--netbsd
--infodir=/usr/pkg/info --mandir=/usr/pkg/man
--enable-option-checking=yes 'CFLAGS=-O2 -g -g -fstack-clash-protection
-I/usr/pkg/include -I/usr/include -I/usr/pkg/include/freetype2
-I/usr/pkg/include/glib-2.0 -I/usr/pkg/include/gio-unix-2.0
-I/usr/pkg/lib/glib-2.0/include -I/usr/pkg/include/harfbuzz
-I/usr/pkg/include/libdrm' 'CPPFLAGS=-g -I/usr/pkg/include
-I/usr/include -I/usr/pkg/include/freetype2 -I/usr/pkg/include/glib-2.0
-I/usr/pkg/include/gio-unix-2.0 -I/usr/pkg/lib/glib-2.0/include
-I/usr/pkg/include/harfbuzz -I/usr/pkg/include/libdrm'
'LDFLAGS=-Wl,-R/usr/pkg/gcc14/lib -Wl,-zrelro -Wl,-znow -L/usr/pkg/lib
-Wl,-R/usr/pkg/lib -L/usr/lib -Wl,-R/usr/lib''
[patch-src_fns.c (text/x-c, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#73307
; Package
emacs
.
(Tue, 17 Sep 2024 11:53:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 73307 <at> debbugs.gnu.org (full text, mbox):
> Date: Mon, 16 Sep 2024 23:25:16 +0200
> From: Thomas Klausner <wiz <at> gatalith.at>
>
> When compiling emacs on NetBSD-10.99.12/amd64, I get the following
> warning
>
> In file included from /usr/include/ctype.h:100,
> from fns.c:29:
> fns.c: In function 'Fyes_or_no_p':
> fns.c:3582:33: warning: array subscript has type 'char' [-Wchar-subscripts]
> 3582 | if ((len > 0) && !isspace (s[len - 1]))
> | ^
>
> The NetBSD man page for ctype(3): https://man.netbsd.org/ctype.3
> is quite explicit about the problems with this - ctype(3) functions only
> accept -1 and "unsigned char" and you can get very weird problems if
> this is disregarded.
>
> The attached patch adds the missing cast.
> [...]
>
> --- src/fns.c.orig 2024-09-16 21:11:40.908684144 +0000
> +++ src/fns.c
> @@ -3579,7 +3579,7 @@ by a mouse, or by some window-system ges
> {
> char *s = SSDATA (prompt);
> ptrdiff_t len = strlen (s);
> - if ((len > 0) && !isspace (s[len - 1]))
> + if ((len > 0) && !isspace ((unsigned char)s[len - 1]))
> prompt = CALLN (Fconcat, prompt, build_string (" "));
> }
> prompt = CALLN (Fconcat, prompt, Vyes_or_no_prompt);
Thanks. However, I'm not sure this is the right fix, the function is
defined with argument type of 'int'. Paul, any comments?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#73307
; Package
emacs
.
(Tue, 17 Sep 2024 11:57:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 73307 <at> debbugs.gnu.org (full text, mbox):
On Tue, Sep 17, 2024 at 02:52:20PM +0300, Eli Zaretskii wrote:
> Thanks. However, I'm not sure this is the right fix, the function is
> defined with argument type of 'int'. Paul, any comments?
That is exactly the problem.
If you pass a 'signed char' below -1, it gets converted to the same
integer value, also a negative number below -1, and you end up passing
an invalid argument to the function. The function is defined to only
accept EOF and non-negative values in the range of 'unsigned char'.
-1 is only special because that's usually the value of EOF.
Thomas
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Wed, 18 Sep 2024 00:06:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Thomas Klausner <wiz <at> gatalith.at>
:
bug acknowledged by developer.
(Wed, 18 Sep 2024 00:06:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 73307-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 2024-09-17 04:52, Eli Zaretskii wrote:
> However, I'm not sure this is the right fix, the function is
> defined with argument type of 'int'. Paul, any comments?
Although that patch was an improvement it still had problems, as it
incorrectly assumed the string does not end in a multibyte space, and
that Emacs's locale matches the system's.
Emacs itself should not use <ctype.h> unless it knows the string is
unibyte and the system locale matches Emacs's. I scanned through its
source code looking for all problematic instances of <ctype.h> that have
crept in (except I didn't scan the MS-Windows code, where you're the
expert), and found five other places where ctype.h was obviously
misused. I installed the attached to fix these glitches and am boldly
closing this the report.
I can't easily test patch 0003, which fixes Android-specific code.
Although I think it's an improvement, in unlikely cases I suspect it
still doesn't exactly match what the Android kernel does with #! lines.
I don't know whether that matters. I'll CC this to Po Lu (my goto person
for Android) as a heads-up.
[0001-Fix-yes-or-no-p-with-multibyte-spaces.patch (text/x-patch, attachment)]
[0002-Fix-misuse-of-toupper-in-sfnt_parse_style.patch (text/x-patch, attachment)]
[0003-Fix-some-misparsing-in-check_interpreter.patch (text/x-patch, attachment)]
[0004-Use-c-ctype.h-in-lib-src.patch (text/x-patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#73307
; Package
emacs
.
(Wed, 18 Sep 2024 02:15:01 GMT)
Full text and
rfc822 format available.
Message #19 received at 73307-done <at> debbugs.gnu.org (full text, mbox):
Paul Eggert <eggert <at> cs.ucla.edu> writes:
> I can't easily test patch 0003, which fixes Android-specific
> code. Although I think it's an improvement, in unlikely cases I
> suspect it still doesn't exactly match what the Android kernel does
> with #! lines. I don't know whether that matters. I'll CC this to Po
> Lu (my goto person for Android) as a heads-up.
It's OK. Though I have never encountered shebangs with tabs.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#73307
; Package
emacs
.
(Wed, 18 Sep 2024 11:47:01 GMT)
Full text and
rfc822 format available.
Message #22 received at 73307 <at> debbugs.gnu.org (full text, mbox):
> Date: Tue, 17 Sep 2024 17:05:29 -0700
> Cc: 73307-done <at> debbugs.gnu.org, Thomas Klausner <wiz <at> gatalith.at>,
> Po Lu <luangruo <at> yahoo.com>
> From: Paul Eggert <eggert <at> cs.ucla.edu>
>
> On 2024-09-17 04:52, Eli Zaretskii wrote:
> > However, I'm not sure this is the right fix, the function is
> > defined with argument type of 'int'. Paul, any comments?
>
> Although that patch was an improvement it still had problems, as it
> incorrectly assumed the string does not end in a multibyte space, and
> that Emacs's locale matches the system's.
>
> Emacs itself should not use <ctype.h> unless it knows the string is
> unibyte and the system locale matches Emacs's. I scanned through its
> source code looking for all problematic instances of <ctype.h> that have
> crept in (except I didn't scan the MS-Windows code, where you're the
> expert), and found five other places where ctype.h was obviously
> misused. I installed the attached to fix these glitches and am boldly
> closing this the report.
>
> I can't easily test patch 0003, which fixes Android-specific code.
> Although I think it's an improvement, in unlikely cases I suspect it
> still doesn't exactly match what the Android kernel does with #! lines.
> I don't know whether that matters. I'll CC this to Po Lu (my goto person
> for Android) as a heads-up.
Thanks.
This part doesn't look right to me:
> - {
> - char *s = SSDATA (prompt);
> - ptrdiff_t len = strlen (s);
> - if ((len > 0) && !isspace (s[len - 1]))
> - prompt = CALLN (Fconcat, prompt, build_string (" "));
> - }
> - prompt = CALLN (Fconcat, prompt, Vyes_or_no_prompt);
> + ptrdiff_t promptlen = SCHARS (prompt);
> + bool prompt_ends_in_nonspace
> + = (0 < promptlen
> + && (SYNTAX (XFIXNAT (Faref (prompt, make_fixnum (promptlen - 1))))
> + != Swhitespace));
> + AUTO_STRING (space_string, " ");
> + prompt = CALLN (Fconcat, prompt,
> + prompt_ends_in_nonspace ? space_string : empty_unibyte_string,
> + Vyes_or_no_prompt);
You are testing characters for whitespace syntax, which is AFAIU
subject to buffer-local syntax tables. Thus, a strange enough setting
of buffer syntax could make this code return unexpected results.
Stefan, am I right?
Why can't we use something closer to the original code, which doesn't
depend on buffer-local customizations? In particular, what's the
problem with using c_isspace (and what do you mean by "multibyte
space"?)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#73307
; Package
emacs
.
(Wed, 18 Sep 2024 15:19:01 GMT)
Full text and
rfc822 format available.
Message #25 received at 73307 <at> debbugs.gnu.org (full text, mbox):
> You are testing characters for whitespace syntax, which is AFAIU
> subject to buffer-local syntax tables. Thus, a strange enough setting
> of buffer syntax could make this code return unexpected results.
>
> Stefan, am I right?
Yup.
> Why can't we use something closer to the original code, which doesn't
> depend on buffer-local customizations?
We could use the global/default syntax table?
[ Or rely on some of the unicode tables? ]
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#73307
; Package
emacs
.
(Wed, 18 Sep 2024 15:36:03 GMT)
Full text and
rfc822 format available.
Message #28 received at 73307 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 2024-09-18 04:45, Eli Zaretskii wrote:
> what's the
> problem with using c_isspace (and what do you mean by "multibyte
> space"
Using c_isspace would be OK, in that it'd handle most practical
examples. I wrote the fancier version to handle prompts that end in
non-ASCII spaces, e.g., (yes-or-no-p "Delete all files? ") where the
last character is actually U+00A0 NO-BREAK_SPACE instead of U+0020
SPACE, so its UTF-8 encoding is multiple bytes. If it's not important to
handle such cases we could just use c_isspace.
> You are testing characters for whitespace syntax, which is AFAIU
> subject to buffer-local syntax tables.
Thanks, I didn't think of that. How about the attached patch? It is an
alternative to just using c_isspace.
[fns.c.diff (text/x-patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#73307
; Package
emacs
.
(Wed, 18 Sep 2024 16:12:02 GMT)
Full text and
rfc822 format available.
Message #31 received at 73307 <at> debbugs.gnu.org (full text, mbox):
> Date: Wed, 18 Sep 2024 08:35:18 -0700
> Cc: 73307 <at> debbugs.gnu.org, wiz <at> gatalith.at, luangruo <at> yahoo.com
> From: Paul Eggert <eggert <at> cs.ucla.edu>
>
> > You are testing characters for whitespace syntax, which is AFAIU
> > subject to buffer-local syntax tables.
>
> Thanks, I didn't think of that. How about the attached patch? It is an
> alternative to just using c_isspace.
That's much better, thanks. But I'm not sure we really want to allow
any Zs character there. For example, what about U+3000 IDEOGRAPHIC
SPACE? But if this is okay with us, please install.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#73307
; Package
emacs
.
(Wed, 18 Sep 2024 16:23:01 GMT)
Full text and
rfc822 format available.
Message #34 received at 73307 <at> debbugs.gnu.org (full text, mbox):
On 2024-09-18 09:11, Eli Zaretskii wrote:
> what about U+3000 IDEOGRAPHIC SPACE?
That should be OK. For (yes-or-no-p "您想删除所有文件吗? "), where the prompt
ends in U+3000, no space is needed. I installed the patch.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 17 Oct 2024 11:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 247 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.