GNU bug report logs - #73307
Fix ctype(3) usage

Previous Next

Package: emacs;

Reported by: Thomas Klausner <wiz <at> gatalith.at>

Date: Mon, 16 Sep 2024 21:26:01 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 73307 in the body.
You can then email your comments to 73307 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#73307; Package emacs. (Mon, 16 Sep 2024 21:26:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Thomas Klausner <wiz <at> gatalith.at>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 16 Sep 2024 21:26:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Thomas Klausner <wiz <at> gatalith.at>
To: bug-gnu-emacs <at> gnu.org
Subject: Fix ctype(3) usage
Date: Mon, 16 Sep 2024 23:25:16 +0200
[Message part 1 (text/plain, inline)]
Tags: patch


When compiling emacs on NetBSD-10.99.12/amd64, I get the following
warning

In file included from /usr/include/ctype.h:100,
                 from fns.c:29:
fns.c: In function 'Fyes_or_no_p':
fns.c:3582:33: warning: array subscript has type 'char' [-Wchar-subscripts]
 3582 |     if ((len > 0) && !isspace (s[len - 1]))
      |                                 ^

The NetBSD man page for ctype(3): https://man.netbsd.org/ctype.3
is quite explicit about the problems with this - ctype(3) functions only
accept -1 and "unsigned char" and you can get very weird problems if
this is disregarded.

The attached patch adds the missing cast.


In GNU Emacs 31.0.50 (build 1, x86_64--netbsd, GTK+ Version 3.24.43,
 cairo version 1.18.0) of 2024-09-16
Repository revision: f27553c30a772a0103d2e6762e4d7f588f302e4b
Repository branch: HEAD
System Description: NetBSD 10.99.12/amd64

Configured using:
 'configure --srcdir=/scratch/wip/emacs-git/work/emacs
 --localstatedir=/var --with-native-compilation --without-ns
 --without-imagemagick --without-xaw3d --with-x-toolkit=gtk3
 --prefix=/usr/pkg --build=x86_64--netbsd --host=x86_64--netbsd
 --infodir=/usr/pkg/info --mandir=/usr/pkg/man
 --enable-option-checking=yes 'CFLAGS=-O2 -g -g -fstack-clash-protection
 -I/usr/pkg/include -I/usr/include -I/usr/pkg/include/freetype2
 -I/usr/pkg/include/glib-2.0 -I/usr/pkg/include/gio-unix-2.0
 -I/usr/pkg/lib/glib-2.0/include -I/usr/pkg/include/harfbuzz
 -I/usr/pkg/include/libdrm' 'CPPFLAGS=-g -I/usr/pkg/include
 -I/usr/include -I/usr/pkg/include/freetype2 -I/usr/pkg/include/glib-2.0
 -I/usr/pkg/include/gio-unix-2.0 -I/usr/pkg/lib/glib-2.0/include
 -I/usr/pkg/include/harfbuzz -I/usr/pkg/include/libdrm'
 'LDFLAGS=-Wl,-R/usr/pkg/gcc14/lib -Wl,-zrelro -Wl,-znow -L/usr/pkg/lib
 -Wl,-R/usr/pkg/lib -L/usr/lib -Wl,-R/usr/lib''

[patch-src_fns.c (text/x-c, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#73307; Package emacs. (Tue, 17 Sep 2024 11:53:01 GMT) Full text and rfc822 format available.

Message #8 received at 73307 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Thomas Klausner <wiz <at> gatalith.at>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 73307 <at> debbugs.gnu.org
Subject: Re: bug#73307: Fix ctype(3) usage
Date: Tue, 17 Sep 2024 14:52:20 +0300
> Date: Mon, 16 Sep 2024 23:25:16 +0200
> From: Thomas Klausner <wiz <at> gatalith.at>
> 
> When compiling emacs on NetBSD-10.99.12/amd64, I get the following
> warning
> 
> In file included from /usr/include/ctype.h:100,
>                  from fns.c:29:
> fns.c: In function 'Fyes_or_no_p':
> fns.c:3582:33: warning: array subscript has type 'char' [-Wchar-subscripts]
>  3582 |     if ((len > 0) && !isspace (s[len - 1]))
>       |                                 ^
> 
> The NetBSD man page for ctype(3): https://man.netbsd.org/ctype.3
> is quite explicit about the problems with this - ctype(3) functions only
> accept -1 and "unsigned char" and you can get very weird problems if
> this is disregarded.
> 
> The attached patch adds the missing cast.
> [...]
>
> --- src/fns.c.orig	2024-09-16 21:11:40.908684144 +0000
> +++ src/fns.c
> @@ -3579,7 +3579,7 @@ by a mouse, or by some window-system ges
>    {
>      char *s = SSDATA (prompt);
>      ptrdiff_t len = strlen (s);
> -    if ((len > 0) && !isspace (s[len - 1]))
> +    if ((len > 0) && !isspace ((unsigned char)s[len - 1]))
>        prompt = CALLN (Fconcat, prompt, build_string (" "));
>    }
>    prompt = CALLN (Fconcat, prompt, Vyes_or_no_prompt);

Thanks.  However, I'm not sure this is the right fix, the function is
defined with argument type of 'int'.  Paul, any comments?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#73307; Package emacs. (Tue, 17 Sep 2024 11:57:01 GMT) Full text and rfc822 format available.

Message #11 received at 73307 <at> debbugs.gnu.org (full text, mbox):

From: Thomas Klausner <wiz <at> gatalith.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 73307 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#73307: Fix ctype(3) usage
Date: Tue, 17 Sep 2024 13:55:59 +0200
On Tue, Sep 17, 2024 at 02:52:20PM +0300, Eli Zaretskii wrote:
> Thanks.  However, I'm not sure this is the right fix, the function is
> defined with argument type of 'int'.  Paul, any comments?

That is exactly the problem.

If you pass a 'signed char' below -1, it gets converted to the same
integer value, also a negative number below -1, and you end up passing
an invalid argument to the function. The function is defined to only
accept EOF and non-negative values in the range of 'unsigned char'.

-1 is only special because that's usually the value of EOF.
 Thomas




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Wed, 18 Sep 2024 00:06:01 GMT) Full text and rfc822 format available.

Notification sent to Thomas Klausner <wiz <at> gatalith.at>:
bug acknowledged by developer. (Wed, 18 Sep 2024 00:06:02 GMT) Full text and rfc822 format available.

Message #16 received at 73307-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Po Lu <luangruo <at> yahoo.com>, 73307-done <at> debbugs.gnu.org,
 Thomas Klausner <wiz <at> gatalith.at>
Subject: Re: bug#73307: Fix ctype(3) usage
Date: Tue, 17 Sep 2024 17:05:29 -0700
[Message part 1 (text/plain, inline)]
On 2024-09-17 04:52, Eli Zaretskii wrote:
> However, I'm not sure this is the right fix, the function is
> defined with argument type of 'int'.  Paul, any comments?

Although that patch was an improvement it still had problems, as it 
incorrectly assumed the string does not end in a multibyte space, and 
that Emacs's locale matches the system's.

Emacs itself should not use <ctype.h> unless it knows the string is 
unibyte and the system locale matches Emacs's. I scanned through its 
source code looking for all problematic instances of <ctype.h> that have 
crept in (except I didn't scan the MS-Windows code, where you're the 
expert), and found five other places where ctype.h was obviously 
misused. I installed the attached to fix these glitches and am boldly 
closing this the report.

I can't easily test patch 0003, which fixes Android-specific code. 
Although I think it's an improvement, in unlikely cases I suspect it 
still doesn't exactly match what the Android kernel does with #! lines. 
I don't know whether that matters. I'll CC this to Po Lu (my goto person 
for Android) as a heads-up.
[0001-Fix-yes-or-no-p-with-multibyte-spaces.patch (text/x-patch, attachment)]
[0002-Fix-misuse-of-toupper-in-sfnt_parse_style.patch (text/x-patch, attachment)]
[0003-Fix-some-misparsing-in-check_interpreter.patch (text/x-patch, attachment)]
[0004-Use-c-ctype.h-in-lib-src.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#73307; Package emacs. (Wed, 18 Sep 2024 02:15:01 GMT) Full text and rfc822 format available.

Message #19 received at 73307-done <at> debbugs.gnu.org (full text, mbox):

From: Po Lu <luangruo <at> yahoo.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 73307-done <at> debbugs.gnu.org,
 Thomas Klausner <wiz <at> gatalith.at>
Subject: Re: bug#73307: Fix ctype(3) usage
Date: Wed, 18 Sep 2024 10:13:26 +0800
Paul Eggert <eggert <at> cs.ucla.edu> writes:

> I can't easily test patch 0003, which fixes Android-specific
> code. Although I think it's an improvement, in unlikely cases I
> suspect it still doesn't exactly match what the Android kernel does
> with #! lines. I don't know whether that matters. I'll CC this to Po
> Lu (my goto person for Android) as a heads-up.

It's OK.  Though I have never encountered shebangs with tabs.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#73307; Package emacs. (Wed, 18 Sep 2024 11:47:01 GMT) Full text and rfc822 format available.

Message #22 received at 73307 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: luangruo <at> yahoo.com, 73307 <at> debbugs.gnu.org, wiz <at> gatalith.at
Subject: Re: bug#73307: Fix ctype(3) usage
Date: Wed, 18 Sep 2024 14:45:25 +0300
> Date: Tue, 17 Sep 2024 17:05:29 -0700
> Cc: 73307-done <at> debbugs.gnu.org, Thomas Klausner <wiz <at> gatalith.at>,
>  Po Lu <luangruo <at> yahoo.com>
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> 
> On 2024-09-17 04:52, Eli Zaretskii wrote:
> > However, I'm not sure this is the right fix, the function is
> > defined with argument type of 'int'.  Paul, any comments?
> 
> Although that patch was an improvement it still had problems, as it 
> incorrectly assumed the string does not end in a multibyte space, and 
> that Emacs's locale matches the system's.
> 
> Emacs itself should not use <ctype.h> unless it knows the string is 
> unibyte and the system locale matches Emacs's. I scanned through its 
> source code looking for all problematic instances of <ctype.h> that have 
> crept in (except I didn't scan the MS-Windows code, where you're the 
> expert), and found five other places where ctype.h was obviously 
> misused. I installed the attached to fix these glitches and am boldly 
> closing this the report.
> 
> I can't easily test patch 0003, which fixes Android-specific code. 
> Although I think it's an improvement, in unlikely cases I suspect it 
> still doesn't exactly match what the Android kernel does with #! lines. 
> I don't know whether that matters. I'll CC this to Po Lu (my goto person 
> for Android) as a heads-up.

Thanks.

This part doesn't look right to me:

> -  {
> -    char *s = SSDATA (prompt);
> -    ptrdiff_t len = strlen (s);
> -    if ((len > 0) && !isspace (s[len - 1]))
> -      prompt = CALLN (Fconcat, prompt, build_string (" "));
> -  }
> -  prompt = CALLN (Fconcat, prompt, Vyes_or_no_prompt);
> +  ptrdiff_t promptlen = SCHARS (prompt);
> +  bool prompt_ends_in_nonspace
> +    = (0 < promptlen
> +       && (SYNTAX (XFIXNAT (Faref (prompt, make_fixnum (promptlen - 1))))
> +	   != Swhitespace));
> +  AUTO_STRING (space_string, " ");
> +  prompt = CALLN (Fconcat, prompt,
> +		  prompt_ends_in_nonspace ? space_string : empty_unibyte_string,
> +		  Vyes_or_no_prompt);

You are testing characters for whitespace syntax, which is AFAIU
subject to buffer-local syntax tables.  Thus, a strange enough setting
of buffer syntax could make this code return unexpected results.

Stefan, am I right?

Why can't we use something closer to the original code, which doesn't
depend on buffer-local customizations?  In particular, what's the
problem with using c_isspace (and what do you mean by "multibyte
space"?)





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#73307; Package emacs. (Wed, 18 Sep 2024 15:19:01 GMT) Full text and rfc822 format available.

Message #25 received at 73307 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: luangruo <at> yahoo.com, 73307 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>,
 wiz <at> gatalith.at
Subject: Re: bug#73307: Fix ctype(3) usage
Date: Wed, 18 Sep 2024 11:18:21 -0400
> You are testing characters for whitespace syntax, which is AFAIU
> subject to buffer-local syntax tables.  Thus, a strange enough setting
> of buffer syntax could make this code return unexpected results.
>
> Stefan, am I right?

Yup.

> Why can't we use something closer to the original code, which doesn't
> depend on buffer-local customizations?

We could use the global/default syntax table?
[ Or rely on some of the unicode tables?  ]


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#73307; Package emacs. (Wed, 18 Sep 2024 15:36:03 GMT) Full text and rfc822 format available.

Message #28 received at 73307 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: luangruo <at> yahoo.com, 73307 <at> debbugs.gnu.org, wiz <at> gatalith.at
Subject: Re: bug#73307: Fix ctype(3) usage
Date: Wed, 18 Sep 2024 08:35:18 -0700
[Message part 1 (text/plain, inline)]
On 2024-09-18 04:45, Eli Zaretskii wrote:
> what's the
> problem with using c_isspace (and what do you mean by "multibyte
> space"

Using c_isspace would be OK, in that it'd handle most practical 
examples. I wrote the fancier version to handle prompts that end in 
non-ASCII spaces, e.g., (yes-or-no-p "Delete all files? ") where the 
last character is actually U+00A0 NO-BREAK_SPACE instead of U+0020 
SPACE, so its UTF-8 encoding is multiple bytes. If it's not important to 
handle such cases we could just use c_isspace.


> You are testing characters for whitespace syntax, which is AFAIU
> subject to buffer-local syntax tables.

Thanks, I didn't think of that. How about the attached patch? It is an 
alternative to just using c_isspace.
[fns.c.diff (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#73307; Package emacs. (Wed, 18 Sep 2024 16:12:02 GMT) Full text and rfc822 format available.

Message #31 received at 73307 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: luangruo <at> yahoo.com, 73307 <at> debbugs.gnu.org, wiz <at> gatalith.at,
 monnier <at> iro.umontreal.ca
Subject: Re: bug#73307: Fix ctype(3) usage
Date: Wed, 18 Sep 2024 19:11:02 +0300
> Date: Wed, 18 Sep 2024 08:35:18 -0700
> Cc: 73307 <at> debbugs.gnu.org, wiz <at> gatalith.at, luangruo <at> yahoo.com
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> 
> > You are testing characters for whitespace syntax, which is AFAIU
> > subject to buffer-local syntax tables.
> 
> Thanks, I didn't think of that. How about the attached patch? It is an 
> alternative to just using c_isspace.

That's much better, thanks.  But I'm not sure we really want to allow
any Zs character there.  For example, what about U+3000 IDEOGRAPHIC
SPACE?  But if this is okay with us, please install.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#73307; Package emacs. (Wed, 18 Sep 2024 16:23:01 GMT) Full text and rfc822 format available.

Message #34 received at 73307 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: luangruo <at> yahoo.com, 73307 <at> debbugs.gnu.org, wiz <at> gatalith.at,
 monnier <at> iro.umontreal.ca
Subject: Re: bug#73307: Fix ctype(3) usage
Date: Wed, 18 Sep 2024 09:22:05 -0700
On 2024-09-18 09:11, Eli Zaretskii wrote:
> what about U+3000 IDEOGRAPHIC SPACE?

That should be OK. For (yes-or-no-p "您想删除所有文件吗? "), where the prompt 
ends in U+3000, no space is needed. I installed the patch.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 17 Oct 2024 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 247 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.