GNU bug report logs - #24206
25.1; Curly quotes generate invalid strings, leading to a segfault

Previous Next

Package: emacs;

Reported by: Phil <p.stephani2 <at> gmail.com>

Date: Thu, 11 Aug 2016 18:57:02 UTC

Severity: normal

Found in version 25.1

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

Full log


Message #55 received at 24206 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: p.stephani2 <at> gmail.com, johnw <at> gnu.org, nicolas <at> petton.fr,
 24206 <at> debbugs.gnu.org
Subject: Re: 25.1; Curly quotes generate invalid strings, leading to a segfault
Date: Sun, 14 Aug 2016 19:04:42 -0700
Eli Zaretskii wrote:
> Its multibyteness is entirely in Emacs's imagination.

Sure, but Emacs should not substitute "\342\200\230" for "`". The point of 
text-quoting-style is to substitute quotes, not byte string encodings of quotes.

>> > More generally, Fsubstitute_command_keys is quite confused about unibyte
>> > versus multibyte issues. It merges together a number of strings, and
>> > assumes that they are all multibyte iff the original string is
>> > multibyte, which is obviously not true in general.
> Could you please point out the specific places where this is done?

OK, here's a contrived example. Run this code in emacs-25:

(progn
  (setq km (make-keymap))
  (define-key km "≠" 'global-set-key)
  (substitute-command-keys "\200\\<km>\\[global-set-key]"))

This should return a 2-character string equal to "\200≠". But in Emacs 25 it 
dumps core, at least on my platform (Fedora 23 x86-64). And in Emacs 24 on my 
platform it returns a malformed string that prints as "\242\1340" but has length 
2. I suppose we could make Emacs 24 dump core too, though I haven't tried hard 
to do that.

The problem is that the older Emacs code incorrectly assumes that the output of 
substitution must be properly-encoded if the substitution changes something. 
This assumption can fail if the input is unibyte and contains bytes that are not 
properly-encoded for UTF-8. (There are other ways the assumption can fail.)




This bug report was last modified 8 years and 339 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.