#12054 - 24.1; regression? font-lock no-break-space with nil nobreak-char-display

GNU bug report logs - #12054
24.1; regression? font-lock no-break-space with nil nobreak-char-display

Package: emacs;

Reported by: "Drew Adams" <drew.adams <at> oracle.com>

Date: Thu, 26 Jul 2012 05:51:02 UTC

Severity: normal

Found in version 24.1

Done: Chong Yidong <cyd <at> gnu.org>

Bug is archived. No further changes may be made.

View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org> To: Drew Adams <drew.adams <at> oracle.com> Cc: cyd <at> gnu.org, 12054 <at> debbugs.gnu.org Subject: bug#12054: 24.1; regression? font-lock no-break-space with nil nobreak-char-display Date: Sat, 03 Nov 2012 23:13:40 +0200

> From: "Drew Adams" <drew.adams <at> oracle.com> > Date: Sat, 3 Nov 2012 12:01:29 -0700 > Cc: 12054 <at> debbugs.gnu.org > > I think I understand this (but I might be misunderstanding). The \240 in the > 4-char ASCII regexp string "\240" is interpreted (read?) as a raw byte, not as > the char I wanted. Yes. > That is, the literal string in my code is read as a string that contains only a > single raw byte of octal 240 in place of the 4 chars \240 (and instead of as a > string with the multibyte char no-break space). Is that right? Yes. > And putting that together with Eli's statement about insertion ("'insert' treats > strings such as "\nnn" as unibyte strings"), I understand that the buffer text > after I type `C-q 240' contains a unibyte raw byte, and not the multibyte char > no-break space. No. It contains the NBSP. Try it. C-q inserts a multibyte character, unlike '(insert "\240")', for example. > But in that case I do not understand why `C-u C-x =' says that it _is_ the > Unicode no-break space char. Because it is. > And I do not understand why Yidong's font-lock correction also shows > that it is a no-break space char. Chong didn't use "\240". > So I'm confused about what is actually in the buffer. From the doc and from > Eli's statement, I gather that there is a unibyte raw byte (octal 240) at that > position. But `C-u C-x =' and font-lock seem to tell me that there is a > (multibyte) no-break space char there. Try '(insert "\240")' and then "C-x =" will show a unibyte byte. > > (One reason for doing this is to allow unibyte strings to > > be specified using string constants in Emacs Lisp source code.) > > I can see how that can be useful. But I can also see how it would be useful to > have some way of using octal syntax to match multibyte chars. Isn't there some > reasonable way to allow for both? Maybe, but we didn't find one, at least not one that would be backward-compatible. > Is there, for example, (or could there be added) a function that one can apply > to the unibyte string for \240 that would convert it to a string that DTRT wrt > multibyte? Such functions do exist, see the "Converting Representations" node in the ELisp manual. > (decode-coding-string "\302\240" 'utf-8) > > That allows use of only octal syntax - good. But it still doesn't solve the > problem for older Emacs versions - they raise the error (coding-system-error > utf-8). You don't want this, because even if you succeed in producing a NBSP in Emacs 22 and older, the result will not match NBSP in other charsets. It's simply impossible with those versions of Emacs.

This bug report was last modified 12 years and 259 days ago.

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #12054 24.1; regression? font-lock no-break-space with nil nobreak-char-display

GNU bug report logs - #12054
24.1; regression? font-lock no-break-space with nil nobreak-char-display