GNU bug report logs -
#12054
24.1; regression? font-lock no-break-space with nil nobreak-char-display
Previous Next
Reported by: "Drew Adams" <drew.adams <at> oracle.com>
Date: Thu, 26 Jul 2012 05:51:02 UTC
Severity: normal
Found in version 24.1
Done: Chong Yidong <cyd <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> From: "Drew Adams" <drew.adams <at> oracle.com>
> Date: Sat, 3 Nov 2012 12:01:29 -0700
> Cc: 12054 <at> debbugs.gnu.org
>
> I think I understand this (but I might be misunderstanding). The \240 in the
> 4-char ASCII regexp string "\240" is interpreted (read?) as a raw byte, not as
> the char I wanted.
Yes.
> That is, the literal string in my code is read as a string that contains only a
> single raw byte of octal 240 in place of the 4 chars \240 (and instead of as a
> string with the multibyte char no-break space). Is that right?
Yes.
> And putting that together with Eli's statement about insertion ("'insert' treats
> strings such as "\nnn" as unibyte strings"), I understand that the buffer text
> after I type `C-q 240' contains a unibyte raw byte, and not the multibyte char
> no-break space.
No. It contains the NBSP. Try it. C-q inserts a multibyte
character, unlike '(insert "\240")', for example.
> But in that case I do not understand why `C-u C-x =' says that it _is_ the
> Unicode no-break space char.
Because it is.
> And I do not understand why Yidong's font-lock correction also shows
> that it is a no-break space char.
Chong didn't use "\240".
> So I'm confused about what is actually in the buffer. From the doc and from
> Eli's statement, I gather that there is a unibyte raw byte (octal 240) at that
> position. But `C-u C-x =' and font-lock seem to tell me that there is a
> (multibyte) no-break space char there.
Try '(insert "\240")' and then "C-x =" will show a unibyte byte.
> > (One reason for doing this is to allow unibyte strings to
> > be specified using string constants in Emacs Lisp source code.)
>
> I can see how that can be useful. But I can also see how it would be useful to
> have some way of using octal syntax to match multibyte chars. Isn't there some
> reasonable way to allow for both?
Maybe, but we didn't find one, at least not one that would be
backward-compatible.
> Is there, for example, (or could there be added) a function that one can apply
> to the unibyte string for \240 that would convert it to a string that DTRT wrt
> multibyte?
Such functions do exist, see the "Converting Representations" node in
the ELisp manual.
> (decode-coding-string "\302\240" 'utf-8)
>
> That allows use of only octal syntax - good. But it still doesn't solve the
> problem for older Emacs versions - they raise the error (coding-system-error
> utf-8).
You don't want this, because even if you succeed in producing a NBSP
in Emacs 22 and older, the result will not match NBSP in other
charsets. It's simply impossible with those versions of Emacs.
This bug report was last modified 12 years and 202 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.