GNU bug report logs -
#9747
M-x untabify with "ZERO WIDTH NO-BREAK SPACE" (aka "BYTE ORDER MARK")
Previous Next
Reported by: noloader <at> gmail.com
Date: Thu, 13 Oct 2011 23:33:01 UTC
Severity: normal
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 9747 in the body.
You can then email your comments to 9747 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9747
; Package
emacs
.
(Thu, 13 Oct 2011 23:33:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
noloader <at> gmail.com
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 13 Oct 2011 23:33:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
I often use C-x h TAB and M-x untabify to format C, C++, and Java code.
If a document has an errant UTF-8 byte order mark (a UTF-8 BOM is EF
BB BF), Emacs cannot always format the source file.
For example, the attached Java file (JavaEncryptor.java-backup) has
1845 BOMs sprinkled throughout. I'm not sure what editor put them in,
but Emacs does not properly handle some operations with them present.
If I strip the errant BOMs with the attached program
(efbbbf-strip.cpp), Emacs will properly format the file.
[JavaEncryptor.java-backup (application/octet-stream, attachment)]
[efbbbf-strip.cpp (text/x-c++src, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9747
; Package
emacs
.
(Wed, 19 Oct 2011 23:57:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 9747 <at> debbugs.gnu.org (full text, mbox):
> I often use C-x h TAB and M-x untabify to format C, C++, and Java code.
>
> If a document has an errant UTF-8 byte order mark (a UTF-8 BOM is EF
> BB BF), Emacs cannot always format the source file.
>
> For example, the attached Java file (JavaEncryptor.java-backup) has
> 1845 BOMs sprinkled throughout. I'm not sure what editor put them in,
> but Emacs does not properly handle some operations with them present.
> If I strip the errant BOMs with the attached program
> (efbbbf-strip.cpp), Emacs will properly format the file.
"BYTE ORDER MARK" is the old name of the U+FEFF character.
The new name is "ZERO WIDTH NO-BREAK SPACE".
You can add to your .emacs something like:
(eval-after-load "cc-mode"
'(progn (modify-syntax-entry ?\uFEFF " " java-mode-syntax-table)))
and the most of indentation code will work correctly.
However, in some places in core packages we need to replace such code
(skip-chars-forward " \t")
with
(skip-chars-forward " \t\uFEFF")
to take into account other whitespace characters.
Changed bug title to 'M-x untabify with "ZERO WIDTH NO-BREAK SPACE" (aka "BYTE ORDER MARK")' from 'C-x h TAB and M-x untabify'
Request was from
npostavs <at> users.sourceforge.net
to
control <at> debbugs.gnu.org
.
(Sat, 25 Mar 2017 01:21:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#9747
; Package
emacs
.
(Fri, 16 Jul 2021 13:59:01 GMT)
Full text and
rfc822 format available.
Message #13 received at 9747 <at> debbugs.gnu.org (full text, mbox):
Juri Linkov <juri <at> jurta.org> writes:
>> I often use C-x h TAB and M-x untabify to format C, C++, and Java code.
>>
>> If a document has an errant UTF-8 byte order mark (a UTF-8 BOM is EF
>> BB BF), Emacs cannot always format the source file.
>>
>> For example, the attached Java file (JavaEncryptor.java-backup) has
>> 1845 BOMs sprinkled throughout. I'm not sure what editor put them in,
>> but Emacs does not properly handle some operations with them present.
>> If I strip the errant BOMs with the attached program
>> (efbbbf-strip.cpp), Emacs will properly format the file.
>
> "BYTE ORDER MARK" is the old name of the U+FEFF character.
> The new name is "ZERO WIDTH NO-BREAK SPACE".
So I don't think there's anything here to fix on the Emacs side --
zero-width spaces aren't necessarily supposed to be handled identically
to other white space here. So I'm closing this bug report.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
bug closed, send any further explanations to
9747 <at> debbugs.gnu.org and noloader <at> gmail.com
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Fri, 16 Jul 2021 13:59:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 14 Aug 2021 11:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 3 years and 314 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.