GNU bug report logs -
#9747
M-x untabify with "ZERO WIDTH NO-BREAK SPACE" (aka "BYTE ORDER MARK")
Previous Next
Reported by: noloader <at> gmail.com
Date: Thu, 13 Oct 2011 23:33:01 UTC
Severity: normal
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
Message #8 received at 9747 <at> debbugs.gnu.org (full text, mbox):
> I often use C-x h TAB and M-x untabify to format C, C++, and Java code.
>
> If a document has an errant UTF-8 byte order mark (a UTF-8 BOM is EF
> BB BF), Emacs cannot always format the source file.
>
> For example, the attached Java file (JavaEncryptor.java-backup) has
> 1845 BOMs sprinkled throughout. I'm not sure what editor put them in,
> but Emacs does not properly handle some operations with them present.
> If I strip the errant BOMs with the attached program
> (efbbbf-strip.cpp), Emacs will properly format the file.
"BYTE ORDER MARK" is the old name of the U+FEFF character.
The new name is "ZERO WIDTH NO-BREAK SPACE".
You can add to your .emacs something like:
(eval-after-load "cc-mode"
'(progn (modify-syntax-entry ?\uFEFF " " java-mode-syntax-table)))
and the most of indentation code will work correctly.
However, in some places in core packages we need to replace such code
(skip-chars-forward " \t")
with
(skip-chars-forward " \t\uFEFF")
to take into account other whitespace characters.
This bug report was last modified 3 years and 315 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.