GNU bug report logs -
#48324
27.2; hexl-mode duplicates the UTF-8 BOM
Previous Next
Reported by: "R. Diez" <rdiezmail-emacs <at> yahoo.de>
Date: Sun, 9 May 2021 21:39:02 UTC
Severity: normal
Found in version 27.2
Fixed in version 29.1
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
I think that hexl-mode has problems with the UTF-8 BOM byte sequence at the beginning of a text file. The steps to reproduce this issue are:
Create a text file with a single line with 3 characters: 123
Do a (set-buffer-file-coding-system 'utf-8-with-signature-dos) and save the file.
The file should now have the following contents (8 bytes):
ef bb bf 31 32 33 0d 0a
That is the UTF-8 BOM (ef bb bf), the ASCII digits 1, 2 and 3, and end-of-line sequence (CR LF).
Now change to hexl-mode, place the cursor at the '1' character (31 in hex), call hexl-insert-hex-char, and enter 00 in order to replace the '1' with a
binary zero (NUL character).
The result is puzzling. Instead of replacing the '1' (31) with NUL (00), the UTF-8 BOM is duplicated, the characters '1' and '2' and '3' have been
overwritten with the new copy of BOM, character CR has been replaced with NUL, and character LF is intact:
ef bb bf ef bb bf 00 0a
If you save, close and reload the file, it gains one byte, but that is probably not important, just a consequence of having lost the CR character:
ef bb bf ef bb bf 00 0d 0a
This bug report was last modified 2 years and 322 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.