GNU bug report logs - #48324
27.2; hexl-mode duplicates the UTF-8 BOM

Previous Next

Package: emacs;

Reported by: "R. Diez" <rdiezmail-emacs <at> yahoo.de>

Date: Sun, 9 May 2021 21:39:02 UTC

Severity: normal

Found in version 27.2

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "R. Diez" <rdiezmail-emacs <at> yahoo.de>
To: bug-gnu-emacs <at> gnu.org
Subject: 27.2; hexl-mode duplicates the UTF-8 BOM
Date: Sun, 9 May 2021 23:38:18 +0200

I think that hexl-mode has problems with the UTF-8 BOM byte sequence at the beginning of a text file. The steps to reproduce this issue are:

Create a text file with a single line with 3 characters: 123

Do a (set-buffer-file-coding-system 'utf-8-with-signature-dos) and save the file.

The file should now have the following contents (8 bytes):

ef bb bf 31 32 33 0d 0a

That is the UTF-8 BOM (ef bb bf), the ASCII digits 1, 2 and 3, and end-of-line sequence (CR LF).

Now change to hexl-mode, place the cursor at the '1' character (31 in hex), call hexl-insert-hex-char, and enter 00 in order to replace the '1' with a 
binary zero (NUL character).

The result is puzzling. Instead of replacing the '1' (31) with NUL (00), the UTF-8 BOM is duplicated, the characters '1' and '2' and '3' have been 
overwritten with the new copy of BOM, character CR has been replaced with NUL, and character LF is intact:

ef bb bf ef bb bf 00 0a

If you save, close and reload the file, it gains one byte, but that is probably not important, just a consequence of having lost the CR character:

ef bb bf ef bb bf 00 0d 0a

This bug report was last modified 3 years and 13 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #48324 27.2; hexl-mode duplicates the UTF-8 BOM

GNU bug report logs - #48324
27.2; hexl-mode duplicates the UTF-8 BOM