GNU bug report logs - #48324
27.2; hexl-mode duplicates the UTF-8 BOM

Previous Next

Package: emacs;

Reported by: "R. Diez" <rdiezmail-emacs <at> yahoo.de>

Date: Sun, 9 May 2021 21:39:02 UTC

Severity: normal

Found in version 27.2

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: rdiezmail-emacs <at> yahoo.de, larsi <at> gnus.org, 48324 <at> debbugs.gnu.org
Subject: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Date: Tue, 11 May 2021 15:04:05 +0300
> From: Andreas Schwab <schwab <at> linux-m68k.org>
> Cc: "R. Diez" <rdiezmail-emacs <at> yahoo.de>,  larsi <at> gnus.org,
>   48324 <at> debbugs.gnu.org
> Date: Mon, 10 May 2021 20:05:33 +0200
> 
> On Mai 10 2021, Eli Zaretskii wrote:
> 
> > FTR, here's a shorter and easier recipe:
> >
> >   emacs -Q
> >   C-x C-f foo.txt RET
> >   C-x RET f utf-8-with-signature-dos RET
> >   1 2 3
> >   C-x C-s
> >   M-x hexl-mode RET
> >   M-x hexl-insert-hex-char RET 00 RET
> 
> I guess the gist is that hexl-mode not only needs to account for the EOL
> type, but also for the signature when computing original-point.

Actually, it turned out that wasn't the main problem.  (It was still a
problem, but the same problem happened in a buffer produced by
hexl-find-file.)  The main problems were that (a) hexl.el handled null
bytes as characters that need to be encoded before inserting them (as
if they were non-ASCII characters), and (b) its handling of non-ASCII
characters when the encoding of the original file used a BOM was
incorrect (because encode-coding-char didn't remove the BOM from the
encoded byte sequence).  By contrast, hexl-find-file visits the file
literally, so its encoding of a null byte was trivially correct.

This should be now fixed on the master branch.

The capability of inserting multibyte characters via Hexl is somewhat
problematic, so I made a point of describing the issues in the
relevant doc strings (because the problems are intrinsic and IMO hard
or impossible to solve in general).




This bug report was last modified 2 years and 322 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.