GNU bug report logs - #48324
27.2; hexl-mode duplicates the UTF-8 BOM

Previous Next

Package: emacs;

Reported by: "R. Diez" <rdiezmail-emacs <at> yahoo.de>

Date: Sun, 9 May 2021 21:39:02 UTC

Severity: normal

Found in version 27.2

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log

Message #47 received at 48324 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: rgm <at> gnu.org, schwab <at> linux-m68k.org, 48324 <at> debbugs.gnu.org
Subject: Re: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Date: Sat, 02 Jul 2022 19:37:07 +0300

> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: Glenn Morris <rgm <at> gnu.org>,  schwab <at> linux-m68k.org,  48324 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 18:14:39 +0200
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > This actually reveals a design flaw in string-limit: we cannot simply
> > use encode-coding-char to encode the characters one by one.  I added a
> > FIXME comment to explain why, as I don't currently have any clever
> > ideas for how to implement it more correctly, except by iterations,
> > which is inelegant.  Ideas welcome.
> 
> Hm...  do we have some way of knowing that the coding system we're using
> is one that should have a BOM?  And a function to remove the BOM?

The problem is not just with BOM.  The problem will happen with any
coding-system that produces prefix and/or suffix bytes when it encodes
strings.  The FIXME I added mentions ISO-2022 7-bit encodings as
another example.

And then there are coding-system's with pre-write-conversion, and
those can produce any additions they like.

> If we had both, then we could strip the BOM from the individual chars,
> and add one to the front.

AFAIR, what we have now already handles BOM in coding-system's that
are known to produce a BOM.  See encode-coding-char.

This bug report was last modified 3 years and 13 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #48324 27.2; hexl-mode duplicates the UTF-8 BOM

GNU bug report logs - #48324
27.2; hexl-mode duplicates the UTF-8 BOM