GNU bug report logs - #48324
27.2; hexl-mode duplicates the UTF-8 BOM

Previous Next

Package: emacs;

Reported by: "R. Diez" <rdiezmail-emacs <at> yahoo.de>

Date: Sun, 9 May 2021 21:39:02 UTC

Severity: normal

Found in version 27.2

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Glenn Morris <rgm <at> gnu.org>, Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: schwab <at> linux-m68k.org, 48324 <at> debbugs.gnu.org
Subject: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Date: Wed, 12 May 2021 16:50:15 +0300
> From: Glenn Morris <rgm <at> gnu.org>
> Cc: Andreas Schwab <schwab <at> linux-m68k.org>,  48324 <at> debbugs.gnu.org,  rdiezmail-emacs <at> yahoo.de,  larsi <at> gnus.org
> Date: Tue, 11 May 2021 16:37:51 -0400
> 
> Eli Zaretskii wrote:
> 
> > This should be now fixed on the master branch.
> 
> The change to encode-coding-char in f3f1947e5b5b causes
> test subr-string-limit-coding to fail. Ref eg
> https://hydra.nixos.org/build/142879118

Thanks, I fixed that.

The original test results seemed strange, to say the least: it's as if
we shoot first and draw the target later so that it fits.  E.g., how
can the last 4 bytes of encoding "foóá" with UTF-16 be
"\376\377\000\341", with the 2 first bytes coming from the BOM?

This actually reveals a design flaw in string-limit: we cannot simply
use encode-coding-char to encode the characters one by one.  I added a
FIXME comment to explain why, as I don't currently have any clever
ideas for how to implement it more correctly, except by iterations,
which is inelegant.  Ideas welcome.




This bug report was last modified 2 years and 322 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.