GNU bug report logs - #48324
27.2; hexl-mode duplicates the UTF-8 BOM

Previous Next

Package: emacs;

Reported by: "R. Diez" <rdiezmail-emacs <at> yahoo.de>

Date: Sun, 9 May 2021 21:39:02 UTC

Severity: normal

Found in version 27.2

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


Message #73 received at 48324 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rgm <at> gnu.org, schwab <at> linux-m68k.org, 48324 <at> debbugs.gnu.org
Subject: Re: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Date: Mon, 04 Jul 2022 12:34:29 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

> I see that it's actually 6 bytes _including_ the BOM.  So I think this
> is confusing: if we are going to return a string with the BOM, we
> should not count the BOM as part of the LENGTH bytes.  Because if I
> requested to get characters which fit into N bytes, I should get those
> N bytes of payload.  Or maybe we should have an optional argument to
> control whether LENGTH includes or excludes the BOM.

It the caller has asked for a max number of bytes in a coding system
that includes a BOM, then the BOM has to be counted -- otherwise the
bytes won't fit into whatever field the protocol they're using limits
the string to.

However, utf-16 is in a slightly special situation here, since the byte
order is often implied, and people use utf-16 instead of
utf-16be-with-signature (or something), and utf-16 (in Emacs) is defined
to have a BOM.  (And we don't have a -without-signature variant, do we?)

> In any case, we should mention this aspect in the doc string, I think.

Yes.  But should we have -without-signature variants for utf-16?  Then
the doc string could recommend using that if the caller wants BOM-less
bytes.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




This bug report was last modified 2 years and 322 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.