GNU bug report logs -
#48324
27.2; hexl-mode duplicates the UTF-8 BOM
Previous Next
Reported by: "R. Diez" <rdiezmail-emacs <at> yahoo.de>
Date: Sun, 9 May 2021 21:39:02 UTC
Severity: normal
Found in version 27.2
Fixed in version 29.1
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
Message #76 received at 48324 <at> debbugs.gnu.org (full text, mbox):
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: rgm <at> gnu.org, schwab <at> linux-m68k.org, 48324 <at> debbugs.gnu.org
> Date: Mon, 04 Jul 2022 12:34:29 +0200
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
> > I see that it's actually 6 bytes _including_ the BOM. So I think this
> > is confusing: if we are going to return a string with the BOM, we
> > should not count the BOM as part of the LENGTH bytes. Because if I
> > requested to get characters which fit into N bytes, I should get those
> > N bytes of payload. Or maybe we should have an optional argument to
> > control whether LENGTH includes or excludes the BOM.
>
> It the caller has asked for a max number of bytes in a coding system
> that includes a BOM, then the BOM has to be counted -- otherwise the
> bytes won't fit into whatever field the protocol they're using limits
> the string to.
You obviously have a very specific use case in mind. But there are
others. Moreover, UTF and BOM is a special case, where the prefix is
known in advance. Other encodings, notably from the ISO-2022 family,
are harder because the exact shift-ion sequence is not always easy to
guess.
Which is why I thought a way to control this aspect could be needed.
But we could just document the subtlety and wait for someone to come
up with a practical scenario where it would be needed.
> (And we don't have a -without-signature variant, do we?)
We do: utf-16le and utf-16be.
> > In any case, we should mention this aspect in the doc string, I think.
>
> Yes. But should we have -without-signature variants for utf-16? Then
> the doc string could recommend using that if the caller wants BOM-less
> bytes.
See above.
This bug report was last modified 2 years and 322 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.