GNU bug report logs -
#48324
27.2; hexl-mode duplicates the UTF-8 BOM
Previous Next
Reported by: "R. Diez" <rdiezmail-emacs <at> yahoo.de>
Date: Sun, 9 May 2021 21:39:02 UTC
Severity: normal
Found in version 27.2
Fixed in version 29.1
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
Message #47 received at 48324 <at> debbugs.gnu.org (full text, mbox):
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: Glenn Morris <rgm <at> gnu.org>, schwab <at> linux-m68k.org, 48324 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 18:14:39 +0200
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
> > This actually reveals a design flaw in string-limit: we cannot simply
> > use encode-coding-char to encode the characters one by one. I added a
> > FIXME comment to explain why, as I don't currently have any clever
> > ideas for how to implement it more correctly, except by iterations,
> > which is inelegant. Ideas welcome.
>
> Hm... do we have some way of knowing that the coding system we're using
> is one that should have a BOM? And a function to remove the BOM?
The problem is not just with BOM. The problem will happen with any
coding-system that produces prefix and/or suffix bytes when it encodes
strings. The FIXME I added mentions ISO-2022 7-bit encodings as
another example.
And then there are coding-system's with pre-write-conversion, and
those can produce any additions they like.
> If we had both, then we could strip the BOM from the individual chars,
> and add one to the front.
AFAIR, what we have now already handles BOM in coding-system's that
are known to produce a BOM. See encode-coding-char.
This bug report was last modified 2 years and 322 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.