GNU bug report logs -
#48324
27.2; hexl-mode duplicates the UTF-8 BOM
Previous Next
Reported by: "R. Diez" <rdiezmail-emacs <at> yahoo.de>
Date: Sun, 9 May 2021 21:39:02 UTC
Severity: normal
Found in version 27.2
Fixed in version 29.1
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
Message #50 received at 48324 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> The problem is not just with BOM. The problem will happen with any
> coding-system that produces prefix and/or suffix bytes when it encodes
> strings. The FIXME I added mentions ISO-2022 7-bit encodings as
> another example.
>
> And then there are coding-system's with pre-write-conversion, and
> those can produce any additions they like.
>
>> If we had both, then we could strip the BOM from the individual chars,
>> and add one to the front.
>
> AFAIR, what we have now already handles BOM in coding-system's that
> are known to produce a BOM. See encode-coding-char.
Ah, OK, it uses (coding-system-get coding-system :bom) and then
special-cases utf-8 and -16 to remove the BOM.
Hm... I guess the only reliable solution across all coding systems is
(like your comment in the code says) to drop the encode-every-char and
try encoding strings, and then see whether the result is short enough.
That could be done somewhat efficiently using a binary search. I'll
have a go at it...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
This bug report was last modified 2 years and 322 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.