GNU bug report logs -
#48324
27.2; hexl-mode duplicates the UTF-8 BOM
Previous Next
Reported by: "R. Diez" <rdiezmail-emacs <at> yahoo.de>
Date: Sun, 9 May 2021 21:39:02 UTC
Severity: normal
Found in version 27.2
Fixed in version 29.1
Done: Lars Ingebrigtsen <larsi <at> gnus.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> Cc: rgm <at> gnu.org, schwab <at> linux-m68k.org, 48324 <at> debbugs.gnu.org
> Date: Sun, 03 Jul 2022 16:00:47 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
>
> > From: Lars Ingebrigtsen <larsi <at> gnus.org>
> > Cc: rgm <at> gnu.org, schwab <at> linux-m68k.org, 48324 <at> debbugs.gnu.org
> > Date: Sun, 03 Jul 2022 14:07:43 +0200
> >
> > Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> >
> > > Hm... I guess the only reliable solution across all coding systems is
> > > (like your comment in the code says) to drop the encode-every-char and
> > > try encoding strings, and then see whether the result is short enough.
> > > That could be done somewhat efficiently using a binary search. I'll
> > > have a go at it...
> >
> > And while I was at it, I changed it to return complete glyphs, not just
> > complete code points.
> >
> > There's a behavioural change, though. This:
> >
> > (string-limit "foóá" 6 t 'utf-16)
> >
> > Now returns a string with a BOM, whereas previously it didn't.
>
> So you get 6 characters + the BOM?
I see that it's actually 6 bytes _including_ the BOM. So I think this
is confusing: if we are going to return a string with the BOM, we
should not count the BOM as part of the LENGTH bytes. Because if I
requested to get characters which fit into N bytes, I should get those
N bytes of payload. Or maybe we should have an optional argument to
control whether LENGTH includes or excludes the BOM.
In any case, we should mention this aspect in the doc string, I think.
This bug report was last modified 2 years and 322 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.