GNU bug report logs - #48324
27.2; hexl-mode duplicates the UTF-8 BOM

Previous Next

Package: emacs;

Reported by: "R. Diez" <rdiezmail-emacs <at> yahoo.de>

Date: Sun, 9 May 2021 21:39:02 UTC

Severity: normal

Found in version 27.2

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

Full log


Message #61 received at 48324 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: larsi <at> gnus.org
Cc: rgm <at> gnu.org, schwab <at> linux-m68k.org, 48324 <at> debbugs.gnu.org
Subject: Re: bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
Date: Sun, 03 Jul 2022 16:26:54 +0300
> Cc: rgm <at> gnu.org, schwab <at> linux-m68k.org, 48324 <at> debbugs.gnu.org
> Date: Sun, 03 Jul 2022 16:00:47 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
> > From: Lars Ingebrigtsen <larsi <at> gnus.org>
> > Cc: rgm <at> gnu.org,  schwab <at> linux-m68k.org,  48324 <at> debbugs.gnu.org
> > Date: Sun, 03 Jul 2022 14:07:43 +0200
> > 
> > Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> > 
> > > Hm...  I guess the only reliable solution across all coding systems is
> > > (like your comment in the code says) to drop the encode-every-char and
> > > try encoding strings, and then see whether the result is short enough.
> > > That could be done somewhat efficiently using a binary search.  I'll
> > > have a go at it...
> > 
> > And while I was at it, I changed it to return complete glyphs, not just
> > complete code points.
> > 
> > There's a behavioural change, though.  This: 
> > 
> > (string-limit "foóá" 6 t 'utf-16)
> > 
> > Now returns a string with a BOM, whereas previously it didn't.
> 
> So you get 6 characters + the BOM?

I see that it's actually 6 bytes _including_ the BOM.  So I think this
is confusing: if we are going to return a string with the BOM, we
should not count the BOM as part of the LENGTH bytes.  Because if I
requested to get characters which fit into N bytes, I should get those
N bytes of payload.  Or maybe we should have an optional argument to
control whether LENGTH includes or excludes the BOM.

In any case, we should mention this aspect in the doc string, I think.




This bug report was last modified 2 years and 322 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.