GNU bug report logs -
#60750
29.0.60; encode-coding-char fails for utf-8-auto coding system
Previous Next
Reported by: Robert Pluim <rpluim <at> gmail.com>
Date: Thu, 12 Jan 2023 09:09:02 UTC
Severity: normal
Found in version 29.0.60
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
Message #8 received at 60750 <at> debbugs.gnu.org (full text, mbox):
> From: Robert Pluim <rpluim <at> gmail.com>
> Date: Thu, 12 Jan 2023 10:08:31 +0100
>
>
> src/emacs -Q
> M-x toggle-debug-on-error
> M-: (setq buffer-file-coding-system 'utf-8-auto)
> C-b
> C-u C-x =
>
> =>
> Debugger entered--Lisp error: (args-out-of-range "))" 3 1)
> encode-coding-char(41 utf-8-auto ascii)
> describe-char(189)
> what-cursor-position((4))
>
> This is because utf-8-auto has a non-nil :bom property:
>
> (define-coding-system 'utf-8-auto
> "UTF-8 (auto-detect signature (BOM))"
> :coding-type 'utf-8
> :mnemonic ?U
> :charset-list '(unicode)
> :bom '(utf-8-with-signature . utf-8))
Right. This is a very old bug in encoding with utf-8 family of
encoding which has a :bom property that is a cons cell. The fix is
simple, but I wonder what will this break out there. So:
> Iʼm not sure if this needs fixing, but it was surprising, and the
> docstring of `define-coding-system' didnʼt make it clear to me whether
> a BOM should have been produced here or not.
Actually, the doc string is clear:
If the value is a cons cell, on decoding, check the first two bytes.
If they are 0xFE 0xFF, use the car part coding system of the value.
If they are 0xFF 0xFE, use the cdr part coding system of the value.
Otherwise, treat them as bytes for a normal character. On encoding,
produce BOM bytes according to the value of ‘:endian’.
Note the last sentence: it should unconditionally produce the BOM on
encoding. Which is what we do in your scenario.
> (Iʼm willing to be told that buffer-file-coding-system shouldnʼt be
> 'utf-8-auto, but I never set that explicitly as far as I know 😀)
Who does set utf-8-auto? where did you originally bump into this?
This is an obscure coding-system, and the fix to make it work as
documented will produce an incompatible change in behavior. So before
I decide whether to make the change and on what branch, I'd like to
know how in the world did you encounter this.
Thanks.
This bug report was last modified 2 years and 190 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.