GNU bug report logs - #66825
last-coding-system-used in basic-save-buffer

Previous Next

Package: emacs;

Reported by: Juri Linkov <juri <at> linkov.net>

Date: Sun, 29 Oct 2023 18:44:02 UTC

Severity: normal

Fixed in version 30.0.50

Done: Juri Linkov <juri <at> linkov.net>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 66825 in the body.
You can then email your comments to 66825 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#66825; Package emacs. (Sun, 29 Oct 2023 18:44:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Juri Linkov <juri <at> linkov.net>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 29 Oct 2023 18:44:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: bug-gnu-emacs <at> gnu.org
Subject: last-coding-system-used in basic-save-buffer
Date: Sun, 29 Oct 2023 20:40:32 +0200
[Creating a separate bug report from bug#66317 since
sometimes the bug occurs even when project-mode-line is nil]

>> Ok, here is 100% reproducible minimal test case:
>> 0. emacs-30 -Q
>> 1. eval:
>> (progn
>>    (require 'project)
>>    (setq project-mode-line t)
>>    (setq set-message-functions '(set-multi-message)))
>> 2. in a temporary directory: M-! git init RET
>> 3. C-x C-f .dir-locals.el RET
>> 4. insert: ((fundamental-mode . ((mode . flyspell))))
>> 5. C-x C-s
>> At this point even buffer-file-coding-system of .dir-locals.el
>> is changed to 't' (raw-text-unix).  The same happens when saving
>> any file in that project.
>> The problem doesn't occur when flyspell-mode is enabled from
>> file-local variables, only from dir-locals.
>
> I can repro. But it's as weird a bug as they come.
>
> I guess it's a combination of using flyspell-mode and editing
> .dir-locals.el? Or have you seen other buffers' b-f-c-s changed this way
> too?
>
> If not, it might have something to do with flyspell-mode's use of
> sit-for?.. But I'm only saying that because it's the only feature of this
> mode that I'm regularly reminded of.
>
> I tried using a variable watcher:
>
>   (add-variable-watcher
>    'buffer-file-coding-system
>    (lambda (_sym value op where)
>     (message "%s %s %s" value op where)
>     (if (eq 'raw-text-unix value) (backtrace))
>    ))
>
> but it just prints
>
> prefer-utf-8-unix set  *temp*-925453 [2 times]
> raw-text-unix set .dir-locals.el
>   backtrace()
>   (if (eq 'raw-text-unix value) (backtrace))
>   (closure (t) (_sym value op where) (message "%s %s %s" value op where)
>   (if (eq 'raw-text-unix value) (backtrace)))(buffer-file-coding-system
>   raw-text-unix set #<buffer .dir-locals.el>)
>   basic-save-buffer(t)
>   save-buffer(1)
>   funcall-interactively(save-buffer 1)
>   call-interactively(save-buffer nil nil)
>   command-execute(save-buffer)
>
> OTOH, the bug is very reliable to reproduce: add the aforementioned line to
> dir-locals and save -> the coding system changes to raw-text. Delete the
> line and save -> and it's prefer-utf-8 again.

Indeed, the problem is in basic-save-buffer on the following line:

    (setq buffer-file-coding-system last-coding-system-used)

It's hard to guess why this code relies on the value that
can be changed by other functions during saving the buffer.

For example,

  (progn
    (setq last-coding-system-used 'prefer-utf-8-unix)
    (project-name (project-current))
    (message "%S" last-coding-system-used))

prints "raw-text-unix" because it enables 'flyspell-mode'
that calls:

  (defun ispell-buffer-local-parsing ()
    (ispell-send-string "!\n")

where 'process-send-string' changes 'last-coding-system-used'
to "raw-text-unix" in:

  send_process (Lisp_Object proc, const char *buf, ptrdiff_t len,
                Lisp_Object object)
  {
    ...
    Vlast_coding_system_used = CODING_ID_NAME (coding->id);

A possible workaround would be to protect the value of
last-coding-system-used in 'project-mode-line-format':

  (defun project-mode-line-format ()
    "Compose the project mode-line."
    (when-let ((project (project-current)))
      (let (last-coding-system-used)
        (concat
         " "
         (propertize
          (project-name project)
          'face project-mode-line-face
          'mouse-face 'mode-line-highlight
          'help-echo "mouse-1: Project menu"
          'local-map project-mode-line-map)))))

However, I noticed that occasionally this bug occurs even
when this function is not used.  So the proper fix needed
in 'basic-save-buffer', but I don't know if it's intended
that some function should change 'last-coding-system-used'
during saving the buffer.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66825; Package emacs. (Sun, 29 Oct 2023 19:29:02 GMT) Full text and rfc822 format available.

Message #8 received at 66825 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juri Linkov <juri <at> linkov.net>
Cc: 66825 <at> debbugs.gnu.org
Subject: Re: bug#66825: last-coding-system-used in basic-save-buffer
Date: Sun, 29 Oct 2023 21:27:19 +0200
> From: Juri Linkov <juri <at> linkov.net>
> Date: Sun, 29 Oct 2023 20:40:32 +0200
> 
> Indeed, the problem is in basic-save-buffer on the following line:
> 
>     (setq buffer-file-coding-system last-coding-system-used)
> 
> It's hard to guess why this code relies on the value that
> can be changed by other functions during saving the buffer.

Because this is the protocol: functions that determine the encoding of
buffer text dynamically set this variable, so it could later be used
to reflect the detection in buffer-file-coding-system.

> For example,
> 
>   (progn
>     (setq last-coding-system-used 'prefer-utf-8-unix)
>     (project-name (project-current))
>     (message "%S" last-coding-system-used))
> 
> prints "raw-text-unix" because it enables 'flyspell-mode'
> that calls:
> 
>   (defun ispell-buffer-local-parsing ()
>     (ispell-send-string "!\n")
> 
> where 'process-send-string' changes 'last-coding-system-used'
> to "raw-text-unix" in:
> 
>   send_process (Lisp_Object proc, const char *buf, ptrdiff_t len,
>                 Lisp_Object object)
>   {
>     ...
>     Vlast_coding_system_used = CODING_ID_NAME (coding->id);
> 
> A possible workaround would be to protect the value of
> last-coding-system-used in 'project-mode-line-format':

That's not a workaround, that's what we should do in such cases.
Alternatively, the "important" setting of last-coding-system-used
should be done later, after the inner functions already returned.

> However, I noticed that occasionally this bug occurs even
> when this function is not used.  So the proper fix needed
> in 'basic-save-buffer', but I don't know if it's intended
> that some function should change 'last-coding-system-used'
> during saving the buffer.

Yes, it's intended: saving the buffer can change
buffer-file-coding-system if it detects characters which cannot be
encoded by the original buffer-file-coding-system.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66825; Package emacs. (Mon, 30 Oct 2023 08:11:02 GMT) Full text and rfc822 format available.

Message #11 received at 66825 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 66825 <at> debbugs.gnu.org
Subject: Re: bug#66825: last-coding-system-used in basic-save-buffer
Date: Mon, 30 Oct 2023 09:56:27 +0200
>> Indeed, the problem is in basic-save-buffer on the following line:
>>
>>     (setq buffer-file-coding-system last-coding-system-used)
>>
>> It's hard to guess why this code relies on the value that
>> can be changed by other functions during saving the buffer.
>
> Because this is the protocol: functions that determine the encoding of
> buffer text dynamically set this variable, so it could later be used
> to reflect the detection in buffer-file-coding-system.

Still it seems there is some flaw in the design when some
functions change 'last-coding-system-used' intentionally,
but some can change it unintentionally that causes this bug.

>> A possible workaround would be to protect the value of
>> last-coding-system-used in 'project-mode-line-format':
>
> That's not a workaround, that's what we should do in such cases.
> Alternatively, the "important" setting of last-coding-system-used
> should be done later, after the inner functions already returned.

I don't understand this alternative.  The mode line updating
that uses 'project-mode-line-format' that unintentionally
changes 'last-coding-system-used' is called from this line
in 'basic-save-buffer-2':

  (write-region nil nil
                buffer-file-name nil t buffer-file-truename)

because this call in 'write_region' updates the mode line:

    message_with_string ((NUMBERP (append)
			  ? "Updated %s"
			  : ! NILP (append)
			  ? "Added to %s"
			  : "Wrote %s"),
			 visit_file, 1);

>> However, I noticed that occasionally this bug occurs even
>> when this function is not used.  So the proper fix needed
>> in 'basic-save-buffer', but I don't know if it's intended
>> that some function should change 'last-coding-system-used'
>> during saving the buffer.
>
> Yes, it's intended: saving the buffer can change
> buffer-file-coding-system if it detects characters which cannot be
> encoded by the original buffer-file-coding-system.

I see this detection happens in the same 'write_region':

  /* Decide the coding-system to encode the data with.
     We used to make this choice before calling build_annotations, but that
     leads to problems when a write-annotate-function takes care of
     unsavable chars (as was the case with X-Symbol).  */
  Vlast_coding_system_used
    = choose_write_coding_system (start, end, filename,
                                 append, visit, lockname, &coding);

This means the value 'last-coding-system-used' should be preserved
afterwards.  Therefore I don't see how this could be fixed in
'basic-save-buffer' and 'write-region'.

Ok, then let's fix this in 'project-mode-line-format':

diff --git a/lisp/progmodes/project.el b/lisp/progmodes/project.el
index bb44cfefa54..da89036160a 100644
--- a/lisp/progmodes/project.el
+++ b/lisp/progmodes/project.el
@@ -2074,14 +2121,17 @@ project-mode-line-format
 (defun project-mode-line-format ()
   "Compose the project mode-line."
   (when-let ((project (project-current)))
-    (concat
-     " "
-     (propertize
-      (project-name project)
-      'face project-mode-line-face
-      'mouse-face 'mode-line-highlight
-      'help-echo "mouse-1: Project menu"
-      'local-map project-mode-line-map))))
+    ;; Preserve the original value of 'last-coding-system-used'
+    ;; that can break 'basic-save-buffer' (bug#66825)
+    (let ((last-coding-system-used nil))
+      (concat
+       " "
+       (propertize
+        (project-name project)
+        'face project-mode-line-face
+        'mouse-face 'mode-line-highlight
+        'help-echo "mouse-1: Project menu"
+        'local-map project-mode-line-map)))))
 
 (provide 'project)
 ;;; project.el ends here





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66825; Package emacs. (Mon, 30 Oct 2023 12:17:01 GMT) Full text and rfc822 format available.

Message #14 received at 66825 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juri Linkov <juri <at> linkov.net>
Cc: 66825 <at> debbugs.gnu.org
Subject: Re: bug#66825: last-coding-system-used in basic-save-buffer
Date: Mon, 30 Oct 2023 14:15:08 +0200
> From: Juri Linkov <juri <at> linkov.net>
> Cc: 66825 <at> debbugs.gnu.org
> Date: Mon, 30 Oct 2023 09:56:27 +0200
> 
> I don't understand this alternative.  The mode line updating
> that uses 'project-mode-line-format' that unintentionally
> changes 'last-coding-system-used' is called from this line
> in 'basic-save-buffer-2':
> 
>   (write-region nil nil
>                 buffer-file-name nil t buffer-file-truename)
> 
> because this call in 'write_region' updates the mode line:
> 
>     message_with_string ((NUMBERP (append)
> 			  ? "Updated %s"
> 			  : ! NILP (append)
> 			  ? "Added to %s"
> 			  : "Wrote %s"),
> 			 visit_file, 1);

How does message_with_string update the mode line?

And why does last-coding-system-used get set to raw-text-unix in this
scenario anyway?

> +    ;; Preserve the original value of 'last-coding-system-used'
> +    ;; that can break 'basic-save-buffer' (bug#66825)
> +    (let ((last-coding-system-used nil))
> +      (concat
> +       " "
> +       (propertize
> +        (project-name project)
> +        'face project-mode-line-face
> +        'mouse-face 'mode-line-highlight
> +        'help-echo "mouse-1: Project menu"
> +        'local-map project-mode-line-map)))))

I'm confused how this avoids the problem, probably because I don't
understand the answers to the two questions above.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66825; Package emacs. (Mon, 30 Oct 2023 17:29:02 GMT) Full text and rfc822 format available.

Message #17 received at 66825 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 66825 <at> debbugs.gnu.org
Subject: Re: bug#66825: last-coding-system-used in basic-save-buffer
Date: Mon, 30 Oct 2023 19:20:19 +0200
>> I don't understand this alternative.  The mode line updating
>> that uses 'project-mode-line-format' that unintentionally
>> changes 'last-coding-system-used' is called from this line
>> in 'basic-save-buffer-2':
>>
>>   (write-region nil nil
>>                 buffer-file-name nil t buffer-file-truename)
>>
>> because this call in 'write_region' updates the mode line:
>>
>>     message_with_string ((NUMBERP (append)
>> 			  ? "Updated %s"
>> 			  : ! NILP (append)
>> 			  ? "Added to %s"
>> 			  : "Wrote %s"),
>> 			 visit_file, 1);
>
> How does message_with_string update the mode line?

I'm not know, some deeper function needs to update the mode line
when the multi-line message resizes the echo area.

> And why does last-coding-system-used get set to raw-text-unix in this
> scenario anyway?

Because send_process needs to set it to raw-text-unix for ispell:

  send_process (Lisp_Object proc, const char *buf, ptrdiff_t len, Lisp_Object object)
  {
    Vlast_coding_system_used = CODING_ID_NAME (coding->id);

Here is a backtrace:

  send_process
  process-send-string
  ispell-send-string
  ispell-buffer-local-parsing
  ispell-accept-buffer-local-defs
  flyspell-accept-buffer-local-defs
  flyspell--mode-on
  flyspell-mode
  hack-one-local-variable(mode flyspell)
  hack-local-variables-apply
  hack-dir-local-variables-non-file-buffer
  project--value-in-dir
  project-name
  project-mode-line-format
  eval((project-mode-line-format))
  write-region
  basic-save-buffer-2
  basic-save-buffer-1
  basic-save-buffer
  save-buffer




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66825; Package emacs. (Mon, 30 Oct 2023 17:47:02 GMT) Full text and rfc822 format available.

Message #20 received at 66825 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Juri Linkov <juri <at> linkov.net>
Cc: 66825 <at> debbugs.gnu.org
Subject: Re: bug#66825: last-coding-system-used in basic-save-buffer
Date: Mon, 30 Oct 2023 19:45:53 +0200
> From: Juri Linkov <juri <at> linkov.net>
> Cc: 66825 <at> debbugs.gnu.org
> Date: Mon, 30 Oct 2023 19:20:19 +0200
> 
> >>     message_with_string ((NUMBERP (append)
> >> 			  ? "Updated %s"
> >> 			  : ! NILP (append)
> >> 			  ? "Added to %s"
> >> 			  : "Wrote %s"),
> >> 			 visit_file, 1);
> >
> > How does message_with_string update the mode line?
> 
> I'm not know, some deeper function needs to update the mode line
> when the multi-line message resizes the echo area.

I think it's because message_with_string eventually calls redisplay,
and redisplay updates the mode line as part of its job.

> > And why does last-coding-system-used get set to raw-text-unix in this
> > scenario anyway?
> 
> Because send_process needs to set it to raw-text-unix for ispell:
> 
>   send_process (Lisp_Object proc, const char *buf, ptrdiff_t len, Lisp_Object object)
>   {
>     Vlast_coding_system_used = CODING_ID_NAME (coding->id);

I think this happens because the string sent to the speller is a
plain-ASCII string, and those are almost always unibyte strings.

So I think the local binding of last-coding-system-used around the
call to project-mode-line is TRT, it just needs a better comment to
explain why it's needed.

I think this should also teach us a lesson: calling arbitrary complex
code from mode-line's :eval forms is in general risky business and
should be avoided as much as possible.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#66825; Package emacs. (Tue, 31 Oct 2023 07:25:02 GMT) Full text and rfc822 format available.

Message #23 received at 66825 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 66825 <at> debbugs.gnu.org
Subject: Re: bug#66825: last-coding-system-used in basic-save-buffer
Date: Tue, 31 Oct 2023 09:23:12 +0200
close 66825 30.0.50
quit

> So I think the local binding of last-coding-system-used around the
> call to project-mode-line is TRT, it just needs a better comment to
> explain why it's needed.

Ok, pushed the fix with a better comment.




bug marked as fixed in version 30.0.50, send any further explanations to 66825 <at> debbugs.gnu.org and Juri Linkov <juri <at> linkov.net> Request was from Juri Linkov <juri <at> linkov.net> to control <at> debbugs.gnu.org. (Tue, 31 Oct 2023 07:25:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 28 Nov 2023 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 265 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.