GNU bug report logs - #53808
29.0.50; ansi colorization process could block indefinetly on stray ESC char

Previous Next

Package: emacs;

Reported by: Ioannis Kappas <ioannis.kappas <at> gmail.com>

Date: Sat, 5 Feb 2022 20:53:02 UTC

Severity: normal

Found in version 29.0.50

Fixed in version 29.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 53808 in the body.
You can then email your comments to 53808 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#53808; Package emacs. (Sat, 05 Feb 2022 20:53:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Ioannis Kappas <ioannis.kappas <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 05 Feb 2022 20:53:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ioannis Kappas <ioannis.kappas <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 29.0.50; ansi colorization process could block indefinetly on stray
 ESC char
Date: Sat, 5 Feb 2022 20:52:25 +0000
Hi,

there appears to be an issue with `ansi-color-apply' that a stray ESC
control character in the input string can block the colorization process

(with-temp-buffer (ansi-color-apply "a\ebc"))
;; => "a"

(with-temp-buffer (concat (ansi-color-apply "a\ebc") (ansi-color-apply "xyz")))
;; => "a"

The process is blocked at character a the rest are never printed. It
can only resume when a CSI
seq (i.e. one starting with ESC [) appears in the stream

(with-temp-buffer (concat (ansi-color-apply "ab\ec") (ansi-color-apply
"x\e[yz")))
;; => "ab^[cxz"

or, using a valid SGR as an example

(with-temp-buffer (concat (ansi-color-apply "ab\ec") (ansi-color-apply
"x\e[3myz")))
;; => #("ab^[cxyz" 5 7
;;      (font-lock-face italic))


This behavior can pose serious problems to applications which support
ansi colorisation of their output streams, but otherwise treat ESC as
any other control character (e.g. REPLs colorizing their output with
ansi-color but they also like to display any other character). Their
output might be blocked indefinitely when an ESC character appears in
their output.

My expectation is that a character sequence starting with ESC which is
not being part of an SGR sequence, should be output immediately,
rather than treated as a potential SGR sequence (which by definition
it can never be) blocking further processing.

e.g.
(with-temp-buffer (ansi-color-apply "a\ebc"))
;; => "a^[bc"

Analysis to follow.

Thanks

In GNU Emacs 29.0.50 (build 1, x86_64-w64-mingw32)
Repository revision: 3a8e140ad115633791d057bd10998d80c33e6dc7
Repository branch: master
Windowing system distributor 'Microsoft Corp.', version 10




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53808; Package emacs. (Sat, 05 Feb 2022 21:01:02 GMT) Full text and rfc822 format available.

Message #8 received at 53808 <at> debbugs.gnu.org (full text, mbox):

From: Ioannis Kappas <ioannis.kappas <at> gmail.com>
To: 53808 <at> debbugs.gnu.org
Subject: RE: 29.0.50; ansi colorization process could block indefinetly on
 stray ESC char
Date: Sat, 5 Feb 2022 21:00:21 +0000
The issue appears to be caused by the ansi color context logic, trying
to handle potential SGR sequences split between string fragments. The
SGR sequence is accumulated into the context until is complete and
only then output with the rest of the input string.

But currently, identifying the beginning of an SGR sequence is
performed naively based on the first character (ESC aka ^], \033 or
\e) and until a valid C1 sequence is matched in the accumulated
context string, rather than checking whether the SGR sequence is valid
or completed:

(defconst ansi-color-control-seq-regexp
  ;; See ECMA 48, section 5.4 "Control Sequences".
  "\e\\[[\x30-\x3F]*[\x20-\x2F]*[\x40-\x7E]"
  "Regexp matching an ANSI control sequence.")

(defun ansi-color-apply (string)
  "Translates SGR control sequences into text properties..."
  (let* ((context
          (ansi-color--ensure-context 'ansi-color-context nil))
         (face-vec (car context))
         (start 0)
         end result)
    ;; If context was saved and is a string, prepend it.
    (setq string (concat (cadr context) string))
    (setcar (cdr context) "")
    ;; Find the next escape sequence.
    (while (setq end (string-match ansi-color-control-seq-regexp string start))
      (let ((esc-end (match-end 0)))
        ;; ...
        (push (substring string start end) result)
        (setq start (match-end 0))
        ;; ...
        ))
    ;; ...

    ;; save context, add the remainder of the string to the result
    (if (string-match "\033" string start)
        (let ((pos (match-beginning 0)))
          (setcar (cdr context) (substring string pos))
          (push (substring string start pos) result))
      (push (substring string start) result))
    (apply 'concat (nreverse result))))


A solution (open to discussion) could be to identify a partial SGR
fragment based on its actual specification rather than only starting
with the ESC char:

modified   lisp/ansi-color.el
@@ -501,6 +501,19 @@ ansi-color-filter-apply
       (setcar (cdr context) fragment))
     (apply #'concat (nreverse result))))

+(defconst ansi-color--sgr-partial-regex
+  "\e\\(?:\\[\\|$\\)\\(?:(?:[0-9]+;?\\)*"
+  "A regexp for locating the beginning of a partial SGR
+  sequence.")
+
+(defun ansi-color--sgr-fragment-pos (string start)
+  "Check if STRING ends with a partial SGR sequence and return
+its position or nil otherwise. Start looking in STRING at position START."
+  (save-match-data
+    (when (and (string-match ansi-color--sgr-partial-regex string start)
+               (= (match-end 0) (1- (length string))))
+      (match-beginning 0))))
+
 (defun ansi-color-apply (string)
   "Translates SGR control sequences into text properties.
 Delete all other control sequences without processing them.
@@ -549,8 +562,8 @@ ansi-color-apply
       (put-text-property start (length string)
                          'font-lock-face face string))
     ;; save context, add the remainder of the string to the result
-    (if (string-match "\033" string start)
-        (let ((pos (match-beginning 0)))
+    (if-let ((pos (ansi-color--sgr-fragment-pos string start)))
+        (progn
           (setcar (cdr context) (substring string pos))
           (push (substring string start pos) result))
       (push (substring string start) result))

Let me know your thoughts, there is also `ansi-color-filter-apply' and
`ansi-color-filter-region' that would need similar treatment. I also
have a unit test in development.

Thanks




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53808; Package emacs. (Sat, 05 Feb 2022 21:49:02 GMT) Full text and rfc822 format available.

Message #11 received at 53808 <at> debbugs.gnu.org (full text, mbox):

From: Ioannis Kappas <ioannis.kappas <at> gmail.com>
To: 53808 <at> debbugs.gnu.org
Subject: Re: 29.0.50; ansi colorization process could block indefinetly on
 stray ESC char
Date: Sat, 5 Feb 2022 21:47:50 +0000
(sorry, I sent out the wrong patch, the correct one is

modified   lisp/ansi-color.el
@@ -501,6 +501,20 @@ ansi-color-filter-apply
       (setcar (cdr context) fragment))
     (apply #'concat (nreverse result))))

+(defconst ansi-color--sgr-partial-regex
+  "\e\\(?:\\[\\|$\\)\\(?:[0-9]+;?\\)*"
+  "A regexp for locating the beginning of a partial SGR
+  sequence.")
+
+(defun ansi-color--sgr-fragment-pos (string start)
+  "Check if STRING ends with a partial SGR sequence and return
+its position or nil otherwise. Start looking in STRING at position START."
+  (save-match-data
+    (when (and (string-match ansi-color--sgr-partial-regex string start)
+               (or (= (match-end 0) 0)
+                   (= (match-end 0) (length string))) )
+      (match-beginning 0))))
+
 (defun ansi-color-apply (string)
   "Translates SGR control sequences into text properties.
 Delete all other control sequences without processing them.
@@ -549,8 +563,8 @@ ansi-color-apply
       (put-text-property start (length string)
                          'font-lock-face face string))
     ;; save context, add the remainder of the string to the result
-    (if (string-match "\033" string start)
-        (let ((pos (match-beginning 0)))
+    (if-let ((pos (ansi-color--sgr-fragment-pos string start)))
+        (progn
           (setcar (cdr context) (substring string pos))
           (push (substring string start pos) result))
       (push (substring string start) result))



)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53808; Package emacs. (Sat, 05 Feb 2022 21:57:01 GMT) Full text and rfc822 format available.

Message #14 received at 53808 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Ioannis Kappas <ioannis.kappas <at> gmail.com>
Cc: 53808 <at> debbugs.gnu.org,
 Miha Rihtaršič <miha <at> kamnitnik.top>
Subject: Re: bug#53808: 29.0.50; ansi colorization process could block
 indefinetly on stray ESC char
Date: Sat, 05 Feb 2022 22:56:23 +0100
Ioannis Kappas <ioannis.kappas <at> gmail.com> writes:

> A solution (open to discussion) could be to identify a partial SGR
> fragment based on its actual specification rather than only starting
> with the ESC char:

Hm...  what happens if the ESC arrives in one chunk and then the rest of
the SGR sequence in the next chunk?

(Miha has done work in this area recently; added to the CCs.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53808; Package emacs. (Sat, 05 Feb 2022 22:06:02 GMT) Full text and rfc822 format available.

Message #17 received at 53808 <at> debbugs.gnu.org (full text, mbox):

From: Ioannis Kappas <ioannis.kappas <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 53808 <at> debbugs.gnu.org,
 Miha Rihtaršič <miha <at> kamnitnik.top>
Subject: Re: bug#53808: 29.0.50; ansi colorization process could block
 indefinetly on stray ESC char
Date: Sat, 5 Feb 2022 22:05:05 +0000
On Sat, Feb 5, 2022 at 9:56 PM Lars Ingebrigtsen <larsi <at> gnus.org> wrote:
>
> Ioannis Kappas <ioannis.kappas <at> gmail.com> writes:
>
> > A solution (open to discussion) could be to identify a partial SGR
> > fragment based on its actual specification rather than only starting
> > with the ESC char:
>
> Hm...  what happens if the ESC arrives in one chunk and then the rest of
> the SGR sequence in the next chunk?

It is handled correctly as expected if the concatenated sequence is an
SGR, it is output as such, i.e. all
test/lisp/ansi-color-tests.el:ansi-color-incomplete-sequences-test
pass still pass.

Here is the list of unit tests showing of what I consider correct
handling of non SGR sequences I have came up with thus far

(ert-deftest ansi-color-context-non-sgr ()

  (with-temp-buffer
    (let ((text (ansi-color-apply "\e[33mHello World\e[0m")))
      (should (string= "Hello World" text))
      (should (equal (get-char-property 0 'font-lock-face text)
                     '(:foreground "yellow3")))
      ))

  (with-temp-buffer
    (let ((pretext (ansi-color-apply "5"))
          (text (ansi-color-apply "\e[33mHello World\e[0m")))
      (should (string= "Hello World" text))
      (should (equal (get-char-property 0 'font-lock-face text)
                     '(:foreground "yellow3")))
      ))

  (with-temp-buffer
    (let ((pretext (ansi-color-apply "\e"))
          (text (ansi-color-apply "\e[33mHello World\e[0m")))
      (should (string= "\eHello World" text))
      (should (equal (get-char-property 1 'font-lock-face text)
                     '(:foreground "yellow3")))
      ))

  (with-temp-buffer
    (let ((pretext (ansi-color-apply "\e["))
          (text (ansi-color-apply "\e[33mHello World\e[0m")))
      (should (string= "\e[Hello World" text))
      (should (equal (get-char-property 2 'font-lock-face text)
                     '(:foreground "yellow3")))
      ))

  (with-temp-buffer
    (let ((pretext (ansi-color-apply "\e[33"))
          (text (ansi-color-apply "mHello World\e[0m")))
      (should (string= "Hello World" text))
      (should (equal (get-char-property 2 'font-lock-face text)
                     '(:foreground "yellow3")))
      ))

  (with-temp-buffer
    (let ((pretext (ansi-color-apply "\e[33m"))
          (text (ansi-color-apply "Hello World\e[0m")))
      (should (string= "Hello World" text))
      (should (equal (get-char-property 2 'font-lock-face text)
                     '(:foreground "yellow3")))
      ))
  (with-temp-buffer
    (let ((pretext (ansi-color-apply "\e[33;1"))
          (text (ansi-color-apply "mHello World\e[0m")))
      (should (string= "Hello World" text))
      (should (equal (get-char-property 2 'font-lock-face text)
                     '(ansi-color-bold (:foreground "yellow3"))))
      ))

  (with-temp-buffer
    (let ((pretext (ansi-color-apply "\e[33;"))
          (text (ansi-color-apply "1mHello World\e[0m")))
      (should (string= "Hello World" text))
      (should (equal (get-char-property 2 'font-lock-face text)
                     '(ansi-color-bold (:foreground "yellow3"))))
      ))
  )

> (Miha has done work in this area recently; added to the CCs.)

Looking forward to his feedback :) it is because of his work I've
decided to raise this against 29 instead of the 28 branch.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53808; Package emacs. (Sun, 06 Feb 2022 20:31:02 GMT) Full text and rfc822 format available.

Message #20 received at 53808 <at> debbugs.gnu.org (full text, mbox):

From: <miha <at> kamnitnik.top>
To: Ioannis Kappas <ioannis.kappas <at> gmail.com>, Lars Ingebrigtsen
 <larsi <at> gnus.org>
Cc: 53808 <at> debbugs.gnu.org
Subject: Re: bug#53808: 29.0.50; ansi colorization process could block
 indefinetly on stray ESC char
Date: Sun, 06 Feb 2022 21:36:59 +0100
[Message part 1 (text/plain, inline)]
Ioannis Kappas <ioannis.kappas <at> gmail.com> writes:

> It is handled correctly as expected if the concatenated sequence is an
> SGR, it is output as such, i.e. all
> test/lisp/ansi-color-tests.el:ansi-color-incomplete-sequences-test
> pass still pass.
>
> Here is the list of unit tests showing of what I consider correct
> handling of non SGR sequences I have came up with thus far
>
> (ert-deftest ansi-color-context-non-sgr ()
>
> [...]
>
>   (with-temp-buffer
>     (let ((pretext (ansi-color-apply "\e[33;"))
>           (text (ansi-color-apply "1mHello World\e[0m")))
>       (should (string= "Hello World" text))
>       (should (equal (get-char-property 2 'font-lock-face text)
>                      '(ansi-color-bold (:foreground "yellow3"))))
>       ))
>   )

Thanks. I took the liberty of working on your patch, adding support for
ansi-color-apply-on-region, ansi-color-filter-region,
ansi-color-filter-apply. I also added some tests as you suggested and
made a minor simplification.

[0001-ansi-color-don-t-get-stuck-on-e.patch (text/x-patch, inline)]
From 162045f83154d3df7b482871b05076a92efd02f9 Mon Sep 17 00:00:00 2001
From: Ioannis Kappas <ioannis.kappas <at> gmail.com>
Date: Sun, 6 Feb 2022 21:25:56 +0100
Subject: [PATCH] ansi-color: don't get stuck on \e

* lisp/ansi-color.el (ansi-color--control-seq-fragment-regexp): New
constant.

(ansi-color-filter-apply):
(ansi-color-apply):
(ansi-color-filter-region):
(ansi-color-apply-on-region): Don't get stuck on \e if it is
determined that it cannot start a valid ANSI escape
sequence (Bug#53808).

* test/lisp/ansi-color-tests.el (ansi-color-incomplete-sequences-test):
Test for \e that doesn't start a valid ANSI escape sequence.
---
 lisp/ansi-color.el            | 26 ++++++++++++++++++++------
 test/lisp/ansi-color-tests.el | 20 +++++++++++++++++++-
 2 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/lisp/ansi-color.el b/lisp/ansi-color.el
index 3973d9db08..e5d2e2c4ac 100644
--- a/lisp/ansi-color.el
+++ b/lisp/ansi-color.el
@@ -347,6 +347,10 @@ ansi-color-control-seq-regexp
   "\e\\[[\x30-\x3F]*[\x20-\x2F]*[\x40-\x7E]"
   "Regexp matching an ANSI control sequence.")
 
+(defconst ansi-color--control-seq-fragment-regexp
+  "\e\\[[\x30-\x3F]*[\x20-\x2F]*\\|\e"
+  "Regexp matching a partial ANSI control sequence.")
+
 (defconst ansi-color-parameter-regexp "\\([0-9]*\\)[m;]"
   "Regexp that matches SGR control sequence parameters.")
 
@@ -492,7 +496,9 @@ ansi-color-filter-apply
     ;; save context, add the remainder of the string to the result
     (let ((fragment ""))
       (push (substring string start
-                       (if (string-match "\033" string start)
+                       (if (string-match
+                            (concat "\\(?:" ansi-color--control-seq-fragment-regexp "\\)\\'")
+                            string start)
                            (let ((pos (match-beginning 0)))
                              (setq fragment (substring string pos))
                              pos)
@@ -549,7 +555,9 @@ ansi-color-apply
       (put-text-property start (length string)
                          'font-lock-face face string))
     ;; save context, add the remainder of the string to the result
-    (if (string-match "\033" string start)
+    (if (string-match
+         (concat "\\(?:" ansi-color--control-seq-fragment-regexp "\\)\\'")
+         string start)
         (let ((pos (match-beginning 0)))
           (setcar (cdr context) (substring string pos))
           (push (substring string start pos) result))
@@ -685,7 +693,11 @@ ansi-color-filter-region
       (while (re-search-forward ansi-color-control-seq-regexp end-marker t)
         (delete-region (match-beginning 0) (match-end 0)))
       ;; save context, add the remainder of the string to the result
-      (if (re-search-forward "\033" end-marker t)
+      (set-marker start (point))
+      (while (re-search-forward ansi-color--control-seq-fragment-regexp
+                                end-marker t))
+      (if (and (/= (point) start)
+               (= (point) end-marker))
 	  (set-marker start (match-beginning 0))
         (set-marker start nil)))))
 
@@ -742,10 +754,12 @@ ansi-color-apply-on-region
             ;; Otherwise, strip.
             (delete-region esc-beg esc-end))))
       ;; search for the possible start of a new escape sequence
-      (if (re-search-forward "\033" end-marker t)
+      (while (re-search-forward ansi-color--control-seq-fragment-regexp
+                                end-marker t))
+      (if (and (/= (point) start-marker)
+               (= (point) end-marker))
           (progn
-            (while (re-search-forward "\033" end-marker t))
-            (backward-char)
+            (goto-char (match-beginning 0))
             (funcall ansi-color-apply-face-function
                      start-marker (point)
                      (ansi-color--face-vec-face face-vec))
diff --git a/test/lisp/ansi-color-tests.el b/test/lisp/ansi-color-tests.el
index 71b706c763..2ff7fc6aaf 100644
--- a/test/lisp/ansi-color-tests.el
+++ b/test/lisp/ansi-color-tests.el
@@ -171,7 +171,25 @@ ansi-color-incomplete-sequences-test
           (insert str)
           (ansi-color-apply-on-region opoint (point))))
       (should (ansi-color-tests-equal-props
-               propertized-str (buffer-string))))))
+               propertized-str (buffer-string))))
+
+    ;; \e not followed by '[' and invalid ANSI escape seqences
+    (dolist (fun (list ansi-filt ansi-app))
+      (with-temp-buffer
+        (should (equal (funcall fun "\e") ""))
+        (should (equal (funcall fun "\e[33m test \e[0m")
+                       (with-temp-buffer
+                         (concat "\e" (funcall fun "\e[33m test \e[0m"))))))
+      (with-temp-buffer
+        (should (equal (funcall fun "\e[") ""))
+        (should (equal (funcall fun "\e[33m Z \e[0m")
+                       (with-temp-buffer
+                         (concat "\e[" (funcall fun "\e[33m Z \e[0m"))))))
+      (with-temp-buffer
+        (should (equal (funcall fun "\e a \e\e[\e[") "\e a \e\e["))
+        (should (equal (funcall fun "\e[33m Z \e[0m")
+                       (with-temp-buffer
+                         (concat "\e[" (funcall fun "\e[33m Z \e[0m")))))))))
 
 (provide 'ansi-color-tests)
 
-- 
2.34.1

[Message part 3 (text/plain, inline)]
Thanks again and best regards.
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53808; Package emacs. (Sun, 06 Feb 2022 22:56:01 GMT) Full text and rfc822 format available.

Message #23 received at 53808 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: <miha <at> kamnitnik.top>
Cc: Ioannis Kappas <ioannis.kappas <at> gmail.com>, 53808 <at> debbugs.gnu.org
Subject: Re: bug#53808: 29.0.50; ansi colorization process could block
 indefinetly on stray ESC char
Date: Sun, 06 Feb 2022 23:55:27 +0100
<miha <at> kamnitnik.top> writes:

> Thanks. I took the liberty of working on your patch, adding support for
> ansi-color-apply-on-region, ansi-color-filter-region,
> ansi-color-filter-apply. I also added some tests as you suggested and
> made a minor simplification.

Thanks; applied to Emacs 29.

Ioannis' original code was small enough to apply without an FSF
copyright assignment, so I noted the different authors in the commit
message.

Ioannis, this change was small enough to apply without assigning
copyright to the FSF, but for future patches you want to submit, it
might make sense to get the paperwork started now, so that subsequent
patches can be applied speedily. Would you be willing to sign such
paperwork?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug marked as fixed in version 29.1, send any further explanations to 53808 <at> debbugs.gnu.org and Ioannis Kappas <ioannis.kappas <at> gmail.com> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Sun, 06 Feb 2022 22:56:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53808; Package emacs. (Mon, 07 Feb 2022 07:53:01 GMT) Full text and rfc822 format available.

Message #28 received at 53808 <at> debbugs.gnu.org (full text, mbox):

From: Ioannis Kappas <ioannis.kappas <at> gmail.com>
To: Miha Rihtaršič <miha <at> kamnitnik.top>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 53808 <at> debbugs.gnu.org
Subject: Re: bug#53808: 29.0.50; ansi colorization process could block
 indefinetly on stray ESC char
Date: Mon, 7 Feb 2022 07:51:44 +0000
Hi Miha,

On Sun, Feb 6, 2022 at 8:30 PM <miha <at> kamnitnik.top> wrote:

> Thanks. I took the liberty of working on your patch, adding support for
> ansi-color-apply-on-region, ansi-color-filter-region,
> ansi-color-filter-apply. I also added some tests as you suggested and
> made a minor simplification.
>

thanks for looking into this! The patch looks good and reduces the
issue considerably, but I've noticed there is still some undesired
behaviour with non SGR CSI sequences. I was expecting the following
test to display the non SGR `\e[a' characters verbatim in the output
(this is in the context of the
test/lisp/ansi-color-tests.el:ansi-color-incomplete-sequences-test()),

(dolist (fun (list ansi-filt ansi-app))
        (with-temp-buffer
          (should (equal (funcall fun "\e[a") ""))
          (should (equal (funcall fun "\e[33m Z \e[0m")
                         (with-temp-buffer
                           (concat "\e[a" (funcall fun "\e[33m Z \e[0m")))))
          ))

but fails to do so with

Test ansi-color-incomplete-sequences-test condition:
    (ert-test-failed
     ((should
       (equal
        (funcall fun "\33[33m Z \33[0m")
        (with-temp-buffer ...)))
      :form
      (equal " Z " "\33[a Z ")
      :value nil :explanation
      (arrays-of-different-length 3 6 " Z " "\33[a Z " first-mismatch-at 0)))

i.e. the "\e[a" seq does not appear in the output. Even before that, I
was expecting  (equal (funcall fun "\e[a") "") to fail and (equal
(funcall fun "\e[a") "\e[a") to be true instead (as this can't be the
start of a valid SGR expression).

Is there a reason why the ansi-color library tries to match input
against the CSI superset sequence instead of the SGR subset? The
package appears to be dealing exclusively with the latter and using
CSI regexps seems like an unnecessary complication to me.

(Just for reference, I'm using the terminology found in the ANSI
escape code in wikipedia at
https://en.wikipedia.org/w/index.php?title=ANSI_escape_code&oldid=1070369816#Description)

The SGR set as I understand it is the char sequence starting with the
ESC control character followed by the [ character followed by zero or
more of [0-9]+; followed by [0-9]+ followed by m. For example, ESC[33m
or ESC[3;31m. This is what I tried to capture as a fragment with the
"\e\\(?:\\[\\|$\\)\\(?:(?:[0-9]+;?\\)*"  regexp in my original patch.

Another minor observation, perhaps the following concat could be moved
into defconst in the interest of performance (it appears twice in the
patch)?

     (let ((fragment ""))
       (push (substring string start
-                       (if (string-match "\033" string start)
+                       (if (string-match
+                            (concat "\\(?:"
ansi-color--control-seq-fragment-regexp "\\)\\'")
+                            string start)

Best Regards




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#53808; Package emacs. (Mon, 07 Feb 2022 11:33:01 GMT) Full text and rfc822 format available.

Message #31 received at 53808 <at> debbugs.gnu.org (full text, mbox):

From: <miha <at> kamnitnik.top>
To: Ioannis Kappas <ioannis.kappas <at> gmail.com>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 53808 <at> debbugs.gnu.org
Subject: Re: bug#53808: 29.0.50; ansi colorization process could block
 indefinetly on stray ESC char
Date: Mon, 07 Feb 2022 12:42:20 +0100
[Message part 1 (text/plain, inline)]
Ioannis Kappas <ioannis.kappas <at> gmail.com> writes:

> Thanks for looking into this! The patch looks good and reduces the
> issue considerably, but I've noticed there is still some undesired
> behaviour with non SGR CSI sequences. I was expecting the following
> test to display the non SGR `\e[a' characters verbatim in the output
> (this is in the context of the
> test/lisp/ansi-color-tests.el:ansi-color-incomplete-sequences-test()),
>
> (dolist (fun (list ansi-filt ansi-app))
>         (with-temp-buffer
>           (should (equal (funcall fun "\e[a") ""))
>           (should (equal (funcall fun "\e[33m Z \e[0m")
>                          (with-temp-buffer
>                            (concat "\e[a" (funcall fun "\e[33m Z \e[0m")))))
>           ))
>
> but fails to do so with
>
> Test ansi-color-incomplete-sequences-test condition:
>     (ert-test-failed
>      ((should
>        (equal
>         (funcall fun "\33[33m Z \33[0m")
>         (with-temp-buffer ...)))
>       :form
>       (equal " Z " "\33[a Z ")
>       :value nil :explanation
>       (arrays-of-different-length 3 6 " Z " "\33[a Z " first-mismatch-at 0)))
>
> i.e. the "\e[a" seq does not appear in the output. Even before that, I
> was expecting  (equal (funcall fun "\e[a") "") to fail and (equal
> (funcall fun "\e[a") "\e[a") to be true instead (as this can't be the
> start of a valid SGR expression).
>
> Is there a reason why the ansi-color library tries to match input
> against the CSI superset sequence instead of the SGR subset? The
> package appears to be dealing exclusively with the latter and using
> CSI regexps seems like an unnecessary complication to me.

Seems like filtering of non-SGR CSI sequences was introduced in commit
from Sat May 29 14:25:00 2010 -0400
(bc8d33d540d079af28ea93a0cf8df829911044ca) to fix bug#6085. And indeed,
if I try to set 'ansi-color-control-seq-regexp' to the more specific
SGR-only regexp "\e\\[[0-9;]*m", I get a lot of distracting "^[[K" in
the output of "grep --color=always" on my system.

> (Just for reference, I'm using the terminology found in the ANSI
> escape code in wikipedia at
> https://en.wikipedia.org/w/index.php?title=ANSI_escape_code&oldid=1070369816#Description)
>
> The SGR set as I understand it is the char sequence starting with the
> ESC control character followed by the [ character followed by zero or
> more of [0-9]+; followed by [0-9]+ followed by m. For example, ESC[33m
> or ESC[3;31m. This is what I tried to capture as a fragment with the
> "\e\\(?:\\[\\|$\\)\\(?:(?:[0-9]+;?\\)*"  regexp in my original patch.

I believe 'ansi-color--control-seq-fragment-regexp' should mirror
'ansi-color-control-seq-regexp' as exactly as possible. In other words,
if one matches all CSI sequences, the other shouldn't match only SGR
sequences.

> Another minor observation, perhaps the following concat could be moved
> into defconst in the interest of performance (it appears twice in the
> patch)?
>
>      (let ((fragment ""))
>        (push (substring string start
> -                       (if (string-match "\033" string start)
> +                       (if (string-match
> +                            (concat "\\(?:"
> ansi-color--control-seq-fragment-regexp "\\)\\'")
> +                            string start)

Thanks, noted, I will hopefully send the simple patch soon.

> Best Regards

Thanks, best regards.
[signature.asc (application/pgp-signature, inline)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 07 Mar 2022 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 106 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.