Reported by: Dima Kogan <dima <at> secretsauce.net>
Date: Wed, 4 Jan 2017 23:38:01 UTC
Severity: normal
Tags: confirmed
Found in version 26.0.50
Done: Alan Mackenzie <acm <at> muc.de>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Alan Mackenzie <acm <at> muc.de> To: Dima Kogan <dima <at> secretsauce.net> Cc: Eli Zaretskii <eliz <at> gnu.org>, 25362 <at> debbugs.gnu.org Subject: bug#25362: 26.0.50; comment-region goes into an infinite loop Date: Mon, 9 Jan 2017 21:18:01 +0000
Hello again, Dima. On Wed, Jan 04, 2017 at 03:37:25PM -0800, Dima Kogan wrote: > Hi. I'm using a very recent emacs built from master: > eb3416016b4 > I'm observing (comment-region) go into an infinite loop when presented > with particular data. I'm attaching a small source file that > demonstrates the issue. To see it break, run this: > emacs --eval '(add-hook (quote c-mode-common-hook) (lambda () (setq comment-start "//" comment-end "")))' \ > --eval '(global-set-key (kbd "<f5>") (lambda () (interactive) (comment-region 16 45)))' \ > -Q tst6.c > This loads the demo file with a clean configuration, and runs two bits > of lisp: > 1. Sets up a c++ commenting style. This is required for the issue to > present itself > 2. Binds f5 to run the problematic (comment-region) > On my machine, invoking emacs that way, and then hitting f5 shows the problem. > void f(void) > { > > g( // output > s ); > > } As already observed, this bug happens simply in C Mode by attempting to comment out the four lines inside the function, doing this by marking the lines then C-c C-c. The cause was a rather nasty situation where a "syntactic whitespace cache", designed to facilitate moving rapidly over large areas of WS (particularly long comments at the beginning of files) got screwed up by the insertion of "// " onto a line already containing a "//" later on. Here is a patch. It is not yet in its final form (the comments need fixing). Would you please try it out and let me know how well/how badly it works: diff -r a3e0a2a33629 cc-engine.el --- a/cc-engine.el Sat Dec 31 16:59:41 2016 +0000 +++ b/cc-engine.el Mon Jan 09 21:03:52 2017 +0000 @@ -1712,46 +1712,122 @@ `((c-debug-remove-face beg end 'c-debug-is-sws-face) (c-debug-remove-face beg end 'c-debug-in-sws-face))))) -(defsubst c-invalidate-sws-region-after (beg end) +;; The type of literal position `end' is in in a `before-change-functions' +;; function - one of `c', `c++', `string', `pound', or nil. +(defvar c-sws-lit-type nil) +;; A cons (BEG . END) of the literal/CPP construct enclosing END, if any, else +;; nil. +(defvar c-sws-lit-limits nil) + +(defun c-invalidate-sws-region-before (end) + ;; Called from c-before-change. END is the end of the change region, the + ;; standard parameter given to all before-change-functions. + ;; + ;; Note whether END is inside a comment of CPP construct, and if so note its + ;; bounds. + (save-excursion + (goto-char end) + (let* ((limits (c-literal-limits)) + (lit-type (c-literal-type limits))) + (cond + ((memq lit-type '(c c++)) + (setq c-sws-lit-type lit-type + c-sws-lit-limits limits)) + ((c-beginning-of-macro) + (setq c-sws-lit-type 'pound + c-sws-lit-limits (cons (point) + (progn (c-end-of-macro) (point))))) + (t (setq c-sws-lit-type nil + c-sws-lit-limits nil)))))) + +(defun c-invalidate-sws-region-after-del (beg end old-len) + ;; Text has been deleted, OLD-LEN characters of it starting from position + ;; BEG. END is typically eq to BEG. Should there have been a comment or + ;; CPP construct open at END before the deletion, check whether this + ;; deletion deleted or "damaged" its opening delimiter. If so, return the + ;; current position of where the construct ended, otherwise return nil. + (when c-sws-lit-limits + (setcdr c-sws-lit-limits (- (cdr c-sws-lit-limits) old-len)) + (if (and (< beg (+ (car c-sws-lit-limits) 2)) + (or (get-text-property end 'c-in-sws) + (next-single-property-change end 'c-in-sws nil + (cdr c-sws-lit-limits)) + (get-text-property end 'c-is-sws) + (next-single-property-change end 'c-is-sws nil + (cdr c-sws-lit-limits)))) + (cdr c-sws-lit-limits)))) + +(defun c-invalidate-sws-region-after-ins (end) + ;; Text has been inserted, ending at buffer position END. Should there be a + ;; literal or CPP construct open at END, check whether there are `c-in-sws' + ;; or `c-is-sws' text properties inside this literal. If there are, return + ;; the buffer position of the end of the literal, else return nil. + (save-excursion + (let* ((limits (c-literal-limits)) + (lit-type (c-literal-type limits))) + (goto-char end) + (when (and (not (memq lit-type '(c c++))) + (c-beginning-of-macro)) + (setq lit-type 'pound + limits (cons (point) + (progn (c-end-of-macro) (point))))) + (when (memq lit-type '(c c++ pound)) + (let ((next-in (next-single-property-change (car limits) 'c-in-sws + nil (cdr limits))) + (next-is (next-single-property-change (car limits) 'c-is-sws + nil (cdr limits)))) + (and (or next-in next-is) + (cdr limits))))))) + +(defun c-invalidate-sws-region-after (beg end old-len) ;; Called from `after-change-functions'. Note that if ;; `c-forward-sws' or `c-backward-sws' are used outside ;; `c-save-buffer-state' or similar then this will remove the cache ;; properties right after they're added. ;; ;; This function does hidden buffer changes. - - (save-excursion - ;; Adjust the end to remove the properties in any following simple - ;; ws up to and including the next line break, if there is any - ;; after the changed region. This is necessary e.g. when a rung - ;; marked empty line is converted to a line comment by inserting - ;; "//" before the line break. In that case the line break would - ;; keep the rung mark which could make a later `c-backward-sws' - ;; move into the line comment instead of over it. - (goto-char end) - (skip-chars-forward " \t\f\v") - (when (and (eolp) (not (eobp))) - (setq end (1+ (point))))) - - (when (and (= beg end) - (get-text-property beg 'c-in-sws) - (> beg (point-min)) - (get-text-property (1- beg) 'c-in-sws)) - ;; Ensure that an `c-in-sws' range gets broken. Note that it isn't - ;; safe to keep a range that was continuous before the change. E.g: - ;; - ;; #define foo - ;; \ - ;; bar - ;; - ;; There can be a "ladder" between "#" and "b". Now, if the newline - ;; after "foo" is removed then "bar" will become part of the cpp - ;; directive instead of a syntactically relevant token. In that - ;; case there's no longer syntactic ws from "#" to "b". - (setq beg (1- beg))) - - (c-debug-sws-msg "c-invalidate-sws-region-after [%s..%s]" beg end) - (c-remove-is-and-in-sws beg end)) + (let ((del-end + (and (> old-len 0) + (c-invalidate-sws-region-after-del beg end old-len))) + (ins-end + (and (> end beg) + (c-invalidate-sws-region-after-ins end)))) + (save-excursion + ;; Adjust the end to remove the properties in any following simple + ;; ws up to and including the next line break, if there is any + ;; after the changed region. This is necessary e.g. when a rung + ;; marked empty line is converted to a line comment by inserting + ;; "//" before the line break. In that case the line break would + ;; keep the rung mark which could make a later `c-backward-sws' + ;; move into the line comment instead of over it. + (goto-char end) + (skip-chars-forward " \t\f\v") + (when (and (eolp) (not (eobp))) + (setq end (1+ (point))))) + + (when (and (= beg end) + (get-text-property beg 'c-in-sws) + (> beg (point-min)) + (get-text-property (1- beg) 'c-in-sws)) + ;; Ensure that an `c-in-sws' range gets broken. Note that it isn't + ;; safe to keep a range that was continuous before the change. E.g: + ;; + ;; #define foo + ;; \ + ;; bar + ;; + ;; There can be a "ladder" between "#" and "b". Now, if the newline + ;; after "foo" is removed then "bar" will become part of the cpp + ;; directive instead of a syntactically relevant token. In that + ;; case there's no longer syntactic ws from "#" to "b". + (setq beg (1- beg))) + + (setq end (max (or del-end end) + (or ins-end end) + end)) + + (c-debug-sws-msg "c-invalidate-sws-region-after [%s..%s]" beg end) + (c-remove-is-and-in-sws beg end))) (defun c-forward-sws () ;; Used by `c-forward-syntactic-ws' to implement the unbounded search. diff -r a3e0a2a33629 cc-langs.el --- a/cc-langs.el Sat Dec 31 16:59:41 2016 +0000 +++ b/cc-langs.el Mon Jan 09 21:03:52 2017 +0000 @@ -1428,6 +1428,15 @@ t "*/" awk nil) +(c-lang-defconst c-block-comment-ender-regexp + ;; Regexp which matches the end of a block comment (if such exists in the + ;; language) + t (if (c-lang-const c-block-comment-ender) + (regexp-quote (c-lang-const c-block-comment-ender)) + "\\<\\>")) +(c-lang-defvar c-block-comment-ender-regexp + (c-lang-const c-block-comment-ender-regexp)) + (c-lang-defconst c-comment-start-regexp ;; Regexp to match the start of any type of comment. t (let ((re (c-make-keywords-re nil diff -r a3e0a2a33629 cc-mode.el --- a/cc-mode.el Sat Dec 31 16:59:41 2016 +0000 +++ b/cc-mode.el Mon Jan 09 21:03:52 2017 +0000 @@ -1189,6 +1189,7 @@ ;; Are we coalescing two tokens together, e.g. "fo o" -> "foo"? (when (< beg end) (c-unfind-coalesced-tokens beg end)) + (c-invalidate-sws-region-before end) ;; Are we (potentially) disrupting the syntactic context which ;; makes a type a type? E.g. by inserting stuff after "foo" in ;; "foo bar;", or before "foo" in "typedef foo *bar;"? @@ -1314,7 +1315,7 @@ (c-clear-char-property-with-value beg end 'syntax-table nil))) (c-trim-found-types beg end old-len) ; maybe we don't need all of these. - (c-invalidate-sws-region-after beg end) + (c-invalidate-sws-region-after beg end old-len) ;; (c-invalidate-state-cache beg) ; moved to `c-before-change'. (c-invalidate-find-decl-cache beg) Thanks for this rather obscure bug report. :-) -- Alan Mackenzie (Nuremberg, Germany).
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.