Package: emacs;
Reported by: Alan Mackenzie <acm <at> muc.de>
Date: Tue, 22 Sep 2020 09:36:02 UTC
Severity: normal
Tags: confirmed
View this message in rfc822 format
From: Alan Mackenzie <acm <at> muc.de> To: Stefan Monnier <monnier <at> iro.umontreal.ca> Cc: 43558 <at> debbugs.gnu.org, Mattias EngdegÄrd <mattiase <at> acm.org>, acm <at> muc.de Subject: bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. Date: Sun, 22 Nov 2020 13:12:31 +0000
Hello, Stefan. On Thu, Nov 19, 2020 at 17:47:40 -0500, Stefan Monnier wrote: > >> So, yeah, you can add yet-another-hack on top of the other syntax.c > >> hacks if you want, but there's a good chance it will only ever be used > >> by CC-mode. It will take a lot more code changes in syntax.c than > >> a quick tweak to your Elisp code to search for "\*/". > [...] > > OK, here's the patch. > I think the patch agrees with my assessment above (even though it's > still missing a etc/NEWS entry, adjustment to the docstring of > modify-syntax-entry and to the .texi manual). Here are these things: diff --git a/doc/lispref/syntax.texi b/doc/lispref/syntax.texi index b99b5de0b3..4e9e9207c3 100644 --- a/doc/lispref/syntax.texi +++ b/doc/lispref/syntax.texi @@ -287,21 +287,21 @@ Syntax Flags @cindex syntax flags In addition to the classes, entries for characters in a syntax table -can specify flags. There are eight possible flags, represented by the +can specify flags. There are nine possible flags, represented by the characters @samp{1}, @samp{2}, @samp{3}, @samp{4}, @samp{b}, @samp{c}, -@samp{n}, and @samp{p}. +@samp{e}, @samp{n}, and @samp{p}. All the flags except @samp{p} are used to describe comment delimiters. The digit flags are used for comment delimiters made up of 2 characters. They indicate that a character can @emph{also} be part of a comment sequence, in addition to the syntactic properties associated with its character class. The flags are independent of the -class and each other for the sake of characters such as @samp{*} in -C mode, which is a punctuation character, @emph{and} the second +class and each other for the sake of characters such as @samp{*} in C +mode, which is a punctuation character, @emph{and} the second character of a start-of-comment sequence (@samp{/*}), @emph{and} the first character of an end-of-comment sequence (@samp{*/}). The flags -@samp{b}, @samp{c}, and @samp{n} are used to qualify the corresponding -comment delimiter. +@samp{b}, @samp{c}, @samp{e}, and @samp{n} are used to qualify the +corresponding comment delimiter. Here is a table of the possible flags for a character @var{c}, and what they mean: @@ -332,6 +332,13 @@ Syntax Flags alternative ``c'' comment style. For a two-character comment delimiter, @samp{c} on either character makes it of style ``c''. +@item +@samp{e} means that when @var{c}, a comment ender or first character +of a two character ender, is directly proceded by one or more escape +characters, @var{c} does not act as a comment ender. Contrast this +with the effect of variable @code{comment-end-can-be-escaped} +(@pxref{Control Parsing}). + @item @samp{n} on a comment delimiter character specifies that this kind of comment can be nested. Inside such a comment, only comments of the @@ -357,7 +364,7 @@ Syntax Flags @item @samp{*} @samp{23b} @item newline -@samp{>} +@samp{> e} @end table This defines four comment-delimiting sequences: @@ -377,7 +384,9 @@ Syntax Flags @item newline This is a comment-end sequence for ``a'' style, because the newline -character does not have the @samp{b} flag. +character does not have the @samp{b} flag. It can be escaped by one +or more @samp{\} characters, so that an ``a'' style comment can +continue onto the next line. @end table @item @@ -962,9 +971,14 @@ Control Parsing @defvar comment-end-can-be-escaped If this buffer local variable is non-@code{nil}, a single character which usually terminates a comment doesn't do so when that character -is escaped. This is used in C and C++ Modes, where line comments -starting with @samp{//} can be continued onto the next line by -escaping the newline with @samp{\}. +is escaped. This used to be used in C and C++ Modes, where line +comments starting with @samp{//} can be continued onto the next line +by escaping the newline with @samp{\}. + +Contrast this variable with the @samp{e} syntax flag (@pxref{Syntax +Flags}), where two consecutive escape characters escape the comment +ender. @code{comment-end-can-be-escaped} should not be used together +with the @samp{e} syntax flag. @end defvar You can use @code{forward-comment} to move forward or backward over @@ -1037,6 +1051,8 @@ Syntax Table Internals @samp{3} @tab @code{(ash 1 18)} @tab @samp{n} @tab @code{(ash 1 22)} @item @samp{4} @tab @code{(ash 1 19)} @tab @samp{c} @tab @code{(ash 1 23)} +@item +@tab <at> tab @samp{e} @tab @code{(ash 1 24)} @end multitable @defun string-to-syntax desc diff --git a/etc/NEWS b/etc/NEWS index a0e72bc673..3b292e8f41 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -1758,6 +1758,12 @@ ledit.el, lmenu.el, lucid.el and old-whitespace.el. * Lisp Changes in Emacs 28.1 ++++ +** New syntax flag 'e'. +This indicates that one or two (or more) escape characters escape a +comment ender with this flag, causing the comment to be continued past +that comment ender (typically onto the next line). + +++ ** 'set-window-configuration' now takes an optional 'dont-set-frame' parameter which, when non-nil, instructs the function not to select diff --git a/src/syntax.c b/src/syntax.c index df07809aaa..7bdbd114ba 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -1224,7 +1270,7 @@ Two-character sequences are represented as described below. The second character of NEWENTRY is the matching parenthesis, used only if the first character is `(' or `)'. Any additional characters are flags. -Defined flags are the characters 1, 2, 3, 4, b, p, and n. +Defined flags are the characters 1, 2, 3, 4, b, c, e, n, and p. 1 means CHAR is the start of a two-char comment start sequence. 2 means CHAR is the second character of such a sequence. 3 means CHAR is the start of a two-char comment end sequence. @@ -1239,6 +1285,11 @@ c (on any of its chars) using this flag: c means CHAR is part of comment sequence c. n means CHAR is part of a nestable comment sequence. + e means CHAR, when a comment ender or first char of a two character + comment ender, can be escaped by (any number of consecutive) + characters with escape syntax. C and C++ use this facility. + Compare and contrast with the variable `comment-end-can-be-escaped'. + p means CHAR is a prefix character for `backward-prefix-chars'; such characters are treated as whitespace when they occur between expressions. > I really can't understand why you resist so much the use of > a `syntax-table` property on those rare \\\n sequences. Because syntax-table text properties are already used for so many different things in CC Mode (I think the count is five in C++ Mode). Adding another one would mean having to scan for this rare construct at every buffer change, and this would slow things down, possibly a lot. There is no slowdown (beyond a possible microscopic one) in the modification to syntax.c and, as a bonus, I have written around 200 test cases for syntax.c's comment features. > Stefan > PS: Also, I just noticed that `gcc -Wall` warns about the use of such > multiline comments, so it doesn't seem to be a very popular feature. It is more of a mistake that people occasionally might make than a feature. In my opinion, having escaped newlines inside line comments is a bug in the C/C++ language standards. Anybody might "end" a line comment accidentally with "\" or "\\". > PPS: For reference, I just tried to add support for it in sm-c-mode > and this is the resulting code: Just to emphasize Stefan Kangas's point, it is a newline preceded by a "\" which continues the comment, not an escaped NL in the ordinary sense. In particular two "\"s followed by NL still continue the comment. > @@ -312,7 +315,15 @@ E.g. a #define nested within 2 #ifs will be turned into \"# define\"." > 'syntax-table (string-to-syntax "|")) > (put-text-property (match-beginning 2) (match-end 2) > 'syntax-table (string-to-syntax "|"))) > - (sm-c--cpp-syntax-propertize end))))) > + (sm-c--cpp-syntax-propertize end)))) > + ("\\\\\\(\n\\)" > + (1 (let ((ppss (save-excursion (syntax-ppss (match-beginning 0))))) > + (when (and (nth 4 ppss) ;Within a comment > + (null (nth 7 ppss)) ;Within a // comment > + (save-excursion ;The \ is not itself escaped > + (goto-char (match-beginning 0)) > + (zerop (mod (skip-chars-backward "\\\\") 2)))) > + (string-to-syntax ".")))))) > (point) end)) > > (defun sm-c-syntactic-face-function (ppss) Yes, something like this would be possible. But all these syntax-ppsss would be slow, at least somewhat, as discussed above. -- Alan Mackenzie (Nuremberg, Germany).
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.