Package: emacs;
Reported by: Alan Mackenzie <acm <at> muc.de>
Date: Tue, 22 Sep 2020 09:36:02 UTC
Severity: normal
Tags: confirmed
View this message in rfc822 format
From: Alan Mackenzie <acm <at> muc.de> To: Stefan Monnier <monnier <at> iro.umontreal.ca> Cc: 43558 <at> debbugs.gnu.org, Mattias EngdegÄrd <mattiase <at> acm.org>, acm <at> muc.de Subject: bug#43558: [PATCH]: Fix (forward-comment 1) when end delimiter is escaped. Date: Thu, 19 Nov 2020 21:18:22 +0000
Hello, Stefan. On Thu, Sep 24, 2020 at 12:56:42 -0400, Stefan Monnier wrote: > > As already said, this is a(n ugly) workaround. syntax.c should handle > > comments in all their generality. With a bit of consideration, the > > method to do this is clear: > In my world, it's quite normal for a specific language's lexical rules > not to line up 100% with syntax tables (whether for strings, comments, > younameit). I don't see anything very special here. > A `syntax-propertize` rule for "\*/" should be very easy to implement > and fairly cheap since the regexp is simple and will almost never match. > So, yeah, you can add yet-another-hack on top of the other syntax.c > hacks if you want, but there's a good chance it will only ever be used > by CC-mode. It will take a lot more code changes in syntax.c than > a quick tweak to your Elisp code to search for "\*/". > I do think it would be good to handle this without `syntax-table` > text-property hacks, but I think that should come with an overhaul of > syntax.c based on a major-mode provided DFA (or something like that) so > it can accommodate all the various oddball cases without even the need > to introduce the notion of escaping comment markers. OK, here's the patch. As a matter of interest, it's been heavily tested by the .../test/src/syntax-tests.el unit tests, further enhancements to which are part of the patch. Just as a reminder, the motivation is to be able to have syntax.c correctly parse C/C++ line comments which look like: foo(); // comment \\ second line of comment. by introducing a new syntax flag "e" as a modifier on the syntax entry for \n: (modify-syntax-entry ?\n "> be") > Stefan diff --git a/src/syntax.c b/src/syntax.c index df07809aaa..c701729ba1 100644 --- a/src/syntax.c +++ b/src/syntax.c @@ -108,6 +108,11 @@ SYNTAX_FLAGS_COMMENT_NESTED (int flags) { return (flags >> 22) & 1; } +static bool +SYNTAX_FLAGS_COMMENT_ESCAPES (int flags) +{ + return (flags >> 24) & 1; +} /* FLAGS should be the flags of the main char of the comment marker, e.g. the second for comstart and the first for comend. */ @@ -673,6 +678,26 @@ prev_char_comend_first (ptrdiff_t pos, ptrdiff_t pos_byte) return val; } +static bool +comment_ender_quoted (ptrdiff_t from, ptrdiff_t from_byte, int syntax) +{ + int c; + int next_syntax; + if (comment_end_can_be_escaped && char_quoted (from, from_byte)) + return true; + if (SYNTAX_FLAGS_COMMENT_ESCAPES (syntax)) + { + dec_both (&from, &from_byte); + UPDATE_SYNTAX_TABLE_BACKWARD (from); + c = FETCH_CHAR_AS_MULTIBYTE (from_byte); + next_syntax = SYNTAX_WITH_FLAGS (c); + UPDATE_SYNTAX_TABLE_FORWARD (from + 1); + if (next_syntax == Sescape || next_syntax == Scharquote) + return true; + } + return false; +} + /* Check whether charpos FROM is at the end of a comment. FROM_BYTE is the bytepos corresponding to FROM. Do not move back before STOP. @@ -755,6 +780,20 @@ back_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, && SYNTAX_FLAGS_COMEND_SECOND (prev_syntax)); comstart = (com2start || code == Scomment); + /* Check for any current delimiter being escaped. */ + if (from > stop + && (((com2end || code == Sendcomment) + && comment_ender_quoted (from, from_byte, syntax)) + || (code == Scomment + && comment_end_can_be_escaped + && char_quoted (from, from_byte)))) + { + dec_both (&from, &from_byte); + UPDATE_SYNTAX_TABLE_BACKWARD (from); + com2end = comstart = com2start = 0; + syntax = Smax; + } + /* Nasty cases with overlapping 2-char comment markers: - snmp-mode: -- c -- foo -- c -- --- c -- @@ -1191,6 +1230,10 @@ the value of a `syntax-table' text property. */) case 'c': val |= 1 << 23; break; + + case 'e': + val |= 1 << 24; + break; } if (val < ASIZE (Vsyntax_code_object) && NILP (match)) @@ -1279,7 +1322,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value, (Lisp_Object syntax) { int code, syntax_code; - bool start1, start2, end1, end2, prefix, comstyleb, comstylec, comnested; + bool start1, start2, end1, end2, prefix, comstyleb, comstylec, comnested, + comescapes; char str[2]; Lisp_Object first, match_lisp, value = syntax; @@ -1320,6 +1364,7 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value, comstyleb = SYNTAX_FLAGS_COMMENT_STYLEB (syntax_code); comstylec = SYNTAX_FLAGS_COMMENT_STYLEC (syntax_code); comnested = SYNTAX_FLAGS_COMMENT_NESTED (syntax_code); + comescapes = SYNTAX_FLAGS_COMMENT_ESCAPES (syntax_code); if (Smax <= code) { @@ -1353,6 +1398,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value, insert ("c", 1); if (comnested) insert ("n", 1); + if (comescapes) + insert ("e", 1); insert_string ("\twhich means: "); @@ -1416,6 +1463,8 @@ DEFUN ("internal-describe-syntax-value", Finternal_describe_syntax_value, insert_string (" (comment style c)"); if (comnested) insert_string (" (nestable)"); + if (comescapes) + insert_string (" (can be escaped)"); if (prefix) { @@ -2336,7 +2385,7 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, && SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0) == style && (SYNTAX_FLAGS_COMMENT_NESTED (syntax) ? (nesting > 0 && --nesting == 0) : nesting < 0) - && !(comment_end_can_be_escaped && char_quoted (from, from_byte))) + && !comment_ender_quoted (from, from_byte, syntax)) /* We have encountered a comment end of the same style as the comment sequence which began this comment section. */ @@ -2354,12 +2403,12 @@ forw_comment (ptrdiff_t from, ptrdiff_t from_byte, ptrdiff_t stop, /* We have encountered a nested comment of the same style as the comment sequence which began this comment section. */ nesting++; - if (comment_end_can_be_escaped - && (code == Sescape || code == Scharquote)) + if (SYNTAX_FLAGS_COMEND_FIRST (syntax) + && comment_ender_quoted (from, from_byte, syntax)) { inc_both (&from, &from_byte); UPDATE_SYNTAX_TABLE_FORWARD (from); - if (from == stop) continue; /* Failure */ + continue; } inc_both (&from, &from_byte); UPDATE_SYNTAX_TABLE_FORWARD (from); @@ -2493,8 +2542,8 @@ between them, return t; otherwise return nil. */) /* We're at the start of a comment. */ found = forw_comment (from, from_byte, stop, comnested, comstyle, 0, &out_charpos, &out_bytepos, &dummy, &dummy2); - from = out_charpos; from_byte = out_bytepos; - if (!found) + from = out_charpos; from_byte = out_bytepos; + if (!found) { SET_PT_BOTH (from, from_byte); return Qnil; @@ -2526,21 +2575,27 @@ between them, return t; otherwise return nil. */) if (code == Sendcomment) comstyle = SYNTAX_FLAGS_COMMENT_STYLE (syntax, 0); if (from > stop && SYNTAX_FLAGS_COMEND_SECOND (syntax) - && prev_char_comend_first (from, from_byte) - && !char_quoted (from - 1, dec_bytepos (from_byte))) + && prev_char_comend_first (from, from_byte)) { int other_syntax; - /* We must record the comment style encountered so that + /* We must record the comment style encountered so that later, we can match only the proper comment begin sequence of the same style. */ dec_both (&from, &from_byte); - code = Sendcomment; - /* Calling char_quoted, above, set up global syntax position - at the new value of FROM. */ c1 = FETCH_CHAR_AS_MULTIBYTE (from_byte); other_syntax = SYNTAX_WITH_FLAGS (c1); - comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax); - comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax); + if (!comment_ender_quoted (from, from_byte, other_syntax)) + { + code = Sendcomment; + comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax); + comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax); + syntax = other_syntax; + } + else + { + inc_both (&from, &from_byte); + UPDATE_SYNTAX_TABLE_FORWARD (from); + } } if (code == Scomment_fence) @@ -2579,7 +2634,8 @@ between them, return t; otherwise return nil. */) } else if (code == Sendcomment) { - found = (!quoted || !comment_end_can_be_escaped) + found = + !comment_ender_quoted (from, from_byte, syntax) && back_comment (from, from_byte, stop, comnested, comstyle, &out_charpos, &out_bytepos); if (!found) @@ -2864,6 +2920,7 @@ scan_lists (EMACS_INT from0, EMACS_INT count, EMACS_INT depth, bool sexpflag) other_syntax = SYNTAX_WITH_FLAGS (c2); comstyle = SYNTAX_FLAGS_COMMENT_STYLE (other_syntax, syntax); comnested |= SYNTAX_FLAGS_COMMENT_NESTED (other_syntax); + syntax = other_syntax; } /* Quoting turns anything except a comment-ender @@ -2946,7 +3003,10 @@ scan_lists (EMACS_INT from0, EMACS_INT count, EMACS_INT depth, bool sexpflag) case Sendcomment: if (!parse_sexp_ignore_comments) break; - found = back_comment (from, from_byte, stop, comnested, comstyle, + found = + (from == stop + || !comment_ender_quoted (from, from_byte, syntax)) + && back_comment (from, from_byte, stop, comnested, comstyle, &out_charpos, &out_bytepos); /* FIXME: if !found, it really wasn't a comment-end. For single-char Sendcomment, we can't do much about it apart diff --git a/test/src/syntax-resources/syntax-comments.txt b/test/src/syntax-resources/syntax-comments.txt index a292d816b9..f3357ea244 100644 --- a/test/src/syntax-resources/syntax-comments.txt +++ b/test/src/syntax-resources/syntax-comments.txt @@ -34,7 +34,7 @@ 54{ //74 \ }54 55{/* */}55 -56{ /*76 \*/ }56 +56{ /*76 \*/80 }56 57*/77 58}58 60{ /*78 \\*/79}60 @@ -87,6 +87,21 @@ 110 111#| ; |#111 +/* Comments and purported comments containing string delimiters. */ +120/* "string" */120 +121/* "" */121 +122/* " */122 +130/* +" " */130 +" "*/123 +124/* " ' */124 +126/* +" ' */126 +127/* " " " " " */127 +128/* " ' " ' " ' */128 +129/* ' " ' " ' */129 +" ' */125 + Local Variables: mode: fundamental eval: (set-syntax-table (make-syntax-table)) diff --git a/test/src/syntax-tests.el b/test/src/syntax-tests.el index edee01ec58..399986c31d 100644 --- a/test/src/syntax-tests.el +++ b/test/src/syntax-tests.el @@ -307,6 +307,7 @@ syntax-pps-comments ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defun {-in () (setq parse-sexp-ignore-comments t) + (setq comment-use-syntax-ppss nil) (setq comment-end-can-be-escaped nil) (modify-syntax-entry ?{ "<") (modify-syntax-entry ?} ">")) @@ -336,6 +337,7 @@ {-out ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defun \;-in () (setq parse-sexp-ignore-comments t) + (setq comment-use-syntax-ppss nil) (setq comment-end-can-be-escaped nil) (modify-syntax-entry ?\n ">") (modify-syntax-entry ?\; "<") @@ -375,6 +377,7 @@ \;-out ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defun \#|-in () (setq parse-sexp-ignore-comments t) + (setq comment-use-syntax-ppss nil) (modify-syntax-entry ?# ". 14") (modify-syntax-entry ?| ". 23n") (modify-syntax-entry ?\; "< b") @@ -418,15 +421,18 @@ \#|-out ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; (defun /*-in () (setq parse-sexp-ignore-comments t) + (setq comment-use-syntax-ppss nil) (setq comment-end-can-be-escaped t) (modify-syntax-entry ?/ ". 124b") (modify-syntax-entry ?* ". 23") - (modify-syntax-entry ?\n "> b")) + (modify-syntax-entry ?\n "> b") + (modify-syntax-entry ?\' "\"")) (defun /*-out () (setq comment-end-can-be-escaped nil) (modify-syntax-entry ?/ ".") (modify-syntax-entry ?* ".") - (modify-syntax-entry ?\n " ")) + (modify-syntax-entry ?\n " ") + (modify-syntax-entry ?\' ".")) (eval-and-compile (setq syntax-comments-section "c")) @@ -489,4 +495,142 @@ /*-out (syntax-pps-comments /* 56 76 77 58) (syntax-pps-comments /* 60 78 79) +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Emacs 28 "C" style comments - `comment-end-can-be-escaped' is nil, the +;; "e" flag is used for line comments. +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +(defun //-in () + (setq parse-sexp-ignore-comments t) + (setq comment-use-syntax-ppss nil) + (modify-syntax-entry ?/ ". 124be") + (modify-syntax-entry ?* ". 23") + (modify-syntax-entry ?\n "> be") + (modify-syntax-entry ?\' "\"")) +(defun //-out () + (modify-syntax-entry ?/ ".") + (modify-syntax-entry ?* ".") + (modify-syntax-entry ?\n " ") + (modify-syntax-entry ?\' ".")) +(eval-and-compile + (setq syntax-comments-section "c++")) + +(syntax-comments // forward t 1) +(syntax-comments // backward t 1) +(syntax-comments // forward t 2) +(syntax-comments // backward t 2) +(syntax-comments // forward t 3) +(syntax-comments // backward t 3) + +(syntax-comments // forward t 4) +(syntax-comments // backward t 4) +(syntax-comments // forward t 5 6) +(syntax-comments // backward nil 5 0) +(syntax-comments // forward nil 6 0) +(syntax-comments // backward t 6 5) + +(syntax-comments // forward t 7) +(syntax-comments // backward t 7) +(syntax-comments // forward nil 8 0) +(syntax-comments // backward nil 8 0) +(syntax-comments // forward t 9) +(syntax-comments // backward t 9) + +(syntax-comments // forward nil 10 0) +(syntax-comments // backward nil 10 0) +(syntax-comments // forward t 11) +(syntax-comments // backward t 11) + +(syntax-comments // forward t 13) +(syntax-comments // backward t 13) +(syntax-comments // forward t 15) +(syntax-comments // backward t 15) + +;; Emacs 28 "C" style comments inside brace lists. +(syntax-br-comments // forward t 50) +(syntax-br-comments // backward t 50) +(syntax-br-comments // forward t 51) +(syntax-br-comments // backward t 51) +(syntax-br-comments // forward t 52) +(syntax-br-comments // backward t 52) + +(syntax-br-comments // forward t 53) +(syntax-br-comments // backward t 53) +(syntax-br-comments // forward t 54 58) +(syntax-br-comments // backward t 54) +(syntax-br-comments // forward t 55) +(syntax-br-comments // backward t 55) + +(syntax-br-comments // forward t 56 56) +(syntax-br-comments // backward t 58 54) +(syntax-br-comments // backward nil 59) +(syntax-br-comments // forward t 60) +(syntax-br-comments // backward t 60) + +;; Emacs 28 "C" style comments parsed by `parse-partial-sexp'. +(syntax-pps-comments // 50 70 71) +(syntax-pps-comments // 52 72 73) +(syntax-pps-comments // 54 74 55 58) +(syntax-pps-comments // 56 76 80) +(syntax-pps-comments // 60 78 79) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; Comments containing string delimiters. +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +(eval-and-compile + (setq syntax-comments-section "c-\"")) + +(syntax-comments /* forward t 120) +(syntax-comments /* backward t 120) +(syntax-comments /* forward t 121) +(syntax-comments /* backward t 121) +(syntax-comments /* forward t 122) +(syntax-comments /* backward t 122) + +(syntax-comments /* backward nil 123 0) +(syntax-comments /* forward t 124) +(syntax-comments /* backward t 124) +(syntax-comments /* backward nil 125 0) +(syntax-comments /* forward t 126) +(syntax-comments /* backward t 126) + +(syntax-comments /* forward t 127) +(syntax-comments /* backward t 127) +(syntax-comments /* forward t 128) +(syntax-comments /* backward t 128) +(syntax-comments /* forward t 129) +(syntax-comments /* backward t 129) + +(syntax-comments /* forward t 130) +(syntax-comments /* backward t 130) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; The same again, with Emacs 28 style C comments. +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +(eval-and-compile + (setq syntax-comments-section "c++-\"")) + +(syntax-comments // forward t 120) +(syntax-comments // backward t 120) +(syntax-comments // forward t 121) +(syntax-comments // backward t 121) +(syntax-comments // forward t 122) +(syntax-comments // backward t 122) + +(syntax-comments // backward nil 123 0) +(syntax-comments // forward t 124) +(syntax-comments // backward t 124) +(syntax-comments // backward nil 125 0) +(syntax-comments // forward t 126) +(syntax-comments // backward t 126) + +(syntax-comments // forward t 127) +(syntax-comments // backward t 127) +(syntax-comments // forward t 128) +(syntax-comments // backward t 128) +(syntax-comments // forward t 129) +(syntax-comments // backward t 129) + +(syntax-comments // forward t 130) +(syntax-comments // backward t 130) + ;;; syntax-tests.el ends here -- Alan Mackenzie (Nuremberg, Germany).
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.