GNU bug report logs - #67417
29.1.50; c-ts-mode syntax issues with no brackets

Previous Next

Package: emacs;

Reported by: Arteen Abrishami <arteen <at> linux.ucla.edu>

Date: Thu, 23 Nov 2023 22:00:03 UTC

Severity: normal

Found in version 29.1.50

Fixed in version 29.2

Done: Dmitry Gutov <dmitry <at> gutov.dev>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Yuan Fu <casouri <at> gmail.com>
To: Dmitry Gutov <dmitry <at> gutov.dev>, Eli Zaretskii <eliz <at> gnu.org>, Arteen Abrishami <arteen <at> linux.ucla.edu>
Cc: 67417 <at> debbugs.gnu.org
Subject: bug#67417: 29.1.50; c-ts-mode syntax issues with no brackets
Date: Sun, 26 Nov 2023 17:47:49 -0800
On 11/24/23 10:26 AM, Dmitry Gutov wrote:
> On 24/11/2023 09:23, Eli Zaretskii wrote:
>>> Date: Thu, 23 Nov 2023 12:55:31 -0800
>>> From:  Arteen Abrishami via "Bug reports for GNU Emacs,
>>>   the Swiss army knife of text editors"<bug-gnu-emacs <at> gnu.org>
>>>
>>> This is specifically for the usage of `c-ts-mode` and is not a problem
>>> in `c-mode`. Sometimes, when you type something like:
>>>
>>> else
>>> break
>>>
>>> it won't indent the "break" until you type a semicolon. In this below
>>> scenario, it does not indent the break at all, but `c-mode` does, and
>>> switching from `c-mode` to `c-ts-mode` with correct indentation leaves
>>> it fixed, but `c-ts-mode` cannot detect or fix it itself.
>>>
>>> You can put it into a `.c` buffer all by itself and see:
>>>
>>> ```
>>> unsigned
>>> heap_pop(struct heapq * heap)
>>> {
>>>    if (heap->sz == 0)
>>>      return -1;
>>>
>>>    unsigned ret_val = heap->vals[0];
>>>    heap->vals[0] = heap->vals[heap->len];
>>>    heap->len -= 1;
>>>    unsigned i = 0;
>>>    unsigned lc;
>>>
>>>    while ((lc = HEAPQ_L_CHILD(i)) < heap->len)
>>>      {
>>>        unsigned rc = HEAPQ_R_CHILD(i);
>>>        /* no right child for our guy, special case */
>>>        if (rc == heap->len)
>>>          {
>>>            if (heap->vals[lc] < heap->vals[i])
>>>              SWAP(heap->vals[lc], heap->vals[i]);
>>>            break;
>>>          }
>>>
>>>        if (heap->vals[lc] < heap->vals[i])
>>>          {
>>>            SWAP(heap->vals[lc], heap->vals[i]);
>>>            i = lc;
>>>          }
>>>        else if (heap->vals[rc] < heap->vals[i])
>>>          {
>>>            SWAP(heap->vals[rc], heap->vals[i]);
>>>            i = rc;
>>>          }
>>>        else
>>>        break;
>>>             }
>>> }
>>> ```
>>>
>>> The very last break on the else without brackets around it will not 
>>> indent.c
>> Yuan, any comments?
>>
>> My personal take on this is that as long as typing the required
>> semi-colons fixes the indentation, we are okay in these cases, but if
>> we can do better (i.e. if the problem is not that tree-sitter returns
>> a tree with an error node), we should fix this even without relying on
>> the electric semi-colon.
>>
>> In the specific example above, it looks like tree-sitter does succeed
>> in parsing and shows a valid tree:
>>
>>        alternative:
>>         (else_clause else
>>          (break_statement break ;)))))
>>
>> So I wonder why we don't indent the "break;" part here.
>
> In my testing, it indents fine when after "else" there is either:
>
>  * some char(s) followed by closing curly
>  * or (optionally) some char(s) followed by semicolon
>
> When there is _no_ code between "else" and the closing curly, it 
> already indents fine in my testing (whether the semicolon is added or 
> not).
>
> Without either, the text after "else" isn't parsed as "alternative:" 
> -- it's parsed as a sibling of the "else" node. And, most 
> unfortunately, when "else" is followed by a closing curly, it's just 
> parsed as (ERROR else), so simply pressing RET does not indent the 
> empty line properly even when one is working with electric-pair-mode 
> enabled.
>
> I'd personally consider the last one a more definite bug in the 
> grammar, but maybe there is some good reason for it. I haven't found 
> anything relevant in the bug tracker.
>
> BTW, it seems like the latest C grammar changed how else without 
> braces is parsed, so "break" isn't reindented even with semicolon at 
> the end.

I pushed two commits which should fix the indentation for "break" after 
"else", and indentation for empty lines after if/else/for/while in 
general. The fix for the general case doesn't use the parse tree, since 
the parse tree is often incomplete when you type if (...) and hit 
return. Instead it uses a plain regexp match to see if the previous line 
starts with if/else/for/while. This seems like a reasonable heuristic to 
use before user types more things, at which point more accurate 
indentation rules would be used, since the parse tree should be more 
complete then.

Yuan





This bug report was last modified 1 year and 163 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.