GNU bug report logs - #74963
Ambiguous treesit named and anonymous nodes in ruby-ts-mode

Previous Next

Package: emacs;

Reported by: Juri Linkov <juri <at> linkov.net>

Date: Thu, 19 Dec 2024 07:20:02 UTC

Severity: normal

Full log


Message #47 received at 74963 <at> debbugs.gnu.org (full text, mbox):

From: Yuan Fu <casouri <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: Dmitry Gutov <dmitry <at> gutov.dev>, 74963 <at> debbugs.gnu.org
Subject: Re: bug#74963: Ambiguous treesit named and anonymous nodes in
 ruby-ts-mode
Date: Sun, 12 Jan 2025 23:47:33 -0800

> On Jan 12, 2025, at 11:31 PM, Juri Linkov <juri <at> linkov.net> wrote:
> 
>> I see that all ts-modes solve this common problem each in its own way
>> (here 'list' indicates a list of strings that should match node names):
>> 
>>  c-ts-mode:    (regexp-opt list 'symbols)
>>  js-ts-mode:   (concat "\\_<" (regexp-opt list t) "\\_>")
>>  java-ts-mode: (rx (or list))
>>  ruby-ts-mode: (rx bol (or list) eol)
>> 
>> Currently there is no uniform way to handle this frequent need.
>> 'concat' like above looks too ugly, but 'regexp-opt' with the
>> 'symbols' arg produces a strange regexp for matching symbols.
> 
> I was thinking about adding two functions treesit-regexp-strict
> and treesit-regexp-lax.  But then discovered that some things
> require specifying both strict and lax matches for the same thing.
> For example, take treesit-thing-settings from c-ts-mode:
> 
>    (sentence
>     ,(regexp-opt '("preproc"
>                    "declaration"
>                    "specifier"
>                    "attributed_statement"
>                    "labeled_statement"
>                    "expression_statement"
>                    "if_statement"
>                    "switch_statement"
>                    "do_statement"
>                    "while_statement"
>                    "for_statement"
>                    "return_statement"
>                    "break_statement"
>                    "continue_statement"
>                    "goto_statement"
>                    "case_statement")))
> 
> "preproc" can be lax, this is fine to match all preprocessor directives.
> But "declaration" should be strict and should not match "parameter_declaration".
> Also "specifier" should not match "attribute_specifier" and "storage_class_specifier",
> but only "enum_specifier" and "union_specifier" that end with the semicolon.
> Also no need to specify all statements separately, it should be sufficient
> to use lax match with "statement".
> 
> The most expressive language to specify all these requirements is the rx macro,
> so let's use it in ts-modes.  Here is how the 'sentence' thing will look like:
> 
>    (sentence
>     ,(rx (or (and bos (or "declaration"
>                           "enum_specifier"
>                           "union_specifier")
>                   eos)
>              (or "preproc"
>                  "statement"))))

Looks good. I’ve always used rx, it has the additional benefit of being macro expanded at compile time.

Also, I finally added support for ‘and’, ‘named’ and ‘anonymous’. I haven’t test it yet (sorry).

Yuan



This bug report was last modified 138 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.