From unknown Sat Sep 06 10:20:47 2025 X-Loop: help-debbugs@gnu.org Subject: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Resent-From: Dominik Honnef Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 22 Oct 2023 06:32:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 66674 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: 66674@debbugs.gnu.org X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.169795626412281 (code B ref -1); Sun, 22 Oct 2023 06:32:01 +0000 Received: (at submit) by debbugs.gnu.org; 22 Oct 2023 06:31:04 +0000 Received: from localhost ([127.0.0.1]:45405 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1quRzK-0003Bx-U8 for submit@debbugs.gnu.org; Sun, 22 Oct 2023 02:31:03 -0400 Received: from lists.gnu.org ([2001:470:142::17]:60224) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1quIif-0006CQ-IW for submit@debbugs.gnu.org; Sat, 21 Oct 2023 16:37:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1quIi8-0003Jy-1D for bug-gnu-emacs@gnu.org; Sat, 21 Oct 2023 16:36:40 -0400 Received: from mail-wr1-x42f.google.com ([2a00:1450:4864:20::42f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1quIi6-00055d-0k for bug-gnu-emacs@gnu.org; Sat, 21 Oct 2023 16:36:39 -0400 Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-32da7ac5c4fso1379249f8f.1 for ; Sat, 21 Oct 2023 13:36:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=honnef-co.20230601.gappssmtp.com; s=20230601; t=1697920594; x=1698525394; darn=gnu.org; h=mime-version:message-id:date:subject:to:from:from:to:cc:subject :date:message-id:reply-to; bh=PFjdNz5zQlWoi+hYlZqNb8fJo0xNv/JWYyfTS1bT6FE=; b=ANuiafN2IGPYjUD9zxdxI73oNzIZ/9e47Zi7HuI9IsV97a+8mnp48cjS3+/FRPjk2r XdRHuIKltOQgy8up3dDkqSNUgwApRUeaYYlQ2dQWKwAoJD7bZxnXx0o5EP+jYWd+BA81 NydwEULQpDPrHkZyQHr/Uc+KlPZrBwgBT32ccKulw6KlPwJL1g4vp+5Zy3UoEJ2tiTiH BoJA4D5HV+dHsp1cWkIP9RwHyHad5fRmxV54wUNG7E+FHIi4NLw7lF7tNHIWVhzZQ8qj 1wJKRVsiptf9unKTeJJj+N+WcJxF/kN18jeSAKYTgNGR7KP0YIYkkHu7+L9XEHe3BsG1 H2dA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697920594; x=1698525394; h=mime-version:message-id:date:subject:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=PFjdNz5zQlWoi+hYlZqNb8fJo0xNv/JWYyfTS1bT6FE=; b=F758wlJIs53Lt87Ez61/r05cQ+pg3Og7Ef6r0K/EVGKUp9mTqULGbwv78ea9w4KJHG B41WVvE0r/4hcIp1ybMxZMoDYY4cZ15Y+Y+NnnbBTgNn8dZ2jLZbMvSstyw7jLPls3gt rynyjcCyyTHO6pp0o+HdigK3THP74zMUasWuwQB0k3ioo0nEw5frUJWvqttcZc7GYYzy kHBw6D1h9mZf/L2ftwMpO462IDlAuA+XnzroVjY1Mqwa4j4+DAq+jupAgmCwSMiaYXPy Ceul7lgOJzICdYRGYeN3IoKIBh6Y6AGyxyYLXTDdXuIPNUVJzk0iM99FANDMFvsW6+PA wflw== X-Gm-Message-State: AOJu0Yx++62QtO33wMKeiaigHZlxWJCzhcJGZBwmcZFT5Q0JFePPAoMh saaGyEjM/W19wziRwS+IzRkmOKcFxF8XAK445Is= X-Google-Smtp-Source: AGHT+IGUSXmAOMU5WaXkIJSPW/YuCDOu1IZVb+HWGoIBLQsCIeL3D2nApWT0enIDiKNs6UpaNO+L2w== X-Received: by 2002:a5d:44ce:0:b0:329:6b3e:d87d with SMTP id z14-20020a5d44ce000000b003296b3ed87dmr3236056wrr.42.1697920593299; Sat, 21 Oct 2023 13:36:33 -0700 (PDT) Received: from localhost (ip-176-199-155-051.um44.pools.vodafone-ip.de. [176.199.155.51]) by smtp.gmail.com with ESMTPSA id a10-20020adfe5ca000000b0032415213a6fsm4290864wrn.87.2023.10.21.13.36.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 21 Oct 2023 13:36:32 -0700 (PDT) From: Dominik Honnef Date: Sat, 21 Oct 2023 22:36:30 +0200 Message-ID: <87edhnzp9t.fsf@honnef.co> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: none client-ip=2a00:1450:4864:20::42f; envelope-from=dominik@honnef.co; helo=mail-wr1-x42f.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_FILL_THIS_FORM_SHORT=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 0.0 (/) X-Mailman-Approved-At: Sun, 22 Oct 2023 02:31:02 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Using tree-sitter's CLI as well as the publicly hosted playground produce different parse trees than treesit in Emacs. Specifically, the assignment of nodes to named fields differs. Given the following C source: void main() { int x = // foo 1+ // comment 2; } treesit-explore-mode displays the following tree: (translation_unit (function_definition type: (primitive_type) declarator: (function_declarator declarator: (identifier) parameters: (parameter_list ( ))) body: (compound_statement { (declaration type: (primitive_type) declarator: (init_declarator declarator: (identifier) = value: (comment) (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) ;) }))) Note how in the init_declarator node, the 'value' field is a comment node, and similarly for the 'right' field in the binary_expression node. Running 'tree-sitter parse file.c', on the other hand, produces the following tree: (translation_unit [0, 0] - [6, 0] (function_definition [0, 0] - [5, 1] type: (primitive_type [0, 0] - [0, 4]) declarator: (function_declarator [0, 5] - [0, 11] declarator: (identifier [0, 5] - [0, 9]) parameters: (parameter_list [0, 9] - [0, 11])) body: (compound_statement [0, 12] - [5, 1] (declaration [1, 2] - [4, 6] type: (primitive_type [1, 2] - [1, 5]) declarator: (init_declarator [1, 6] - [4, 5] declarator: (identifier [1, 6] - [1, 7]) (comment [1, 10] - [1, 16]) value: (binary_expression [2, 4] - [4, 5] left: (number_literal [2, 4] - [2, 5]) (comment [3, 4] - [3, 14]) right: (number_literal [4, 4] - [4, 5]))))))) Here, the two comment nodes appear as unnamed nodes. IMHO the second tree is a more useful one, as the named fields contain the semantically important subtrees (e.g. a binary expression is made up of a left and right subtree, not a left subtree, a right comment, and then some unnamed subtree.) Emacs's tree makes writing queries less convenient, as instead of being able to refer to well-defined names, one has to rely on child indices to account for comments. Further mismatch arises from repeated fields and separators. Consider the following Go source: package pkg var a, b, c = 1, 2, 3 treesit-explore-mode displays the following tree: (source_file (package_clause package (package_identifier)) \n (var_declaration var (var_spec name: (identifier) name: , (identifier) value: , (identifier) = (expression_list (int_literal) , (int_literal) , (int_literal)))) \n) Here, the var_spec node has two fields named 'name' even though the source specifies three names. Furthermore, The second 'name', as well as 'value' are set to the ',' separator between identifiers. Two of the three identifiers aren't named. 'tree-sitter parse file.go', on the other hand, produces this more accurate tree: (source_file [0, 0] - [2, 21] (package_clause [0, 0] - [0, 11] (package_identifier [0, 8] - [0, 11])) (var_declaration [2, 0] - [2, 21] (var_spec [2, 4] - [2, 21] name: (identifier [2, 4] - [2, 5]) name: (identifier [2, 7] - [2, 8]) name: (identifier [2, 10] - [2, 11]) value: (expression_list [2, 14] - [2, 21] (int_literal [2, 14] - [2, 15]) (int_literal [2, 17] - [2, 18]) (int_literal [2, 20] - [2, 21]))))) This reproduces with 29.1 as well as 30.0.50. From unknown Sat Sep 06 10:20:47 2025 X-Loop: help-debbugs@gnu.org Subject: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 25 Oct 2023 13:04:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 66674 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Dominik Honnef , Yuan Fu Cc: 66674@debbugs.gnu.org Received: via spool by 66674-submit@debbugs.gnu.org id=B66674.169823902228231 (code B ref 66674); Wed, 25 Oct 2023 13:04:02 +0000 Received: (at 66674) by debbugs.gnu.org; 25 Oct 2023 13:03:42 +0000 Received: from localhost ([127.0.0.1]:57693 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qvdXx-0007LH-V2 for submit@debbugs.gnu.org; Wed, 25 Oct 2023 09:03:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46094) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qvdXv-0007L1-Qs for 66674@debbugs.gnu.org; Wed, 25 Oct 2023 09:03:40 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qvdXL-0001xb-NV; Wed, 25 Oct 2023 09:03:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=n0jZ2TtcH17DmMgCWJqcDkZUn2Iy4TkmXr+f6VcNlIU=; b=g7NaTxkHUwti 5oq+jO/YTyizfH1SlbWXsuupGA29dJG6RhQOk5ZtiXvEM6wAMyRRGg4DJYv3s9xW3B2Xw+L3I0p1o LPj83ThoGh4Gd62IFoHFLA2pdLICmNf+rV3XPu3jaJG1XV294Q1lcITlGykb+meu4DtFiVBRV0Mxw B1NnNiPdqA9Sp4MuXXG3kUcJeF6QIt581NVi/7o2BeJFdPpBfN7O4g54EGmZEuUAVGyze6dgVtqcv OBmieoir8IxJLKOxr8IuNH21iY4AsDJsXuNRVZ2YQ1Hgt4aguqwjzvLvjBGjjC8YPWAulRhplYP+B +DB+5hdGz3M/6Hw1zaZBtw==; Date: Wed, 25 Oct 2023 16:03:10 +0300 Message-Id: <835y2ukg6p.fsf@gnu.org> From: Eli Zaretskii In-Reply-To: <87edhnzp9t.fsf@honnef.co> (message from Dominik Honnef on Sat, 21 Oct 2023 22:36:30 +0200) References: <87edhnzp9t.fsf@honnef.co> X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > From: Dominik Honnef > Date: Sat, 21 Oct 2023 22:36:30 +0200 > > Using tree-sitter's CLI as well as the publicly hosted playground > produce different parse trees than treesit in Emacs. Specifically, the > assignment of nodes to named fields differs. > > Given the following C source: > > void main() { > int x = // foo > 1+ > // comment > 2; > } > > treesit-explore-mode displays the following tree: > > (translation_unit > (function_definition type: (primitive_type) > declarator: > (function_declarator declarator: (identifier) > parameters: (parameter_list ( ))) > body: > (compound_statement { > (declaration type: (primitive_type) > declarator: > (init_declarator declarator: (identifier) = value: (comment) > (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) > ;) > }))) > > Note how in the init_declarator node, the 'value' field is a comment > node, and similarly for the 'right' field in the binary_expression node. > > Running 'tree-sitter parse file.c', on the other hand, produces the > following tree: > > (translation_unit [0, 0] - [6, 0] > (function_definition [0, 0] - [5, 1] > type: (primitive_type [0, 0] - [0, 4]) > declarator: (function_declarator [0, 5] - [0, 11] > declarator: (identifier [0, 5] - [0, 9]) > parameters: (parameter_list [0, 9] - [0, 11])) > body: (compound_statement [0, 12] - [5, 1] > (declaration [1, 2] - [4, 6] > type: (primitive_type [1, 2] - [1, 5]) > declarator: (init_declarator [1, 6] - [4, 5] > declarator: (identifier [1, 6] - [1, 7]) > (comment [1, 10] - [1, 16]) > value: (binary_expression [2, 4] - [4, 5] > left: (number_literal [2, 4] - [2, 5]) > (comment [3, 4] - [3, 14]) > right: (number_literal [4, 4] - [4, 5]))))))) > > Here, the two comment nodes appear as unnamed nodes. IMHO the second > tree is a more useful one, as the named fields contain the semantically > important subtrees (e.g. a binary expression is made up of a left and > right subtree, not a left subtree, a right comment, and then some > unnamed subtree.) > > Emacs's tree makes writing queries less convenient, as instead of being > able to refer to well-defined names, one has to rely on child indices to > account for comments. > > > Further mismatch arises from repeated fields and separators. > > Consider the following Go source: > > package pkg > > var a, b, c = 1, 2, 3 > > treesit-explore-mode displays the following tree: > > (source_file > (package_clause package (package_identifier)) > \n > (var_declaration var > (var_spec name: (identifier) name: , (identifier) value: , (identifier) = > (expression_list (int_literal) , (int_literal) , (int_literal)))) > \n) > > Here, the var_spec node has two fields named 'name' even though the > source specifies three names. Furthermore, The second 'name', as well as > 'value' are set to the ',' separator between identifiers. Two of the three > identifiers aren't named. > > 'tree-sitter parse file.go', on the other hand, produces this more > accurate tree: > > (source_file [0, 0] - [2, 21] > (package_clause [0, 0] - [0, 11] > (package_identifier [0, 8] - [0, 11])) > (var_declaration [2, 0] - [2, 21] > (var_spec [2, 4] - [2, 21] > name: (identifier [2, 4] - [2, 5]) > name: (identifier [2, 7] - [2, 8]) > name: (identifier [2, 10] - [2, 11]) > value: (expression_list [2, 14] - [2, 21] > (int_literal [2, 14] - [2, 15]) > (int_literal [2, 17] - [2, 18]) > (int_literal [2, 20] - [2, 21]))))) > > This reproduces with 29.1 as well as 30.0.50. Yuan, any comments or suggestions? From unknown Sat Sep 06 10:20:47 2025 X-Loop: help-debbugs@gnu.org Subject: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 19 Nov 2023 10:09:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 66674 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: casouri@gmail.com Cc: 66674@debbugs.gnu.org, dominik@honnef.co Received: via spool by 66674-submit@debbugs.gnu.org id=B66674.170038851824032 (code B ref 66674); Sun, 19 Nov 2023 10:09:02 +0000 Received: (at 66674) by debbugs.gnu.org; 19 Nov 2023 10:08:38 +0000 Received: from localhost ([127.0.0.1]:49988 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1r4ejF-0006FV-Ps for submit@debbugs.gnu.org; Sun, 19 Nov 2023 05:08:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:53106) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1r4ejC-0006FB-JA for 66674@debbugs.gnu.org; Sun, 19 Nov 2023 05:08:35 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r4ej5-0007pT-Hh; Sun, 19 Nov 2023 05:08:27 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=vug/w8wQjM41yS/F6YfClFt5TzvIlCqD8wpwiujZNxM=; b=Unh3Ja7iBf6K 3C/nw6VFOabTi7fn/gxovHCfZCSd+7lvjNzyFgOHX+Bs4TBlUN6rYDIKfnmWIrwYhiboeywpFWBvL 9el7tdhChi6zB/cVwz64vCmhSv8SUn4vTbeE7RzXqzixxtkOXy6h4rf/q1B5n9HmlegeqYywWqOYL oH9jrTEcMhABoqG5E6XXwcr/oaF79rVpcIlwMBAGnZMXFDi1VdoaStflXvOjnJAqYDa7e0mJ6/Ob2 G6QnE4NXrbGS2Dr9Cc0IRi9Agh5dsbuSS+X+Gptb3NfJwAItqo1FjAYkFGZAKPhxcOU9UUH2Qg8/5 vklvM/kkpF4P5IFbh+XcSA==; Date: Sun, 19 Nov 2023 12:08:08 +0200 Message-Id: <835y1ykqd3.fsf@gnu.org> From: Eli Zaretskii In-Reply-To: <835y2ukg6p.fsf@gnu.org> (message from Eli Zaretskii on Wed, 25 Oct 2023 16:03:10 +0300) References: <87edhnzp9t.fsf@honnef.co> <835y2ukg6p.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Ping! Yuan, any comments? > Cc: 66674@debbugs.gnu.org > Date: Wed, 25 Oct 2023 16:03:10 +0300 > From: Eli Zaretskii > > > From: Dominik Honnef > > Date: Sat, 21 Oct 2023 22:36:30 +0200 > > > > Using tree-sitter's CLI as well as the publicly hosted playground > > produce different parse trees than treesit in Emacs. Specifically, the > > assignment of nodes to named fields differs. > > > > Given the following C source: > > > > void main() { > > int x = // foo > > 1+ > > // comment > > 2; > > } > > > > treesit-explore-mode displays the following tree: > > > > (translation_unit > > (function_definition type: (primitive_type) > > declarator: > > (function_declarator declarator: (identifier) > > parameters: (parameter_list ( ))) > > body: > > (compound_statement { > > (declaration type: (primitive_type) > > declarator: > > (init_declarator declarator: (identifier) = value: (comment) > > (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) > > ;) > > }))) > > > > Note how in the init_declarator node, the 'value' field is a comment > > node, and similarly for the 'right' field in the binary_expression node. > > > > Running 'tree-sitter parse file.c', on the other hand, produces the > > following tree: > > > > (translation_unit [0, 0] - [6, 0] > > (function_definition [0, 0] - [5, 1] > > type: (primitive_type [0, 0] - [0, 4]) > > declarator: (function_declarator [0, 5] - [0, 11] > > declarator: (identifier [0, 5] - [0, 9]) > > parameters: (parameter_list [0, 9] - [0, 11])) > > body: (compound_statement [0, 12] - [5, 1] > > (declaration [1, 2] - [4, 6] > > type: (primitive_type [1, 2] - [1, 5]) > > declarator: (init_declarator [1, 6] - [4, 5] > > declarator: (identifier [1, 6] - [1, 7]) > > (comment [1, 10] - [1, 16]) > > value: (binary_expression [2, 4] - [4, 5] > > left: (number_literal [2, 4] - [2, 5]) > > (comment [3, 4] - [3, 14]) > > right: (number_literal [4, 4] - [4, 5]))))))) > > > > Here, the two comment nodes appear as unnamed nodes. IMHO the second > > tree is a more useful one, as the named fields contain the semantically > > important subtrees (e.g. a binary expression is made up of a left and > > right subtree, not a left subtree, a right comment, and then some > > unnamed subtree.) > > > > Emacs's tree makes writing queries less convenient, as instead of being > > able to refer to well-defined names, one has to rely on child indices to > > account for comments. > > > > > > Further mismatch arises from repeated fields and separators. > > > > Consider the following Go source: > > > > package pkg > > > > var a, b, c = 1, 2, 3 > > > > treesit-explore-mode displays the following tree: > > > > (source_file > > (package_clause package (package_identifier)) > > \n > > (var_declaration var > > (var_spec name: (identifier) name: , (identifier) value: , (identifier) = > > (expression_list (int_literal) , (int_literal) , (int_literal)))) > > \n) > > > > Here, the var_spec node has two fields named 'name' even though the > > source specifies three names. Furthermore, The second 'name', as well as > > 'value' are set to the ',' separator between identifiers. Two of the three > > identifiers aren't named. > > > > 'tree-sitter parse file.go', on the other hand, produces this more > > accurate tree: > > > > (source_file [0, 0] - [2, 21] > > (package_clause [0, 0] - [0, 11] > > (package_identifier [0, 8] - [0, 11])) > > (var_declaration [2, 0] - [2, 21] > > (var_spec [2, 4] - [2, 21] > > name: (identifier [2, 4] - [2, 5]) > > name: (identifier [2, 7] - [2, 8]) > > name: (identifier [2, 10] - [2, 11]) > > value: (expression_list [2, 14] - [2, 21] > > (int_literal [2, 14] - [2, 15]) > > (int_literal [2, 17] - [2, 18]) > > (int_literal [2, 20] - [2, 21]))))) > > > > This reproduces with 29.1 as well as 30.0.50. > > Yuan, any comments or suggestions? > > > > From unknown Sat Sep 06 10:20:47 2025 X-Loop: help-debbugs@gnu.org Subject: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 25 Nov 2023 10:04:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 66674 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: casouri@gmail.com Cc: 66674@debbugs.gnu.org, dominik@honnef.co Received: via spool by 66674-submit@debbugs.gnu.org id=B66674.170090662318334 (code B ref 66674); Sat, 25 Nov 2023 10:04:01 +0000 Received: (at 66674) by debbugs.gnu.org; 25 Nov 2023 10:03:43 +0000 Received: from localhost ([127.0.0.1]:37883 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1r6pVm-0004le-DK for submit@debbugs.gnu.org; Sat, 25 Nov 2023 05:03:42 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:50390) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1r6pVk-0004lS-Bl for 66674@debbugs.gnu.org; Sat, 25 Nov 2023 05:03:41 -0500 Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r6pVZ-0005Wr-Lt; Sat, 25 Nov 2023 05:03:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=a4Ed5pBVxDv2InKjqbRM5IlhcKXnj6WORqtOoAGBICw=; b=abu2viRdn7y+ 5nFjqnc+BnVLg8XWyFxP2rhQbjdTEZ2zqtOj9k25udtlwDdn/7TqOWYporo1dGFZcfOzIrSRPfL5J tvOkfQ8dibTKJ5C6rRlpWteS1ilsYktT6DaSrsiX8vJ2PORcMyKbdfHJRjNktqzbF1nfOIq8UXJUO Vrjywx0LBUT6CK7gPCPeGPv+KOuxEgttE6aekU4e8gEv1emSNi0325IfgcCIdZK0ak7AfKAUqpXDK I7WKtJ9SwwU76eiPF5jSvk8PiB3vVgnhqN6KrYiv4yMsDKfPNv8SfhpoLvjYBBzFEs0kcuo68uku6 cTIA20JVro8uO4wnPIfSrg==; Date: Sat, 25 Nov 2023 12:03:27 +0200 Message-Id: <83ttpacfps.fsf@gnu.org> From: Eli Zaretskii In-Reply-To: <835y1ykqd3.fsf@gnu.org> (message from Eli Zaretskii on Sun, 19 Nov 2023 12:08:08 +0200) References: <87edhnzp9t.fsf@honnef.co> <835y2ukg6p.fsf@gnu.org> <835y1ykqd3.fsf@gnu.org> X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Ping! Ping! Yuan, please chime in. > Cc: 66674@debbugs.gnu.org, dominik@honnef.co > Date: Sun, 19 Nov 2023 12:08:08 +0200 > From: Eli Zaretskii > > Ping! Yuan, any comments? > > > Cc: 66674@debbugs.gnu.org > > Date: Wed, 25 Oct 2023 16:03:10 +0300 > > From: Eli Zaretskii > > > > > From: Dominik Honnef > > > Date: Sat, 21 Oct 2023 22:36:30 +0200 > > > > > > Using tree-sitter's CLI as well as the publicly hosted playground > > > produce different parse trees than treesit in Emacs. Specifically, the > > > assignment of nodes to named fields differs. > > > > > > Given the following C source: > > > > > > void main() { > > > int x = // foo > > > 1+ > > > // comment > > > 2; > > > } > > > > > > treesit-explore-mode displays the following tree: > > > > > > (translation_unit > > > (function_definition type: (primitive_type) > > > declarator: > > > (function_declarator declarator: (identifier) > > > parameters: (parameter_list ( ))) > > > body: > > > (compound_statement { > > > (declaration type: (primitive_type) > > > declarator: > > > (init_declarator declarator: (identifier) = value: (comment) > > > (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) > > > ;) > > > }))) > > > > > > Note how in the init_declarator node, the 'value' field is a comment > > > node, and similarly for the 'right' field in the binary_expression node. > > > > > > Running 'tree-sitter parse file.c', on the other hand, produces the > > > following tree: > > > > > > (translation_unit [0, 0] - [6, 0] > > > (function_definition [0, 0] - [5, 1] > > > type: (primitive_type [0, 0] - [0, 4]) > > > declarator: (function_declarator [0, 5] - [0, 11] > > > declarator: (identifier [0, 5] - [0, 9]) > > > parameters: (parameter_list [0, 9] - [0, 11])) > > > body: (compound_statement [0, 12] - [5, 1] > > > (declaration [1, 2] - [4, 6] > > > type: (primitive_type [1, 2] - [1, 5]) > > > declarator: (init_declarator [1, 6] - [4, 5] > > > declarator: (identifier [1, 6] - [1, 7]) > > > (comment [1, 10] - [1, 16]) > > > value: (binary_expression [2, 4] - [4, 5] > > > left: (number_literal [2, 4] - [2, 5]) > > > (comment [3, 4] - [3, 14]) > > > right: (number_literal [4, 4] - [4, 5]))))))) > > > > > > Here, the two comment nodes appear as unnamed nodes. IMHO the second > > > tree is a more useful one, as the named fields contain the semantically > > > important subtrees (e.g. a binary expression is made up of a left and > > > right subtree, not a left subtree, a right comment, and then some > > > unnamed subtree.) > > > > > > Emacs's tree makes writing queries less convenient, as instead of being > > > able to refer to well-defined names, one has to rely on child indices to > > > account for comments. > > > > > > > > > Further mismatch arises from repeated fields and separators. > > > > > > Consider the following Go source: > > > > > > package pkg > > > > > > var a, b, c = 1, 2, 3 > > > > > > treesit-explore-mode displays the following tree: > > > > > > (source_file > > > (package_clause package (package_identifier)) > > > \n > > > (var_declaration var > > > (var_spec name: (identifier) name: , (identifier) value: , (identifier) = > > > (expression_list (int_literal) , (int_literal) , (int_literal)))) > > > \n) > > > > > > Here, the var_spec node has two fields named 'name' even though the > > > source specifies three names. Furthermore, The second 'name', as well as > > > 'value' are set to the ',' separator between identifiers. Two of the three > > > identifiers aren't named. > > > > > > 'tree-sitter parse file.go', on the other hand, produces this more > > > accurate tree: > > > > > > (source_file [0, 0] - [2, 21] > > > (package_clause [0, 0] - [0, 11] > > > (package_identifier [0, 8] - [0, 11])) > > > (var_declaration [2, 0] - [2, 21] > > > (var_spec [2, 4] - [2, 21] > > > name: (identifier [2, 4] - [2, 5]) > > > name: (identifier [2, 7] - [2, 8]) > > > name: (identifier [2, 10] - [2, 11]) > > > value: (expression_list [2, 14] - [2, 21] > > > (int_literal [2, 14] - [2, 15]) > > > (int_literal [2, 17] - [2, 18]) > > > (int_literal [2, 20] - [2, 21]))))) > > > > > > This reproduces with 29.1 as well as 30.0.50. > > > > Yuan, any comments or suggestions? > > > > > > > > > > > > From unknown Sat Sep 06 10:20:47 2025 X-Loop: help-debbugs@gnu.org Subject: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Resent-From: Yuan Fu Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 10 Dec 2023 10:08:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 66674 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Eli Zaretskii Cc: 66674@debbugs.gnu.org, dominik@honnef.co Received: via spool by 66674-submit@debbugs.gnu.org id=B66674.170220288023264 (code B ref 66674); Sun, 10 Dec 2023 10:08:01 +0000 Received: (at 66674) by debbugs.gnu.org; 10 Dec 2023 10:08:00 +0000 Received: from localhost ([127.0.0.1]:49523 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rCGj9-000639-AP for submit@debbugs.gnu.org; Sun, 10 Dec 2023 05:07:59 -0500 Received: from mail-pl1-x636.google.com ([2607:f8b0:4864:20::636]:49202) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rCGj7-00062s-5Z for 66674@debbugs.gnu.org; Sun, 10 Dec 2023 05:07:57 -0500 Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-1d053c45897so30796815ad.2 for <66674@debbugs.gnu.org>; Sun, 10 Dec 2023 02:07:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702202857; x=1702807657; darn=debbugs.gnu.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=62WsOQdhsnBRfiFan3+3nPsomkdk78mR31J5+ijqXMw=; b=BmLjv4Tt3TGPvascmTdw2zYbrEC7+20JblCTA8abOkfosnnrjqC+CGAH8ieAuWHfce iG/EJXuniU21tNOZfe7V9U/NofvqZWbp1a3s/cjxsQ5/jLSCMdJzBOmVwyS4h5vO5hrd SuK5XmyF3u9JLXqiSzGcEwjuIrvJkhekdn3H+gMo6AOSBN55I7Zzkez2ZDF0aCwNgUOB B8GK8HAI/3NNC3nlG3GPkdfU6WjWFJAEVTul6DilifIviE51n9uiiqCoH8hc+vrUSujA UYKWbSqYLKH6fETS6lVie+iA9nGBAkO894qPfGFOCSS4Sp/PSpqKMKbhR23MYRj4Srm9 lmpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702202857; x=1702807657; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=62WsOQdhsnBRfiFan3+3nPsomkdk78mR31J5+ijqXMw=; b=PBwokPbSwNJbvpvHoHQQGVr01Dik9IULOxt0NKI8K4shcXgkBXHbn6BCLOPAGRkcDt mLcb/FVLneOacWxcaho176AMGC/FdFhwsezWQD26Z2VLPiP7CzBtTcueXTk9WRENJSDK LBOZeOsYJRsYWFc7jkLAayFmYZgpchInUmed+P7Zitjf9pJsUAbkuTC5B0dkq10VfO3j z4hC/2x3kDjHQpV7qHfPWvvTrT4LXkOT7xbBt6JRvF14UHJjHa9HT8B4yQvCqfaz+zw7 kbLyeNq5zFBCvWjXEfB683qufYfVeeUWzugoD1y30ll+qb753XMnwgarP9ApG3KIZCWY 9jiQ== X-Gm-Message-State: AOJu0YxFvAAtBuKKuS8pjxz/O7DxD+/UOfbwqmeUUf4n0QkPV9DTjMw1 dMVXpggktxNb2m09z2Tp3DM= X-Google-Smtp-Source: AGHT+IEVR7xS1/U7fmT67KpTWyVGl3vJ2L0KIIOqEbjqj/kdtaBEJctQGpjM7jGAbIOIKr318MTkHw== X-Received: by 2002:a17:903:32c5:b0:1d0:a53e:2662 with SMTP id i5-20020a17090332c500b001d0a53e2662mr3194537plr.104.1702202857135; Sun, 10 Dec 2023 02:07:37 -0800 (PST) Received: from [192.168.1.7] (172-117-161-177.res.spectrum.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id x4-20020a170902ec8400b001d0b6caddb1sm4553239plg.137.2023.12.10.02.07.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 10 Dec 2023 02:07:36 -0800 (PST) Message-ID: <5ad5f956-7533-451b-9815-1710713ee334@gmail.com> Date: Sun, 10 Dec 2023 02:07:35 -0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US References: <87edhnzp9t.fsf@honnef.co> <835y2ukg6p.fsf@gnu.org> <835y1ykqd3.fsf@gnu.org> <83ttpacfps.fsf@gnu.org> From: Yuan Fu In-Reply-To: <83ttpacfps.fsf@gnu.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 11/25/23 2:03 AM, Eli Zaretskii wrote: > Ping! Ping! Yuan, please chime in. > >> Cc: 66674@debbugs.gnu.org, dominik@honnef.co >> Date: Sun, 19 Nov 2023 12:08:08 +0200 >> From: Eli Zaretskii >> >> Ping! Yuan, any comments? >> >>> Cc: 66674@debbugs.gnu.org >>> Date: Wed, 25 Oct 2023 16:03:10 +0300 >>> From: Eli Zaretskii >>> >>>> From: Dominik Honnef >>>> Date: Sat, 21 Oct 2023 22:36:30 +0200 >>>> >>>> Using tree-sitter's CLI as well as the publicly hosted playground >>>> produce different parse trees than treesit in Emacs. Specifically, the >>>> assignment of nodes to named fields differs. >>>> >>>> Given the following C source: >>>> >>>> void main() { >>>> int x = // foo >>>> 1+ >>>> // comment >>>> 2; >>>> } >>>> >>>> treesit-explore-mode displays the following tree: >>>> >>>> (translation_unit >>>> (function_definition type: (primitive_type) >>>> declarator: >>>> (function_declarator declarator: (identifier) >>>> parameters: (parameter_list ( ))) >>>> body: >>>> (compound_statement { >>>> (declaration type: (primitive_type) >>>> declarator: >>>> (init_declarator declarator: (identifier) = value: (comment) >>>> (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) >>>> ;) >>>> }))) >>>> >>>> Note how in the init_declarator node, the 'value' field is a comment >>>> node, and similarly for the 'right' field in the binary_expression node. >>>> >>>> Running 'tree-sitter parse file.c', on the other hand, produces the >>>> following tree: >>>> >>>> (translation_unit [0, 0] - [6, 0] >>>> (function_definition [0, 0] - [5, 1] >>>> type: (primitive_type [0, 0] - [0, 4]) >>>> declarator: (function_declarator [0, 5] - [0, 11] >>>> declarator: (identifier [0, 5] - [0, 9]) >>>> parameters: (parameter_list [0, 9] - [0, 11])) >>>> body: (compound_statement [0, 12] - [5, 1] >>>> (declaration [1, 2] - [4, 6] >>>> type: (primitive_type [1, 2] - [1, 5]) >>>> declarator: (init_declarator [1, 6] - [4, 5] >>>> declarator: (identifier [1, 6] - [1, 7]) >>>> (comment [1, 10] - [1, 16]) >>>> value: (binary_expression [2, 4] - [4, 5] >>>> left: (number_literal [2, 4] - [2, 5]) >>>> (comment [3, 4] - [3, 14]) >>>> right: (number_literal [4, 4] - [4, 5]))))))) >>>> >>>> Here, the two comment nodes appear as unnamed nodes. IMHO the second >>>> tree is a more useful one, as the named fields contain the semantically >>>> important subtrees (e.g. a binary expression is made up of a left and >>>> right subtree, not a left subtree, a right comment, and then some >>>> unnamed subtree.) >>>> >>>> Emacs's tree makes writing queries less convenient, as instead of being >>>> able to refer to well-defined names, one has to rely on child indices to >>>> account for comments. >>>> >>>> >>>> Further mismatch arises from repeated fields and separators. >>>> >>>> Consider the following Go source: >>>> >>>> package pkg >>>> >>>> var a, b, c = 1, 2, 3 >>>> >>>> treesit-explore-mode displays the following tree: >>>> >>>> (source_file >>>> (package_clause package (package_identifier)) >>>> \n >>>> (var_declaration var >>>> (var_spec name: (identifier) name: , (identifier) value: , (identifier) = >>>> (expression_list (int_literal) , (int_literal) , (int_literal)))) >>>> \n) >>>> >>>> Here, the var_spec node has two fields named 'name' even though the >>>> source specifies three names. Furthermore, The second 'name', as well as >>>> 'value' are set to the ',' separator between identifiers. Two of the three >>>> identifiers aren't named. >>>> >>>> 'tree-sitter parse file.go', on the other hand, produces this more >>>> accurate tree: >>>> >>>> (source_file [0, 0] - [2, 21] >>>> (package_clause [0, 0] - [0, 11] >>>> (package_identifier [0, 8] - [0, 11])) >>>> (var_declaration [2, 0] - [2, 21] >>>> (var_spec [2, 4] - [2, 21] >>>> name: (identifier [2, 4] - [2, 5]) >>>> name: (identifier [2, 7] - [2, 8]) >>>> name: (identifier [2, 10] - [2, 11]) >>>> value: (expression_list [2, 14] - [2, 21] >>>> (int_literal [2, 14] - [2, 15]) >>>> (int_literal [2, 17] - [2, 18]) >>>> (int_literal [2, 20] - [2, 21]))))) >>>> >>>> This reproduces with 29.1 as well as 30.0.50. >>> Yuan, any comments or suggestions? Sorry sorry sorry, another missed report. I think this is a bug in treesit-explore-mode, I'll work on fixing it! Yuan From unknown Sat Sep 06 10:20:47 2025 X-Loop: help-debbugs@gnu.org Subject: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Resent-From: Dominik Honnef Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sun, 10 Dec 2023 14:43:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 66674 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: To: Yuan Fu , Eli Zaretskii Cc: 66674@debbugs.gnu.org Received: via spool by 66674-submit@debbugs.gnu.org id=B66674.170221937131549 (code B ref 66674); Sun, 10 Dec 2023 14:43:02 +0000 Received: (at 66674) by debbugs.gnu.org; 10 Dec 2023 14:42:51 +0000 Received: from localhost ([127.0.0.1]:49847 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rCL19-0008Cm-38 for submit@debbugs.gnu.org; Sun, 10 Dec 2023 09:42:51 -0500 Received: from mail-wm1-x334.google.com ([2a00:1450:4864:20::334]:50564) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rCKnk-0007mo-In for 66674@debbugs.gnu.org; Sun, 10 Dec 2023 09:29:02 -0500 Received: by mail-wm1-x334.google.com with SMTP id 5b1f17b1804b1-40c2308faedso38944765e9.1 for <66674@debbugs.gnu.org>; Sun, 10 Dec 2023 06:28:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=honnef-co.20230601.gappssmtp.com; s=20230601; t=1702218520; x=1702823320; darn=debbugs.gnu.org; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:from:to:cc:subject:date:message-id:reply-to; bh=hZs2zvF19wz9P+2X0dC5vN1Ih3w2eaIdJmLG5BdiGgY=; b=iIZ2jV2BjWjYhH5Awg9e1gDlK2qgd8Rj/c/NAqkCeQxkHORyMcGtaEAdLuNqjaQmXo RdvHaaNGMN/khhoCiQwuBm6fRkzx/YHZZ7v1Gvtb11I/dcKM2QUH/XNXr2pfSSTq9ep4 yx6h+pIF877tlK8sitJ+L0jHDSg6pIx/zDVEQwRMvoXSMk3U5KpbGtLYgpAXdoUVbfQs 39+PpJ0CPehUtO0yLpaN7PM1vp6USqDDXPciq6nNXyZdF+NXKkdJMkoCdbJMDFBBWA4e IablfyumSZiXOiUp51G9J/WwKQMEpmz5CTJuZuQELIdvSWTnZHb1Y5mIkZ/vP/1LYld4 j42Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702218520; x=1702823320; h=mime-version:message-id:date:references:in-reply-to:subject:cc:to :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hZs2zvF19wz9P+2X0dC5vN1Ih3w2eaIdJmLG5BdiGgY=; b=JV77hEe6AleQOJBo/u/3v6xT6GyRPMM2rrE0Ncyd7NbI9zxwxIz4lIoeR+HmmKlkNe SJ9PZnLQm4AX/uN7VacwQLe+e+F02Ypt00eQoQG2iES/Za+Q1xrfbJH6UKdKXAdWoB/J fsXlMDodTv14qbgRsKUBf5pb9VbNG1mziHMV0sM5wJPSWt93IlE7M2Ae2MaOj3k6EeRu uZN/yIKSfilK3084PmmdfG8bU8r2OzrC7cp9cQae6Ph+fUeAo/tZ8eRtdOtjZAr0dNAz /IKCfvqDBc4Iy141P7Va7d7rNtogkIH/CnOrawopE97JLTHRoKhkB5ArEvrq2NUmlKTp 0Krw== X-Gm-Message-State: AOJu0YyWXpARC3LtvpRS+VSx/4+zXn9VOJI0LRsNtOBL7s7wGMMwAPbK ZfDXmLmUqU3RqML1e3/8qNrwSZn9aYBEN+c1/DHDJA== X-Google-Smtp-Source: AGHT+IHr/UZnWEqWr5KhBsuxsTHyJ29/CTesy+wcnkoHuYUo8JiQDK6lKA8npeuXR0uLbomuO/sl6A== X-Received: by 2002:a05:600c:35d6:b0:40b:338b:5f10 with SMTP id r22-20020a05600c35d600b0040b338b5f10mr1588361wmq.32.1702218520448; Sun, 10 Dec 2023 06:28:40 -0800 (PST) Received: from localhost (ip-176-199-155-051.um44.pools.vodafone-ip.de. [176.199.155.51]) by smtp.gmail.com with ESMTPSA id w7-20020a05600c474700b0040c41846919sm3730515wmo.41.2023.12.10.06.28.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Dec 2023 06:28:39 -0800 (PST) From: Dominik Honnef In-Reply-To: <5ad5f956-7533-451b-9815-1710713ee334@gmail.com> References: <87edhnzp9t.fsf@honnef.co> <835y2ukg6p.fsf@gnu.org> <835y1ykqd3.fsf@gnu.org> <83ttpacfps.fsf@gnu.org> <5ad5f956-7533-451b-9815-1710713ee334@gmail.com> Date: Sun, 10 Dec 2023 15:28:38 +0100 Message-ID: <87r0jujfmx.fsf@honnef.co> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -0.0 (/) X-Mailman-Approved-At: Sun, 10 Dec 2023 09:42:49 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Yuan Fu writes: > On 11/25/23 2:03 AM, Eli Zaretskii wrote: >> Ping! Ping! Yuan, please chime in. >> >>> Cc: 66674@debbugs.gnu.org, dominik@honnef.co >>> Date: Sun, 19 Nov 2023 12:08:08 +0200 >>> From: Eli Zaretskii >>> >>> Ping! Yuan, any comments? >>> >>>> Cc: 66674@debbugs.gnu.org >>>> Date: Wed, 25 Oct 2023 16:03:10 +0300 >>>> From: Eli Zaretskii >>>> >>>>> From: Dominik Honnef >>>>> Date: Sat, 21 Oct 2023 22:36:30 +0200 >>>>> >>>>> Using tree-sitter's CLI as well as the publicly hosted playground >>>>> produce different parse trees than treesit in Emacs. Specifically, the >>>>> assignment of nodes to named fields differs. >>>>> >>>>> Given the following C source: >>>>> >>>>> void main() { >>>>> int x = // foo >>>>> 1+ >>>>> // comment >>>>> 2; >>>>> } >>>>> >>>>> treesit-explore-mode displays the following tree: >>>>> >>>>> (translation_unit >>>>> (function_definition type: (primitive_type) >>>>> declarator: >>>>> (function_declarator declarator: (identifier) >>>>> parameters: (parameter_list ( ))) >>>>> body: >>>>> (compound_statement { >>>>> (declaration type: (primitive_type) >>>>> declarator: >>>>> (init_declarator declarator: (identifier) = value: (comment) >>>>> (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) >>>>> ;) >>>>> }))) >>>>> >>>>> Note how in the init_declarator node, the 'value' field is a comment >>>>> node, and similarly for the 'right' field in the binary_expression node. >>>>> >>>>> Running 'tree-sitter parse file.c', on the other hand, produces the >>>>> following tree: >>>>> >>>>> (translation_unit [0, 0] - [6, 0] >>>>> (function_definition [0, 0] - [5, 1] >>>>> type: (primitive_type [0, 0] - [0, 4]) >>>>> declarator: (function_declarator [0, 5] - [0, 11] >>>>> declarator: (identifier [0, 5] - [0, 9]) >>>>> parameters: (parameter_list [0, 9] - [0, 11])) >>>>> body: (compound_statement [0, 12] - [5, 1] >>>>> (declaration [1, 2] - [4, 6] >>>>> type: (primitive_type [1, 2] - [1, 5]) >>>>> declarator: (init_declarator [1, 6] - [4, 5] >>>>> declarator: (identifier [1, 6] - [1, 7]) >>>>> (comment [1, 10] - [1, 16]) >>>>> value: (binary_expression [2, 4] - [4, 5] >>>>> left: (number_literal [2, 4] - [2, 5]) >>>>> (comment [3, 4] - [3, 14]) >>>>> right: (number_literal [4, 4] - [4, 5]))))))) >>>>> >>>>> Here, the two comment nodes appear as unnamed nodes. IMHO the second >>>>> tree is a more useful one, as the named fields contain the semantically >>>>> important subtrees (e.g. a binary expression is made up of a left and >>>>> right subtree, not a left subtree, a right comment, and then some >>>>> unnamed subtree.) >>>>> >>>>> Emacs's tree makes writing queries less convenient, as instead of being >>>>> able to refer to well-defined names, one has to rely on child indices to >>>>> account for comments. >>>>> >>>>> >>>>> Further mismatch arises from repeated fields and separators. >>>>> >>>>> Consider the following Go source: >>>>> >>>>> package pkg >>>>> >>>>> var a, b, c = 1, 2, 3 >>>>> >>>>> treesit-explore-mode displays the following tree: >>>>> >>>>> (source_file >>>>> (package_clause package (package_identifier)) >>>>> \n >>>>> (var_declaration var >>>>> (var_spec name: (identifier) name: , (identifier) value: , (identifier) = >>>>> (expression_list (int_literal) , (int_literal) , (int_literal)))) >>>>> \n) >>>>> >>>>> Here, the var_spec node has two fields named 'name' even though the >>>>> source specifies three names. Furthermore, The second 'name', as well as >>>>> 'value' are set to the ',' separator between identifiers. Two of the three >>>>> identifiers aren't named. >>>>> >>>>> 'tree-sitter parse file.go', on the other hand, produces this more >>>>> accurate tree: >>>>> >>>>> (source_file [0, 0] - [2, 21] >>>>> (package_clause [0, 0] - [0, 11] >>>>> (package_identifier [0, 8] - [0, 11])) >>>>> (var_declaration [2, 0] - [2, 21] >>>>> (var_spec [2, 4] - [2, 21] >>>>> name: (identifier [2, 4] - [2, 5]) >>>>> name: (identifier [2, 7] - [2, 8]) >>>>> name: (identifier [2, 10] - [2, 11]) >>>>> value: (expression_list [2, 14] - [2, 21] >>>>> (int_literal [2, 14] - [2, 15]) >>>>> (int_literal [2, 17] - [2, 18]) >>>>> (int_literal [2, 20] - [2, 21]))))) >>>>> >>>>> This reproduces with 29.1 as well as 30.0.50. >>>> Yuan, any comments or suggestions? > > Sorry sorry sorry, another missed report. I think this is a bug in > treesit-explore-mode, I'll work on fixing it! > > Yuan I don't think that's the case, at least not exclusively. I used treesit-explore-mode to debug patterns that matched in the playground but not in Emacs. The matching behavior seemed pretty in line with what treesit-explore-mode reported. From unknown Sat Sep 06 10:20:47 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Dominik Honnef Subject: bug#66674: closed (Re: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields) Message-ID: References: <87edhnzp9t.fsf@honnef.co> X-Gnu-PR-Message: they-closed 66674 X-Gnu-PR-Package: emacs Reply-To: 66674@debbugs.gnu.org Date: Mon, 11 Dec 2023 01:04:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1702256642-14541-1" This is a multi-part message in MIME format... ------------=_1702256642-14541-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields which was filed against the emacs package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 66674@debbugs.gnu.org. --=20 66674: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D66674 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1702256642-14541-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 66674-done) by debbugs.gnu.org; 11 Dec 2023 01:03:16 +0000 Received: from localhost ([127.0.0.1]:52140 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rCUhX-0003lU-VI for submit@debbugs.gnu.org; Sun, 10 Dec 2023 20:03:16 -0500 Received: from mail-pl1-x62e.google.com ([2607:f8b0:4864:20::62e]:47524) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rCUhT-0003lE-0s for 66674-done@debbugs.gnu.org; Sun, 10 Dec 2023 20:03:14 -0500 Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-1d0bcc0c313so20819815ad.3 for <66674-done@debbugs.gnu.org>; Sun, 10 Dec 2023 17:02:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702256570; x=1702861370; darn=debbugs.gnu.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=WXfwJLNXUvENkGRTP0c1UhTtEYpXmJHjmGsCbYw6jcU=; b=K4Cz/JWXALGTOZkQLb1K1LGWLNPqYkZdiOKdBtHW4oNoNNKVjhfQcXxqvjh4kAw0Lx JqAUw+Fl5mjMm2BN40prCGIpsfaAfz+YctnOcc3wH71qaVIjbb+obZgzoy+21ukXB/bK YLDYuHzKChjtds4BzXwuThr1lRZXFVAZJ+Cxgd5RXB6/yDiAnSHJ5A3mvPbiTpwQ7cb2 Tr5RwrRS/XhHq7VD2OJtt92DgL9Fe8ys9AE9xQNs15SaOiWmDkEJp1zN5FpnX+JmuRtL B7Qqj5SHLjy4OaMypnhsfBrMM2b5scy0Bh6zSInLuUzN3+A3seSC66POImIDGgShcZRx W4fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702256570; x=1702861370; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WXfwJLNXUvENkGRTP0c1UhTtEYpXmJHjmGsCbYw6jcU=; b=g93EHpaN1EldKGtFAZmg7yqCR0p+LPL7YcblHg8vibecATkK5kwxUMEQ+U7LNy8dRw ++MHVVXoPqduXWZX0urd18lZHQ+1bQ+Qt0s/zyZjIYnW9a4e3LMu81P6j8OxTxbzQy6T nmmgtAqeLnA0gwDjbmMiWXgz6u5B/Tfv6FeBaa2EKCiC0HKt7SMXfkjWB9eVjB/xfwcT r3pke/+lx6ep3LmW+yWLnxqPeiuOK4b8eCkEMkc3WNJPhg29cMe6PCWOaSqb35nqCTRR ghkJd5VgEzKUwVgv0KDu8AMo+ZTBZdajRtynmV7+zoMRXyXzhbtXwt1JjYAfmXNrpoT9 7p0w== X-Gm-Message-State: AOJu0Yy7HVZ0QSPN0L7pHuLOeRzAILxJx/1I75R+XYFgta7njKlpYBsv 0kyA5ot7L0/lAKXAso90D5k= X-Google-Smtp-Source: AGHT+IGx6UdRCgaO4F5QeoaSE+tVmGy6LQwFzqFYcZ3n26M34OGxW17wd0eVBe6z+C62h2JurVe0xg== X-Received: by 2002:a17:903:40c6:b0:1d0:9c54:2fa5 with SMTP id t6-20020a17090340c600b001d09c542fa5mr1343914pld.25.1702256570516; Sun, 10 Dec 2023 17:02:50 -0800 (PST) Received: from [192.168.1.7] (172-117-161-177.res.spectrum.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id k10-20020a170902694a00b001cca8a01e68sm5286411plt.278.2023.12.10.17.02.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 10 Dec 2023 17:02:50 -0800 (PST) Message-ID: Date: Sun, 10 Dec 2023 17:02:48 -0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: bug#66674: 30.0.50; Upstream tree-sitter and treesit disagree about fields Content-Language: en-US To: Dominik Honnef , Eli Zaretskii References: <87edhnzp9t.fsf@honnef.co> <835y2ukg6p.fsf@gnu.org> <835y1ykqd3.fsf@gnu.org> <83ttpacfps.fsf@gnu.org> <5ad5f956-7533-451b-9815-1710713ee334@gmail.com> <87r0jujfmx.fsf@honnef.co> From: Yuan Fu In-Reply-To: <87r0jujfmx.fsf@honnef.co> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 66674-done Cc: 66674-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 12/10/23 6:28 AM, Dominik Honnef wrote: > Yuan Fu writes: > >> On 11/25/23 2:03 AM, Eli Zaretskii wrote: >>> Ping! Ping! Yuan, please chime in. >>> >>>> Cc: 66674@debbugs.gnu.org, dominik@honnef.co >>>> Date: Sun, 19 Nov 2023 12:08:08 +0200 >>>> From: Eli Zaretskii >>>> >>>> Ping! Yuan, any comments? >>>> >>>>> Cc: 66674@debbugs.gnu.org >>>>> Date: Wed, 25 Oct 2023 16:03:10 +0300 >>>>> From: Eli Zaretskii >>>>> >>>>>> From: Dominik Honnef >>>>>> Date: Sat, 21 Oct 2023 22:36:30 +0200 >>>>>> >>>>>> Using tree-sitter's CLI as well as the publicly hosted playground >>>>>> produce different parse trees than treesit in Emacs. Specifically, the >>>>>> assignment of nodes to named fields differs. >>>>>> >>>>>> Given the following C source: >>>>>> >>>>>> void main() { >>>>>> int x = // foo >>>>>> 1+ >>>>>> // comment >>>>>> 2; >>>>>> } >>>>>> >>>>>> treesit-explore-mode displays the following tree: >>>>>> >>>>>> (translation_unit >>>>>> (function_definition type: (primitive_type) >>>>>> declarator: >>>>>> (function_declarator declarator: (identifier) >>>>>> parameters: (parameter_list ( ))) >>>>>> body: >>>>>> (compound_statement { >>>>>> (declaration type: (primitive_type) >>>>>> declarator: >>>>>> (init_declarator declarator: (identifier) = value: (comment) >>>>>> (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) >>>>>> ;) >>>>>> }))) >>>>>> >>>>>> Note how in the init_declarator node, the 'value' field is a comment >>>>>> node, and similarly for the 'right' field in the binary_expression node. >>>>>> >>>>>> Running 'tree-sitter parse file.c', on the other hand, produces the >>>>>> following tree: >>>>>> >>>>>> (translation_unit [0, 0] - [6, 0] >>>>>> (function_definition [0, 0] - [5, 1] >>>>>> type: (primitive_type [0, 0] - [0, 4]) >>>>>> declarator: (function_declarator [0, 5] - [0, 11] >>>>>> declarator: (identifier [0, 5] - [0, 9]) >>>>>> parameters: (parameter_list [0, 9] - [0, 11])) >>>>>> body: (compound_statement [0, 12] - [5, 1] >>>>>> (declaration [1, 2] - [4, 6] >>>>>> type: (primitive_type [1, 2] - [1, 5]) >>>>>> declarator: (init_declarator [1, 6] - [4, 5] >>>>>> declarator: (identifier [1, 6] - [1, 7]) >>>>>> (comment [1, 10] - [1, 16]) >>>>>> value: (binary_expression [2, 4] - [4, 5] >>>>>> left: (number_literal [2, 4] - [2, 5]) >>>>>> (comment [3, 4] - [3, 14]) >>>>>> right: (number_literal [4, 4] - [4, 5]))))))) >>>>>> >>>>>> Here, the two comment nodes appear as unnamed nodes. IMHO the second >>>>>> tree is a more useful one, as the named fields contain the semantically >>>>>> important subtrees (e.g. a binary expression is made up of a left and >>>>>> right subtree, not a left subtree, a right comment, and then some >>>>>> unnamed subtree.) >>>>>> >>>>>> Emacs's tree makes writing queries less convenient, as instead of being >>>>>> able to refer to well-defined names, one has to rely on child indices to >>>>>> account for comments. >>>>>> >>>>>> >>>>>> Further mismatch arises from repeated fields and separators. >>>>>> >>>>>> Consider the following Go source: >>>>>> >>>>>> package pkg >>>>>> >>>>>> var a, b, c = 1, 2, 3 >>>>>> >>>>>> treesit-explore-mode displays the following tree: >>>>>> >>>>>> (source_file >>>>>> (package_clause package (package_identifier)) >>>>>> \n >>>>>> (var_declaration var >>>>>> (var_spec name: (identifier) name: , (identifier) value: , (identifier) = >>>>>> (expression_list (int_literal) , (int_literal) , (int_literal)))) >>>>>> \n) >>>>>> >>>>>> Here, the var_spec node has two fields named 'name' even though the >>>>>> source specifies three names. Furthermore, The second 'name', as well as >>>>>> 'value' are set to the ',' separator between identifiers. Two of the three >>>>>> identifiers aren't named. >>>>>> >>>>>> 'tree-sitter parse file.go', on the other hand, produces this more >>>>>> accurate tree: >>>>>> >>>>>> (source_file [0, 0] - [2, 21] >>>>>> (package_clause [0, 0] - [0, 11] >>>>>> (package_identifier [0, 8] - [0, 11])) >>>>>> (var_declaration [2, 0] - [2, 21] >>>>>> (var_spec [2, 4] - [2, 21] >>>>>> name: (identifier [2, 4] - [2, 5]) >>>>>> name: (identifier [2, 7] - [2, 8]) >>>>>> name: (identifier [2, 10] - [2, 11]) >>>>>> value: (expression_list [2, 14] - [2, 21] >>>>>> (int_literal [2, 14] - [2, 15]) >>>>>> (int_literal [2, 17] - [2, 18]) >>>>>> (int_literal [2, 20] - [2, 21]))))) >>>>>> >>>>>> This reproduces with 29.1 as well as 30.0.50. >>>>> Yuan, any comments or suggestions? >> Sorry sorry sorry, another missed report. I think this is a bug in >> treesit-explore-mode, I'll work on fixing it! >> >> Yuan > I don't think that's the case, at least not exclusively. I used > treesit-explore-mode to debug patterns that matched in the playground > but not in Emacs. The matching behavior seemed pretty in line with what > treesit-explore-mode reported. I do find that treesit-node-field-name are returning wrong field names, that's why in the first example, you see the "value" field name given to the comment node, rather than the binary_expression behind it. In the actual parse tree, "value" belongs to binary_expression. With the fixed I just pushed to emacs-29, the explorer parse tree for the first example becomes (translation_unit (function_definition type: (primitive_type) declarator: (function_declarator declarator: (identifier) parameters: (parameter_list ( ))) body: (compound_statement { (declaration type: (primitive_type) declarator: (init_declarator declarator: (identifier) = (comment) value: (binary_expression left: (number_literal) operator: + operator: (comment) right: (number_literal))) ;) }))) which should match the playground. If you can find the pattern that matches in the playground but doesn't in Emacs, do please post it and I can see if there's anything wrong. Yuan ------------=_1702256642-14541-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 22 Oct 2023 06:31:04 +0000 Received: from localhost ([127.0.0.1]:45405 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1quRzK-0003Bx-U8 for submit@debbugs.gnu.org; Sun, 22 Oct 2023 02:31:03 -0400 Received: from lists.gnu.org ([2001:470:142::17]:60224) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1quIif-0006CQ-IW for submit@debbugs.gnu.org; Sat, 21 Oct 2023 16:37:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1quIi8-0003Jy-1D for bug-gnu-emacs@gnu.org; Sat, 21 Oct 2023 16:36:40 -0400 Received: from mail-wr1-x42f.google.com ([2a00:1450:4864:20::42f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1quIi6-00055d-0k for bug-gnu-emacs@gnu.org; Sat, 21 Oct 2023 16:36:39 -0400 Received: by mail-wr1-x42f.google.com with SMTP id ffacd0b85a97d-32da7ac5c4fso1379249f8f.1 for ; Sat, 21 Oct 2023 13:36:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=honnef-co.20230601.gappssmtp.com; s=20230601; t=1697920594; x=1698525394; darn=gnu.org; h=mime-version:message-id:date:subject:to:from:from:to:cc:subject :date:message-id:reply-to; bh=PFjdNz5zQlWoi+hYlZqNb8fJo0xNv/JWYyfTS1bT6FE=; b=ANuiafN2IGPYjUD9zxdxI73oNzIZ/9e47Zi7HuI9IsV97a+8mnp48cjS3+/FRPjk2r XdRHuIKltOQgy8up3dDkqSNUgwApRUeaYYlQ2dQWKwAoJD7bZxnXx0o5EP+jYWd+BA81 NydwEULQpDPrHkZyQHr/Uc+KlPZrBwgBT32ccKulw6KlPwJL1g4vp+5Zy3UoEJ2tiTiH BoJA4D5HV+dHsp1cWkIP9RwHyHad5fRmxV54wUNG7E+FHIi4NLw7lF7tNHIWVhzZQ8qj 1wJKRVsiptf9unKTeJJj+N+WcJxF/kN18jeSAKYTgNGR7KP0YIYkkHu7+L9XEHe3BsG1 H2dA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697920594; x=1698525394; h=mime-version:message-id:date:subject:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=PFjdNz5zQlWoi+hYlZqNb8fJo0xNv/JWYyfTS1bT6FE=; b=F758wlJIs53Lt87Ez61/r05cQ+pg3Og7Ef6r0K/EVGKUp9mTqULGbwv78ea9w4KJHG B41WVvE0r/4hcIp1ybMxZMoDYY4cZ15Y+Y+NnnbBTgNn8dZ2jLZbMvSstyw7jLPls3gt rynyjcCyyTHO6pp0o+HdigK3THP74zMUasWuwQB0k3ioo0nEw5frUJWvqttcZc7GYYzy kHBw6D1h9mZf/L2ftwMpO462IDlAuA+XnzroVjY1Mqwa4j4+DAq+jupAgmCwSMiaYXPy Ceul7lgOJzICdYRGYeN3IoKIBh6Y6AGyxyYLXTDdXuIPNUVJzk0iM99FANDMFvsW6+PA wflw== X-Gm-Message-State: AOJu0Yx++62QtO33wMKeiaigHZlxWJCzhcJGZBwmcZFT5Q0JFePPAoMh saaGyEjM/W19wziRwS+IzRkmOKcFxF8XAK445Is= X-Google-Smtp-Source: AGHT+IGUSXmAOMU5WaXkIJSPW/YuCDOu1IZVb+HWGoIBLQsCIeL3D2nApWT0enIDiKNs6UpaNO+L2w== X-Received: by 2002:a5d:44ce:0:b0:329:6b3e:d87d with SMTP id z14-20020a5d44ce000000b003296b3ed87dmr3236056wrr.42.1697920593299; Sat, 21 Oct 2023 13:36:33 -0700 (PDT) Received: from localhost (ip-176-199-155-051.um44.pools.vodafone-ip.de. [176.199.155.51]) by smtp.gmail.com with ESMTPSA id a10-20020adfe5ca000000b0032415213a6fsm4290864wrn.87.2023.10.21.13.36.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 21 Oct 2023 13:36:32 -0700 (PDT) From: Dominik Honnef To: bug-gnu-emacs@gnu.org Subject: 30.0.50; Upstream tree-sitter and treesit disagree about fields Date: Sat, 21 Oct 2023 22:36:30 +0200 Message-ID: <87edhnzp9t.fsf@honnef.co> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: none client-ip=2a00:1450:4864:20::42f; envelope-from=dominik@honnef.co; helo=mail-wr1-x42f.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_FILL_THIS_FORM_SHORT=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sun, 22 Oct 2023 02:31:02 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Using tree-sitter's CLI as well as the publicly hosted playground produce different parse trees than treesit in Emacs. Specifically, the assignment of nodes to named fields differs. Given the following C source: void main() { int x = // foo 1+ // comment 2; } treesit-explore-mode displays the following tree: (translation_unit (function_definition type: (primitive_type) declarator: (function_declarator declarator: (identifier) parameters: (parameter_list ( ))) body: (compound_statement { (declaration type: (primitive_type) declarator: (init_declarator declarator: (identifier) = value: (comment) (binary_expression left: (number_literal) operator: + right: (comment) (number_literal))) ;) }))) Note how in the init_declarator node, the 'value' field is a comment node, and similarly for the 'right' field in the binary_expression node. Running 'tree-sitter parse file.c', on the other hand, produces the following tree: (translation_unit [0, 0] - [6, 0] (function_definition [0, 0] - [5, 1] type: (primitive_type [0, 0] - [0, 4]) declarator: (function_declarator [0, 5] - [0, 11] declarator: (identifier [0, 5] - [0, 9]) parameters: (parameter_list [0, 9] - [0, 11])) body: (compound_statement [0, 12] - [5, 1] (declaration [1, 2] - [4, 6] type: (primitive_type [1, 2] - [1, 5]) declarator: (init_declarator [1, 6] - [4, 5] declarator: (identifier [1, 6] - [1, 7]) (comment [1, 10] - [1, 16]) value: (binary_expression [2, 4] - [4, 5] left: (number_literal [2, 4] - [2, 5]) (comment [3, 4] - [3, 14]) right: (number_literal [4, 4] - [4, 5]))))))) Here, the two comment nodes appear as unnamed nodes. IMHO the second tree is a more useful one, as the named fields contain the semantically important subtrees (e.g. a binary expression is made up of a left and right subtree, not a left subtree, a right comment, and then some unnamed subtree.) Emacs's tree makes writing queries less convenient, as instead of being able to refer to well-defined names, one has to rely on child indices to account for comments. Further mismatch arises from repeated fields and separators. Consider the following Go source: package pkg var a, b, c = 1, 2, 3 treesit-explore-mode displays the following tree: (source_file (package_clause package (package_identifier)) \n (var_declaration var (var_spec name: (identifier) name: , (identifier) value: , (identifier) = (expression_list (int_literal) , (int_literal) , (int_literal)))) \n) Here, the var_spec node has two fields named 'name' even though the source specifies three names. Furthermore, The second 'name', as well as 'value' are set to the ',' separator between identifiers. Two of the three identifiers aren't named. 'tree-sitter parse file.go', on the other hand, produces this more accurate tree: (source_file [0, 0] - [2, 21] (package_clause [0, 0] - [0, 11] (package_identifier [0, 8] - [0, 11])) (var_declaration [2, 0] - [2, 21] (var_spec [2, 4] - [2, 21] name: (identifier [2, 4] - [2, 5]) name: (identifier [2, 7] - [2, 8]) name: (identifier [2, 10] - [2, 11]) value: (expression_list [2, 14] - [2, 21] (int_literal [2, 14] - [2, 15]) (int_literal [2, 17] - [2, 18]) (int_literal [2, 20] - [2, 21]))))) This reproduces with 29.1 as well as 30.0.50. ------------=_1702256642-14541-1--