Package: emacs;
Reported by: Vincenzo Pupillo <v.pupillo <at> gmail.com>
Date: Fri, 29 Nov 2024 21:58:01 UTC
Severity: wishlist
Fixed in version 31.0.50
Done: Juri Linkov <juri <at> linkov.net>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: Yuan Fu <casouri <at> gmail.com> To: Vincenzo Pupillo <v.pupillo <at> gmail.com> Cc: 74610 <at> debbugs.gnu.org Subject: bug#74610: 31.0.50; Submitting mhtml-ts-mode, treesitter alternative to mhtml-mode Date: Sat, 30 Nov 2024 22:01:21 -0800
> On Nov 29, 2024, at 1:57 PM, Vincenzo Pupillo <v.pupillo <at> gmail.com> wrote: > > Ciao, > following the discussion > https://lists.gnu.org/archive/html/emacs-devel/2024-11/msg00079.html I would > like to ask if it would be possible to add to emacs this new mode for editing > html files alternative to mhtml-mode. > > Thank you. > > Vincenzo.<0001-Add-mhtml-ts-mode.patch> Thank you so much! This will be very helpful for others. Here’re some comments. Yuan From 8a1c792aaddf4daef2808f5a74212a2fb8b0a01e Mon Sep 17 00:00:00 2001 From: Vincenzo Pupillo <v.pupillo <at> gmail.com> Date: Fri, 29 Nov 2024 22:48:45 +0100 Subject: [PATCH] Add mhtml-ts-mode. New major-mode alternative to mhtml-mode, based on treesitter, for editing files containing html, javascript and css. * etc/NEWS: Mention the new mode. * lisp/textmodes/mhtml-ts-mode.el: New file. --- etc/NEWS | 8 + lisp/textmodes/mhtml-ts-mode.el | 462 ++++++++++++++++++++++++++++++++ 2 files changed, 470 insertions(+) create mode 100644 lisp/textmodes/mhtml-ts-mode.el diff --git a/etc/NEWS b/etc/NEWS index 4d2a2c893d0..8f9a04dcf01 100644 --- a/etc/NEWS +++ b/etc/NEWS @@ -797,6 +797,14 @@ destination window is chosen using 'display-buffer-alist'. Example: * New Modes and Packages in Emacs 31.1 +** New major modes based on the tree-sitter library + ++++ +*** New major mode 'mhtml-ts-mode'. +An optional major mode based on the tree-sitter library for editing html +files. This mode handles indentation, fontification, and commenting for +embedded JavaScript and CSS. + * Incompatible Lisp Changes in Emacs 31.1 diff --git a/lisp/textmodes/mhtml-ts-mode.el b/lisp/textmodes/mhtml-ts-mode.el new file mode 100644 index 00000000000..b6b220663e3 --- /dev/null +++ b/lisp/textmodes/mhtml-ts-mode.el @@ -0,0 +1,100 @@ +;;; mhtml-ts-mode.el --- Major mode for HTML using tree-sitter -*- lexical-binding: t; -*- + +;; Copyright (C) 2024 Free Software Foundation, Inc. + +;; Author: Vincenzo Pupillo <v.pupillo <at> gmail.com> +;; Maintainer: Vincenzo Pupillo <v.pupillo <at> gmail.com> +;; Created: Nov 2024 +;; Keywords: HTML language tree-sitter + +;; This file is part of GNU Emacs. + +;; GNU Emacs is free software: you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation, either version 3 of the License, or +;; (at your option) any later version. + +;; GNU Emacs is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. + +;;; Commentary: +;; +;; This package provides `mhtml-ts-mode' which is a major mode +;; for editing HTML files with embedded JavaScript and CSS. +;; Tree Sitter is used to parse each of these languages. +;; +;; Please note that this package requires `html-ts-mode', which +;; registers itself as the major mode for editing HTML. +;; +;; This package is compatible and has been tested with the following +;; tree-sitter grammars: +;; * https://github.com/tree-sitter/tree-sitter-html +;; * https://github.com/tree-sitter/tree-sitter-javascript +;; * https://github.com/tree-sitter/tree-sitter-jsdoc +;; * https://github.com/tree-sitter/tree-sitter-css +;; +;; Features +;; +;; * Indent +;; * IMenu +;; * Navigation +;; * Which-function +;; * Tree-sitter parser installation helper + +;;; Code: + +(require 'treesit) +(require 'html-ts-mode) +(require 'css-mode) ;; for embed css into html +(require 'js) ;; for embed javascript into html + +(eval-when-compile + (require 'rx)) + +;; Declare all native functions used by the major mode. +;; This tells the byte-compiler where the functions are defined. +(declare-function treesit-node-end "treesit.c") +(declare-function treesit-node-parent "treesit.c") +(declare-function treesit-node-start "treesit.c") +(declare-function treesit-node-type "treesit.c") +(declare-function treesit-parser-create "treesit.c") + +;; In a multi-language major mode can be useful to have an "installer" to +;; simplify the installation of the grammars supported by the major-mode. +(defvar mhtml-ts-mode--language-source-alist + '((html . ("https://github.com/tree-sitter/tree-sitter-html" "v0.23.0")) + (javascript . ("https://github.com/tree-sitter/tree-sitter-javascript" "v0.23.0")) + (jsdoc . ("https://github.com/tree-sitter/tree-sitter-jsdoc" "v0.23.0")) + (css . ("https://github.com/tree-sitter/tree-sitter-css" "v0.23.0"))) + "Treesitter language parsers required by `mhtml-ts-mode'. +You can customize this variable if you want to stick to a specific +commit and/or use different parsers.") + +(defun mhtml-ts-mode-install-parsers () + "Install all the required treesitter parsers. +`mhtml-ts-mode--language-source-alist' defines which parsers to install." + (interactive) + (let ((treesit-language-source-alist mhtml-ts-mode--language-source-alist)) + (dolist (item mhtml-ts-mode--language-source-alist) + (treesit-install-language-grammar (car item))))) + +;;; Custom variables + +(defgroup mhtml-ts-mode nil + "Major mode for editing HTML files, based on `html-ts-mode'. +Works with JS and CSS and for that use `js-ts-mode' and `css-ts-mode'." + :prefix "html-ts-mode-" + :group 'languages) + +(defcustom mhtml-ts-mode-js-css-indent-offset 2 + "JavaScript and CSS indent spaces related to the <script> and <style> HTML tags. +By default should have same value as `html-ts-mode-indent-offset'." + :tag "HTML javascript or css indent offset" + :version "31.1" + :type 'integer + :safe 'integerp) It's not uncommon to see different indent offset for CSS and Javascript, so it's a good idea to have separate control for them. + +(defvar mhtml-ts-mode--js-css-indent-offset + mhtml-ts-mode-js-css-indent-offset + "Internal copy of `mhtml-ts-mode-js-css-indent-offset'. +The value changes, by `mhtml-ts-mode--tag-relative-indent-offset' according to +the value of `mhtml-ts-mode-tag-relative-indent'.") + +(defun mhtml-ts-mode--tag-relative-indent-offset (sym val) + "Custom setter for `mhtml-ts-mode-tag-relative-indent'. + +Apart from setting the default value of SYM to VAL, also change the +value of SYM in `mhtml-ts-mode' buffers to VAL. SYM should be +`mhtml-ts-mode-tag-relative-indent', and VAL should be t, nil or +`ignore'. When sym is `mhtml-ts-mode-tag-relative-indent' set the +value of `mhtml-ts-mode--js-css-indent-offset' to 0 if VAL is t, +otherwise to `mhtml-ts-mode-js-css-indent-offset'." + (set-default sym val) + (when (eq sym 'mhtml-ts-mode-tag-relative-indent) + (setq-local + mhtml-ts-mode--js-css-indent-offset + (if (eq val t) + mhtml-ts-mode-js-css-indent-offset + 0)))) + +(defcustom mhtml-ts-mode-tag-relative-indent t + "How <script> and <style> bodies are indented relative to the tag. + +When t, indentation looks like: + + <script> + code(); + </script> + +When nil, indentation of the script body starts just below the +tag, like: + + <script> + code(); + </script> + +When `ignore', the script body starts in the first column, like: + + <script> +code(); + </script>" + :type '(choice (const nil) (const t) (const ignore)) + :safe 'symbolp + :set #'mhtml-ts-mode--tag-relative-indent-offset + :version "31.1") + +(defcustom mhtml-ts-mode-css-fontify-colors t + "Whether CSS colors should be fontified using the color as the background. +If non-nil, text representing a CSS color will be fontified +such that its background is the color itself. +Works like `css--fontify-region'." + :tag "HTML colors the CSS properties values." + :version "31.1" + :type 'boolean + :safe 'booleanp) + +;; To enable some basic treesiter functionality, you should define +;; a function that recognizes which grammar is used at-point. +;; This function should be assigned to `treesit-language-at-point-function' +(defun mhtml-ts-mode--language-at-point (point) + "Return the language at POINT assuming the point is within a HTML buffer." + (let* ((node (treesit-node-at point 'html)) + (parent (treesit-node-parent node)) + (node-query (format "(%s (%s))" + (treesit-node-type parent) + (treesit-node-type node)))) + (cond + ((string-equal "(script_element (raw_text))" node-query) 'javascript) + ((string-equal "(style_element (raw_text))" node-query) 'css) + (t 'html)))) + +;; Sometimes you need to override some property attached to a node. +;; The signature of the function should be conforming to signature +;; QUERY-SPEC required by `treesit-font-lock-rules'. "property attached to a node" is vague. I would just say ;; Custom font-lock function that's used to apply color to css color ;; values. This function is used below where we define font-lock rules. +(defun mhtml-ts-mode--colorize-css-value (node override start end &rest _) + "Colorize CSS property value like `css--fontify-region'. +For NODE, OVERRIDE, START, and END, see `treesit-font-lock-rules'." + (if (and mhtml-ts-mode-css-fontify-colors + (string-equal "plain_value" (treesit-node-type node))) + (let ((color (css--compute-color start (treesit-node-text node t)))) + (when color + (with-silent-modifications + (add-text-properties + (treesit-node-start node) (treesit-node-end node) + (list 'face (list :background color + :foreground (readable-foreground-color + color) + :box '(:line-width -1))))))) + (treesit-fontify-with-override + (treesit-node-start node) (treesit-node-end node) + 'font-lock-variable-name-face + override start end))) + +;; Embedded languages should be indented according to the language +;; that embeds them. +;; This function signature complies with `treesit-simple-indent-rules' +;; ANCHOR. +(defun mhtml-ts-mode--js-css-tag-bol (_node _parent &rest _) + "Find the first non-space characters of html tags <script> or <style>. +Return `line-beginning-position' when `treesit-node-at' is html, or +`mhtml-ts-mode-tag-relative-indent' is equal to ignore. +NODE and PARENT are ignored." + (if (or (eq (treesit-language-at (point)) 'html) + (eq mhtml-ts-mode-tag-relative-indent 'ignore)) + (line-beginning-position) + ;; Ok, we are in js or css block. + (save-excursion + (re-search-backward "<script.*>\\|<style.*>" nil t)))) + +;; Treesit supports 4 level of decoration, `treesit-font-lock-level' +;; define which level use. Major-modes categorize their fontification +;; features, these categories are defined by `treesit-font-lock-rules' of +;; each major-mode using :feature keyword. +;; In a multiple language major-mode it's a good idea to provvide, for each +;; level, the union of the :feature of the same level. "which level to use", "Major modes", and "provide" +(defvar mhtml-ts-mode--feature-list + '(;; level 1 + (;; common + comment definition + ;; JS specific + document + ;; CSS specific + query selector) + ;; level 2 + (keyword name property string type) + ;; level 3 + (;; common + attribute assignment constant escape-sequence + base-clause literal variable-name variable + ;; Javascript specific + jsx number pattern string-interpolation) + ;; level 4 + (bracket delimiter error operator function))) + +;; In order to support wich-fuction-mode we should define "which-function-mode" +;; a function that return the defun name. +;; In a multilingual treesit mode, this can be implemented simply by +;; calling language-specific functions. +(defun mhtml-ts-mode--defun-name (node) + "Return the defun name of NODE. +Return nil if there is no name or if NODE is not a defun node." + ;; (message "node type ""%s""" (treesit-node-type node)) + (let ((lang (mhtml-ts-mode--language-at-point (point)))) + (cond + ((eq lang 'html) (html-ts-mode--defun-name node)) + ((eq lang 'javascript) (js--treesit-defun-name node)) + ((eq lang 'css) (css--treesit-defun-name node))))) + +(define-derived-mode mhtml-ts-mode html-mode + '("HTML+" (:eval (let ((lang (mhtml-ts-mode--language-at-point (point)))) + (cond ((eq lang 'html) "") + ((eq lang 'javascript) "JS") + ((eq lang 'css) "CSS"))))) + "Major mode for editing HTML with embedded JavaScript and CSS. +Powered by tree-sitter." + (if (not (and + (treesit-ready-p 'html) + (treesit-ready-p 'javascript) + (treesit-ready-p 'css))) + (error "Tree-sitter parsers for HTML isn't + available. You can install the parsers with M-x + `mhtml-ts-mode-install-parsers'") + + ;; When an language is embedded, you should initialize some variable + ;; just like it's done in the original mode. + + ;; Comment. + ;; indenting settings for js-ts-mode. + (c-ts-common-comment-setup) + (setq-local comment-multi-line t) + + ;; Font-lock. + + ;; There are two kind of treesitter parser: + ;; 1. global parser + ;; 2. local parser + ;; The global parser considers each piece of text, + ;; in a multilingual buffer, as if it were a single buffer in its + ;; own language. Local parsers, on the other hand, consider each + ;; piece of text, in a multilingual buffer, as if they were + ;; separate buffers. + ;; In a multilingual buffer you should create only global ones. + ;; Local ones are created automatically. + ;; Warning: do not create a local parser! It may cause side + ;; effects that are difficult to handle. + + ;; There are two types of treesitter parsers: + ;; 1. global parsers + ;; 2. local parsers + ;; A global parser treats each piece of text, + ;; in a multilingual buffer, as if it were a single buffer in its + ;; language. Local parser, on the other hand, treat each + ;; piece of text, in a multilingual buffer, as if they were separate buffers. + ;; In a multilingual buffer you should only create global ones. + ;; The local ones are created automatically. + ;; Warning: do not create a local parser! It may cause side effects that are difficult to handle. Seems like a duplicate? And I want to highlight the fact that we're talking about embedded parsers here, so I would say: There are two ways to handle embedded code: 1. Use a single parser for all the embedded code in the buffer. In this case, the embedded code blocks are concatenated together and are seen as a single continuous document to the parser. 2. Each embedded code block gets its own parser. Each parser only sees that particular code block. If you go with 2 for a language, the local parsers are created and destroyed automatically by Emacs. So don't create a global parser for that embedded language here. + + ;; Create the parsers, only the global one. "only global ones", I think + ;; jsdoc is a local parser, don't create a parser for it. + (treesit-parser-create 'css) + (treesit-parser-create 'javascript) + + ;; jsdoc is not mandatory for js-ts-mode, so we respect this by + ;; adding jsdoc range rules only when jsdoc is available. + (if (treesit-ready-p 'jsdoc t) + (setq-local treesit-range-settings + (treesit-range-rules + :embed 'javascript + :host 'html + :offset '(1 . -1) + '((script_element + (start_tag (tag_name)) + (raw_text) @cap)) + + :embed 'jsdoc + :host 'javascript + :local t + `(((comment) @cap + (:match ,js--treesit-jsdoc-beginning-regexp @cap))) + + :embed 'css + :host 'html + :offset '(1 . -1) + '((style_element + (start_tag (tag_name)) + (raw_text) @cap)))) + (setq-local treesit-range-settings + (treesit-range-rules + :embed 'javascript + :host 'html + :offset '(1 . -1) + '((script_element + (start_tag (tag_name)) + (raw_text) @cap)) + + :embed 'css + :host 'html + :offset '(1 . -1) + '((style_element + (start_tag (tag_name)) + (raw_text) @cap))))) You can create range rules for each language and append them, that way you don't need to duplicate the code here. It's just like the font-lock settings. + + ;; Many treesit fuctions need to know the language at-point. + ;; So you should define such a function. + (setq-local treesit-language-at-point-function #'mhtml-ts-mode--language-at-point) + + ;; Indent. + + ;; Since mhtl-ts-mode inherits indentation rules from html-ts-mode, js + ;; and css, if you want to change the offset you have to act on the + ;; *-offset variables defined for those languages. + + ;; JavaScript and CSS must be indented relative to their code block. + ;; This is done by inserting a special rule before the normal + ;; indentation rules of these languages. + ;; The value of mhtml-ts-mode--js-css-indent-offset changes based on + ;; mhtml-ts-mode-tag-relative-indent and can be used to indent + ;; JavaScript and CSS code relative to the HTML that contains them, + ;; just like in mhtml-mode. + (setq-local treesit-simple-indent-rules + (append html-ts-mode--indent-rules + ;; Extended rules for js and css, to + ;; indent appropriately when injected + ;; into html + `((javascript ((parent-is "program") + mhtml-ts-mode--js-css-tag-bol + mhtml-ts-mode--js-css-indent-offset) + ,@(cdr (car js--treesit-indent-rules)))) + `((css ((parent-is "stylesheet") + mhtml-ts-mode--js-css-tag-bol + mhtml-ts-mode--js-css-indent-offset) + ,@(cdr (car css--treesit-indent-rules)))))) + ;; Navigation. + + ;; This regular expression tells treesit how to match the node type + ;; of defun nodes. + ;; Used by `treesit-beginning-of-defun' and friends for + ;; navigations. + (setq-local treesit-defun-type-regexp + (rx (or + ;; Javascript + "class_declaration" + "method_definition" + "function_declaration" + "lexical_declaration" + ;; HTML + "element" + ;; CSS + "rule_set"))) You can actually define a defun "thing" in treesit-thing-setting, and it should work the same. + ;; This is for finding defun name, it's used by IMenu as default + ;; function no specific functions are defined. + (setq-local treesit-defun-name-function #'mhtml-ts-mode--defun-name) + + ;; Define what are 'thing' for treesit. + ;; 'Thing' is a symbol representing the thing, like `defun', `sexp', or + ;; `sentence'. + (setq-local treesit-thing-settings + `((html + (sexp ,(regexp-opt '("element" + "text" + "attribute" + "value"))) + (sentence "tag") + (text ,(regexp-opt '("comment" "text")))) + (javascript + (sexp ,(js--regexp-opt-symbol js--treesit-sexp-nodes)) + (sentence ,(js--regexp-opt-symbol js--treesit-sentence-nodes)) + (text ,(js--regexp-opt-symbol '("comment" + "string_fragment")))))) + + ;; Font-lock. + + ;; In a multi-language scenario, font lock settings are usually a + ;; concatenation of language rules. As you can see, it is possible + ;; to extend/modify the default rule or use a different set of + ;; rules. See `php-ts-mode--custom-html-font-lock-settings' for more + ;; advanced usage. + (setq-local treesit-font-lock-settings + (append html-ts-mode--font-lock-settings + js--treesit-font-lock-settings + (append + ;; Rule for coloring CSS property values. + ;; Placed before `css--treesit-settings' + ;; to win against the same rule contained therein. + (treesit-font-lock-rules + :language 'css + :override t + :feature 'variable + '((plain_value) @mhtml-ts-mode--colorize-css-value)) + css--treesit-settings))) + + ;; Tells treesit the list of features to fontify. + (setq-local treesit-font-lock-feature-list mhtml-ts-mode--feature-list) + + ;; Imenu + + ;; Setup Imenu: if no function is specified, try to find an object + ;; using `treesit-defun-name-function'. + ;; TODO: we need to see if it is possible to extend Imenu to + ;; embedded languages as well. + (setq-local treesit-simple-imenu-settings + `(("Element" "\\`tag_name\\'" nil nil))) + + ;; This should be the last thing to do. + ;; Treesit tries to find out what the primary language is, but it is better + ;; to say it explicitly. Correction: multi-language modes must set the primary parser explicitly, the auto-guessing trick only works for single-language modes. You can also move this line next to where you created the other parsers, for better readability. + (setq-local treesit-primary-parser (treesit-parser-create 'html)) + (treesit-font-lock-recompute-features) You don't need to call treesit-font-lock-recompute-features, treesit-major-mode-setup will do that for you. + (treesit-major-mode-setup))) + +(when (and (treesit-ready-p 'html) (treesit-ready-p 'javascript) (treesit-ready-p 'css)) + (add-to-list + 'auto-mode-alist '("\\.[sx]?html?\\(\\.[a-zA-Z_]+\\)?\\'" . mhtml-ts-mode))) + +(provide 'mhtml-ts-mode) +;;; mhtml-ts-mode.el ends here -- 2.47.1
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.