GNU bug report logs -
#63225
Compiling regexp patterns (and REGEXP_CACHE_SIZE in search.c)
Previous Next
Full log
View this message in rfc822 format
Mattias EngdegÄrd <mattias.engdegard <at> gmail.com> writes:
>> (save-excursion
>> (beginning-of-line 0)
>> (not (looking-at-p "[[:blank:]]*$"))))
>
> I wonder if that last part isn't better written as
>
> (save-excursion
> (forward-line 0) ; faster than beginning-of-line
Fair point. It is probably worth looking through Org sources and
replacing all those `beginning-of-line' and `end-of-line' calls. I doubt
that we even intend to use fields for real.
> (skip-chars-forward "[:blank:]") ; faster than looking-at-p
> (not (eolp))) ; very cheap
Hmm. I feel confused.
Does this imply that simple regexps like
(looking-at-p (rx (seq bol (zero-or-more (any "\t ")) "#" (or " " eol))))
should better be implemented as the following?
(and (bolp)
(skip-chars-forward " \t")
(eq (char-after) ?#) (forward-char)
(or (eolp) (eq (char-after) ?\s)))
(I now start thinking that it might be more efficient to create a bunch
or char tables and step over them to match "regexp", just like any
finite automaton would)
> But yes, I sort of understand what you are getting at (except the business with the MODE parameter which is still a bit mysterious to me).
MODE parameter is used because we constrain what kinds of syntax
elements are allowed inside other. For example, see
`org-element-object-restrictions'. And within
`org-element--current-element', MODE is used, for example, to constrain
planning/property drawer to be only the first child of a parent heading,
parsed earlier.
>> [now part of the giant rx]
>>
>> (rx line-start (0+ (any ?\s ?\t))
>> ":" (1+ (any ?- ?_ word)) ":"
>> (0+ (any ?\s ?\t)) line-end)
>
> Any reason you don't capture the part between the colons here, so that you don't need to match it later on?
That would demand all the callers of `org-element-drawer-parser' to set
match data appropriately. Which is doable, but headache for maintenance.
We sometimes do call parsers explicitly not from inside
`org-element--current-element'.
>> But why? Aren't (in word ?_ ?-) and (or word ?_ ?-) not the same?
>
> "[-_[:word:]]" and "\\w\\|[_-]" indeed match the same thing but they don't generate the same regexp bytecode -- the former is faster. (In this case rx makes a literal translation to those strings but we should probably make it optimise to the faster regexp.)
>
> There is a regexp disassembler for the really curious but it doesn't come with Emacs.
I really hope that I did not need to do all these workarounds specific to
current implementation pitfalls of Emacs regexp compiler.
>>> Maybe, but you still cons each time. (And remember that the plist-get equality funarg is new in Emacs 29.)
>>
>> Sure it does.
>> It is just one of the variable parts of Org syntax that might be
>> changed. There are ways to make this into constant, but it is a fragile
>> area of the code that I do not want to touch without a reason.
>> (Especially given that I am not familiar with org-list.el)
>
> So it's fine to use elisp constructs new in Emacs 29 in Org? Then the line
>
> ;; Package-Requires: ((emacs "26.1"))
>
> in org.el should probably be updated, right?
Nope. It is just that the link I shared is for WIP branch I am
developing using Emacs master. I will ensure backwards compatibility
later. For example, by converting `plist-get' to `assoc'.
>> I hope we do. If only Emacs had a way to define `case-fold-search' right
>> within the regexp itself.
>
> I would like that too, but changing that isn't easy.
I am sure that it is easy.
For example, regexp command may optionally accept a vector with first
element being regexp and second element setting a flag for
`case-fold-search'.
Of course, it is just one, maybe not the best way to implement this.
My suggestion in https://debbugs.gnu.org/cgi/bugreport.cgi?bug=63225#56
is also compatible with this kind of approach.
> By the way, it seems that org-element-node-property-parser binds case-fold-search without actually using it. Bug?
It actually does nothing given that `org-property-re' does not contain
letters. I will remove it.
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
This bug report was last modified 2 years and 37 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.