GNU bug report logs -
#78863
31.0.50; Feature request: add option for greedy looking-back to abbrev-before-point
Previous Next
Full log
View this message in rfc822 format
[Please use Reply All to reply, to keep the bug tracker CC'd.]
> Date: Sun, 22 Jun 2025 19:00:51 +0000
> From: Alexander Prähauser <ahprae <at> protonmail.com>
>
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
>
> >> Date: Sun, 22 Jun 2025 16:42:59 +0000
> >> From: Alexander Prähauser <ahprae <at> protonmail.com>
> >> Cc: 78863 <at> debbugs.gnu.org
> >>
> >> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> >>
> >> >> Date: Sun, 22 Jun 2025 14:17:15 +0000
> >> >> From: Alexander Prähauser via "Bug reports for GNU Emacs,
> >> >> the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> >> >>
> >> >>
> >> >> This is a feature request: I have a keyboard layout with many Unicode
> >> >> symbols like ∀ and such, which I would like to define abbrevs with, so
> >> >> for instance I would like to define "o∀" to expand to "overall". This
> >> >> doesn't work when the abbrev is isolated using backward-word
> >> >
> >> > This is because by default, word motion stops at the character-script
> >> > boundaries. But you could override that by suitable changes to
> >> > word-combining-categories, which see. Did you try that?
> >>
> >> Yeah, but I wasn't very successful. I changed the syntax-class of ∀ to
> >> "w", which lets `backward-word' consider ∀ a word-constituent, but then
> >> `backward-word' stops between the o and the ∀ in o∀. I tried to set
> >> other syntactic properties of ∀ equal to those of o using
> >> `put-char-code-property' but that didn't work either. Looking into it
> >> deeper I suspected that it was because the two belong to different
> >> categories as you say, so I used `char-category-set' and
> >> `modify-category-entry' to add and remove
> >> the categories of ∀ until it had the same categories as o, but
> >> `forward-word' still stopped between the two characters. I have no idea
> >> why.
> >
> > I told you: it's a feature. See the doc string of forward-word.
> >
> >> At that point I gave up and decided to use a regexp. I'd actually
> >> like to know why it didn't work with all categories set equally but I'm
> >> a bit out of my depth here. I can read lisp and use edebug to track what
> >> happens in lisp-code, but `forward-word' and the function it uses to
> >> determine word-boundaries are C-primitives and I know next to no C. I
> >> tried following the source code but, again, I have no clue why it didn't
> >> work after I equalized the categories. Maybe because I only did it for
> >> the category table of the local buffer (which was *scratch*)?
> >
> > As I told, the way to affect this is to modify the list in
> > word-combining-categories so that a position between latin and symbol
> > script is not considered a border that requires forward-word to stop.
> > Both latin and symbol have known categories (see "M-x describe-categories")
> > so you could use them to customize word-combining-categories. Its doc
> > string is supposed to explain how; feel free to ask question if it
> > isn't clear enough.
>
> Oh, I see what you mean now! Thanks, I think this is working!
>
> I read the documentation of `word-combining-categories' but what
> confused me was that each character has many categories, so I didn't
> know which one to add (which is why I tried to set them equal until I
> found the right one). But now I see what's meant here:
>
> "Emacs finds no word boundary between characters of different scripts
> if they have categories matching some element of this list.
>
> More precisely, if an element of this list is a cons of category CAT1
> and CAT2, and a multibyte character C1 which has CAT1 is followed by
> C2 which has CAT2, there's no word boundary between C1 and C2."
>
> So if any of the categories of C1 is CAT1 and any of the categories in
> C2 is CAT2 there is no boundary in a string C1C2 but there is one in a
> string C2C1. I think I get it now. Thanks again!
Yes, exactly.
Does this mean we can close this bug? Or would you still like to
discuss the extension of the regexp specifications of abbrevs?
This bug report was last modified 56 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.