GNU bug report logs - #78863
31.0.50; Feature request: add option for greedy looking-back to abbrev-before-point

Package: emacs;

Reported by: Alexander Prähauser <ahprae <at> protonmail.com>

Date: Sun, 22 Jun 2025 14:18:02 UTC

Severity: wishlist

Found in version 31.0.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 78863 in the body.
You can then email your comments to 78863 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#78863; Package emacs. (Sun, 22 Jun 2025 14:18:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Alexander Prähauser <ahprae <at> protonmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 22 Jun 2025 14:18:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Alexander Prähauser <ahprae <at> protonmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 31.0.50; Feature request: add option for greedy looking-back to
 abbrev-before-point
Date: Sun, 22 Jun 2025 14:17:15 +0000

This is a feature request: I have a keyboard layout with many Unicode
symbols like ∀ and such, which I would like to define abbrevs with, so
for instance I would like to define "o∀" to expand to "overall". This
doesn't work when the abbrev is isolated using backward-word, but can be
implemented using the :regexp property of `define-abbrev-table'. I would
like to define many such mixed abbrevs this way, using a regexp like

(rx (group (* (not (or space "-" "_")))))

to isolate abbrevs. The trouble is that `abbrev--before-point' uses

(looking-back re (line-beginning-position))

to isolate an abbrev when a regexp is given, which matches non-greedily.
This could be fixed fairly easily from my understanding by providing an
optional argument to `expand-abbrev' that sets the GREEDY argument of
`looking-back', which would cause `looking-back' to match greedily. I'm
guessing that this would make `looking-back' even slower but it would be
optional and a modern machine should be able to handle the matching, I'd
guess. And it would make the :regexp property of abbrev-tables a lot
more useful, I'd say.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78863; Package emacs. (Sun, 22 Jun 2025 14:57:02 GMT) Full text and rfc822 format available.

Message #8 received at 78863 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alexander Prähauser <ahprae <at> protonmail.com>
Cc: 78863 <at> debbugs.gnu.org
Subject: Re: bug#78863: 31.0.50;
 Feature request: add option for greedy looking-back to
 abbrev-before-point
Date: Sun, 22 Jun 2025 17:55:35 +0300

> Date: Sun, 22 Jun 2025 14:17:15 +0000
> From:  Alexander Prähauser via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> 
> 
> This is a feature request: I have a keyboard layout with many Unicode
> symbols like ∀ and such, which I would like to define abbrevs with, so
> for instance I would like to define "o∀" to expand to "overall". This
> doesn't work when the abbrev is isolated using backward-word

This is because by default, word motion stops at the character-script
boundaries.  But you could override that by suitable changes to
word-combining-categories, which see.  Did you try that?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78863; Package emacs. (Sun, 22 Jun 2025 15:02:02 GMT) Full text and rfc822 format available.

Message #11 received at 78863 <at> debbugs.gnu.org (full text, mbox):

From: Alexander Prähauser <ahprae <at> protonmail.com>
To: 78863 <at> debbugs.gnu.org
Subject: Re: Feature request: add option for greedy looking-back to
 abbrev-before-point
Date: Sun, 22 Jun 2025 15:01:14 +0000

This is really stupid, but I just realized that I can use something like

(rx (or space "-" "_" "\n" line-start string-start)
      (group (* (not (or space "-" "_" "\n")))))

for matching, which works. Sorry for overlooking that. So that kind of
lessens the importance of adding an option along the lines I described,
though it might make it less confusing for people wondering why their
abbrevs don't seem to match.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78863; Package emacs. (Sun, 22 Jun 2025 16:44:02 GMT) Full text and rfc822 format available.

Message #14 received at 78863 <at> debbugs.gnu.org (full text, mbox):

From: Alexander Prähauser <ahprae <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 78863 <at> debbugs.gnu.org
Subject: Re: bug#78863: 31.0.50;
 Feature request: add option for greedy looking-back to
 abbrev-before-point
Date: Sun, 22 Jun 2025 16:42:59 +0000

"Eli Zaretskii" <eliz <at> gnu.org> writes:

>> Date: Sun, 22 Jun 2025 14:17:15 +0000
>> From:  Alexander Prähauser via "Bug reports for GNU Emacs,
>>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>>
>>
>> This is a feature request: I have a keyboard layout with many Unicode
>> symbols like ∀ and such, which I would like to define abbrevs with, so
>> for instance I would like to define "o∀" to expand to "overall". This
>> doesn't work when the abbrev is isolated using backward-word
>
> This is because by default, word motion stops at the character-script
> boundaries.  But you could override that by suitable changes to
> word-combining-categories, which see.  Did you try that?

Yeah, but I wasn't very successful. I changed the syntax-class of ∀ to
"w", which lets `backward-word' consider ∀ a word-constituent, but then
`backward-word' stops between the o and the ∀ in o∀. I tried to set
other syntactic properties of ∀  equal to those of o using
`put-char-code-property' but that didn't work either. Looking into it
deeper I suspected that it was because the two belong to different
categories as you say, so I used `char-category-set' and
`modify-category-entry' to add and remove
the categories of ∀ until it had the same categories as o, but
`forward-word' still stopped between the two characters. I have no idea
why. At that point I gave up and decided to use a regexp. I'd actually
like to know why it didn't work with all categories set equally but I'm
a bit out of my depth here. I can read lisp and use edebug to track what
happens in lisp-code, but `forward-word' and the function it uses to
determine word-boundaries are C-primitives and I know next to no C. I
tried following the source code but, again, I have no clue why it didn't
work after I equalized the categories. Maybe because I only did it for
the category table of the local buffer (which was *scratch*)?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78863; Package emacs. (Sun, 22 Jun 2025 18:20:02 GMT) Full text and rfc822 format available.

Message #17 received at 78863 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alexander Prähauser <ahprae <at> protonmail.com>
Cc: 78863 <at> debbugs.gnu.org
Subject: Re: bug#78863: 31.0.50;
 Feature request: add option for greedy looking-back to
 abbrev-before-point
Date: Sun, 22 Jun 2025 21:18:59 +0300

> Date: Sun, 22 Jun 2025 16:42:59 +0000
> From: Alexander Prähauser <ahprae <at> protonmail.com>
> Cc: 78863 <at> debbugs.gnu.org
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> 
> >> Date: Sun, 22 Jun 2025 14:17:15 +0000
> >> From:  Alexander Prähauser via "Bug reports for GNU Emacs,
> >>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> >>
> >>
> >> This is a feature request: I have a keyboard layout with many Unicode
> >> symbols like ∀ and such, which I would like to define abbrevs with, so
> >> for instance I would like to define "o∀" to expand to "overall". This
> >> doesn't work when the abbrev is isolated using backward-word
> >
> > This is because by default, word motion stops at the character-script
> > boundaries.  But you could override that by suitable changes to
> > word-combining-categories, which see.  Did you try that?
> 
> Yeah, but I wasn't very successful. I changed the syntax-class of ∀ to
> "w", which lets `backward-word' consider ∀ a word-constituent, but then
> `backward-word' stops between the o and the ∀ in o∀. I tried to set
> other syntactic properties of ∀  equal to those of o using
> `put-char-code-property' but that didn't work either. Looking into it
> deeper I suspected that it was because the two belong to different
> categories as you say, so I used `char-category-set' and
> `modify-category-entry' to add and remove
> the categories of ∀ until it had the same categories as o, but
> `forward-word' still stopped between the two characters. I have no idea
> why.

I told you: it's a feature.  See the doc string of forward-word.

> At that point I gave up and decided to use a regexp. I'd actually
> like to know why it didn't work with all categories set equally but I'm
> a bit out of my depth here. I can read lisp and use edebug to track what
> happens in lisp-code, but `forward-word' and the function it uses to
> determine word-boundaries are C-primitives and I know next to no C. I
> tried following the source code but, again, I have no clue why it didn't
> work after I equalized the categories. Maybe because I only did it for
> the category table of the local buffer (which was *scratch*)?

As I told, the way to affect this is to modify the list in
word-combining-categories so that a position between latin and symbol
script is not considered a border that requires forward-word to stop.
Both latin and symbol have known categories (see "M-x describe-categories")
so you could use them to customize word-combining-categories.  Its doc
string is supposed to explain how; feel free to ask question if it
isn't clear enough.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78863; Package emacs. (Mon, 23 Jun 2025 11:21:02 GMT) Full text and rfc822 format available.

Message #20 received at 78863 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alexander Prähauser <ahprae <at> protonmail.com>
Cc: 78863 <at> debbugs.gnu.org
Subject: Re: bug#78863: 31.0.50;
 Feature request: add option for greedy looking-back to
 abbrev-before-point
Date: Mon, 23 Jun 2025 14:19:45 +0300

[Please use Reply All to reply, to keep the bug tracker CC'd.]

> Date: Sun, 22 Jun 2025 19:00:51 +0000
> From: Alexander Prähauser <ahprae <at> protonmail.com>
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> 
> >> Date: Sun, 22 Jun 2025 16:42:59 +0000
> >> From: Alexander Prähauser <ahprae <at> protonmail.com>
> >> Cc: 78863 <at> debbugs.gnu.org
> >>
> >> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> >>
> >> >> Date: Sun, 22 Jun 2025 14:17:15 +0000
> >> >> From:  Alexander Prähauser via "Bug reports for GNU Emacs,
> >> >>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
> >> >>
> >> >>
> >> >> This is a feature request: I have a keyboard layout with many Unicode
> >> >> symbols like ∀ and such, which I would like to define abbrevs with, so
> >> >> for instance I would like to define "o∀" to expand to "overall". This
> >> >> doesn't work when the abbrev is isolated using backward-word
> >> >
> >> > This is because by default, word motion stops at the character-script
> >> > boundaries.  But you could override that by suitable changes to
> >> > word-combining-categories, which see.  Did you try that?
> >>
> >> Yeah, but I wasn't very successful. I changed the syntax-class of ∀ to
> >> "w", which lets `backward-word' consider ∀ a word-constituent, but then
> >> `backward-word' stops between the o and the ∀ in o∀. I tried to set
> >> other syntactic properties of ∀  equal to those of o using
> >> `put-char-code-property' but that didn't work either. Looking into it
> >> deeper I suspected that it was because the two belong to different
> >> categories as you say, so I used `char-category-set' and
> >> `modify-category-entry' to add and remove
> >> the categories of ∀ until it had the same categories as o, but
> >> `forward-word' still stopped between the two characters. I have no idea
> >> why.
> >
> > I told you: it's a feature.  See the doc string of forward-word.
> >
> >> At that point I gave up and decided to use a regexp. I'd actually
> >> like to know why it didn't work with all categories set equally but I'm
> >> a bit out of my depth here. I can read lisp and use edebug to track what
> >> happens in lisp-code, but `forward-word' and the function it uses to
> >> determine word-boundaries are C-primitives and I know next to no C. I
> >> tried following the source code but, again, I have no clue why it didn't
> >> work after I equalized the categories. Maybe because I only did it for
> >> the category table of the local buffer (which was *scratch*)?
> >
> > As I told, the way to affect this is to modify the list in
> > word-combining-categories so that a position between latin and symbol
> > script is not considered a border that requires forward-word to stop.
> > Both latin and symbol have known categories (see "M-x describe-categories")
> > so you could use them to customize word-combining-categories.  Its doc
> > string is supposed to explain how; feel free to ask question if it
> > isn't clear enough.
> 
> Oh, I see what you mean now! Thanks, I think this is working!
> 
> I read the documentation of `word-combining-categories' but what
> confused me was that each character has many categories, so I didn't
> know which one to add (which is why I tried to set them equal until I
> found the right one). But now I see what's meant here:
> 
> "Emacs finds no word boundary between characters of different scripts
> if they have categories matching some element of this list.
> 
> More precisely, if an element of this list is a cons of category CAT1
> and CAT2, and a multibyte character C1 which has CAT1 is followed by
> C2 which has CAT2, there's no word boundary between C1 and C2."
> 
> So if any of the categories of C1 is CAT1 and any of the categories in
> C2 is CAT2 there is no boundary in a string C1C2 but there is one in a
> string C2C1. I think I get it now. Thanks again!

Yes, exactly.

Does this mean we can close this bug?  Or would you still like to
discuss the extension of the regexp specifications of abbrevs?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78863; Package emacs. (Mon, 23 Jun 2025 16:28:03 GMT) Full text and rfc822 format available.

Message #23 received at 78863 <at> debbugs.gnu.org (full text, mbox):

From: Alexander Prähauser <ahprae <at> protonmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 78863 <at> debbugs.gnu.org
Subject: Re: bug#78863: 31.0.50;
 Feature request: add option for greedy looking-back to
 abbrev-before-point
Date: Mon, 23 Jun 2025 16:26:55 +0000

"Eli Zaretskii" <eliz <at> gnu.org> writes:

> [Please use Reply All to reply, to keep the bug tracker CC'd.]

Oh, sorry, I meant to.

> Does this mean we can close this bug?  Or would you still like to
> discuss the extension of the regexp specifications of abbrevs?

You can as far as I'm concerned. I'm legitimately unsure whether it
would be useful for other people to have the option I described, but
with these tools already in place I think it's probably unnecessary.
>
>> Date: Sun, 22 Jun 2025 19:00:51 +0000
>> From: Alexander Prähauser <ahprae <at> protonmail.com>
>>
>> "Eli Zaretskii" <eliz <at> gnu.org> writes:
>>
>> >> Date: Sun, 22 Jun 2025 16:42:59 +0000
>> >> From: Alexander Prähauser <ahprae <at> protonmail.com>
>> >> Cc: 78863 <at> debbugs.gnu.org
>> >>
>> >> "Eli Zaretskii" <eliz <at> gnu.org> writes:
>> >>
>> >> >> Date: Sun, 22 Jun 2025 14:17:15 +0000
>> >> >> From:  Alexander Prähauser via "Bug reports for GNU Emacs,
>> >> >>  the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>> >> >>
>> >> >>
>> >> >> This is a feature request: I have a keyboard layout with many Unicode
>> >> >> symbols like ∀ and such, which I would like to define abbrevs with, so
>> >> >> for instance I would like to define "o∀" to expand to "overall". This
>> >> >> doesn't work when the abbrev is isolated using backward-word
>> >> >
>> >> > This is because by default, word motion stops at the character-script
>> >> > boundaries.  But you could override that by suitable changes to
>> >> > word-combining-categories, which see.  Did you try that?
>> >>
>> >> Yeah, but I wasn't very successful. I changed the syntax-class of ∀ to
>> >> "w", which lets `backward-word' consider ∀ a word-constituent, but then
>> >> `backward-word' stops between the o and the ∀ in o∀. I tried to set
>> >> other syntactic properties of ∀  equal to those of o using
>> >> `put-char-code-property' but that didn't work either. Looking into it
>> >> deeper I suspected that it was because the two belong to different
>> >> categories as you say, so I used `char-category-set' and
>> >> `modify-category-entry' to add and remove
>> >> the categories of ∀ until it had the same categories as o, but
>> >> `forward-word' still stopped between the two characters. I have no idea
>> >> why.
>> >
>> > I told you: it's a feature.  See the doc string of forward-word.
>> >
>> >> At that point I gave up and decided to use a regexp. I'd actually
>> >> like to know why it didn't work with all categories set equally but I'm
>> >> a bit out of my depth here. I can read lisp and use edebug to track what
>> >> happens in lisp-code, but `forward-word' and the function it uses to
>> >> determine word-boundaries are C-primitives and I know next to no C. I
>> >> tried following the source code but, again, I have no clue why it didn't
>> >> work after I equalized the categories. Maybe because I only did it for
>> >> the category table of the local buffer (which was *scratch*)?
>> >
>> > As I told, the way to affect this is to modify the list in
>> > word-combining-categories so that a position between latin and symbol
>> > script is not considered a border that requires forward-word to stop.
>> > Both latin and symbol have known categories (see "M-x describe-categories")
>> > so you could use them to customize word-combining-categories.  Its doc
>> > string is supposed to explain how; feel free to ask question if it
>> > isn't clear enough.
>>
>> Oh, I see what you mean now! Thanks, I think this is working!
>>
>> I read the documentation of `word-combining-categories' but what
>> confused me was that each character has many categories, so I didn't
>> know which one to add (which is why I tried to set them equal until I
>> found the right one). But now I see what's meant here:
>>
>> "Emacs finds no word boundary between characters of different scripts
>> if they have categories matching some element of this list.
>>
>> More precisely, if an element of this list is a cons of category CAT1
>> and CAT2, and a multibyte character C1 which has CAT1 is followed by
>> C2 which has CAT2, there's no word boundary between C1 and C2."
>>
>> So if any of the categories of C1 is CAT1 and any of the categories in
>> C2 is CAT2 there is no boundary in a string C1C2 but there is one in a
>> string C2C1. I think I get it now. Thanks again!
>
> Yes, exactly.
>
> Does this mean we can close this bug?  Or would you still like to
> discuss the extension of the regexp specifications of abbrevs?

Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sat, 28 Jun 2025 09:56:04 GMT) Full text and rfc822 format available.

Notification sent to Alexander Prähauser <ahprae <at> protonmail.com>:
bug acknowledged by developer. (Sat, 28 Jun 2025 09:56:04 GMT) Full text and rfc822 format available.

Message #28 received at 78863-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Alexander Prähauser <ahprae <at> protonmail.com>
Cc: 78863-done <at> debbugs.gnu.org
Subject: Re: bug#78863: 31.0.50;
 Feature request: add option for greedy looking-back to
 abbrev-before-point
Date: Sat, 28 Jun 2025 12:54:55 +0300

> Date: Mon, 23 Jun 2025 16:26:55 +0000
> From: Alexander Prähauser <ahprae <at> protonmail.com>
> Cc: 78863 <at> debbugs.gnu.org
> 
> "Eli Zaretskii" <eliz <at> gnu.org> writes:
> 
> > [Please use Reply All to reply, to keep the bug tracker CC'd.]
> 
> Oh, sorry, I meant to.
> 
> > Does this mean we can close this bug?  Or would you still like to
> > discuss the extension of the regexp specifications of abbrevs?
> 
> You can as far as I'm concerned. I'm legitimately unsure whether it
> would be useful for other people to have the option I described, but
> with these tools already in place I think it's probably unnecessary.

OK, so closing the bug.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 26 Jul 2025 11:24:10 GMT) Full text and rfc822 format available.

This bug report was last modified 56 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #78863 31.0.50; Feature request: add option for greedy looking-back to abbrev-before-point

GNU bug report logs - #78863
31.0.50; Feature request: add option for greedy looking-back to abbrev-before-point