GNU bug report logs -
#76108
Call `modify_text' only on the text being replaced in `subst-char-in-region'
Previous Next
To reply to this bug, email your comments to 76108 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76108
; Package
emacs
.
(Thu, 06 Feb 2025 23:42:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Thuna <thuna.cing <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 06 Feb 2025 23:42:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Currently `subst-char-in-region' calls `modify_text' on the entire text,
starting from the first match to the end of the text, which fires off
`before-change-functions' for (almost) every character in the region
it's called in.
The specific place where this caused a problem for me is while filling
paragraphs with latex fragments in org-mode. Latex fragments are
overlays with the `modification-hooks' property set to delete the
overlay, so when a paragraph is filled, that calls
(subst-char-in-region ?\n ?\s ...)
on the entire paragraph which then calls `modify_text' on the entire
paragraph (sans the first line), which causes all latex fragments to
disappear.
I'm assuming that there isn't any particular reason why `modify_text'
should be called on the whole paragraph, but if there is, then that's ok
too, I suppose. I don't see any reason why this should be breaking
stuff, but I haven't actually checked.
[0001-Only-modify-the-text-being-replaced-in-subst-char-in.patch (text/x-patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76108
; Package
emacs
.
(Fri, 07 Feb 2025 08:38:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 76108 <at> debbugs.gnu.org (full text, mbox):
> From: Thuna <thuna.cing <at> gmail.com>
> Date: Fri, 07 Feb 2025 00:41:12 +0100
>
> Currently `subst-char-in-region' calls `modify_text' on the entire text,
> starting from the first match to the end of the text, which fires off
> `before-change-functions' for (almost) every character in the region
> it's called in.
That's right. But if the character in question is substituted more
than once, we call modify_text only once, for the entire region.
> The specific place where this caused a problem for me is while filling
> paragraphs with latex fragments in org-mode. Latex fragments are
> overlays with the `modification-hooks' property set to delete the
> overlay, so when a paragraph is filled, that calls
> (subst-char-in-region ?\n ?\s ...)
> on the entire paragraph which then calls `modify_text' on the entire
> paragraph (sans the first line), which causes all latex fragments to
> disappear.
Why do you think this is a bug in Emacs core and not in how org-mode
handles latex fragments?
> I'm assuming that there isn't any particular reason why `modify_text'
> should be called on the whole paragraph, but if there is, then that's ok
> too, I suppose. I don't see any reason why this should be breaking
> stuff, but I haven't actually checked.
AFAIU, your change will cause modify_text to be called only once, like
we do now, but only for the first replacement of the character in
question. So if the loop in subst-char-in-region replaces the
character more than once, the fact that there were changes in other
places in the region will not be reported.
Or maybe I'm missing something. But to establish whether this change
introduces a regression, we need some test, and the test should
include both single and multiple substitutions, and also both ASCII
and non-ASCII characters. Then we will have a good base for assessing
this patch, and also our test suite will become better -- a nice
bonus.
So could you please write such a test, and then see whether your
change still passes it?
TIA
P.S. Adding Stefan to the discussion, because I think we once
discussed this function in this respect, and decided that the way it
calls modify_text is correct -- but I cannot find that discussion in
my archives, so maybe I was dreaming.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76108
; Package
emacs
.
(Fri, 07 Feb 2025 10:16:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 76108 <at> debbugs.gnu.org (full text, mbox):
> Why do you think this is a bug in Emacs core and not in how org-mode
> handles latex fragments?
I figured this was a bug in Emacs so I came here first, but I'll go
ahead and patch org-mode if this turns out to be intended behavior.
>> I'm assuming that there isn't any particular reason why `modify_text'
>> should be called on the whole paragraph, but if there is, then that's ok
>> too, I suppose. I don't see any reason why this should be breaking
>> stuff, but I haven't actually checked.
>
> AFAIU, your change will cause modify_text to be called only once, like
> we do now, but only for the first replacement of the character in
> question. So if the loop in subst-char-in-region replaces the
> character more than once, the fact that there were changes in other
> places in the region will not be reported.
It's possible that I'm reading the code wrong, but it seems to me that
it's being called for every change. If not, I could just make it so.
I'm not familiar with `modify_text' so I could be misunderstanding, but
shouldn't it only be called on the text that is actively being modified
(in this case replaced)?
> Or maybe I'm missing something. But to establish whether this change
> introduces a regression, we need some test, and the test should
> include both single and multiple substitutions, and also both ASCII
> and non-ASCII characters. Then we will have a good base for assessing
> this patch, and also our test suite will become better -- a nice
> bonus.
>
> So could you please write such a test, and then see whether your
> change still passes it?
Sure, I'll do that. For a start, just to demonstrate the problem,
here's a test which fails on a July build (I don't think anything
changed since?) and passes with my patch. Specifically, it's the
`should-not-error' below that fails. (Also, is there a canonical way to
write `should-not-error' that I don't know about?)
(ert-deftest test-subst-char-in-region-before-change-functions ()
(let ((hook (lambda (_beg _end) (error "Poor man's read-only"))))
(with-temp-buffer
(insert (propertize "foo" 'modification-hooks (list hook)) "," "bar")
(should-error (subst-char-in-region (point-min) (point-max) ?o ?e)))
(with-temp-buffer
(insert "foo" (propertize "," 'modification-hooks (list hook)) "bar")
;; My best approximation of `should-not-error'.
(should (ignore-errors
(subst-char-in-region (point-min) (point-max) ?o ?e)
t))
(should (equal "fee,bar" (buffer-string))))))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76108
; Package
emacs
.
(Fri, 07 Feb 2025 13:35:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 76108 <at> debbugs.gnu.org (full text, mbox):
> The specific place where this caused a problem for me is while filling
> paragraphs with latex fragments in org-mode. Latex fragments are
> overlays with the `modification-hooks' property set to delete the
> overlay, so when a paragraph is filled, that calls
> (subst-char-in-region ?\n ?\s ...)
> on the entire paragraph which then calls `modify_text' on the entire
> paragraph (sans the first line), which causes all latex fragments to
> disappear.
I'm probably misunderstanding something, but what you describe here
sounds like a bug in Org.
> I'm assuming that there isn't any particular reason why `modify_text'
> should be called on the whole paragraph, but if there is, then that's ok
> too, I suppose. I don't see any reason why this should be breaking
> stuff, but I haven't actually checked.
The reason is that running the modification hooks N times (to replace
N chars) would be expensive.
> (ert-deftest test-subst-char-in-region-before-change-functions ()
> (let ((hook (lambda (_beg _end) (error "Poor man's read-only"))))
> (with-temp-buffer
> (insert (propertize "foo" 'modification-hooks (list hook)) "," "bar")
> (should-error (subst-char-in-region (point-min) (point-max) ?o ?e)))
>
> (with-temp-buffer
> (insert "foo" (propertize "," 'modification-hooks (list hook)) "bar")
> ;; My best approximation of `should-not-error'.
> (should (ignore-errors
> (subst-char-in-region (point-min) (point-max) ?o ?e)
> t))
> (should (equal "fee,bar" (buffer-string))))))
The `modification-hooks` text property is weaker than the
`modification-hooks` overlay property: if you use an overlay here, then
you should be able to get the behavior your test wants by checking in
your `hook` the value of the second argument (which indicates if it's
the "before change" or the "after change" case): while the "before
change" call covers the whole region (actually, IIRC it covers the
region between the first substituted chars and the end argument), the
"after change" should be more specific and cover only the region between
the first and the last chars that were substituted.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76108
; Package
emacs
.
(Fri, 07 Feb 2025 15:20:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 76108 <at> debbugs.gnu.org (full text, mbox):
>> I'm assuming that there isn't any particular reason why `modify_text'
>> should be called on the whole paragraph, but if there is, then that's ok
>> too, I suppose. I don't see any reason why this should be breaking
>> stuff, but I haven't actually checked.
>
> The reason is that running the modification hooks N times (to replace
> N chars) would be expensive.
What if we called `modify_text' for each contiguous region of FROMCHARs?
It should be faster than calling it for each character and even
occasionally (though I wouldn't count on it) faster than calling it over
the whole region since it would avoid running hooks unnecessarily. Do
you know of a way we could benchmark it?
> The `modification-hooks` text property is weaker than the
> `modification-hooks` overlay property: if you use an overlay here, then
> you should be able to get the behavior your test wants by checking in
> your `hook` the value of the second argument (which indicates if it's
> the "before change" or the "after change" case): while the "before
> change" call covers the whole region (actually, IIRC it covers the
> region between the first substituted chars and the end argument), the
> "after change" should be more specific and cover only the region between
> the first and the last chars that were substituted.
And that is the intended and desired behavior? Personally I find it
quite unintuitive to have `modification-hooks' be "Hooks that are called
when some characters in region _might_ be modified".
I don't think that it's just me, either; in buffer-tests.el's
`overlay-modification-hooks' there is a comment stating
> Replacing text never calls `insert-in-front-hooks' or
> `insert-behind-hooks'. It calls `modification-hooks' if the overlay
> covers any text that has changed.
which seems to suggest that the tests there were also written with the
understanding that `modification-hooks' should only be called for
characters actually being replaced. Of course, those tests are for
`replace-match', but still, I see no reason why the two should have
different semantics (sans performance considerations, of course).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76108
; Package
emacs
.
(Fri, 07 Feb 2025 16:07:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 76108 <at> debbugs.gnu.org (full text, mbox):
>> The reason is that running the modification hooks N times (to replace
>> N chars) would be expensive.
> What if we called `modify_text' for each contiguous region of FROMCHARs?
AFAIK, in practice most such regions would span a single char anyway, so
I don't think it would make a big difference.
[ Furthermore, IIRC, another reasons we like to call the hooks only once
is that those hooks would otherwise be called while in the middle of
`subst-char-in-region`, so we'd need to be more careful with potential
unexpected changes happening while running the hooks. Nothing that
can't be handled, but a known source of corner case bugs. ]
> Do you know of a way we could benchmark it?
I think there are too many different cases to handle it with a benchmark.
>> The `modification-hooks` text property is weaker than the
>> `modification-hooks` overlay property: if you use an overlay here, then
>> you should be able to get the behavior your test wants by checking in
>> your `hook` the value of the second argument (which indicates if it's
>> the "before change" or the "after change" case): while the "before
>> change" call covers the whole region (actually, IIRC it covers the
>> region between the first substituted chars and the end argument), the
>> "after change" should be more specific and cover only the region between
>> the first and the last chars that were substituted.
>
> And that is the intended and desired behavior?
Yes.
> Personally I find it quite unintuitive to have `modification-hooks' be
> "Hooks that are called when some characters in region _might_ be
> modified".
You have to understand the distinction between the "before" and the
"after" calls (which correspond to the `before-change-functions` and the
`after-change-functions` hooks): the "before" call is there to announce
that some change*s* *may* happen in the specified region.
In the vast majority of cases you should care only about the "after" call.
[ Sadly, for the `modification-hooks` placed on text properties (as
opposed to overlays), only the "before" call is available. ]
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76108
; Package
emacs
.
(Fri, 07 Feb 2025 17:18:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 76108 <at> debbugs.gnu.org (full text, mbox):
>> Personally I find it quite unintuitive to have `modification-hooks' be
>> "Hooks that are called when some characters in region _might_ be
>> modified".
>
> You have to understand the distinction between the "before" and the
> "after" calls (which correspond to the `before-change-functions` and the
> `after-change-functions` hooks): the "before" call is there to announce
> that some change*s* *may* happen in the specified region.
It still reads to me as though it's called before some changes _will_
happen - that is, when changing text, you call before-change-functions
(and co.) on the text, change it, and call after-change-functions (and
co.) afterwards - but I understand that it's up to interpretation and
mine just happens to not match with the implementation.
> In the vast majority of cases you should care only about the "after" call.
> [ Sadly, for the `modification-hooks` placed on text properties (as
> opposed to overlays), only the "before" call is available. ]
Now, I am not entirely sure how I've managed to miss this, but
`subst-char-in-region' actually _also_ messes up the after calls. I
have no idea how or why this is happening, but a buffer whose contents
are
"foo bar\nquux baz ban"
with an overlay from 14 to 17 (that is, around the word "baz") with the
modification hook
(lambda (&rest args) (print args))
reports being called with the arguments
(#<overlay from 17 to 17 in temp> nil 8 21)
(#<overlay from 14 to 17 in temp> t 8 9 1)
when I do
(subst-char-in-region (point-min) (point-max) ?\n ?\s)
which means that it's being called even though it doesn't overlap at all
with the range of characters reported as being modified.
I thought it might have been because `subst-char-in-region' calls
`signal_after_change' from the first to the last change (and that is
likely not unrelated) but even if that were the case the first and last
change in the above situation is the sole newline at the middle, so it
shouldn't have reached the overlay regardless. Maybe
`report_overlay_modification' calls everything called in the before call
as well, but I honestly can't tell.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76108
; Package
emacs
.
(Fri, 07 Feb 2025 19:45:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 76108 <at> debbugs.gnu.org (full text, mbox):
>> You have to understand the distinction between the "before" and the
>> "after" calls (which correspond to the `before-change-functions` and the
>> `after-change-functions` hooks): the "before" call is there to announce
>> that some change*s* *may* happen in the specified region.
> It still reads to me as though it's called before some changes _will_
> happen - that is, when changing text, you call before-change-functions
> (and co.) on the text, change it, and call after-change-functions (and
> co.) afterwards - but I understand that it's up to interpretation and
> mine just happens to not match with the implementation.
Your interpretation is right for the simple case. But the API is
designed to accommodate other cases, and `subst-char-in-region` takes
advantage of that.
Of course, calling the `before-change-functions` without calling
`after-change-functions` is something we strive to avoid, and similarly
we try to make the region passed to `before-change-functions` as small
as possible, but both of those are done for performance/optimization
reasons, they are not needed for correctness.
> Now, I am not entirely sure how I've managed to miss this, but
> `subst-char-in-region' actually _also_ messes up the after calls. I
> have no idea how or why this is happening, but a buffer whose contents
> are
>
> "foo bar\nquux baz ban"
>
> with an overlay from 14 to 17 (that is, around the word "baz") with the
> modification hook
>
> (lambda (&rest args) (print args))
>
> reports being called with the arguments
>
> (#<overlay from 17 to 17 in temp> nil 8 21)
> (#<overlay from 14 to 17 in temp> t 8 9 1)
>
> when I do
>
> (subst-char-in-region (point-min) (point-max) ?\n ?\s)
The `nil 8 21` and `t 8 9 1` are the values I'd expect, but the `17 to
17` seems very odd and the fact that the hook is called at all for the
"after" case sounds like a bug, as you point out.
> Maybe `report_overlay_modification' calls everything called in the
> before call as well, but I honestly can't tell.
That would be my guess, indeed.
[ Haven't looked at the code yet. ]
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76108
; Package
emacs
.
(Sat, 08 Feb 2025 18:43:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 76108 <at> debbugs.gnu.org (full text, mbox):
>> Now, I am not entirely sure how I've managed to miss this, but
>> `subst-char-in-region' actually _also_ messes up the after calls. I
>> have no idea how or why this is happening, but a buffer whose contents
>> are
>>
>> "foo bar\nquux baz ban"
>>
>> with an overlay from 14 to 17 (that is, around the word "baz") with the
>> modification hook
>>
>> (lambda (&rest args) (print args))
>>
>> reports being called with the arguments
>>
>> (#<overlay from 17 to 17 in temp> nil 8 21)
>> (#<overlay from 14 to 17 in temp> t 8 9 1)
>>
>> when I do
>>
>> (subst-char-in-region (point-min) (point-max) ?\n ?\s)
>
> The `nil 8 21` and `t 8 9 1` are the values I'd expect, but the `17 to
> 17` seems very odd and the fact that the hook is called at all for the
> "after" case sounds like a bug, as you point out.
It doesn't change anything but I should make a correction: That first
case is also an overlay from 14 to 17.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76108
; Package
emacs
.
(Wed, 12 Feb 2025 00:19:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 76108 <at> debbugs.gnu.org (full text, mbox):
>> The `nil 8 21` and `t 8 9 1` are the values I'd expect, but the `17 to
>> 17` seems very odd and the fact that the hook is called at all for the
>> "after" case sounds like a bug, as you point out.
>
> It doesn't change anything but I should make a correction: That first
> case is also an overlay from 14 to 17.
Ah, well it does change things in the sense that it was quite weird.
A copy/paste error is a nice explanation and I can stop worrying
about it. 🙂
> I thought it might have been because `subst-char-in-region' calls
> `signal_after_change' from the first to the last change (and that is
> likely not unrelated) but even if that were the case the first and last
> change in the above situation is the sole newline at the middle, so it
> shouldn't have reached the overlay regardless. Maybe
> `report_overlay_modification' calls everything called in the before call
> as well, but I honestly can't tell.
It turns out that, indeed, the C code makes extra efforts to call the
same `modification-hooks` functions from overlays before and after
the change, hence the weird behavior (from `editfns.c`):
/* Lisp vector holding overlay hook functions to call.
Vector elements come in pairs.
Each even-index element is a list of hook functions.
The following odd-index element is the overlay they came from.
Before the buffer change, we fill in this vector
as we call overlay hook functions.
After the buffer change, we get the functions to call from this vector.
This way we always call the same functions before and after the change. */
static Lisp_Object last_overlay_modification_hooks;
/* Number of elements actually used in last_overlay_modification_hooks. */
static ptrdiff_t last_overlay_modification_hooks_used;
so, while it looks like a bug to me, it's very much done on purpose.
Not sure whether we should fix it or document it.
And I guess we could use the same approach to run the
`modification-hooks` from text properties also *after* the modification.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76108
; Package
emacs
.
(Wed, 12 Feb 2025 00:38:01 GMT)
Full text and
rfc822 format available.
Message #35 received at 76108 <at> debbugs.gnu.org (full text, mbox):
>> I thought it might have been because `subst-char-in-region' calls
>> `signal_after_change' from the first to the last change (and that is
>> likely not unrelated) but even if that were the case the first and last
>> change in the above situation is the sole newline at the middle, so it
>> shouldn't have reached the overlay regardless. Maybe
>> `report_overlay_modification' calls everything called in the before call
>> as well, but I honestly can't tell.
>
> It turns out that, indeed, the C code makes extra efforts to call the
> same `modification-hooks` functions from overlays before and after
> the change, hence the weird behavior (from `editfns.c`):
>
> /* Lisp vector holding overlay hook functions to call.
> Vector elements come in pairs.
> Each even-index element is a list of hook functions.
> The following odd-index element is the overlay they came from.
>
> Before the buffer change, we fill in this vector
> as we call overlay hook functions.
> After the buffer change, we get the functions to call from this vector.
> This way we always call the same functions before and after the change. */
> static Lisp_Object last_overlay_modification_hooks;
>
> /* Number of elements actually used in last_overlay_modification_hooks. */
> static ptrdiff_t last_overlay_modification_hooks_used;
>
> so, while it looks like a bug to me, it's very much done on purpose.
> Not sure whether we should fix it or document it.
>
> And I guess we could use the same approach to run the
> `modification-hooks` from text properties also *after* the modification.
I'm still not happy with the loose way in which these hooks are called.
I imagine that any modification hook that's useful will essentially be
implementing a half-working version of what calling them strictly would
achieve. My biggest problem is that I don't have a reference for how
significant the performance issues are, so the limitations end up
feeling much more impactful.
At the very least, the documentation of precisely when these functions
are called and what sort of guarantees they give should be updated,
since it currently reads
> ‘modification-hooks’
> This property's value is a list of functions to be called if any
> character within the overlay is changed or if text is inserted
> strictly within the overlay.
in (info "(elisp) Overlay Properties").
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76108
; Package
emacs
.
(Wed, 12 Feb 2025 03:44:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 76108 <at> debbugs.gnu.org (full text, mbox):
>> And I guess we could use the same approach to run the
>> `modification-hooks` from text properties also *after* the modification.
>
> I'm still not happy with the loose way in which these hooks are called.
Think of it this way: while `subst-char-in-region` can turn
foo
bar
into
foo bar
by modifying a single character, the major mode needs to be able to
handle the situation where the user replaces `foo\nbar` with `foo bar`.
So while it's better for `subst-char-in-region` to use tighter bounds,
it should not be needed for correctness.
IOW, if your hook function misbehaves when the region is not tight
enough, it's (also) a problem in your hook function.
> I imagine that any modification hook that's useful will essentially be
> implementing a half-working version of what calling them strictly would
> achieve. My biggest problem is that I don't have a reference for how
> significant the performance issues are, so the limitations end up
> feeling much more impactful.
Most hook functions like coarse regions. E.g. the one used by font-lock
rounds up the region to whole lines; the one used by `syntax-ppss` only
cares about the BEG arg and considers everything after it as lost; the
one used by Eglot also likes them rather coarse to keep the description
of changes (sent to the LSP server) smaller.
The cost of running too many hook functions prompted the introduction of
`combine-after-change-calls` in Emacs-20, and `combine-change-calls` in
Emacs-27.
> At the very least, the documentation of precisely when these functions
> are called and what sort of guarantees they give should be updated,
> since it currently reads
>> ‘modification-hooks’
>> This property's value is a list of functions to be called if any
>> character within the overlay is changed or if text is inserted
>> strictly within the overlay.
> in (info "(elisp) Overlay Properties").
Agreed. I'd be in favor of changing the code also to avoid running the
after change when it turns out it's outside the bounds of the overlay.
Stefan
This bug report was last modified 125 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.