GNU bug report logs - #74243
[PATCH] Speed up vc-hg-state by treating ignored files as unregistered

Previous Next

Package: emacs;

Reported by: Spencer Baugh <sbaugh <at> janestreet.com>

Date: Thu, 7 Nov 2024 17:42:01 UTC

Severity: normal

Tags: patch

Done: Sean Whitton <spwhitton <at> spwhitton.name>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 74243 in the body.
You can then email your comments to 74243 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#74243; Package emacs. (Thu, 07 Nov 2024 17:42:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Spencer Baugh <sbaugh <at> janestreet.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 07 Nov 2024 17:42:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: bug-gnu-emacs <at> gnu.org
Subject: [PATCH] Speed up vc-hg-state by treating ignored files as unregistered
Date: Thu, 07 Nov 2024 12:41:47 -0500
[Message part 1 (text/plain, inline)]
Tags: patch


The most significant slow component of "hg status" is parsing the
.hgignore file.  If we pass -mardc instead of -A to hg status, hg
doesn't list ignored or untracked files, so it skips parsing the
.hgignore.  On my large repo, this brings "hg status" from 140ms to
20ms.

For vc-hg-state, the distinction doesn't matter: nothing using the
output of vc-hg-state has significantly different behavior for ignored
files vs unregistered files:
- vc-dir-clean-files and vc-dir-recompute-file-state call vc-hg-state,
  but will never see an ignored file anyway since vc-dir shouldn't list
  ignored files for hg.
- vc-next-action checks 'ignored, but it's OK to take the
  'unregistered path instead; it will either fail when calling hg, or
  succeed.
- Other users of vc-state don't differ between 'ignored and
  'unregistered

* lisp/vc/vc-hg.el (vc-hg-state-slow): Treat ignored files as
unregistered.

In GNU Emacs 29.2.50 (build 6, x86_64-pc-linux-gnu, X toolkit, cairo
 version 1.15.12, Xaw scroll bars) of 2024-11-06 built on
 igm-qws-u22796a
Repository revision: 18ed746717c1c80e5cc9d9dc85b6e1f4013a1cec
Repository branch: emacs-29
Windowing system distributor 'The X.Org Foundation', version 11.0.12011000
System Description: Rocky Linux 8.10 (Green Obsidian)

Configured using:
 'configure --with-x-toolkit=lucid --without-gpm --without-gconf
 --without-selinux --without-imagemagick --with-modules --with-gif=no
 --with-tree-sitter --with-native-compilation=aot
 PKG_CONFIG_PATH=/usr/local/home/garnish/libtree-sitter/0.22.6-1/lib/pkgconfig/'

[0001-Speed-up-vc-hg-state-by-treating-ignored-files-as-un.patch (text/patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#74243; Package emacs. (Sun, 17 Nov 2024 01:04:02 GMT) Full text and rfc822 format available.

Message #8 received at 74243 <at> debbugs.gnu.org (full text, mbox):

From: Sean Whitton <spwhitton <at> spwhitton.name>
To: 74243 <at> debbugs.gnu.org
Cc: Spencer Baugh <sbaugh <at> janestreet.com>
Subject: Re: bug#74243: [PATCH] Speed up vc-hg-state by treating ignored
 files as unregistered
Date: Sun, 17 Nov 2024 09:03:30 +0800
Hello,

On Thu 07 Nov 2024 at 12:41pm -05, Spencer Baugh via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:

> The most significant slow component of "hg status" is parsing the
> .hgignore file.  If we pass -mardc instead of -A to hg status, hg
> doesn't list ignored or untracked files, so it skips parsing the
> .hgignore.  On my large repo, this brings "hg status" from 140ms to
> 20ms.

Thanks for investigating this.

> For vc-hg-state, the distinction doesn't matter: nothing using the
> output of vc-hg-state has significantly different behavior for ignored
> files vs unregistered files:
> - vc-dir-clean-files and vc-dir-recompute-file-state call vc-hg-state,
>   but will never see an ignored file anyway since vc-dir shouldn't list
>   ignored files for hg.
> - vc-next-action checks 'ignored, but it's OK to take the
>   'unregistered path instead; it will either fail when calling hg, or
>   succeed.
> - Other users of vc-state don't differ between 'ignored and
>   'unregistered

In vc-dir for a git repo, if I type 'G' on an unregistered file and then
'g' to refresh the view, the status label next to the unregistered file
changes to "ignored".  ISTM this is a nice feature that allows you to
confirm that the addition to .gitignore worked.

If that doesn't currently work for Hg, someone might want to implement
it at some point.  vc-state is a public function, so someone might well
be relying on it returning 'ignored, for some other purpose.

So, can we do this with a new optional argument to vc-state?  Like how
vc-deduce-fileset can provide more information if STATE-MODEL-ONLY-FILES
is non-nil.  It could be an optional argument that means to treat
'ignored and 'unregistered the same.  Or something similar.

It seems well-motivated to add an argument to the general vc-status
because it's an operation that can be slow in large repos regardless of
the backend.  Though we could start by just adding an optional argument
to vc-hg-status.

Or, less invasive would be a vc-hg--status-internal which does it.

-- 
Sean Whitton




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#74243; Package emacs. (Tue, 26 Nov 2024 07:53:02 GMT) Full text and rfc822 format available.

Message #11 received at 74243 <at> debbugs.gnu.org (full text, mbox):

From: Sean Whitton <spwhitton <at> spwhitton.name>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 74243 <at> debbugs.gnu.org
Subject: Re: bug#74243: [PATCH] Speed up vc-hg-state by treating ignored
 files as unregistered
Date: Tue, 26 Nov 2024 15:52:29 +0800
Hello,

On Tue 19 Nov 2024 at 08:28am -05, Spencer Baugh wrote:

> The original motivation of optimizing 'state is to speed up
> vc-refresh-state and vc-after-save, since they call vc-hg-state in
> find-file-hook and after-save-buffer, which adds very noticeable
> latency.  (I care less about the speed of vc-dir, which uses
> 'dir-status-files not 'state)
>
> So those functions would need to pass this argument.  But they (through
> vc-state-refresh) then store the returned state in 'vc-state in the VC
> per-file properties, so anyone accessing that will also see the effect
> of this argument.

I see what you mean.

Quoting vc-state,

| A return of nil from this function means we have no information on the
| status of this file.
| [...]
|  `ignored'          The file showed up in a dir-status listing with a flag
|                     indicating the version-control system is ignoring it,
|                     Note: This property is not set reliably (some VCSes
|                     don't have useful directory-status commands) so assume
|                     that any file with vc-state nil might be ignorable
|                     without VC knowing it.
|
|  `unregistered'     The file is not under version control."
|
|  ;; Note: we usually return nil here for unregistered files anyway
|  ;; when called with only one argument.  This doesn't seem to cause
|  ;; any problems.  But if we wanted to change that, we should
|  ;; probably opt for redefining the `registered' command to return
|  ;; non-nil even for unregistered files (maybe also rename it), and
|  ;; then make sure that all `state' implementations handle
|  ;; unregistered file appropriately.

(I think there's a mistake here: an ignored file is not a file "under
version control", so `unregistered' should say "not under version
control and not ignored".  Would you agree?)

Thanks for pointing out the involvement of find-file-hook and
after-save-hook.  The problem you describe is not at all Hg-specific:
vc-state gets called in a context where speed matters, but it's also the
primary entry point for any code that wants to know the state of a file,
some of which might care more about accuracy than speed.

To put it another way, the code assumes throughout that finding out the
file state will always be fast.  But it also assumes the information is
accurate if present.  This makes me queasy about your original patch.
It does not seem wise to return something we don't know to be true only
on the basis that it all works out fine for now.

The 'nil' return value might provide us with a way out, however.
Could we add an optional argument to vc-state that means "just return
nil if finding out the state properly might be slow"?
Could we make vc-after-save and the relevant find-file-hook entry pass
that option through, and do something sensible with a nil return value?

If they get nil, they would clear out the saved property, and possibly
update the mode line display to "????" or something.  Maybe we'd want a
user option (that could go in your large repo's .dir-locals.el, so it's
set-and-forget) to opt-in to not knowing the file state as often.

-- 
Sean Whitton




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#74243; Package emacs. (Tue, 26 Nov 2024 23:27:02 GMT) Full text and rfc822 format available.

Message #14 received at 74243 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Sean Whitton <spwhitton <at> spwhitton.name>,
 Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 74243 <at> debbugs.gnu.org
Subject: Re: bug#74243: [PATCH] Speed up vc-hg-state by treating ignored files
 as unregistered
Date: Wed, 27 Nov 2024 01:26:26 +0200
Hi, just to interject a little.

On 26/11/2024 09:52, Sean Whitton wrote:
> (I think there's a mistake here: an ignored file is not a file "under
> version control", so `unregistered' should say "not under version
> control and not ignored".  Would you agree?)
> 
> Thanks for pointing out the involvement of find-file-hook and
> after-save-hook.  The problem you describe is not at all Hg-specific:
> vc-state gets called in a context where speed matters, but it's also the
> primary entry point for any code that wants to know the state of a file,
> some of which might care more about accuracy than speed.
> 
> To put it another way, the code assumes throughout that finding out the
> file state will always be fast.  But it also assumes the information is
> accurate if present.  This makes me queasy about your original patch.
> It does not seem wise to return something we don't know to be true only
> on the basis that it all works out fine for now.

This FR reminds me of a similar change in vc-git-state that we ended up 
installing in bug#11757 (in 2012). Then stayed with it until 2017 when 
bug#19343 was filed and fixed (provided a recent enough Git is used) - 
see also the problem scenario described there.

So it seems both a reasonable change and ultimately not ideal. Depending 
on how many users we think might be affected by performance here.

> The 'nil' return value might provide us with a way out, however.
> Could we add an optional argument to vc-state that means "just return
> nil if finding out the state properly might be slow"?
> Could we make vc-after-save and the relevant find-file-hook entry pass
> that option through, and do something sensible with a nil return value?
> 
> If they get nil, they would clear out the saved property, and possibly
> update the mode line display to "????" or something.  Maybe we'd want a
> user option (that could go in your large repo's .dir-locals.el, so it's
> set-and-forget) to opt-in to not knowing the file state as often.

A user option seems like an easier choice.

Solutions that clear cache under some conditions or other tend to be 
more complex, and slow down at least some combined scenarios (e.g. one 
of my use cases is saving the buffer and having diff-hl-mode use its vc 
state from after-save-hook).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#74243; Package emacs. (Tue, 26 Nov 2024 23:33:01 GMT) Full text and rfc822 format available.

Message #17 received at 74243 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: 74243 <at> debbugs.gnu.org, Sean Whitton <spwhitton <at> spwhitton.name>
Subject: Re: bug#74243: [PATCH] Speed up vc-hg-state by treating ignored
 files as unregistered
Date: Tue, 26 Nov 2024 18:32:38 -0500
Dmitry Gutov <dmitry <at> gutov.dev> writes:
> Hi, just to interject a little.
>
> On 26/11/2024 09:52, Sean Whitton wrote:
>> (I think there's a mistake here: an ignored file is not a file "under
>> version control", so `unregistered' should say "not under version
>> control and not ignored".  Would you agree?)
>> Thanks for pointing out the involvement of find-file-hook and
>> after-save-hook.  The problem you describe is not at all Hg-specific:
>> vc-state gets called in a context where speed matters, but it's also the
>> primary entry point for any code that wants to know the state of a file,
>> some of which might care more about accuracy than speed.
>> To put it another way, the code assumes throughout that finding out
>> the
>> file state will always be fast.  But it also assumes the information is
>> accurate if present.  This makes me queasy about your original patch.
>> It does not seem wise to return something we don't know to be true only
>> on the basis that it all works out fine for now.
>
> This FR reminds me of a similar change in vc-git-state that we ended
> up installing in bug#11757 (in 2012). Then stayed with it until 2017
> when bug#19343 was filed and fixed (provided a recent enough Git is
> used) - see also the problem scenario described there.
>
> So it seems both a reasonable change and ultimately not
> ideal. Depending on how many users we think might be affected by
> performance here.
>
>> The 'nil' return value might provide us with a way out, however.
>> Could we add an optional argument to vc-state that means "just return
>> nil if finding out the state properly might be slow"?
>> Could we make vc-after-save and the relevant find-file-hook entry pass
>> that option through, and do something sensible with a nil return value?
>> If they get nil, they would clear out the saved property, and
>> possibly
>> update the mode line display to "????" or something.  Maybe we'd want a
>> user option (that could go in your large repo's .dir-locals.el, so it's
>> set-and-forget) to opt-in to not knowing the file state as often.
>
> A user option seems like an easier choice.
>
> Solutions that clear cache under some conditions or other tend to be
> more complex, and slow down at least some combined scenarios (e.g. one
> of my use cases is saving the buffer and having diff-hl-mode use its
> vc state from after-save-hook).

These are all reasonable concerns.

Since posting my original patch, though, I've heard from Hg developers
that they plan to eventually implement per-directory ignore files, like
in Git.  That would remove the original performance problem, so maybe
this is not so important.

That being said, it's still sad that the vc hooks in find-file-hook and
after-save-hook can hurt performance so much.

It seems to me that the ideal outcome would be to support asynchronicity
in those hooks.  That would benefit all vc backends... though this is
perhaps quite difficult.

Maybe asynchronously populating the saved vc-state property would work?
With some clever usage of nil return values as Sean describes...




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#74243; Package emacs. (Wed, 27 Nov 2024 00:19:02 GMT) Full text and rfc822 format available.

Message #20 received at 74243 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 74243 <at> debbugs.gnu.org, Sean Whitton <spwhitton <at> spwhitton.name>
Subject: Re: bug#74243: [PATCH] Speed up vc-hg-state by treating ignored files
 as unregistered
Date: Wed, 27 Nov 2024 02:18:39 +0200
On 27/11/2024 01:32, Spencer Baugh wrote:

>> A user option seems like an easier choice.
>>
>> Solutions that clear cache under some conditions or other tend to be
>> more complex, and slow down at least some combined scenarios (e.g. one
>> of my use cases is saving the buffer and having diff-hl-mode use its
>> vc state from after-save-hook).
> 
> These are all reasonable concerns.
> 
> Since posting my original patch, though, I've heard from Hg developers
> that they plan to eventually implement per-directory ignore files, like
> in Git.  That would remove the original performance problem, so maybe
> this is not so important.

That's good to hear.

> That being said, it's still sad that the vc hooks in find-file-hook and
> after-save-hook can hurt performance so much.

Understandable, that's what drove me to implement that older change in 
Git back then.

> It seems to me that the ideal outcome would be to support asynchronicity
> in those hooks.  That would benefit all vc backends... though this is
> perhaps quite difficult.

The most complex part could be retaining a compatible/synchronous API.

> Maybe asynchronously populating the saved vc-state property would work?
> With some clever usage of nil return values as Sean describes...

Maybe Sean's idea is better, but to spitball different options:

- FWIW since not too long ago we've treated a similar issue in diff-hl 
by using a thread - which calls the same code inside (meaning the 
current synchronous implementation), but it happens in the background, 
so the input is unfrozen and the visual update is asynchronous.

But keeping in mind that threads' error handling is not great, so it 
seems not optimal to keep a lot of implementation code inside a thread. 
Also, threads are reportedly not good with remote calls yet.

- The mode-line update isn't going to wait asynchronously, though, but 
perhaps an update could be scheduled. If state updates are not 
synchronous, I suppose this would also need some debouncing/queueing 
mechanism for the callers as well. That is the route of migrating to a 
different calling convention, though.

- Finally, if the main scenario that we are concerned is the use in 
vc-dir, we could try switching only its updates to another backend call. 
E.g. vc-dir-resynch-file would switch to the (possibly) more precise - 
though slower - dir-status-files, just like the code that first 
populates that buffer. vc-state could then afford shortcuts more safely.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#74243; Package emacs. (Fri, 29 Nov 2024 08:18:02 GMT) Full text and rfc822 format available.

Message #23 received at 74243 <at> debbugs.gnu.org (full text, mbox):

From: Sean Whitton <spwhitton <at> spwhitton.name>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: Spencer Baugh <sbaugh <at> janestreet.com>, 74243 <at> debbugs.gnu.org
Subject: Re: bug#74243: [PATCH] Speed up vc-hg-state by treating ignored
 files as unregistered
Date: Fri, 29 Nov 2024 16:17:33 +0800
Hello,

On Wed 27 Nov 2024 at 02:18am +02, Dmitry Gutov wrote:

> Maybe Sean's idea is better, but to spitball different options:
>
> - FWIW since not too long ago we've treated a similar issue in diff-hl by
>   using a thread - which calls the same code inside (meaning the current
>   synchronous implementation), but it happens in the background, so the input
>  is unfrozen and the visual update is asynchronous.
>
> But keeping in mind that threads' error handling is not great, so it seems not
> optimal to keep a lot of implementation code inside a thread. Also, threads
> are reportedly not good with remote calls yet.
>
> - The mode-line update isn't going to wait asynchronously, though, but perhaps
>   an update could be scheduled. If state updates are not synchronous, I
>   suppose this would also need some debouncing/queueing mechanism for the
>   callers as well. That is the route of migrating to a different calling
>  convention, though.

Thanks for these ideas.

Spencer, do you mind if I close this bug?  It's clear that we could be
doing something better here, but given the news from Hg upstream, we
probably don't want to make changes along the lines of your original
patch.

> - Finally, if the main scenario that we are concerned is the use in vc-dir, we
>   could try switching only its updates to another backend
>   call. E.g. vc-dir-resynch-file would switch to the (possibly) more precise -
>   though slower - dir-status-files, just like the code that first populates
>  that buffer. vc-state could then afford shortcuts more safely.

Just to note, my concern was that vc-state is a public function and we
don't know where it's being used, so vc-dir is not the main concern.

-- 
Sean Whitton




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#74243; Package emacs. (Fri, 29 Nov 2024 12:44:01 GMT) Full text and rfc822 format available.

Message #26 received at 74243 <at> debbugs.gnu.org (full text, mbox):

From: Spencer Baugh <sbaugh <at> janestreet.com>
To: Sean Whitton <spwhitton <at> spwhitton.name>
Cc: Dmitry Gutov <dmitry <at> gutov.dev>, 74243 <at> debbugs.gnu.org
Subject: Re: bug#74243: [PATCH] Speed up vc-hg-state by treating ignored files
 as unregistered
Date: Fri, 29 Nov 2024 07:43:09 -0500
[Message part 1 (text/plain, inline)]
On Fri, Nov 29, 2024, 3:17 AM Sean Whitton <spwhitton <at> spwhitton.name> wrote:

> Spencer, do you mind if I close this bug?  It's clear that we could be
> doing something better here, but given the news from Hg upstream, we
> probably don't want to make changes along the lines of your original
> patch.
>

Agreed, go ahead.

>
[Message part 2 (text/html, inline)]

Reply sent to Sean Whitton <spwhitton <at> spwhitton.name>:
You have taken responsibility. (Fri, 29 Nov 2024 23:46:02 GMT) Full text and rfc822 format available.

Notification sent to Spencer Baugh <sbaugh <at> janestreet.com>:
bug acknowledged by developer. (Fri, 29 Nov 2024 23:46:02 GMT) Full text and rfc822 format available.

Message #31 received at 74243-close <at> debbugs.gnu.org (full text, mbox):

From: Sean Whitton <spwhitton <at> spwhitton.name>
To: Spencer Baugh <sbaugh <at> janestreet.com>
Cc: 74243-close <at> debbugs.gnu.org, Dmitry Gutov <dmitry <at> gutov.dev>
Subject: Re: bug#74243: [PATCH] Speed up vc-hg-state by treating ignored
 files as unregistered
Date: Sat, 30 Nov 2024 07:45:27 +0800
Hello,

On Fri 29 Nov 2024 at 07:43am -05, Spencer Baugh via "Bug reports for GNU Emacs, the Swiss army knife of text editors" wrote:

> On Fri, Nov 29, 2024, 3:17 AM Sean Whitton <spwhitton <at> spwhitton.name> wrote:
>
>  Spencer, do you mind if I close this bug?  It's clear that we could be
>  doing something better here, but given the news from Hg upstream, we
>  probably don't want to make changes along the lines of your original
>  patch.
>
> Agreed, go ahead.

Coolio, thanks.

-- 
Sean Whitton




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 28 Dec 2024 12:24:10 GMT) Full text and rfc822 format available.

This bug report was last modified 176 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.