GNU bug report logs -
#40704
28.0.50; Improve and speed up (Gnus) registry saving
Previous Next
To reply to this bug, email your comments to 40704 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Sun, 19 Apr 2020 02:15:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Michael Heerdegen <michael_heerdegen <at> web.de>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Sun, 19 Apr 2020 02:15:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
Saving the Gnus registry is quite slow currently. I profiled a bit and
for now suggest to do something like in the attached patch. In detail:
(1) We need to bind inhibit-modification-hooks -> t, this offers a good
speedup (~ 4 or so).
(2) Printing the registry which basically consists of huge hash tables,
causes a lot of garbage. Most of that garbage seems to be unavoidable
(is it created by the printing primitives?). Anyway, seems we should
temporarily increase `gc-cons-threshold' drastically, this offers
another speedup of 25% or so. The patch attached uses the value that
works well for me and the size of my registry, and I bind it in
`gnus-registry-save', because I assume other registries outside of Gnus
can be smaller. What would be a good value of `gc-cons-threshold', or
should it even scale with `gnus-registry-max-entries' instead of being
constant?
(3) I also decided to change `eieio-override-prin1' to print hash tables
"by hand" from Lisp. The eieio-persistent requires to modify how
elements in the hash tables are printed, and the current way of doing
this (make a copy of the complete table, change the elements, prin1 and
re-read the result) is not only hackish but also inefficient (it does
this recursively for nested tables).
Any comments on the suggested changes?
TIA,
Michael.
[0001-WIP-Try-to-improve-gnus-registry-saving.patch (text/x-diff, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Mon, 20 Apr 2020 04:14:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
> Hello,
>
> Saving the Gnus registry is quite slow currently. I profiled a bit and
> for now suggest to do something like in the attached patch. In detail:
>
> (1) We need to bind inhibit-modification-hooks -> t, this offers a good
> speedup (~ 4 or so).
>
> (2) Printing the registry which basically consists of huge hash tables,
> causes a lot of garbage. Most of that garbage seems to be unavoidable
> (is it created by the printing primitives?). Anyway, seems we should
> temporarily increase `gc-cons-threshold' drastically, this offers
> another speedup of 25% or so. The patch attached uses the value that
> works well for me and the size of my registry, and I bind it in
> `gnus-registry-save', because I assume other registries outside of Gnus
> can be smaller. What would be a good value of `gc-cons-threshold', or
> should it even scale with `gnus-registry-max-entries' instead of being
> constant?
>
> (3) I also decided to change `eieio-override-prin1' to print hash tables
> "by hand" from Lisp. The eieio-persistent requires to modify how
> elements in the hash tables are printed, and the current way of doing
> this (make a copy of the complete table, change the elements, prin1 and
> re-read the result) is not only hackish but also inefficient (it does
> this recursively for nested tables).
>
> Any comments on the suggested changes?
Not that it's up to me, but I'm all for putting in #1 and #3 as-is, and
adjusting #2 to scale with the number of `gnus-registry-max-entries',
with the addition of a hard ceiling.
Eric
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Mon, 20 Apr 2020 23:27:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Eric Abrahamsen <eric <at> ericabrahamsen.net> writes:
> > Any comments on the suggested changes?
>
> Not that it's up to me,
I wonder who it's up to...
> but I'm all for putting in #1 and #3 as-is, and adjusting #2 to scale
> with the number of `gnus-registry-max-entries', with the addition of a
> hard ceiling.
Too large values of `gnus-registry-max-entries' can be harmful
(fragmentation of memory?), I wonder if this can potentially be the case
here.
I'm also not sure if scaling is a good idea at all. The value I chose
will still make gc happen much less frequently even if your registry is
huge. So it's still a win. If your value is too large, the final
cleanup may take longer as necessary -- who has experience with this
problem -- anybody?
Thanks,
Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Tue, 21 Apr 2020 03:43:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
> Eric Abrahamsen <eric <at> ericabrahamsen.net> writes:
>
>> > Any comments on the suggested changes?
>>
>> Not that it's up to me,
>
> I wonder who it's up to...
There's sort of a `with-gnus-bug-report' macro that we're working with
here, where if Lars (or in this case, Ted Zlatanov) doesn't respond
within a certain period of time, the commit automatically goes in.
>> but I'm all for putting in #1 and #3 as-is, and adjusting #2 to scale
>> with the number of `gnus-registry-max-entries', with the addition of a
>> hard ceiling.
>
> Too large values of `gnus-registry-max-entries' can be harmful
> (fragmentation of memory?), I wonder if this can potentially be the case
> here.
Well that's where the hard ceiling comes in, right?
> I'm also not sure if scaling is a good idea at all. The value I chose
> will still make gc happen much less frequently even if your registry is
> huge. So it's still a win. If your value is too large, the final
> cleanup may take longer as necessary -- who has experience with this
> problem -- anybody?
Not me, let's hope for more respondents...
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Thu, 23 Apr 2020 01:33:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Eric Abrahamsen <eric <at> ericabrahamsen.net> writes:
> > -- who has experience with this problem -- anybody?
Maybe the scaling idea is not a good idea at all. Even when it works as
we hope - the ideal factor could still depend on what kind of data is
saved.
> Not me, let's hope for more respondents...
Did you try how the increased gc-limit behaves for you btw? Then we
would at least have two data points.
Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Thu, 23 Apr 2020 02:37:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
> Eric Abrahamsen <eric <at> ericabrahamsen.net> writes:
>
>> > -- who has experience with this problem -- anybody?
>
> Maybe the scaling idea is not a good idea at all. Even when it works as
> we hope - the ideal factor could still depend on what kind of data is
> saved.
>
>> Not me, let's hope for more respondents...
>
> Did you try how the increased gc-limit behaves for you btw? Then we
> would at least have two data points.
I applied the patch a couple of days ago and have been very pleased with
the speedup! Nothing "bad" has happened, but I haven't tried
experimenting with different gc limits, either.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Wed, 29 Apr 2020 16:12:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Eric Abrahamsen <eric <at> ericabrahamsen.net> writes:
> Michael Heerdegen <michael_heerdegen <at> web.de> writes:
>
>> Eric Abrahamsen <eric <at> ericabrahamsen.net> writes:
>>
>>> > -- who has experience with this problem -- anybody?
>>
>> Maybe the scaling idea is not a good idea at all. Even when it works as
>> we hope - the ideal factor could still depend on what kind of data is
>> saved.
>>
>>> Not me, let's hope for more respondents...
>>
>> Did you try how the increased gc-limit behaves for you btw? Then we
>> would at least have two data points.
>
> I applied the patch a couple of days ago and have been very pleased with
> the speedup! Nothing "bad" has happened, but I haven't tried
> experimenting with different gc limits, either.
Hey let's at least stick in the modification-hook and hash-table
improvements now, there are Gnus users out there still waiting for their
registry to save! It helps noticeably with EBDB, too.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Sun, 19 Jul 2020 02:17:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
> (3) I also decided to change `eieio-override-prin1' to print hash tables
> "by hand" from Lisp. The eieio-persistent requires to modify how
> elements in the hash tables are printed, and the current way of doing
> this (make a copy of the complete table, change the elements, prin1 and
> re-read the result) is not only hackish but also inefficient (it does
> this recursively for nested tables).
Yeah, the current implementation seems maximally inefficient... But I
guess this copies functionality found elsewhere?
> + (princ "#s(hash-table size ")
> + (prin1 (hash-table-size thing))
> + (princ " test ")
> + (prin1 (hash-table-test thing))
> + (princ " weakness ")
> + (prin1 (hash-table-weakness thing))
> + (princ " rehash-size ")
etc
So if the other printer changes, then this has to change, too? That
seems kinda brittle -- there should at least be references between the
two printers with a note to keep them updated if one of them changes.
> diff --git a/lisp/gnus/gnus-registry.el b/lisp/gnus/gnus-registry.el
> index 480ed80ef8..4ac3c84a80 100644
> --- a/lisp/gnus/gnus-registry.el
> +++ b/lisp/gnus/gnus-registry.el
> @@ -398,6 +398,7 @@ gnus-registry-save
> (interactive)
> (let* ((file (or file gnus-registry-cache-file))
> (db (or db gnus-registry-db))
> + (gc-cons-threshold (max gc-cons-threshold (* 800000 500)))
> (clone (clone db)))
> (gnus-message 5 "Saving Gnus registry (%d entries) to %s..."
> (registry-size db) file)
Could this have adverse consequences for people with low memory?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Sun, 19 Jul 2020 14:53:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> > + (princ "#s(hash-table size ")
> > + (prin1 (hash-table-size thing))
> > + (princ " test ")
> > + (prin1 (hash-table-test thing))
> > + (princ " weakness ")
> > + (prin1 (hash-table-weakness thing))
> > + (princ " rehash-size ")
>
> etc
>
> So if the other printer changes, then this has to change, too? That
> seems kinda brittle -- there should at least be references between the
> two printers with a note to keep them updated if one of them changes.
What do you mean, "other printer"? The Lisp printer?
This read syntax is officially described in the Elisp manual:
(info "(elisp) Creating Hash")
(near the end of the page), so I would expect that the syntax will be
supported in the future.
> > diff --git a/lisp/gnus/gnus-registry.el b/lisp/gnus/gnus-registry.el
> > index 480ed80ef8..4ac3c84a80 100644
> > --- a/lisp/gnus/gnus-registry.el
> > +++ b/lisp/gnus/gnus-registry.el
> > @@ -398,6 +398,7 @@ gnus-registry-save
> > (interactive)
> > (let* ((file (or file gnus-registry-cache-file))
> > (db (or db gnus-registry-db))
> > + (gc-cons-threshold (max gc-cons-threshold (* 800000 500)))
> > (clone (clone db)))
> > (gnus-message 5 "Saving Gnus registry (%d entries) to %s..."
> > (registry-size db) file)
>
> Could this have adverse consequences for people with low memory?
These are 400 MB... Could be? Dunno. I wonder, though, if when you
would hit that limit when that code runs, your computer can hold that
huge hash-table at all. I don't know the relation between hash-table
size and corresponding amount of garbage. But I guess if you are low on
memory using the registry is problematic per se.
Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Sun, 19 Jul 2020 14:59:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
>> So if the other printer changes, then this has to change, too? That
>> seems kinda brittle -- there should at least be references between the
>> two printers with a note to keep them updated if one of them changes.
>
> What do you mean, "other printer"? The Lisp printer?
>
> This read syntax is officially described in the Elisp manual:
>
> (info "(elisp) Creating Hash")
>
> (near the end of the page), so I would expect that the syntax will be
> supported in the future.
Presumably there's already code to print a hash table somewhere in
Emacs? That would be the other printer, since your patch adds a new one.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Sun, 19 Jul 2020 15:20:02 GMT)
Full text and
rfc822 format available.
Message #35 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> Presumably there's already code to print a hash table somewhere in
> Emacs? That would be the other printer, since your patch adds a new
> one.
That's what the code currently already does. But: See what I wrote
about the purpose of the patch: our saving code needs to modify how
elements in the hash table are printed. The Lisp printer doesn't allow
that per se. My suggested patch substitutes a very inefficient hack
with a more efficient hack. The current code uses printing + re-reading
AFAIR - and that on nested hash tables.
I thought long about removing the need for this hack altogehter, but
this is very complicated, if not impossible (yes, you would need to
change the printer).
Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Sun, 19 Jul 2020 15:23:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
> Lars Ingebrigtsen <larsi <at> gnus.org> writes:
>
>> Presumably there's already code to print a hash table somewhere in
>> Emacs? That would be the other printer, since your patch adds a new
>> one.
>
> That's what the code currently already does. But: See what I wrote
> about the purpose of the patch: our saving code needs to modify how
> elements in the hash table are printed. The Lisp printer doesn't allow
> that per se. My suggested patch substitutes a very inefficient hack
> with a more efficient hack. The current code uses printing + re-reading
> AFAIR - and that on nested hash tables.
>
> I thought long about removing the need for this hack altogehter, but
> this is very complicated, if not impossible (yes, you would need to
> change the printer).
I didn't object to this new, more efficient printer -- I just pointed
out that you should add comments to both of the hash table printers that
there are (now) two of them.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Sun, 19 Jul 2020 15:24:01 GMT)
Full text and
rfc822 format available.
Message #41 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
> > > + (gc-cons-threshold (max gc-cons-threshold (* 800000 500)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Maybe it would be wise to factor this out into a defvar so that it's in
reach of users. The value could be a function whose return value could
also depend on the actual size of the registry.
Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Sun, 19 Jul 2020 15:33:02 GMT)
Full text and
rfc822 format available.
Message #44 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
> Michael Heerdegen <michael_heerdegen <at> web.de> writes:
>
>> > > + (gc-cons-threshold (max gc-cons-threshold (* 800000 500)))
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Maybe it would be wise to factor this out into a defvar so that it's in
> reach of users. The value could be a function whose return value could
> also depend on the actual size of the registry.
Sure; that sounds good.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Thu, 01 Oct 2020 18:24:01 GMT)
Full text and
rfc822 format available.
Message #47 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
> (3) I also decided to change `eieio-override-prin1' to print hash tables
> "by hand" from Lisp. The eieio-persistent requires to modify how
> elements in the hash tables are printed, and the current way of doing
> this (make a copy of the complete table, change the elements, prin1 and
> re-read the result) is not only hackish but also inefficient (it does
> this recursively for nested tables).
[...]
> + (princ "#s(hash-table size ")
> + (prin1 (hash-table-size thing))
> + (princ " test ")
> + (prin1 (hash-table-test thing))
> + (princ " weakness ")
> + (prin1 (hash-table-weakness thing))
> + (princ " rehash-size ")
> + (prin1 (hash-table-rehash-size thing))
> + (princ " rehash-threshold ")
> + (prin1 (hash-table-rehash-threshold thing))
> + (princ " data (")
I'm still not enthusiastic about duplicating the hash printing here.
Whenever print_vectorlike changes, this has to be changed, too.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Thu, 01 Oct 2020 23:54:01 GMT)
Full text and
rfc822 format available.
Message #50 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Lars Ingebrigtsen <larsi <at> gnus.org> writes:
> I'm still not enthusiastic about duplicating the hash printing here.
> Whenever print_vectorlike changes, this has to be changed, too.
Nothing existing fits the needs here, since printing with prin1 can't be
controlled as much as we need. I'm ok with leaving the code as is for
now, since both alternatives seem equally bad to me.
Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Fri, 02 Oct 2020 01:39:02 GMT)
Full text and
rfc822 format available.
Message #53 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
> Lars Ingebrigtsen <larsi <at> gnus.org> writes:
>
>> I'm still not enthusiastic about duplicating the hash printing here.
>> Whenever print_vectorlike changes, this has to be changed, too.
>
> Nothing existing fits the needs here, since printing with prin1 can't be
> controlled as much as we need. I'm ok with leaving the code as is for
> now, since both alternatives seem equally bad to me.
I was wondering whether we could factor out the hash table bits from
print_vectorlike and reuse them here, somehow...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#40704
; Package
emacs
.
(Sun, 17 Apr 2022 15:22:02 GMT)
Full text and
rfc822 format available.
Message #56 received at 40704 <at> debbugs.gnu.org (full text, mbox):
Michael Heerdegen <michael_heerdegen <at> web.de> writes:
> Any comments on the suggested changes?
This was almost two years ago, and I had some reservations, but the
speedups are significant, so I think you should just go ahead and push.
> + (gc-cons-threshold (max gc-cons-threshold (* 800000 500)))
Except this bit -- I don't think we should rebind gc-cons-threshold --
it should be up to the user (and may have adverse effects in
memory-constrained setups).
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
Added tag(s) patch.
Request was from
Lars Ingebrigtsen <larsi <at> gnus.org>
to
control <at> debbugs.gnu.org
.
(Sun, 17 Apr 2022 15:23:01 GMT)
Full text and
rfc822 format available.
This bug report was last modified 3 years and 120 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.