GNU bug report logs -
#76017
31.0.50; Which args can cl-nintersection and cl-nset-difference mutate?
Previous Next
Reported by: "Basil L. Contovounesios" <basil <at> contovou.net>
Date: Sun, 2 Feb 2025 17:49:02 UTC
Severity: minor
Found in version 31.0.50
Fixed in version 31.1
Done: "Basil L. Contovounesios" <basil <at> contovou.net>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 76017 in the body.
You can then email your comments to 76017 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
mattias.engdegard <at> gmail.com, monnier <at> iro.umontreal.ca, bug-gnu-emacs <at> gnu.org
:
bug#76017
; Package
emacs
.
(Sun, 02 Feb 2025 17:49:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"Basil L. Contovounesios" <basil <at> contovou.net>
:
New bug report received and forwarded. Copy sent to
mattias.engdegard <at> gmail.com, monnier <at> iro.umontreal.ca, bug-gnu-emacs <at> gnu.org
.
(Sun, 02 Feb 2025 17:49:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Since as far back as I can go in emacs.git history, the manual says:
-- Function: cl-nintersection list1 list2 &key :test :test-not :key
This is a destructive version of ‘cl-intersection’. It tries to
reuse storage of LIST1 rather than copying. It does _not_ reuse
the storage of LIST2.
-- Function: cl-nset-difference list1 list2 &key :test :test-not :key
This is a destructive ‘cl-set-difference’, which will try to reuse
LIST1 if possible.
whereas their docstrings say:
(cl-nintersection LIST1 LIST2 [KEYWORD VALUE]...)
Combine LIST1 and LIST2 using a set-intersection operation.
The resulting list contains all items that appear in both LIST1 and LIST2.
This is a destructive function; it reuses the storage of LIST1 and LIST2
whenever possible.
(cl-nset-difference LIST1 LIST2 [KEYWORD VALUE]...)
Combine LIST1 and LIST2 using a set-difference operation.
The resulting list contains all items that appear in LIST1 but not LIST2.
This is a destructive function; it reuses the storage of LIST1 and LIST2
whenever possible.
The mutates-arguments property added in
Byte-compiler warning about mutation of constant values
bfc07100d28 2023-05-13 11:53:25 +0200
https://git.sv.gnu.org/cgit/emacs.git/commit/?id=bfc07100d28
agrees with the docstrings, but CL docs I found online, and João's patch
in https://lists.gnu.org/r/emacs-devel/2023-11/msg00595.html agree with
the manual.
My arguments in favour of the manual:
- Documented for decades in both Emacs and CL docs.
- It is the weaker of the two requirements, thus backward-compatible.
- Set difference implementations do not naturally benefit from modifying
the second set (though I would love to see a counterexample).
- Callers of cl-nintersection having to ensure both arguments are safe
to mutate (e.g. through excessive copying) could diminish any
performance benefit it has over cl-intersection.
If there is agreement, I can follow up with a patch for our docs and
tests.
Thanks,
--
Basil
In GNU Emacs 31.0.50 (build 1, x86_64-pc-linux-gnu, X toolkit, cairo
version 1.18.2, Xaw3d scroll bars) of 2025-02-02 built on tais
Repository revision: c91c591f0f0cc774647c32bdcf05bb3a9551e340
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12101015
System Description: Debian GNU/Linux trixie/sid
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76017
; Package
emacs
.
(Sun, 02 Feb 2025 18:32:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 76017 <at> debbugs.gnu.org (full text, mbox):
"Basil L. Contovounesios" <basil <at> contovou.net> writes:
> Since as far back as I can go in emacs.git history, the manual says:
>
> -- Function: cl-nintersection list1 list2 &key :test :test-not :key
> This is a destructive version of ‘cl-intersection’. It tries to
> reuse storage of LIST1 rather than copying. It does _not_ reuse
> the storage of LIST2.
> whereas their docstrings say:
>
> (cl-nintersection LIST1 LIST2 [KEYWORD VALUE]...)
> Combine LIST1 and LIST2 using a set-intersection operation.
> The resulting list contains all items that appear in both LIST1 and LIST2.
> This is a destructive function; it reuses the storage of LIST1 and LIST2
> whenever possible.
On the third hand, the implementation is simply:
(defun cl-nintersection (cl-list1 cl-list2 &rest cl-keys)
"Combine LIST1 and LIST2 using a set-intersection operation.
The resulting list contains all items that appear in both LIST1 and LIST2.
This is a destructive function; it reuses the storage of LIST1 and LIST2
whenever possible.
\nKeywords supported: :test :test-not :key
\n(fn LIST1 LIST2 [KEYWORD VALUE]...)"
(and cl-list1 cl-list2 (apply 'cl-intersection cl-list1 cl-list2 cl-keys)))
So no storage is ever reused, right?
So is this about the optimizations that cl-nintersection might in theory
one day perform, but didn't do in 1993 and doesn't do today?
Or am I missing something there?
I think the situation is similar to plist-put: you're taught always to
reassign the return value to the variable you're using, but it only ever
actually changes if the original list was nil. We reserve the right to
switch to an optimized version but, so far, there was no need to.
> My arguments in favour of the manual:
> - Documented for decades in both Emacs and CL docs.
> - It is the weaker of the two requirements, thus backward-compatible.
"weaker" in the sense that it gives the Lisp hacker more options, and
the Lisp implementor fewer options. I think we should go for "simpler":
assume both arguments may be mutated, and don't worry about specifying
them in the "right" order.
> - Set difference implementations do not naturally benefit from modifying
> the second set (though I would love to see a counterexample).
I'm not sure I agree. If you've got a cheap population count, you start
with the smaller set and throw out unnecessary elements, returning that
set.
> - Callers of cl-nintersection having to ensure both arguments are safe
> to mutate (e.g. through excessive copying) could diminish any
> performance benefit it has over cl-intersection.
Which is 0, right?
> If there is agreement, I can follow up with a patch for our docs and
> tests.
Both the "tries to" and "whenever possible" seem simply incorrect to me,
but the byte compiler warning can stay, IMHO.
Pip
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76017
; Package
emacs
.
(Sun, 02 Feb 2025 19:14:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 76017 <at> debbugs.gnu.org (full text, mbox):
Pip Cet [2025-02-02 18:30 +0000] wrote:
> So no storage is ever reused, right?
Right.
> So is this about the optimizations that cl-nintersection might in theory
> one day perform, but didn't do in 1993 and doesn't do today?
Yes, but more importantly about fixing the inconsistent docs.
>> - It is the weaker of the two requirements, thus backward-compatible.
>
> "weaker" in the sense that it gives the Lisp hacker more options, and
> the Lisp implementor fewer options. I think we should go for "simpler":
> assume both arguments may be mutated, and don't worry about specifying
> them in the "right" order.
It may be 'simpler' by some measure but it's also less backward-compatible.
>> - Set difference implementations do not naturally benefit from modifying
>> the second set (though I would love to see a counterexample).
>
> I'm not sure I agree. If you've got a cheap population count, you start
> with the smaller set and throw out unnecessary elements, returning that
> set.
Not sure I understand how that works - could you give an example please?
Given that set difference is not commutative, I don't immediately see
how picking the smaller set can help. And since set difference is the
complement of the second set wrt the first, I don't immediately see how
any of the population (and thus storage) of the second set can survive
the operation (i.e. be reused to greater effect than by reusing the
first set's).
>> - Callers of cl-nintersection having to ensure both arguments are safe
>> to mutate (e.g. through excessive copying) could diminish any
>> performance benefit it has over cl-intersection.
>
> Which is 0, right?
Today it's actually worse than 0, since there's the funcall overhead in
the absence of inlining. Hence 'any [potential] performance benefit'.
Thanks,
--
Basil
Severity set to 'minor' from 'normal'
Request was from
Stefan Kangas <stefankangas <at> gmail.com>
to
control <at> debbugs.gnu.org
.
(Tue, 11 Feb 2025 07:14:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76017
; Package
emacs
.
(Thu, 13 Feb 2025 18:14:02 GMT)
Full text and
rfc822 format available.
Message #16 received at 76017 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Basil L. Contovounesios [2025-02-02 18:48 +0100] wrote:
> If there is agreement, I can follow up with a patch for our docs and
> tests.
I propose the attached. WDYT?
[ I'm not sure whether it applies cleanly as is; it builds on the
unrelated patch proposed in https://bugs.gnu.org/75633#22. ]
Thanks,
--
Basil
[0002-Document-cl-n.-set-operations-consistently.patch (text/x-diff, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76017
; Package
emacs
.
(Fri, 14 Feb 2025 11:04:01 GMT)
Full text and
rfc822 format available.
Message #19 received at 76017 <at> debbugs.gnu.org (full text, mbox):
13 feb. 2025 kl. 19.13 skrev Basil L. Contovounesios <basil <at> contovou.net>:
> I propose the attached.
Thank you, this should be fine. I was clearly over-cautious in labelling the second arguments to those functions as destructive in mutates-arguments, and agree that there is no significant advantage to preserving the right to mutate it (which we never really had in the first place).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76017
; Package
emacs
.
(Fri, 14 Feb 2025 11:14:02 GMT)
Full text and
rfc822 format available.
Message #22 received at 76017 <at> debbugs.gnu.org (full text, mbox):
Mattias Engdegård <mattias.engdegard <at> gmail.com> writes:
> 13 feb. 2025 kl. 19.13 skrev Basil L. Contovounesios <basil <at> contovou.net>:
>
>> I propose the attached.
>
> Thank you, this should be fine. I was clearly over-cautious in
> labelling the second arguments to those functions as destructive in
> mutates-arguments, and agree that there is no significant advantage to
> preserving the right to mutate it (which we never really had in the
> first place).
It seems to me the only possible advantage in the cl-nset-difference
case is you could sort (and possibly uniq) both lists in place and
traverse the sorted lists. But that would require many changes to the
current Emacs code, and I don't think it's going to happen. Similarly,
the "mutate the shorter list" thing for cl-intersection seems unlikely.
Maybe if feature/igc is merged, and becomes a common option, we can
formally make cl-nintersection an alias of cl-intersection, rather than
the currenet somewhat redundant wrapper.
As a followup, I would like to propose removing claims of cons cells
being reused "whenever possible", though: we reserve the right to
mutate one of the arguments, but we don't, so we shouldn't make such
claims. (And what's "whenever possible" supposed to mean, anyway? All
cons cells are equal!)
Pip
bug marked as fixed in version 31.1, send any further explanations to
76017 <at> debbugs.gnu.org and "Basil L. Contovounesios" <basil <at> contovou.net>
Request was from
"Basil L. Contovounesios" <basil <at> contovou.net>
to
control <at> debbugs.gnu.org
.
(Fri, 14 Feb 2025 14:49:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76017
; Package
emacs
.
(Fri, 14 Feb 2025 14:49:03 GMT)
Full text and
rfc822 format available.
Message #27 received at 76017-done <at> debbugs.gnu.org (full text, mbox):
close 76017 31.1
quit
Mattias Engdegård [2025-02-14 12:03 +0100] wrote:
> 13 feb. 2025 kl. 19.13 skrev Basil L. Contovounesios <basil <at> contovou.net>:
>
>> I propose the attached.
>
> Thank you, this should be fine. I was clearly over-cautious in labelling the
> second arguments to those functions as destructive in mutates-arguments, and
> agree that there is no significant advantage to preserving the right to mutate
> it (which we never really had in the first place).
Thanks, now applied.
Document cl-n... set operations consistently
ac143186c04 2025-02-14 15:42:52 +0100
https://git.sv.gnu.org/cgit/emacs.git/commit/?id=ac143186c04
--
Basil
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76017
; Package
emacs
.
(Fri, 14 Feb 2025 14:59:02 GMT)
Full text and
rfc822 format available.
Message #30 received at 76017 <at> debbugs.gnu.org (full text, mbox):
Pip Cet [2025-02-14 11:13 +0000] wrote:
> Maybe if feature/igc is merged, and becomes a common option, we can
> formally make cl-nintersection an alias of cl-intersection,
I haven't followed most of the feature/igc developments; how do they
affect the choice of whether cl-nintersection aliases cl-intersection?
> rather than
> the currenet somewhat redundant wrapper.
I would sooner attempt a destructive implementation before giving up;
maybe tomorrow™.
> As a followup, I would like to propose removing claims of cons cells
> being reused "whenever possible", though: we reserve the right to
> mutate one of the arguments, but we don't, so we shouldn't make such
> claims. (And what's "whenever possible" supposed to mean, anyway? All
> cons cells are equal!)
The right to mutate one of the arguments is the function's raison
d'être, and its users ought to be made aware of that before they choose
to use it. Today the implementation suffers from a clear performance
bug, but tomorrow that might be fixed, and at such time its already
informed users will automatically benefit from the fix.
Thanks,
--
Basil
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#76017
; Package
emacs
.
(Fri, 14 Feb 2025 15:22:02 GMT)
Full text and
rfc822 format available.
Message #33 received at 76017 <at> debbugs.gnu.org (full text, mbox):
"Basil L. Contovounesios" <basil <at> contovou.net> writes:
> Pip Cet [2025-02-14 11:13 +0000] wrote:
>
>> Maybe if feature/igc is merged, and becomes a common option, we can
>> formally make cl-nintersection an alias of cl-intersection,
>
> I haven't followed most of the feature/igc developments; how do they
> affect the choice of whether cl-nintersection aliases cl-intersection?
My hope is that an efficient GC greatly reduces the need for destructive
operations, but I'm not sure whether that's more of a dream :-)
>> As a followup, I would like to propose removing claims of cons cells
>> being reused "whenever possible", though: we reserve the right to
>> mutate one of the arguments, but we don't, so we shouldn't make such
>> claims. (And what's "whenever possible" supposed to mean, anyway? All
>> cons cells are equal!)
>
> The right to mutate one of the arguments is the function's raison
> d'être, and its users ought to be made aware of that before they choose
> to use it. Today the implementation suffers from a clear performance
> bug, but tomorrow that might be fixed, and at such time its already
> informed users will automatically benefit from the fix.
That sounds like a good way to fix it, "simply" adjust the function to
do what the docstring already claims it does!
Good luck!
Pip
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 15 Mar 2025 11:24:06 GMT)
Full text and
rfc822 format available.
This bug report was last modified 95 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.