GNU bug report logs -
#69709
`sort` interface improvement and universal ordering predicate
Previous Next
To reply to this bug, email your comments to 69709 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sun, 10 Mar 2024 13:29:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Mattias Engdegård <mattias.engdegard <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Sun, 10 Mar 2024 13:29:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
The existing `sort` interface suffers from some usability problems:
1. Writing an ordering predicate is often difficult and error-prone, even for very basic tasks such as selecting the field to sort on. It's not uncommon to botch a predicate as simple as
(lambda (a b) (< (f a) (f b)))
which I've managed to do myself more than once. It gets particularly messy when sorting on multiple fields.
Having to write a custom comparison function for almost every occasion also means that performance suffers.
2. Mutability by default is a bug magnet as well.
To deal with the first problem, we could:
* Add a universal ordering predicate that will compare two values of the same type for many built-in types: numbers, strings, symbols, markers, lists, vectors, records, and a few more.
* Make this ordering the default.
* Add a key (accessor) function argument, like that in the recent `sort-on` interface, but built-in. This is important.
These work very well together: the user does not need to write or even choose an ordering predicate in most cases. Key functions are much less error-prone to write, and with the lexicographic ordering of lists, vectors and records, multi-key sorting is made much easier.
A key function combined with a standard ordering can also be used to optimise comparisons since we have all key values up front along with how they should be compared. The original timsort code that we stole from Python did this.
As a starting point, a patch implementing a universal ordering predicate is attached below.
The proposed sorting function interface would be
(new-sort seq &key key lessp destructive)
because the keyword interface is easier to read and write than a lengthening list of optional positional parameters, and can be extended more gracefully. For example, it could be handy to have a `reversed` (or `descending`) parameter. The parsing cost is not significant.
Instead of inventing a new and rather meaningless function name, I suggest we re-use `sort` and allow both
(sort seq lessp) ; old-style
(sort seq &key key lessp destructive) ; new-style
since they are easy to distinguish, and let `destructive` default to false in new-style calls, true in the old style.
[0001-value-less-p.patch (application/octet-stream, attachment)]
[Message part 3 (text/plain, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sun, 10 Mar 2024 14:13:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 69709 <at> debbugs.gnu.org (full text, mbox):
> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Sun, 10 Mar 2024 14:28:02 +0100
>
> The proposed sorting function interface would be
>
> (new-sort seq &key key lessp destructive)
A nit: let's go with a name that doesn't have "new" as part of it.
Something like "lsort" or "xsort" or somesuch. (I don't suggest
"nsort" because 'n' as the first character has a special meaning in
Emacs Lisp, so I'd like to avoid confusion.)
> because the keyword interface is easier to read and write than a lengthening list of optional positional parameters, and can be extended more gracefully. For example, it could be handy to have a `reversed` (or `descending`) parameter. The parsing cost is not significant.
>
> Instead of inventing a new and rather meaningless function name, I suggest we re-use `sort` and allow both
>
> (sort seq lessp) ; old-style
> (sort seq &key key lessp destructive) ; new-style
>
> since they are easy to distinguish, and let `destructive` default to false in new-style calls, true in the old style.
Do you intend to present an implementation that replaces sort-on as
well?
And what about performance?
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sun, 10 Mar 2024 14:59:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 69709 <at> debbugs.gnu.org (full text, mbox):
10 mars 2024 kl. 15.09 skrev Eli Zaretskii <eliz <at> gnu.org>:
> A nit: let's go with a name that doesn't have "new" as part of it.
Sorry, that was meant as a placeholder. I should have used a more obvious non-name.
> Do you intend to present an implementation that replaces sort-on as
> well?
Yes, `sort-on` would be completely subsumed by the new interface.
> And what about performance?
Existing code should see little or no change. Users of the new interface would probably see performance improvements. You should trust me on neither of these points until there is actual code.
However the performance of the universal predicate seems fine, and is definitely better than hand-writing a predicate in Lisp when applicable.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sun, 10 Mar 2024 15:50:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 69709 <at> debbugs.gnu.org (full text, mbox):
On 10/03/2024 15:28, Mattias Engdegård wrote:
> 2. Mutability by default is a bug magnet as well.
>
> To deal with the first problem, we could:
>
> * Add a universal ordering predicate that will compare two values of the
> same type for many built-in types: numbers, strings, symbols, markers,
> lists, vectors, records, and a few more.
> * Make this ordering the default.
> * Add a key (accessor) function argument, like that in the recent
> `sort-on` interface, but built-in. This is important.
>
> These work very well together: the user does not need to write or even
> choose an ordering predicate in most cases. Key functions are much less
> error-prone to write, and with the lexicographic ordering of lists,
> vectors and records, multi-key sorting is made much easier.
>
> A key function combined with a standard ordering can also be used to
> optimise comparisons since we have all key values up front along with
> how they should be compared. The original timsort code that we stole
> from Python did this.
>
> As a starting point, a patch implementing a universal ordering predicate
> is attached below.
>
> The proposed sorting function interface would be
>
> (new-sort seq &key key lessp destructive)
Here's a concern: a lot of existing code is written with either
mutability in mind (the source list is a throwaway one, owned by the
caller), or coupled with copy-sequence already.
If the new 'sort' default to non-destructive, wouldn't that make those
existing callsites slower? Or at least some of them.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sun, 10 Mar 2024 15:59:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 69709 <at> debbugs.gnu.org (full text, mbox):
10 mars 2024 kl. 16.48 skrev Dmitry Gutov <dmitry <at> gutov.dev>:
> Here's a concern: a lot of existing code is written with either mutability in mind (the source list is a throwaway one, owned by the caller), or coupled with copy-sequence already.
>
> If the new 'sort' default to non-destructive, wouldn't that make those existing callsites slower? Or at least some of them.
No, with the old calling convention they would get destructive (in-place) sorting. There should be no incompatibilities at all.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sun, 10 Mar 2024 16:05:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 69709 <at> debbugs.gnu.org (full text, mbox):
On 10/03/2024 17:56, Mattias Engdegård wrote:
> 10 mars 2024 kl. 16.48 skrev Dmitry Gutov<dmitry <at> gutov.dev>:
>
>> Here's a concern: a lot of existing code is written with either mutability in mind (the source list is a throwaway one, owned by the caller), or coupled with copy-sequence already.
>>
>> If the new 'sort' default to non-destructive, wouldn't that make those existing callsites slower? Or at least some of them.
> No, with the old calling convention they would get destructive (in-place) sorting. There should be no incompatibilities at all.
I see. That sounds more confusing, though (you drop the 'pred' argument,
and the function changes behavior, possibly becoming slower too).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sun, 10 Mar 2024 16:49:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 69709 <at> debbugs.gnu.org (full text, mbox):
10 mars 2024 kl. 17.03 skrev Dmitry Gutov <dmitry <at> gutov.dev>:
> That sounds more confusing, though (you drop the 'pred' argument, and the function changes behavior, possibly becoming slower too).
Is it really surprising that behaviour changes if you remove a previously-mandatory argument from a call without reading the documentation? It seems unlikely that this would happen by accident.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sun, 10 Mar 2024 16:57:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 69709 <at> debbugs.gnu.org (full text, mbox):
> Cc: 69709 <at> debbugs.gnu.org
> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Sun, 10 Mar 2024 17:46:59 +0100
>
> 10 mars 2024 kl. 17.03 skrev Dmitry Gutov <dmitry <at> gutov.dev>:
>
> > That sounds more confusing, though (you drop the 'pred' argument, and the function changes behavior, possibly becoming slower too).
>
> Is it really surprising that behaviour changes if you remove a previously-mandatory argument from a call without reading the documentation? It seems unlikely that this would happen by accident.
I think we should delay this discussion until we see the first draft
of the code and the documentation to go with it. As long as the issue
is in Mattias's sights, I'm sure the result will be acceptable, to say
the least.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sun, 10 Mar 2024 17:55:01 GMT)
Full text and
rfc822 format available.
Message #29 received at 69709 <at> debbugs.gnu.org (full text, mbox):
On 10/03/2024 18:55, Eli Zaretskii wrote:
>> Cc:69709 <at> debbugs.gnu.org
>> From: Mattias Engdegård<mattias.engdegard <at> gmail.com>
>> Date: Sun, 10 Mar 2024 17:46:59 +0100
>>
>> 10 mars 2024 kl. 17.03 skrev Dmitry Gutov<dmitry <at> gutov.dev>:
>>
>>> That sounds more confusing, though (you drop the 'pred' argument, and the function changes behavior, possibly becoming slower too).
>> Is it really surprising that behaviour changes if you remove a previously-mandatory argument from a call without reading the documentation? It seems unlikely that this would happen by accident.
> I think we should delay this discussion until we see the first draft
> of the code and the documentation to go with it. As long as the issue
> is in Mattias's sights, I'm sure the result will be acceptable, to say
> the least.
I do think it's problematic, but it's not a deal-breaker, so if everyone
else is fine with it, let's proceed.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Mon, 11 Mar 2024 07:03:01 GMT)
Full text and
rfc822 format available.
Message #32 received at 69709 <at> debbugs.gnu.org (full text, mbox):
Mattias Engdegård <mattias.engdegard <at> gmail.com> writes:
> Instead of inventing a new and rather meaningless function name, I
> suggest we re-use `sort` and allow both
>
> (sort seq lessp) ; old-style
> (sort seq &key key lessp destructive) ; new-style
FWIW, I wouldn't like overloading the name 'sort', because it feels like
magic happening. A new name would make things explicit - no magic.
Otherwise, I can't say much. The destructiveness of sort/stable-sort has
actually never been a problem for me. It has instead often been just
right, when consing a sequence and sorting it in the end.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Wed, 20 Mar 2024 19:11:02 GMT)
Full text and
rfc822 format available.
Message #35 received at 69709 <at> debbugs.gnu.org (full text, mbox):
Now the origin/scratch/sort-key branch contains a draft proposal. Summary:
* Our timsort has now the key function handling from the original code (ported to Emacs).
* New keyword-based calling convention for `sort`. The old one is still there and works as before.
* New `value-less-p` universal ordering predicate.
* No manual updates yet.
* NEWS entries are there.
* `sort-on` is now completely superfluous (slower, less convenient) and should be removed.
* Performance seems fine from initial tests. More comprehensive benchmarking will be done.
Some things that I haven't made up my mind about:
* Better name for `value-less-p`:
value<
{generic,universal,standard,lisp}{-less-p,<}
* Maybe the :destructive keyword be called :inplace or :in-place instead? Shorter, less violent.
* Should the :reverse keyword be called :reversed or even :descending ?
* Internally, `value-less-p` computes a 3-way result; it would be easy to expose that to Lisp as `value-compare`, could be useful.
* The design space for `value-less-p` is vast: the current code is an attempt at intuitive semantics without too much complexity. There are also good (and bad) arguments for:
- heterogeneous comparisons (compare objects of different types)
- total order on all values
- strict agreement with `equal`
- automatic call-out to user-defined comparison functions for classes that have them
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Wed, 20 Mar 2024 19:42:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 69709 <at> debbugs.gnu.org (full text, mbox):
> * Better name for `value-less-p`:
> value<
> {generic,universal,standard,lisp}{-less-p,<}
^^
,
:-)
IOW, I vote for `<`!
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Thu, 21 Mar 2024 14:56:02 GMT)
Full text and
rfc822 format available.
Message #41 received at 69709 <at> debbugs.gnu.org (full text, mbox):
Hi,
Mattias Engdegård <mattias.engdegard <at> gmail.com> writes:
> Now the origin/scratch/sort-key branch contains a draft proposal. Summary:
>
> * Our timsort has now the key function handling from the original code (ported to Emacs).
> * New keyword-based calling convention for `sort`. The old one is still there and works as before.
> * New `value-less-p` universal ordering predicate.
> * No manual updates yet.
> * NEWS entries are there.
> * `sort-on` is now completely superfluous (slower, less convenient) and should be removed.
In case it hasn't already been mentioned, AFAICT this also applies to
`minibuffer--sort-by-key` (which is very similar to `sort-on`).
> * Performance seems fine from initial tests. More comprehensive benchmarking will be done.
>
> Some things that I haven't made up my mind about:
>
> * Better name for `value-less-p`:
> value<
> {generic,universal,standard,lisp}{-less-p,<}
>
> * Maybe the :destructive keyword be called :inplace or :in-place instead? Shorter, less violent.
>
> * Should the :reverse keyword be called :reversed or even :descending ?
>
> * Internally, `value-less-p` computes a 3-way result; it would be easy
> to expose that to Lisp as `value-compare`, could be useful.
>
> * The design space for `value-less-p` is vast: the current code is an
> attempt at intuitive semantics without too much complexity.
Perhaps the "standard order" of Prolog terms programs could be an
interesting reference:
https://www.swi-prolog.org/pldoc/man?section=standardorder
Best,
Eshel
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Thu, 21 Mar 2024 14:58:02 GMT)
Full text and
rfc822 format available.
Message #44 received at 69709 <at> debbugs.gnu.org (full text, mbox):
20 mars 2024 kl. 20.37 skrev Stefan Monnier <monnier <at> iro.umontreal.ca>:
> IOW, I vote for `<`!
That may actually be within the realms of possibility. Some snags:
* Not sure if code using `<` and expect a type error for non-numbers would be confused (probably not)
* Making `value<` n-ary would slow it down a bit, unclear how much
* `value<` compares markers by buffer then position, `<` by position only
* `<` can compare markers with numbers
* We expect consistency and shared code for all of < > <= >= = /= but it's unclear how `value<` would be generalised in that direction (at least if we care about NaN correctness)
Currently, `value<` treats NaN as if it were equal to any number, which is terrible and useless but consistent with `<`. Other options would include order NaN distinct from any number, perhaps just consider all NaNs to be equal and put them before or after all numbers.
`value<` is actually quite a bit faster than `<` which has a terribly cumbersome implementation, except for pairs of fixnums which apparently got special-cased in `<` and the other inequalities fairly recently.
By the way, I ran some more benchmarks: it turns out that `car-less-than-car`, the secret shortcut that you just had to know, now gives roughly the same speed as :key #'car which is a lot more discoverable (and works for many other types).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Fri, 22 Mar 2024 20:57:02 GMT)
Full text and
rfc822 format available.
Message #47 received at 69709 <at> debbugs.gnu.org (full text, mbox):
On 20/03/2024 21:01, Mattias Engdegård wrote:
> * Maybe the :destructive keyword be called :inplace or :in-place instead? Shorter, less violent.
If the function default to destructive, perhaps the argument could be
called :copy.
:in-place is not too bad. But I've looked around and don't see many
sorting functions in other stdlibs that do it non-destructively.
Even 'sort' in Clojure might mutate:
https://clojuredocs.org/clojure.core/sort
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sat, 23 Mar 2024 15:01:01 GMT)
Full text and
rfc822 format available.
Message #50 received at 69709 <at> debbugs.gnu.org (full text, mbox):
22 mars 2024 kl. 21.55 skrev Dmitry Gutov <dmitry <at> gutov.dev>:
> :in-place is not too bad.
Thank you, I'm going with :in-place because :destructive puts emphasis on the wrong aspect. Sorting in-place has predictable and useful side-effects, in contrast to the old (pre-timsort) implementation that garbled the input list.
But non-destructive should definitely be the default. All my own experience (and from observations, that of other people) shows that it's far less error-prone. This applies to other languages as well, even very imperative ones like Python.
The branch scratch/sort-key has been updated with polished changes, including updates of the Lisp manual.
`value-less-p` is now called `value<`. (We could still unify it with `<`, perhaps.)
A small tweak to the implementation of non-destructive list sorting gave a speed-up of 1.1x-1.5x, which was surprisingly good. The old code just called copy-sequence on the input.
An even bigger boost was gained from special-casing the ordering predicate `value<`: 1.5x-2x speed-up on practically all input. This alone could be worth all the trouble with the patch series. We could do even better by special-casing on common key types, such as fixnums.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sat, 23 Mar 2024 17:41:02 GMT)
Full text and
rfc822 format available.
Message #53 received at 69709 <at> debbugs.gnu.org (full text, mbox):
On 23/03/2024 16:58, Mattias Engdegård wrote:
> 22 mars 2024 kl. 21.55 skrev Dmitry Gutov <dmitry <at> gutov.dev>:
>
>> :in-place is not too bad.
>
> Thank you, I'm going with :in-place because :destructive puts emphasis on the wrong aspect. Sorting in-place has predictable and useful side-effects, in contrast to the old (pre-timsort) implementation that garbled the input list.
Okay.
> But non-destructive should definitely be the default. All my own experience (and from observations, that of other people) shows that it's far less error-prone. This applies to other languages as well, even very imperative ones like Python.
My concern is that it will make people write less performant code by
default. Which will degrade unnecessarily on larger inputs (often not
anticipated by the author in advance).
So others will have to go around each such project, ensure the original
list is "owned" and add the :in-place argument, to reach the optimum
performance. That's why, I think, 'sort' is mutating in most languages.
I suppose an extra copy-sequence is not that expensive, compared to most
things. It's still extra garbage (meaning GC pauses sooner or later).
Maybe it'll become less important with a better GC algorithm.
> The branch scratch/sort-key has been updated with polished changes, including updates of the Lisp manual.
> `value-less-p` is now called `value<`. (We could still unify it with `<`, perhaps.)
>
> A small tweak to the implementation of non-destructive list sorting gave a speed-up of 1.1x-1.5x, which was surprisingly good. The old code just called copy-sequence on the input.
>
> An even bigger boost was gained from special-casing the ordering predicate `value<`: 1.5x-2x speed-up on practically all input. This alone could be worth all the trouble with the patch series. We could do even better by special-casing on common key types, such as fixnums.
Very nice.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sat, 23 Mar 2024 20:11:02 GMT)
Full text and
rfc822 format available.
Message #56 received at 69709 <at> debbugs.gnu.org (full text, mbox):
> So others will have to go around each such project, ensure the original list
> is "owned" and add the :in-place argument, to reach the optimum
> performance. That's why, I think, 'sort' is mutating in most languages.
I think it's mutating in "most languages" because "most languages" are
imperative and because those primitives usually operate on arrays (a
"naturally imperative" data structure) rather than on singly-linked
lists (a "naturally more immutable" data structure).
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sat, 23 Mar 2024 23:21:01 GMT)
Full text and
rfc822 format available.
Message #59 received at 69709 <at> debbugs.gnu.org (full text, mbox):
On 23/03/2024 22:09, Stefan Monnier wrote:
>> So others will have to go around each such project, ensure the original list
>> is "owned" and add the :in-place argument, to reach the optimum
>> performance. That's why, I think, 'sort' is mutating in most languages.
> I think it's mutating in "most languages" because "most languages" are
> imperative and because those primitives usually operate on arrays (a
> "naturally imperative" data structure) rather than on singly-linked
> lists (a "naturally more immutable" data structure).
There aren't many general purpose languages around which use cons cells
and have 'sort' in stdlib for them, to compare.
To get back to the example of Clojure, which touts immutability most of
the time, the 'sort' routine first copies a sequence into an array
(apparently for performance - fewer indirections and better locality
when subsequently swapping values around), and then returns that array
with a thin wrapper called ArraySeq (which it 1 extra object, not N).
https://github.com/clojure/clojure/blob/ce55092f2b2f5481d25cff6205470c1335760ef6/src/clj/clojure/core.clj#L3103-L3118
IIUC we allocate an array internally as well during the sorting phrase,
but then we can't return it as-is or with a thin wrapper (if only
because the user expects back a list, not a vector), so we'll need to
allocate a whole new list to avoid mutating the original. And our GC is
worse than JVM's.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sat, 23 Mar 2024 23:27:02 GMT)
Full text and
rfc822 format available.
Message #62 received at 69709 <at> debbugs.gnu.org (full text, mbox):
> IIUC we allocate an array internally as well during the sorting phrase, but
> then we can't return it as-is or with a thin wrapper (if only because the
> user expects back a list, not a vector), so we'll need to allocate a whole
> new list to avoid mutating the original. And our GC is worse than JVM's.
When it matters, people can use `:in-place`.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Mon, 25 Mar 2024 16:38:01 GMT)
Full text and
rfc822 format available.
Message #65 received at 69709 <at> debbugs.gnu.org (full text, mbox):
24 mars 2024 kl. 00.25 skrev Stefan Monnier <monnier <at> iro.umontreal.ca>:
> When it matters, people can use `:in-place`.
Indeed, we follow the policy of being safe by default with a faster, more dangerous alternative on explicit request: `reverse` vs `nreverse`, for instance.
I'm quite happy with the result overall: safer, much easier to use, and faster, all at the same time. Many of the uses of `sort` in the Emacs tree would have benefitted a lot from the new design, especially the hand-written lambda monstrosities that try to implement multi-key comparisons.
It should also be quite feasible to supply a working (but slower) all-Lisp implementation for the `compat` package; I have some basic proof-of-concept code.
The plan is to add scratch/sort-key to master in a few days if no serious issues turn up. There are a few minor adjustments to the documentation that haven't been pushed yet.
Oh, and the embarrassing code in `value<` for comparing bool-vectors has now been sped up ~150x in my tree, if you were worried about that.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Fri, 29 Mar 2024 11:00:02 GMT)
Full text and
rfc822 format available.
Message #68 received at 69709 <at> debbugs.gnu.org (full text, mbox):
25 mars 2024 kl. 12.11 skrev Mattias Engdegård <mattias.engdegard <at> gmail.com>:
> The plan is to add scratch/sort-key to master in a few days if no serious issues turn up.
Plan executed.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Fri, 29 Mar 2024 11:39:02 GMT)
Full text and
rfc822 format available.
Message #71 received at 69709 <at> debbugs.gnu.org (full text, mbox):
Mattias Engdegård <mattias.engdegard <at> gmail.com> writes:
> 25 mars 2024 kl. 12.11 skrev Mattias Engdegård <mattias.engdegard <at> gmail.com>:
>
>> The plan is to add scratch/sort-key to master in a few days if no serious issues turn up.
>
> Plan executed.
Thank you for this excellent improvement of the sort function! I would
like to back-port the new feature in the Compat library, specifically in
the emacs-30 branch of the Compat repository. I've seen you mentioned
some pure Lisp code, implementing the new feature, in a slower, but
backward compatible way, which we could perhaps use?
Daniel
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Fri, 29 Mar 2024 11:53:01 GMT)
Full text and
rfc822 format available.
Message #74 received at 69709 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
29 mars 2024 kl. 12.38 skrev Daniel Mendler <mail <at> daniel-mendler.de>:
> I would
> like to back-port the new feature in the Compat library, specifically in
> the emacs-30 branch of the Compat repository. I've seen you mentioned
> some pure Lisp code, implementing the new feature, in a slower, but
> backward compatible way, which we could perhaps use?
Here is my proof-of-concept hack that maybe you could use as starting point. It's incomplete but I think you can fill in the blanks. You could use the tests in fns-tests.el, perhaps adapted to fit. Let us know how it goes.
[compatsort.el (application/octet-stream, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Fri, 29 Mar 2024 12:07:02 GMT)
Full text and
rfc822 format available.
Message #77 received at 69709 <at> debbugs.gnu.org (full text, mbox):
> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Fri, 29 Mar 2024 11:59:39 +0100
> Cc: Dmitry Gutov <dmitry <at> gutov.dev>,
> Eli Zaretskii <eliz <at> gnu.org>,
> 69709 <at> debbugs.gnu.org,
> Gerd Möllmann <gerd.moellmann <at> gmail.com>
>
> 25 mars 2024 kl. 12.11 skrev Mattias Engdegård <mattias.engdegard <at> gmail.com>:
>
> > The plan is to add scratch/sort-key to master in a few days if no serious issues turn up.
>
> Plan executed.
Thanks.
I have a few comments/questions about value<:
. The description of value< says "lexicographic order" about some
types, but I think that is not clear enough, and we should tell
explicitly how the comparison works in those cases. AFAIU, the
corresponding elements are compared by recursively calling value<
for them? And what happens if one has more elements than the other?
These are all questions that popped up when I read the new text,
and I think they should be answered in the text.
. AFAICT, no ordering is defined for overlays, and I wonder why. I
think this could be very useful; we certainly sort overlays in C in
several places.
. Various fine details about value< are never mentioned: the fact
that there's a limit to recursion, the special treatment of nil in
some cases, etc. I think we should document them, at least in the
doc string if not in the manual as well.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Fri, 29 Mar 2024 15:03:02 GMT)
Full text and
rfc822 format available.
Message #80 received at 69709 <at> debbugs.gnu.org (full text, mbox):
29 mars 2024 kl. 13.06 skrev Eli Zaretskii <eliz <at> gnu.org>:
> . The description of value< says "lexicographic order" about some
> types, but I think that is not clear enough, and we should tell
> explicitly how the comparison works in those cases.
That's a good point. Since it is standard terminology I will explain it in the manual but leave the doc string as it is -- this is our usual practice, right?
> . AFAICT, no ordering is defined for overlays, and I wonder why. I
> think this could be very useful; we certainly sort overlays in C in
> several places.
Yes, the plan was to add ordering for more types as we figure out which ones are missing. Basically, if a type has a useful ordering that is well-defined, we should probably add it.
That's one reason why I didn't include overlays from the beginning: I couldn't easily find one obvious ordering that would make intuitive sense, and I'd rather leave them unordered than define something useless. Code that sorts overlays uses a variety of criteria: priority, starting position, application-specific properties, and so on.
> . Various fine details about value< are never mentioned: the fact
> that there's a limit to recursion, the special treatment of nil in
> some cases, etc. I think we should document them, at least in the
> doc string if not in the manual as well.
Right -- the depth limit is very much like that of `equal` which is documented in the manual but not the doc string; I'll probably do the same for `value<`. `nil` deserves a note about its dual nature (symbol and list) as well.
Thank you for the markup fixes, by the way. One question:
> -@table @asis
> +@table @code
I used @asis because that is what the entries for some other functions using key-value arguments used, like `make-process`. Clearly the keyword should be marked up as @code, but should it encompass the @var{argument} part as well? Or should we use @asis and then explicit @code in each @item?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Fri, 29 Mar 2024 15:36:02 GMT)
Full text and
rfc822 format available.
Message #83 received at 69709 <at> debbugs.gnu.org (full text, mbox):
> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Fri, 29 Mar 2024 16:02:01 +0100
> Cc: monnier <at> iro.umontreal.ca,
> dmitry <at> gutov.dev,
> 69709 <at> debbugs.gnu.org,
> gerd.moellmann <at> gmail.com
>
> 29 mars 2024 kl. 13.06 skrev Eli Zaretskii <eliz <at> gnu.org>:
>
> > . The description of value< says "lexicographic order" about some
> > types, but I think that is not clear enough, and we should tell
> > explicitly how the comparison works in those cases.
>
> That's a good point. Since it is standard terminology I will explain it in the manual but leave the doc string as it is -- this is our usual practice, right?
That'd be fine, thanks.
> > . AFAICT, no ordering is defined for overlays, and I wonder why. I
> > think this could be very useful; we certainly sort overlays in C in
> > several places.
>
> Yes, the plan was to add ordering for more types as we figure out which ones are missing. Basically, if a type has a useful ordering that is well-defined, we should probably add it.
>
> That's one reason why I didn't include overlays from the beginning: I couldn't easily find one obvious ordering that would make intuitive sense, and I'd rather leave them unordered than define something useless. Code that sorts overlays uses a variety of criteria: priority, starting position, application-specific properties, and so on.
The canonical sorting order for overlays is in compare_overlays,
AFAIK.
> Thank you for the markup fixes, by the way. One question:
>
> > -@table @asis
> > +@table @code
>
> I used @asis because that is what the entries for some other functions using key-value arguments used, like `make-process`. Clearly the keyword should be marked up as @code, but should it encompass the @var{argument} part as well? Or should we use @asis and then explicit @code in each @item?
Makeinfo can handle @code{@var{..}} just fine, we have that in many
other places, so that's not a concern. It is indeed possible to use
@asis in @table, and then mark each @item with @code, but that's just
too much trouble; "@table @code" exists to save us that...
I think the places where keywords don't have the @code markup are
simply wrong. This should be most evident in the printed format of
the manual, where @code produces a distinct typeface, but doesn't have
the quotes around it. Maybe you could produce the PDF manual and see
the difference by yourself.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Fri, 29 Mar 2024 16:14:02 GMT)
Full text and
rfc822 format available.
Message #86 received at 69709 <at> debbugs.gnu.org (full text, mbox):
29 mars 2024 kl. 16.35 skrev Eli Zaretskii <eliz <at> gnu.org>:
> The canonical sorting order for overlays is in compare_overlays,
> AFAIK.
Maybe, but I haven't see much Lisp code trying to use an order like that -- sorting overlays is common but is done with all kinds of orders. Perhaps I haven't looked very carefully.
In any case, maybe compare_overlays would work as a starting point. It seems to focus on vertical order (by priority), but much Lisp code sorts overlays horizontally (by position).
> Makeinfo can handle @code{@var{..}} just fine, we have that in many
> other places, so that's not a concern. It is indeed possible to use
> @asis in @table, and then mark each @item with @code, but that's just
> too much trouble; "@table @code" exists to save us that...
Looking at the PDF makes me think that we should decide whether we prefer
@code{:keyword @var{variable}}
or
@code{:keyword} @var{variable}
in general. I can think of arguments for either but consistency would be even better.
> I think the places where keywords don't have the @code markup are
> simply wrong.
Yes, I agree.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Fri, 29 Mar 2024 18:10:01 GMT)
Full text and
rfc822 format available.
Message #89 received at 69709 <at> debbugs.gnu.org (full text, mbox):
> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Fri, 29 Mar 2024 17:13:33 +0100
> Cc: monnier <at> iro.umontreal.ca,
> dmitry <at> gutov.dev,
> 69709 <at> debbugs.gnu.org,
> gerd.moellmann <at> gmail.com
>
> 29 mars 2024 kl. 16.35 skrev Eli Zaretskii <eliz <at> gnu.org>:
>
> > Makeinfo can handle @code{@var{..}} just fine, we have that in many
> > other places, so that's not a concern. It is indeed possible to use
> > @asis in @table, and then mark each @item with @code, but that's just
> > too much trouble; "@table @code" exists to save us that...
>
> Looking at the PDF makes me think that we should decide whether we prefer
>
> @code{:keyword @var{variable}}
>
> or
>
> @code{:keyword} @var{variable}
>
> in general. I can think of arguments for either but consistency would be even better.
The former is the correct one.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sun, 14 Apr 2024 15:21:05 GMT)
Full text and
rfc822 format available.
Message #92 received at 69709 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Thank you for your excellent work on `sort` and related functionality!
Unfortunately, the new `sort` implementation occasionally crashes with a
segfault. The following code reproduces that in current master:
(dotimes (i 500)
(sort (make-list 128 42)
:key (lambda (n) (make-list i n))))
It happens for inputs of length >= `MERGESTATE_TEMP_SIZE / 2` (= 128
currently) along with a non-NIL `:key` function. In such cases, a
`Lisp_Object` array is explicitly heap-allocated to store the keys, which is
never marked against GC. This would not be a problem if not for the fact
that
the `:key` function call may trigger GC.
I'm attaching a patch with a proposed solution, consisting of three changes:
1. Allocate with `xzalloc` to have the new array initialized to `Qnil`. This
ensures its objects can be marked properly.
2. Mark `ms->allocated_keys` in `merge_markmem`. We don't need to check that
`ms->allocated_keys != NULL`, as `merge_register_cleanup` is only called
under this exact condition.
3. Move the computation of keys (which may trigger a GC) after `merge_init`,
which registers `merge_markmem`.
I hope this is useful.
Cheers,
Aris
[Message part 2 (text/html, inline)]
[sort-segfault-fix.patch (text/x-patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sun, 14 Apr 2024 16:27:04 GMT)
Full text and
rfc822 format available.
Message #95 received at 69709 <at> debbugs.gnu.org (full text, mbox):
> Cc: 69709 <at> debbugs.gnu.org
> From: Aris Spathis <agspathis <at> gmail.com>
> Date: Sun, 14 Apr 2024 17:03:11 +0300
>
> Thank you for your excellent work on `sort` and related functionality!
>
> Unfortunately, the new `sort` implementation occasionally crashes with a
> segfault. The following code reproduces that in current master:
>
> (dotimes (i 500)
> (sort (make-list 128 42)
> :key (lambda (n) (make-list i n))))
>
> It happens for inputs of length >= `MERGESTATE_TEMP_SIZE / 2` (= 128
> currently) along with a non-NIL `:key` function. In such cases, a
> `Lisp_Object` array is explicitly heap-allocated to store the keys, which is
> never marked against GC. This would not be a problem if not for the fact that
> the `:key` function call may trigger GC.
>
> I'm attaching a patch with a proposed solution, consisting of three changes:
>
> 1. Allocate with `xzalloc` to have the new array initialized to `Qnil`. This
> ensures its objects can be marked properly.
>
> 2. Mark `ms->allocated_keys` in `merge_markmem`. We don't need to check that
> `ms->allocated_keys != NULL`, as `merge_register_cleanup` is only called
> under this exact condition.
>
> 3. Move the computation of keys (which may trigger a GC) after `merge_init`,
> which registers `merge_markmem`.
>
> I hope this is useful.
Thanks, I took the liberty of CC'ing Mattias using his "usual"
address, in the hope that it will help bring this to his attention
sooner.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Sun, 14 Apr 2024 16:34:03 GMT)
Full text and
rfc822 format available.
Message #98 received at 69709 <at> debbugs.gnu.org (full text, mbox):
14 apr. 2024 kl. 18.26 skrev Eli Zaretskii <eliz <at> gnu.org>:
>> Thank you for your excellent work on `sort` and related functionality!
Apparently not excellent enough!
>> Unfortunately, the new `sort` implementation occasionally crashes with a
>> segfault. The following code reproduces that in current master:
>>
>> (dotimes (i 500)
>> (sort (make-list 128 42)
>> :key (lambda (n) (make-list i n))))
>>
>> It happens for inputs of length >= `MERGESTATE_TEMP_SIZE / 2` (= 128
>> currently) along with a non-NIL `:key` function. In such cases, a
>> `Lisp_Object` array is explicitly heap-allocated to store the keys, which is
>> never marked against GC. This would not be a problem if not for the fact that
>> the `:key` function call may trigger GC.
Oh dear, what a silly mistake. Many thanks for spotting this and providing the patch, which I applied with a few minor adjustments (and a regression test).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Fri, 17 May 2024 12:33:01 GMT)
Full text and
rfc822 format available.
Message #101 received at 69709 <at> debbugs.gnu.org (full text, mbox):
Mattias Engdegård <mattias.engdegard <at> gmail.com> writes:
> 29 mars 2024 kl. 12.38 skrev Daniel Mendler <mail <at> daniel-mendler.de>:
>
>> I would
>> like to back-port the new feature in the Compat library, specifically in
>> the emacs-30 branch of the Compat repository. I've seen you mentioned
>> some pure Lisp code, implementing the new feature, in a slower, but
>> backward compatible way, which we could perhaps use?
>
> Here is my proof-of-concept hack that maybe you could use as starting point.
> It's incomplete but I think you can fill in the blanks. You could use the tests
> in fns-tests.el, perhaps adapted to fit. Let us know how it goes.
Just letting you know that I've implemented value< and sort in the
emacs-30 branch of Compat. It works well so far, but value< is not yet
completely implemented. See the following commit:
https://github.com/emacs-compat/compat/commit/8190769d9eb9258dd8361bd322d90228dc586770
There is one thing I'd like to ask about value<. Would it make sense to
support comparing mixed types, e.g., numbers and markers or strings and
symbols? In particular I often like to mix numbers and markers, in
particular in cases where we can avoid creating a marker for performance
reasons if the pure position is sufficient and the location doesn't
change due to edits.
Daniel
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#69709
; Package
emacs
.
(Fri, 17 May 2024 17:52:01 GMT)
Full text and
rfc822 format available.
Message #104 received at 69709 <at> debbugs.gnu.org (full text, mbox):
17 maj 2024 kl. 14.29 skrev Daniel Mendler <mail <at> daniel-mendler.de>:
> Just letting you know that I've implemented value< and sort in the
> emacs-30 branch of Compat. It works well so far, but value< is not yet
> completely implemented. See the following commit:
> https://github.com/emacs-compat/compat/commit/8190769d9eb9258dd8361bd322d90228dc586770
An excellent start! I'll post comments on the source site.
> There is one thing I'd like to ask about value<. Would it make sense to
> support comparing mixed types, e.g., numbers and markers or strings and
> symbols?
I went back and forth quite a bit, but decided to start small homogeneous comparisons which would at least give the option to extend to heterogeneous (or in your case, ad-hoc mixed) comparisons later on, without committing too much to any particular design until we have more experience.
When it comes to mixing numbers and markers, I'm still leaning against it. `<` allows mixed comparisons but it doesn't work well with markers from different buffers. `value<` orders markers by grouping them by buffer which feels more useful.
(I've never been a great fan of the way elisp allows comparison and arithmetic on markers and numbers; it probably seemed to be a good idea at the time. That feature seems to have caused more muddled than clear code.)
Same with strings and symbols: I can't remember having seen a collection where I would have wanted them to be sorted together, and `nil` being a symbol means that we would lose a useful error check.
There is a more tempting case for a truly heterogeneous comparisons, like a universal total order. Some other languages have this but I think it is more difficult to construct with so much legacy. I went with unifying all numbers for pragmatic reasons even though -0.0 and NaN are terrible.
This bug report was last modified 1 year and 89 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.