GNU bug report logs - #69709
`sort` interface improvement and universal ordering predicate

Package: emacs;

Reported by: Mattias Engdegård <mattias.engdegard <at> gmail.com>

Date: Sun, 10 Mar 2024 13:29:02 UTC

Severity: normal

To reply to this bug, email your comments to 69709 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sun, 10 Mar 2024 13:29:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Mattias Engdegård <mattias.engdegard <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 10 Mar 2024 13:29:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Emacs Bug Report <bug-gnu-emacs <at> gnu.org>
Subject: `sort` interface improvement and universal ordering predicate
Date: Sun, 10 Mar 2024 14:28:02 +0100

[Message part 1 (text/plain, inline)]

The existing `sort` interface suffers from some usability problems:

1. Writing an ordering predicate is often difficult and error-prone, even for very basic tasks such as selecting the field to sort on. It's not uncommon to botch a predicate as simple as

  (lambda (a b) (< (f a) (f b)))

which I've managed to do myself more than once. It gets particularly messy when sorting on multiple fields.
Having to write a custom comparison function for almost every occasion also means that performance suffers.

2. Mutability by default is a bug magnet as well.

To deal with the first problem, we could:

* Add a universal ordering predicate that will compare two values of the same type for many built-in types: numbers, strings, symbols, markers, lists, vectors, records, and a few more.
* Make this ordering the default.
* Add a key (accessor) function argument, like that in the recent `sort-on` interface, but built-in. This is important.

These work very well together: the user does not need to write or even choose an ordering predicate in most cases. Key functions are much less error-prone to write, and with the lexicographic ordering of lists, vectors and records, multi-key sorting is made much easier.

A key function combined with a standard ordering can also be used to optimise comparisons since we have all key values up front along with how they should be compared. The original timsort code that we stole from Python did this.

As a starting point, a patch implementing a universal ordering predicate is attached below.

The proposed sorting function interface would be

  (new-sort seq &key key lessp destructive)

because the keyword interface is easier to read and write than a lengthening list of optional positional parameters, and can be extended more gracefully. For example, it could be handy to have a `reversed` (or `descending`) parameter. The parsing cost is not significant.

Instead of inventing a new and rather meaningless function name, I suggest we re-use `sort` and allow both

  (sort seq lessp)                       ; old-style
  (sort seq &key key lessp destructive)  ; new-style

since they are easy to distinguish, and let `destructive` default to false in new-style calls, true in the old style.

[0001-value-less-p.patch (application/octet-stream, attachment)]

[Message part 3 (text/plain, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sun, 10 Mar 2024 14:13:02 GMT) Full text and rfc822 format available.

Message #8 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 69709 <at> debbugs.gnu.org
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sun, 10 Mar 2024 16:09:32 +0200

> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Sun, 10 Mar 2024 14:28:02 +0100
> 
> The proposed sorting function interface would be
> 
>   (new-sort seq &key key lessp destructive)

A nit: let's go with a name that doesn't have "new" as part of it.
Something like "lsort" or "xsort" or somesuch.  (I don't suggest
"nsort" because 'n' as the first character has a special meaning in
Emacs Lisp, so I'd like to avoid confusion.)

> because the keyword interface is easier to read and write than a lengthening list of optional positional parameters, and can be extended more gracefully. For example, it could be handy to have a `reversed` (or `descending`) parameter. The parsing cost is not significant.
> 
> Instead of inventing a new and rather meaningless function name, I suggest we re-use `sort` and allow both
> 
>   (sort seq lessp)                       ; old-style
>   (sort seq &key key lessp destructive)  ; new-style
> 
> since they are easy to distinguish, and let `destructive` default to false in new-style calls, true in the old style.

Do you intend to present an implementation that replaces sort-on as
well?

And what about performance?

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sun, 10 Mar 2024 14:59:02 GMT) Full text and rfc822 format available.

Message #11 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 69709 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sun, 10 Mar 2024 15:56:23 +0100

10 mars 2024 kl. 15.09 skrev Eli Zaretskii <eliz <at> gnu.org>:

> A nit: let's go with a name that doesn't have "new" as part of it.

Sorry, that was meant as a placeholder. I should have used a more obvious non-name.

> Do you intend to present an implementation that replaces sort-on as
> well?

Yes, `sort-on` would be completely subsumed by the new interface.

> And what about performance?

Existing code should see little or no change. Users of the new interface would probably see performance improvements. You should trust me on neither of these points until there is actual code.

However the performance of the universal predicate seems fine, and is definitely better than hand-writing a predicate in Lisp when applicable.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sun, 10 Mar 2024 15:50:02 GMT) Full text and rfc822 format available.

Message #14 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>,
 69709 <at> debbugs.gnu.org
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sun, 10 Mar 2024 17:48:34 +0200

On 10/03/2024 15:28, Mattias Engdegård wrote:
> 2. Mutability by default is a bug magnet as well.
> 
> To deal with the first problem, we could:
> 
> * Add a universal ordering predicate that will compare two values of the
>   same type for many built-in types: numbers, strings, symbols, markers,
> lists, vectors, records, and a few more.
> * Make this ordering the default.
> * Add a key (accessor) function argument, like that in the recent
> `sort-on` interface, but built-in. This is important.
> 
> These work very well together: the user does not need to write or even
> choose an ordering predicate in most cases. Key functions are much less
> error-prone to write, and with the lexicographic ordering of lists,
> vectors and records, multi-key sorting is made much easier.
> 
> A key function combined with a standard ordering can also be used to
> optimise comparisons since we have all key values up front along with
> how they should be compared. The original timsort code that we stole
> from Python did this.
> 
> As a starting point, a patch implementing a universal ordering predicate
>   is attached below.
> 
> The proposed sorting function interface would be
> 
>    (new-sort seq &key key lessp destructive)

Here's a concern: a lot of existing code is written with either 
mutability in mind (the source list is a throwaway one, owned by the 
caller), or coupled with copy-sequence already.

If the new 'sort' default to non-destructive, wouldn't that make those 
existing callsites slower? Or at least some of them.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sun, 10 Mar 2024 15:59:02 GMT) Full text and rfc822 format available.

Message #17 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: 69709 <at> debbugs.gnu.org
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sun, 10 Mar 2024 16:56:37 +0100

10 mars 2024 kl. 16.48 skrev Dmitry Gutov <dmitry <at> gutov.dev>:

> Here's a concern: a lot of existing code is written with either mutability in mind (the source list is a throwaway one, owned by the caller), or coupled with copy-sequence already.
> 
> If the new 'sort' default to non-destructive, wouldn't that make those existing callsites slower? Or at least some of them.

No, with the old calling convention they would get destructive (in-place) sorting. There should be no incompatibilities at all.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sun, 10 Mar 2024 16:05:02 GMT) Full text and rfc822 format available.

Message #20 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 69709 <at> debbugs.gnu.org
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sun, 10 Mar 2024 18:03:31 +0200

On 10/03/2024 17:56, Mattias Engdegård wrote:
> 10 mars 2024 kl. 16.48 skrev Dmitry Gutov<dmitry <at> gutov.dev>:
> 
>> Here's a concern: a lot of existing code is written with either mutability in mind (the source list is a throwaway one, owned by the caller), or coupled with copy-sequence already.
>>
>> If the new 'sort' default to non-destructive, wouldn't that make those existing callsites slower? Or at least some of them.
> No, with the old calling convention they would get destructive (in-place) sorting. There should be no incompatibilities at all.

I see. That sounds more confusing, though (you drop the 'pred' argument, 
and the function changes behavior, possibly becoming slower too).

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sun, 10 Mar 2024 16:49:01 GMT) Full text and rfc822 format available.

Message #23 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: 69709 <at> debbugs.gnu.org
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sun, 10 Mar 2024 17:46:59 +0100

10 mars 2024 kl. 17.03 skrev Dmitry Gutov <dmitry <at> gutov.dev>:

> That sounds more confusing, though (you drop the 'pred' argument, and the function changes behavior, possibly becoming slower too).

Is it really surprising that behaviour changes if you remove a previously-mandatory argument from a call without reading the documentation? It seems unlikely that this would happen by accident.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sun, 10 Mar 2024 16:57:02 GMT) Full text and rfc822 format available.

Message #26 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 69709 <at> debbugs.gnu.org, dmitry <at> gutov.dev
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sun, 10 Mar 2024 18:55:29 +0200

> Cc: 69709 <at> debbugs.gnu.org
> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Sun, 10 Mar 2024 17:46:59 +0100
> 
> 10 mars 2024 kl. 17.03 skrev Dmitry Gutov <dmitry <at> gutov.dev>:
> 
> > That sounds more confusing, though (you drop the 'pred' argument, and the function changes behavior, possibly becoming slower too).
> 
> Is it really surprising that behaviour changes if you remove a previously-mandatory argument from a call without reading the documentation? It seems unlikely that this would happen by accident.

I think we should delay this discussion until we see the first draft
of the code and the documentation to go with it.  As long as the issue
is in Mattias's sights, I'm sure the result will be acceptable, to say
the least.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sun, 10 Mar 2024 17:55:01 GMT) Full text and rfc822 format available.

Message #29 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Eli Zaretskii <eliz <at> gnu.org>,
 Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 69709 <at> debbugs.gnu.org
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sun, 10 Mar 2024 19:54:16 +0200

On 10/03/2024 18:55, Eli Zaretskii wrote:
>> Cc:69709 <at> debbugs.gnu.org
>> From: Mattias Engdegård<mattias.engdegard <at> gmail.com>
>> Date: Sun, 10 Mar 2024 17:46:59 +0100
>>
>> 10 mars 2024 kl. 17.03 skrev Dmitry Gutov<dmitry <at> gutov.dev>:
>>
>>> That sounds more confusing, though (you drop the 'pred' argument, and the function changes behavior, possibly becoming slower too).
>> Is it really surprising that behaviour changes if you remove a previously-mandatory argument from a call without reading the documentation? It seems unlikely that this would happen by accident.
> I think we should delay this discussion until we see the first draft
> of the code and the documentation to go with it.  As long as the issue
> is in Mattias's sights, I'm sure the result will be acceptable, to say
> the least.

I do think it's problematic, but it's not a deal-breaker, so if everyone 
else is fine with it, let's proceed.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Mon, 11 Mar 2024 07:03:01 GMT) Full text and rfc822 format available.

Message #32 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Gerd Möllmann <gerd.moellmann <at> gmail.com>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 69709 <at> debbugs.gnu.org
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Mon, 11 Mar 2024 08:01:03 +0100

Mattias Engdegård <mattias.engdegard <at> gmail.com> writes:

> Instead of inventing a new and rather meaningless function name, I
> suggest we re-use `sort` and allow both
>
>   (sort seq lessp)                       ; old-style
>   (sort seq &key key lessp destructive)  ; new-style

FWIW, I wouldn't like overloading the name 'sort', because it feels like
magic happening. A new name would make things explicit - no magic.

Otherwise, I can't say much. The destructiveness of sort/stable-sort has
actually never been a problem for me. It has instead often been just
right, when consing a sequence and sorting it in the end.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Wed, 20 Mar 2024 19:11:02 GMT) Full text and rfc822 format available.

Message #35 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 69709 <at> debbugs.gnu.org,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, Dmitry Gutov <dmitry <at> gutov.dev>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Wed, 20 Mar 2024 20:01:39 +0100

Now the origin/scratch/sort-key branch contains a draft proposal. Summary:

* Our timsort has now the key function handling from the original code (ported to Emacs).
* New keyword-based calling convention for `sort`. The old one is still there and works as before.
* New `value-less-p` universal ordering predicate.
* No manual updates yet.
* NEWS entries are there.
* `sort-on` is now completely superfluous (slower, less convenient) and should be removed.
* Performance seems fine from initial tests. More comprehensive benchmarking will be done.

Some things that I haven't made up my mind about:

* Better name for `value-less-p`:
    value<
    {generic,universal,standard,lisp}{-less-p,<}

* Maybe the :destructive keyword be called :inplace or :in-place instead? Shorter, less violent.

* Should the :reverse keyword be called :reversed or even :descending ?

* Internally, `value-less-p` computes a 3-way result; it would be easy to expose that to Lisp as `value-compare`, could be useful.

* The design space for `value-less-p` is vast: the current code is an attempt at intuitive semantics without too much complexity. There are also good (and bad) arguments for:

- heterogeneous comparisons (compare objects of different types)
- total order on all values
- strict agreement with `equal`
- automatic call-out to user-defined comparison functions for classes that have them

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Wed, 20 Mar 2024 19:42:02 GMT) Full text and rfc822 format available.

Message #38 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 69709 <at> debbugs.gnu.org,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Eli Zaretskii <eliz <at> gnu.org>, Dmitry Gutov <dmitry <at> gutov.dev>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Wed, 20 Mar 2024 15:37:59 -0400

> * Better name for `value-less-p`:
>     value<
>     {generic,universal,standard,lisp}{-less-p,<}
                                     ^^
                                     ,
:-)

IOW, I vote for `<`!


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Thu, 21 Mar 2024 14:56:02 GMT) Full text and rfc822 format available.

Message #41 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Eshel Yaron <me <at> eshelyaron.com>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 69709 <at> debbugs.gnu.org,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Dmitry Gutov <dmitry <at> gutov.dev>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Thu, 21 Mar 2024 15:54:31 +0100

Hi,

Mattias Engdegård <mattias.engdegard <at> gmail.com> writes:

> Now the origin/scratch/sort-key branch contains a draft proposal. Summary:
>
> * Our timsort has now the key function handling from the original code (ported to Emacs).
> * New keyword-based calling convention for `sort`. The old one is still there and works as before.
> * New `value-less-p` universal ordering predicate.
> * No manual updates yet.
> * NEWS entries are there.
> * `sort-on` is now completely superfluous (slower, less convenient) and should be removed.

In case it hasn't already been mentioned, AFAICT this also applies to
`minibuffer--sort-by-key` (which is very similar to `sort-on`).

> * Performance seems fine from initial tests. More comprehensive benchmarking will be done.
>
> Some things that I haven't made up my mind about:
>
> * Better name for `value-less-p`:
>     value<
>     {generic,universal,standard,lisp}{-less-p,<}
>
> * Maybe the :destructive keyword be called :inplace or :in-place instead? Shorter, less violent.
>
> * Should the :reverse keyword be called :reversed or even :descending ?
>
> * Internally, `value-less-p` computes a 3-way result; it would be easy
> to expose that to Lisp as `value-compare`, could be useful.
>
> * The design space for `value-less-p` is vast: the current code is an
> attempt at intuitive semantics without too much complexity.

Perhaps the "standard order" of Prolog terms programs could be an
interesting reference:
https://www.swi-prolog.org/pldoc/man?section=standardorder


Best,

Eshel

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Thu, 21 Mar 2024 14:58:02 GMT) Full text and rfc822 format available.

Message #44 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 69709 <at> debbugs.gnu.org,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Eli Zaretskii <eliz <at> gnu.org>, Dmitry Gutov <dmitry <at> gutov.dev>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Thu, 21 Mar 2024 15:55:43 +0100

20 mars 2024 kl. 20.37 skrev Stefan Monnier <monnier <at> iro.umontreal.ca>:

> IOW, I vote for `<`!

That may actually be within the realms of possibility. Some snags:

* Not sure if code using `<` and expect a type error for non-numbers would be confused (probably not)
* Making `value<` n-ary would slow it down a bit, unclear how much
* `value<` compares markers by buffer then position, `<` by position only
* `<` can compare markers with numbers
* We expect consistency and shared code for all of < > <= >= = /= but it's unclear how `value<` would be generalised in that direction (at least if we care about NaN correctness)

Currently, `value<` treats NaN as if it were equal to any number, which is terrible and useless but consistent with `<`. Other options would include order NaN distinct from any number, perhaps just consider all NaNs to be equal and put them before or after all numbers.

`value<` is actually quite a bit faster than `<` which has a terribly cumbersome implementation, except for pairs of fixnums which apparently got special-cased in `<` and the other inequalities fairly recently.

By the way, I ran some more benchmarks: it turns out that `car-less-than-car`, the secret shortcut that you just had to know, now gives roughly the same speed as :key #'car which is a lot more discoverable (and works for many other types).

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Fri, 22 Mar 2024 20:57:02 GMT) Full text and rfc822 format available.

Message #47 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>,
 Eli Zaretskii <eliz <at> gnu.org>
Cc: 69709 <at> debbugs.gnu.org,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Fri, 22 Mar 2024 22:55:55 +0200

On 20/03/2024 21:01, Mattias Engdegård wrote:
> * Maybe the :destructive keyword be called :inplace or :in-place instead? Shorter, less violent.

If the function default to destructive, perhaps the argument could be 
called :copy.

:in-place is not too bad. But I've looked around and don't see many 
sorting functions in other stdlibs that do it non-destructively.

Even 'sort' in Clojure might mutate: 
https://clojuredocs.org/clojure.core/sort

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sat, 23 Mar 2024 15:01:01 GMT) Full text and rfc822 format available.

Message #50 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: 69709 <at> debbugs.gnu.org,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sat, 23 Mar 2024 15:58:52 +0100

22 mars 2024 kl. 21.55 skrev Dmitry Gutov <dmitry <at> gutov.dev>:

> :in-place is not too bad.

Thank you, I'm going with :in-place because :destructive puts emphasis on the wrong aspect. Sorting in-place has predictable and useful side-effects, in contrast to the old (pre-timsort) implementation that garbled the input list.

But non-destructive should definitely be the default. All my own experience (and from observations, that of other people) shows that it's far less error-prone. This applies to other languages as well, even very imperative ones like Python.

The branch scratch/sort-key has been updated with polished changes, including updates of the Lisp manual.
`value-less-p` is now called `value<`. (We could still unify it with `<`, perhaps.)

A small tweak to the implementation of non-destructive list sorting gave a speed-up of 1.1x-1.5x, which was surprisingly good. The old code just called copy-sequence on the input.

An even bigger boost was gained from special-casing the ordering predicate `value<`: 1.5x-2x speed-up on practically all input. This alone could be worth all the trouble with the patch series. We could do even better by special-casing on common key types, such as fixnums.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sat, 23 Mar 2024 17:41:02 GMT) Full text and rfc822 format available.

Message #53 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 69709 <at> debbugs.gnu.org,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Eli Zaretskii <eliz <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sat, 23 Mar 2024 19:39:51 +0200

On 23/03/2024 16:58, Mattias Engdegård wrote:
> 22 mars 2024 kl. 21.55 skrev Dmitry Gutov <dmitry <at> gutov.dev>:
> 
>> :in-place is not too bad.
> 
> Thank you, I'm going with :in-place because :destructive puts emphasis on the wrong aspect. Sorting in-place has predictable and useful side-effects, in contrast to the old (pre-timsort) implementation that garbled the input list.

Okay.

> But non-destructive should definitely be the default. All my own experience (and from observations, that of other people) shows that it's far less error-prone. This applies to other languages as well, even very imperative ones like Python.

My concern is that it will make people write less performant code by 
default. Which will degrade unnecessarily on larger inputs (often not 
anticipated by the author in advance).

So others will have to go around each such project, ensure the original 
list is "owned" and add the :in-place argument, to reach the optimum 
performance. That's why, I think, 'sort' is mutating in most languages.

I suppose an extra copy-sequence is not that expensive, compared to most 
things. It's still extra garbage (meaning GC pauses sooner or later). 
Maybe it'll become less important with a better GC algorithm.

> The branch scratch/sort-key has been updated with polished changes, including updates of the Lisp manual.
> `value-less-p` is now called `value<`. (We could still unify it with `<`, perhaps.)
> 
> A small tweak to the implementation of non-destructive list sorting gave a speed-up of 1.1x-1.5x, which was surprisingly good. The old code just called copy-sequence on the input.
> 
> An even bigger boost was gained from special-casing the ordering predicate `value<`: 1.5x-2x speed-up on practically all input. This alone could be worth all the trouble with the patch series. We could do even better by special-casing on common key types, such as fixnums.

Very nice.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sat, 23 Mar 2024 20:11:02 GMT) Full text and rfc822 format available.

Message #56 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: 69709 <at> debbugs.gnu.org,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Mattias Engdegård <mattias.engdegard <at> gmail.com>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sat, 23 Mar 2024 16:09:16 -0400

> So others will have to go around each such project, ensure the original list
> is "owned" and add the :in-place argument, to reach the optimum
> performance. That's why, I think, 'sort' is mutating in most languages.

I think it's mutating in "most languages" because "most languages" are
imperative and because those primitives usually operate on arrays (a
"naturally imperative" data structure) rather than on singly-linked
lists (a "naturally more immutable" data structure).


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sat, 23 Mar 2024 23:21:01 GMT) Full text and rfc822 format available.

Message #59 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Dmitry Gutov <dmitry <at> gutov.dev>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 69709 <at> debbugs.gnu.org,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Mattias Engdegård <mattias.engdegard <at> gmail.com>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sun, 24 Mar 2024 01:19:09 +0200

On 23/03/2024 22:09, Stefan Monnier wrote:
>> So others will have to go around each such project, ensure the original list
>> is "owned" and add the :in-place argument, to reach the optimum
>> performance. That's why, I think, 'sort' is mutating in most languages.
> I think it's mutating in "most languages" because "most languages" are
> imperative and because those primitives usually operate on arrays (a
> "naturally imperative" data structure) rather than on singly-linked
> lists (a "naturally more immutable" data structure).

There aren't many general purpose languages around which use cons cells 
and have 'sort' in stdlib for them, to compare.

To get back to the example of Clojure, which touts immutability most of 
the time, the 'sort' routine first copies a sequence into an array 
(apparently for performance - fewer indirections and better locality 
when subsequently swapping values around), and then returns that array 
with a thin wrapper called ArraySeq (which it 1 extra object, not N).

https://github.com/clojure/clojure/blob/ce55092f2b2f5481d25cff6205470c1335760ef6/src/clj/clojure/core.clj#L3103-L3118

IIUC we allocate an array internally as well during the sorting phrase, 
but then we can't return it as-is or with a thin wrapper (if only 
because the user expects back a list, not a vector), so we'll need to 
allocate a whole new list to avoid mutating the original. And our GC is 
worse than JVM's.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sat, 23 Mar 2024 23:27:02 GMT) Full text and rfc822 format available.

Message #62 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Dmitry Gutov <dmitry <at> gutov.dev>
Cc: 69709 <at> debbugs.gnu.org,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Mattias Engdegård <mattias.engdegard <at> gmail.com>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sat, 23 Mar 2024 19:25:54 -0400

> IIUC we allocate an array internally as well during the sorting phrase, but
> then we can't return it as-is or with a thin wrapper (if only because the
> user expects back a list, not a vector), so we'll need to allocate a whole
> new list to avoid mutating the original. And our GC is worse than JVM's.

When it matters, people can use `:in-place`.


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Mon, 25 Mar 2024 16:38:01 GMT) Full text and rfc822 format available.

Message #65 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 69709 <at> debbugs.gnu.org, Dmitry Gutov <dmitry <at> gutov.dev>,
 Eli Zaretskii <eliz <at> gnu.org>,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Mon, 25 Mar 2024 12:11:57 +0100

24 mars 2024 kl. 00.25 skrev Stefan Monnier <monnier <at> iro.umontreal.ca>:

> When it matters, people can use `:in-place`.

Indeed, we follow the policy of being safe by default with a faster, more dangerous alternative on explicit request: `reverse` vs `nreverse`, for instance.

I'm quite happy with the result overall: safer, much easier to use, and faster, all at the same time. Many of the uses of `sort` in the Emacs tree would have benefitted a lot from the new design, especially the hand-written lambda monstrosities that try to implement multi-key comparisons.

It should also be quite feasible to supply a working (but slower) all-Lisp implementation for the `compat` package; I have some basic proof-of-concept code.

The plan is to add scratch/sort-key to master in a few days if no serious issues turn up. There are a few minor adjustments to the documentation that haven't been pushed yet.

Oh, and the embarrassing code in `value<` for comparing bool-vectors has now been sped up ~150x in my tree, if you were worried about that.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Fri, 29 Mar 2024 11:00:02 GMT) Full text and rfc822 format available.

Message #68 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 69709 <at> debbugs.gnu.org, Dmitry Gutov <dmitry <at> gutov.dev>,
 Eli Zaretskii <eliz <at> gnu.org>,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Fri, 29 Mar 2024 11:59:39 +0100

25 mars 2024 kl. 12.11 skrev Mattias Engdegård <mattias.engdegard <at> gmail.com>:

> The plan is to add scratch/sort-key to master in a few days if no serious issues turn up.

Plan executed.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Fri, 29 Mar 2024 11:39:02 GMT) Full text and rfc822 format available.

Message #71 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Mendler <mail <at> daniel-mendler.de>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 69709 <at> debbugs.gnu.org, Dmitry Gutov <dmitry <at> gutov.dev>,
 Eli Zaretskii <eliz <at> gnu.org>,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Fri, 29 Mar 2024 12:38:21 +0100

Mattias Engdegård <mattias.engdegard <at> gmail.com> writes:

> 25 mars 2024 kl. 12.11 skrev Mattias Engdegård <mattias.engdegard <at> gmail.com>:
>
>> The plan is to add scratch/sort-key to master in a few days if no serious issues turn up.
>
> Plan executed.

Thank you for this excellent improvement of the sort function! I would
like to back-port the new feature in the Compat library, specifically in
the emacs-30 branch of the Compat repository. I've seen you mentioned
some pure Lisp code, implementing the new feature, in a slower, but
backward compatible way, which we could perhaps use?

Daniel

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Fri, 29 Mar 2024 11:53:01 GMT) Full text and rfc822 format available.

Message #74 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Daniel Mendler <mail <at> daniel-mendler.de>
Cc: 69709 <at> debbugs.gnu.org, Dmitry Gutov <dmitry <at> gutov.dev>,
 Eli Zaretskii <eliz <at> gnu.org>,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Fri, 29 Mar 2024 12:52:37 +0100

[Message part 1 (text/plain, inline)]

29 mars 2024 kl. 12.38 skrev Daniel Mendler <mail <at> daniel-mendler.de>:

> I would
> like to back-port the new feature in the Compat library, specifically in
> the emacs-30 branch of the Compat repository. I've seen you mentioned
> some pure Lisp code, implementing the new feature, in a slower, but
> backward compatible way, which we could perhaps use?

Here is my proof-of-concept hack that maybe you could use as starting point. It's incomplete but I think you can fill in the blanks. You could use the tests in fns-tests.el, perhaps adapted to fit. Let us know how it goes.

[compatsort.el (application/octet-stream, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Fri, 29 Mar 2024 12:07:02 GMT) Full text and rfc822 format available.

Message #77 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 69709 <at> debbugs.gnu.org, dmitry <at> gutov.dev, gerd.moellmann <at> gmail.com,
 monnier <at> iro.umontreal.ca
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Fri, 29 Mar 2024 15:06:05 +0300

> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Fri, 29 Mar 2024 11:59:39 +0100
> Cc: Dmitry Gutov <dmitry <at> gutov.dev>,
>  Eli Zaretskii <eliz <at> gnu.org>,
>  69709 <at> debbugs.gnu.org,
>  Gerd Möllmann <gerd.moellmann <at> gmail.com>
> 
> 25 mars 2024 kl. 12.11 skrev Mattias Engdegård <mattias.engdegard <at> gmail.com>:
> 
> > The plan is to add scratch/sort-key to master in a few days if no serious issues turn up.
> 
> Plan executed.

Thanks.

I have a few comments/questions about value<:

 . The description of value< says "lexicographic order" about some
   types, but I think that is not clear enough, and we should tell
   explicitly how the comparison works in those cases.  AFAIU, the
   corresponding elements are compared by recursively calling value<
   for them? And what happens if one has more elements than the other?
   These are all questions that popped up when I read the new text,
   and I think they should be answered in the text.
 . AFAICT, no ordering is defined for overlays, and I wonder why.  I
   think this could be very useful; we certainly sort overlays in C in
   several places.
 . Various fine details about value< are never mentioned: the fact
   that there's a limit to recursion, the special treatment of nil in
   some cases, etc.  I think we should document them, at least in the
   doc string if not in the manual as well.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Fri, 29 Mar 2024 15:03:02 GMT) Full text and rfc822 format available.

Message #80 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 69709 <at> debbugs.gnu.org, dmitry <at> gutov.dev, gerd.moellmann <at> gmail.com,
 monnier <at> iro.umontreal.ca
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Fri, 29 Mar 2024 16:02:01 +0100

29 mars 2024 kl. 13.06 skrev Eli Zaretskii <eliz <at> gnu.org>:

> . The description of value< says "lexicographic order" about some
>   types, but I think that is not clear enough, and we should tell
>   explicitly how the comparison works in those cases.

That's a good point. Since it is standard terminology I will explain it in the manual but leave the doc string as it is -- this is our usual practice, right?

> . AFAICT, no ordering is defined for overlays, and I wonder why.  I
>   think this could be very useful; we certainly sort overlays in C in
>   several places.

Yes, the plan was to add ordering for more types as we figure out which ones are missing. Basically, if a type has a useful ordering that is well-defined, we should probably add it.

That's one reason why I didn't include overlays from the beginning: I couldn't easily find one obvious ordering that would make intuitive sense, and I'd rather leave them unordered than define something useless. Code that sorts overlays uses a variety of criteria: priority, starting position, application-specific properties, and so on.

> . Various fine details about value< are never mentioned: the fact
>   that there's a limit to recursion, the special treatment of nil in
>   some cases, etc.  I think we should document them, at least in the
>   doc string if not in the manual as well.

Right -- the depth limit is very much like that of `equal` which is documented in the manual but not the doc string; I'll probably do the same for `value<`. `nil` deserves a note about its dual nature (symbol and list) as well.

Thank you for the markup fixes, by the way. One question:

> -@table @asis
> +@table @code

I used @asis because that is what the entries for some other functions using key-value arguments used, like `make-process`. Clearly the keyword should be marked up as @code, but should it encompass the @var{argument} part as well? Or should we use @asis and then explicit @code in each @item?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Fri, 29 Mar 2024 15:36:02 GMT) Full text and rfc822 format available.

Message #83 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 69709 <at> debbugs.gnu.org, dmitry <at> gutov.dev, gerd.moellmann <at> gmail.com,
 monnier <at> iro.umontreal.ca
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Fri, 29 Mar 2024 18:35:10 +0300

> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Fri, 29 Mar 2024 16:02:01 +0100
> Cc: monnier <at> iro.umontreal.ca,
>  dmitry <at> gutov.dev,
>  69709 <at> debbugs.gnu.org,
>  gerd.moellmann <at> gmail.com
> 
> 29 mars 2024 kl. 13.06 skrev Eli Zaretskii <eliz <at> gnu.org>:
> 
> > . The description of value< says "lexicographic order" about some
> >   types, but I think that is not clear enough, and we should tell
> >   explicitly how the comparison works in those cases.
> 
> That's a good point. Since it is standard terminology I will explain it in the manual but leave the doc string as it is -- this is our usual practice, right?

That'd be fine, thanks.

> > . AFAICT, no ordering is defined for overlays, and I wonder why.  I
> >   think this could be very useful; we certainly sort overlays in C in
> >   several places.
> 
> Yes, the plan was to add ordering for more types as we figure out which ones are missing. Basically, if a type has a useful ordering that is well-defined, we should probably add it.
> 
> That's one reason why I didn't include overlays from the beginning: I couldn't easily find one obvious ordering that would make intuitive sense, and I'd rather leave them unordered than define something useless. Code that sorts overlays uses a variety of criteria: priority, starting position, application-specific properties, and so on.

The canonical sorting order for overlays is in compare_overlays,
AFAIK.

> Thank you for the markup fixes, by the way. One question:
> 
> > -@table @asis
> > +@table @code
> 
> I used @asis because that is what the entries for some other functions using key-value arguments used, like `make-process`. Clearly the keyword should be marked up as @code, but should it encompass the @var{argument} part as well? Or should we use @asis and then explicit @code in each @item?

Makeinfo can handle @code{@var{..}} just fine, we have that in many
other places, so that's not a concern.  It is indeed possible to use
@asis in @table, and then mark each @item with @code, but that's just
too much trouble; "@table @code" exists to save us that...

I think the places where keywords don't have the @code markup are
simply wrong.  This should be most evident in the printed format of
the manual, where @code produces a distinct typeface, but doesn't have
the quotes around it.  Maybe you could produce the PDF manual and see
the difference by yourself.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Fri, 29 Mar 2024 16:14:02 GMT) Full text and rfc822 format available.

Message #86 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 69709 <at> debbugs.gnu.org, dmitry <at> gutov.dev, gerd.moellmann <at> gmail.com,
 monnier <at> iro.umontreal.ca
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Fri, 29 Mar 2024 17:13:33 +0100

29 mars 2024 kl. 16.35 skrev Eli Zaretskii <eliz <at> gnu.org>:

> The canonical sorting order for overlays is in compare_overlays,
> AFAIK.

Maybe, but I haven't see much Lisp code trying to use an order like that -- sorting overlays is common but is done with all kinds of orders. Perhaps I haven't looked very carefully.
In any case, maybe compare_overlays would work as a starting point. It seems to focus on vertical order (by priority), but much Lisp code sorts overlays horizontally (by position).

> Makeinfo can handle @code{@var{..}} just fine, we have that in many
> other places, so that's not a concern.  It is indeed possible to use
> @asis in @table, and then mark each @item with @code, but that's just
> too much trouble; "@table @code" exists to save us that...

Looking at the PDF makes me think that we should decide whether we prefer

  @code{:keyword @var{variable}}

or

  @code{:keyword} @var{variable}

in general. I can think of arguments for either but consistency would be even better.

> I think the places where keywords don't have the @code markup are
> simply wrong.

Yes, I agree.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Fri, 29 Mar 2024 18:10:01 GMT) Full text and rfc822 format available.

Message #89 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 69709 <at> debbugs.gnu.org, dmitry <at> gutov.dev, gerd.moellmann <at> gmail.com,
 monnier <at> iro.umontreal.ca
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Fri, 29 Mar 2024 21:09:16 +0300

> From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
> Date: Fri, 29 Mar 2024 17:13:33 +0100
> Cc: monnier <at> iro.umontreal.ca,
>  dmitry <at> gutov.dev,
>  69709 <at> debbugs.gnu.org,
>  gerd.moellmann <at> gmail.com
> 
> 29 mars 2024 kl. 16.35 skrev Eli Zaretskii <eliz <at> gnu.org>:
> 
> > Makeinfo can handle @code{@var{..}} just fine, we have that in many
> > other places, so that's not a concern.  It is indeed possible to use
> > @asis in @table, and then mark each @item with @code, but that's just
> > too much trouble; "@table @code" exists to save us that...
> 
> Looking at the PDF makes me think that we should decide whether we prefer
> 
>   @code{:keyword @var{variable}}
> 
> or
> 
>   @code{:keyword} @var{variable}
> 
> in general. I can think of arguments for either but consistency would be even better.

The former is the correct one.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sun, 14 Apr 2024 15:21:05 GMT) Full text and rfc822 format available.

Message #92 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Aris Spathis <agspathis <at> gmail.com>
To: mattias.engdegard <at> gmail.com
Cc: 69709 <at> debbugs.gnu.org
Subject: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sun, 14 Apr 2024 17:03:11 +0300

[Message part 1 (text/plain, inline)]

Thank you for your excellent work on `sort` and related functionality!

Unfortunately, the new `sort` implementation occasionally crashes with a
segfault. The following code reproduces that in current master:

(dotimes (i 500)
  (sort (make-list 128 42)
        :key (lambda (n) (make-list i n))))

It happens for inputs of length >= `MERGESTATE_TEMP_SIZE / 2` (= 128
currently) along with a non-NIL `:key` function. In such cases, a
`Lisp_Object` array is explicitly heap-allocated to store the keys, which is
never marked against GC. This would not be a problem if not for the fact
that
the `:key` function call may trigger GC.

I'm attaching a patch with a proposed solution, consisting of three changes:

1. Allocate with `xzalloc` to have the new array initialized to `Qnil`. This
   ensures its objects can be marked properly.

2. Mark `ms->allocated_keys` in `merge_markmem`. We don't need to check that
   `ms->allocated_keys != NULL`, as `merge_register_cleanup` is only called
   under this exact condition.

3. Move the computation of keys (which may trigger a GC) after `merge_init`,
   which registers `merge_markmem`.

I hope this is useful.

Cheers,
Aris

[Message part 2 (text/html, inline)]

[sort-segfault-fix.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sun, 14 Apr 2024 16:27:04 GMT) Full text and rfc822 format available.

Message #95 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Aris Spathis <agspathis <at> gmail.com>,
 Mattias Engdegård <mattiase <at> acm.org>
Cc: 69709 <at> debbugs.gnu.org, mattias.engdegard <at> gmail.com
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sun, 14 Apr 2024 19:26:05 +0300

> Cc: 69709 <at> debbugs.gnu.org
> From: Aris Spathis <agspathis <at> gmail.com>
> Date: Sun, 14 Apr 2024 17:03:11 +0300
> 
> Thank you for your excellent work on `sort` and related functionality!
> 
> Unfortunately, the new `sort` implementation occasionally crashes with a
> segfault. The following code reproduces that in current master:
> 
> (dotimes (i 500)
>   (sort (make-list 128 42)
>         :key (lambda (n) (make-list i n))))
> 
> It happens for inputs of length >= `MERGESTATE_TEMP_SIZE / 2` (= 128
> currently) along with a non-NIL `:key` function. In such cases, a
> `Lisp_Object` array is explicitly heap-allocated to store the keys, which is
> never marked against GC. This would not be a problem if not for the fact that
> the `:key` function call may trigger GC.
> 
> I'm attaching a patch with a proposed solution, consisting of three changes:
> 
> 1. Allocate with `xzalloc` to have the new array initialized to `Qnil`. This
>    ensures its objects can be marked properly.
> 
> 2. Mark `ms->allocated_keys` in `merge_markmem`. We don't need to check that
>    `ms->allocated_keys != NULL`, as `merge_register_cleanup` is only called
>    under this exact condition.
> 
> 3. Move the computation of keys (which may trigger a GC) after `merge_init`,
>    which registers `merge_markmem`.
> 
> I hope this is useful.

Thanks, I took the liberty of CC'ing Mattias using his "usual"
address, in the hope that it will help bring this to his attention
sooner.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Sun, 14 Apr 2024 16:34:03 GMT) Full text and rfc822 format available.

Message #98 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 69709 <at> debbugs.gnu.org, Aris Spathis <agspathis <at> gmail.com>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Sun, 14 Apr 2024 18:33:32 +0200

14 apr. 2024 kl. 18.26 skrev Eli Zaretskii <eliz <at> gnu.org>:

>> Thank you for your excellent work on `sort` and related functionality!

Apparently not excellent enough!

>> Unfortunately, the new `sort` implementation occasionally crashes with a
>> segfault. The following code reproduces that in current master:
>> 
>> (dotimes (i 500)
>>  (sort (make-list 128 42)
>>        :key (lambda (n) (make-list i n))))
>> 
>> It happens for inputs of length >= `MERGESTATE_TEMP_SIZE / 2` (= 128
>> currently) along with a non-NIL `:key` function. In such cases, a
>> `Lisp_Object` array is explicitly heap-allocated to store the keys, which is
>> never marked against GC. This would not be a problem if not for the fact that
>> the `:key` function call may trigger GC.

Oh dear, what a silly mistake. Many thanks for spotting this and providing the patch, which I applied with a few minor adjustments (and a regression test).

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Fri, 17 May 2024 12:33:01 GMT) Full text and rfc822 format available.

Message #101 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Mendler <mail <at> daniel-mendler.de>
To: Mattias Engdegård <mattias.engdegard <at> gmail.com>
Cc: 69709 <at> debbugs.gnu.org, ~pkal/compat-devel <at> lists.sr.ht,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Dmitry Gutov <dmitry <at> gutov.dev>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Fri, 17 May 2024 14:29:38 +0200

Mattias Engdegård <mattias.engdegard <at> gmail.com> writes:

> 29 mars 2024 kl. 12.38 skrev Daniel Mendler <mail <at> daniel-mendler.de>:
>
>> I would
>> like to back-port the new feature in the Compat library, specifically in
>> the emacs-30 branch of the Compat repository. I've seen you mentioned
>> some pure Lisp code, implementing the new feature, in a slower, but
>> backward compatible way, which we could perhaps use?
>
> Here is my proof-of-concept hack that maybe you could use as starting point.
> It's incomplete but I think you can fill in the blanks. You could use the tests
> in fns-tests.el, perhaps adapted to fit. Let us know how it goes.

Just letting you know that I've implemented value< and sort in the
emacs-30 branch of Compat. It works well so far, but value< is not yet
completely implemented. See the following commit:
https://github.com/emacs-compat/compat/commit/8190769d9eb9258dd8361bd322d90228dc586770

There is one thing I'd like to ask about value<. Would it make sense to
support comparing mixed types, e.g., numbers and markers or strings and
symbols? In particular I often like to mix numbers and markers, in
particular in cases where we can avoid creating a marker for performance
reasons if the pure position is sufficient and the location doesn't
change due to edits.

Daniel

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#69709; Package emacs. (Fri, 17 May 2024 17:52:01 GMT) Full text and rfc822 format available.

Message #104 received at 69709 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattias.engdegard <at> gmail.com>
To: Daniel Mendler <mail <at> daniel-mendler.de>
Cc: 69709 <at> debbugs.gnu.org, ~pkal/compat-devel <at> lists.sr.ht,
 Gerd Möllmann <gerd.moellmann <at> gmail.com>,
 Dmitry Gutov <dmitry <at> gutov.dev>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#69709: `sort` interface improvement and universal ordering
 predicate
Date: Fri, 17 May 2024 19:49:48 +0200

17 maj 2024 kl. 14.29 skrev Daniel Mendler <mail <at> daniel-mendler.de>:

> Just letting you know that I've implemented value< and sort in the
> emacs-30 branch of Compat. It works well so far, but value< is not yet
> completely implemented. See the following commit:
> https://github.com/emacs-compat/compat/commit/8190769d9eb9258dd8361bd322d90228dc586770

An excellent start! I'll post comments on the source site.

> There is one thing I'd like to ask about value<. Would it make sense to
> support comparing mixed types, e.g., numbers and markers or strings and
> symbols?

I went back and forth quite a bit, but decided to start small homogeneous comparisons which would at least give the option to extend to heterogeneous (or in your case, ad-hoc mixed) comparisons later on, without committing too much to any particular design until we have more experience.

When it comes to mixing numbers and markers, I'm still leaning against it. `<` allows mixed comparisons but it doesn't work well with markers from different buffers. `value<` orders markers by grouping them by buffer which feels more useful.

(I've never been a great fan of the way elisp allows comparison and arithmetic on markers and numbers; it probably seemed to be a good idea at the time. That feature seems to have caused more muddled than clear code.)

Same with strings and symbols: I can't remember having seen a collection where I would have wanted them to be sorted together, and `nil` being a symbol means that we would lose a useful error check.

There is a more tempting case for a truly heterogeneous comparisons, like a universal total order. Some other languages have this but I think it is more difficult to construct with so much legacy. I went with unifying all numbers for pragmatic reasons even though -0.0 and NaN are terrible.

This bug report was last modified 1 year and 89 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #69709 `sort` interface improvement and universal ordering predicate

GNU bug report logs - #69709
`sort` interface improvement and universal ordering predicate