GNU bug report logs - #41354
equal? has no sensible code path for symbols

Previous Next

Package: guile;

Reported by: David Kastrup <dak <at> gnu.org>

Date: Sun, 17 May 2020 10:50:02 UTC

Severity: normal

To reply to this bug, email your comments to 41354 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#41354; Package guile. (Sun, 17 May 2020 10:50:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Kastrup <dak <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Sun, 17 May 2020 10:50:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: bug-guile <at> gnu.org
Subject: equal? has no sensible code path for symbols
Date: Sun, 17 May 2020 12:49:50 +0200
In Scheme, symbols can be compared using eq? for equality.  However,
since they have garbage-collected content attached, they do not meet the
predicate SCM_IMP in the short-circuit evaluation at the start of equal?
This means that unequal symbols compared using equal? fall through a
whole bunch of tests and end up in a general structural comparison
comparing their underlying string names.

This completely sabotages the semantics symbols are intended for.
Behavior for eqv? is similar but the fall-through at least is not as
expensive as it is for equal? .

-- 
David Kastrup




Information forwarded to bug-guile <at> gnu.org:
bug#41354; Package guile. (Wed, 27 May 2020 20:41:01 GMT) Full text and rfc822 format available.

Message #8 received at 41354 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: David Kastrup <dak <at> gnu.org>
Cc: 41354 <at> debbugs.gnu.org
Subject: Re: bug#41354: equal? has no sensible code path for symbols
Date: Wed, 27 May 2020 22:39:59 +0200
Hi David,

David Kastrup <dak <at> gnu.org> skribis:

> In Scheme, symbols can be compared using eq? for equality.  However,
> since they have garbage-collected content attached, they do not meet the
> predicate SCM_IMP in the short-circuit evaluation at the start of equal?
> This means that unequal symbols compared using equal? fall through a
> whole bunch of tests and end up in a general structural comparison
> comparing their underlying string names.

‘equal?’ starts by checking for eq-ness, which LGTM:

  SCM
  scm_equal_p (SCM x, SCM y)
  #define FUNC_NAME s_scm_i_equal_p
  {
    SCM_CHECK_STACK;
   tailrecurse:
    SCM_TICK;
    if (scm_is_eq (x, y))
      return SCM_BOOL_T;

Or were you referring to something else?

Thanks,
Ludo’.




Information forwarded to bug-guile <at> gnu.org:
bug#41354; Package guile. (Wed, 27 May 2020 20:50:02 GMT) Full text and rfc822 format available.

Message #11 received at 41354 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 41354 <at> debbugs.gnu.org
Subject: Re: bug#41354: equal? has no sensible code path for symbols
Date: Wed, 27 May 2020 22:49:10 +0200
Ludovic Courtès <ludo <at> gnu.org> writes:

> Hi David,
>
> David Kastrup <dak <at> gnu.org> skribis:
>
>> In Scheme, symbols can be compared using eq? for equality.  However,
>> since they have garbage-collected content attached, they do not meet the
>> predicate SCM_IMP in the short-circuit evaluation at the start of equal?
>> This means that unequal symbols compared using equal? fall through a
>> whole bunch of tests and end up in a general structural comparison
>> comparing their underlying string names.
>
> ‘equal?’ starts by checking for eq-ness, which LGTM:
>
>   SCM
>   scm_equal_p (SCM x, SCM y)
>   #define FUNC_NAME s_scm_i_equal_p
>   {
>     SCM_CHECK_STACK;
>    tailrecurse:
>     SCM_TICK;
>     if (scm_is_eq (x, y))
>       return SCM_BOOL_T;
>
> Or were you referring to something else?

I repeat: "This means that UNEQUAL symbols compared using equal? fall
through a whole bunch of tests and end up in a general structural
comparison comparing their underlying string names".

Lots of searches _end_ with an equal comparison (which is fast) but do a
lot of unequal comparisons before that (which is slow, even though
symbols that are not eq? will also not be equal?, so if you know you are
checking _symbols_, if they are not eq? you are done).

Symbols comparing as _unequal_ have no special path in equal?.

-- 
David Kastrup




Information forwarded to bug-guile <at> gnu.org:
bug#41354; Package guile. (Thu, 28 May 2020 16:08:02 GMT) Full text and rfc822 format available.

Message #14 received at 41354 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: David Kastrup <dak <at> gnu.org>
Cc: 41354 <at> debbugs.gnu.org
Subject: Re: bug#41354: equal? has no sensible code path for symbols
Date: Thu, 28 May 2020 18:06:54 +0200
[Message part 1 (text/plain, inline)]
Hi,

David Kastrup <dak <at> gnu.org> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> Hi David,
>>
>> David Kastrup <dak <at> gnu.org> skribis:
>>
>>> In Scheme, symbols can be compared using eq? for equality.  However,
>>> since they have garbage-collected content attached, they do not meet the
>>> predicate SCM_IMP in the short-circuit evaluation at the start of equal?
>>> This means that unequal symbols compared using equal? fall through a
>>> whole bunch of tests and end up in a general structural comparison
>>> comparing their underlying string names.
>>
>> ‘equal?’ starts by checking for eq-ness, which LGTM:
>>
>>   SCM
>>   scm_equal_p (SCM x, SCM y)
>>   #define FUNC_NAME s_scm_i_equal_p
>>   {
>>     SCM_CHECK_STACK;
>>    tailrecurse:
>>     SCM_TICK;
>>     if (scm_is_eq (x, y))
>>       return SCM_BOOL_T;
>>
>> Or were you referring to something else?
>
> I repeat: "This means that UNEQUAL symbols compared using equal? fall
> through a whole bunch of tests and end up in a general structural
> comparison comparing their underlying string names".
>
> Lots of searches _end_ with an equal comparison (which is fast) but do a
> lot of unequal comparisons before that (which is slow, even though
> symbols that are not eq? will also not be equal?, so if you know you are
> checking _symbols_, if they are not eq? you are done).
>
> Symbols comparing as _unequal_ have no special path in equal?.

I was going to say that this is necessary for uninterned symbols, but it
turns out that uninterned symbols that look the same are not ‘equal?’:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (define a (make-symbol "x"))
scheme@(guile-user)> (define b (make-symbol "x"))
scheme@(guile-user)> (eq? a b)
$10 = #f
scheme@(guile-user)> (equal? a b)
$11 = #f
--8<---------------cut here---------------end--------------->8---

Thus we could go with the patch below, though I doubt it would make a
measurable difference (and it actually adds tests for other cases).

Thoughts?

Besides, in the common case where one is comparing against a symbol
literal, the question is moot:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,optimize (equal? 'x s)
$14 = (eq? 'x s)
--8<---------------cut here---------------end--------------->8---

Ludo’.

[Message part 2 (text/x-patch, inline)]
diff --git a/libguile/eq.c b/libguile/eq.c
index 627d6f09b..16c5bfb3f 100644
--- a/libguile/eq.c
+++ b/libguile/eq.c
@@ -303,6 +303,8 @@ scm_equal_p (SCM x, SCM y)
     return SCM_BOOL_F;
   if (SCM_IMP (y))
     return SCM_BOOL_F;
+  if (scm_is_symbol (x) || scm_is_symbol (y))
+    return SCM_BOOL_F;
   if (scm_is_pair (x) && scm_is_pair (y))
     {
       if (scm_is_false (scm_equal_p (SCM_CAR (x), SCM_CAR (y))))

Information forwarded to bug-guile <at> gnu.org:
bug#41354; Package guile. (Thu, 28 May 2020 16:51:01 GMT) Full text and rfc822 format available.

Message #17 received at 41354 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 41354 <at> debbugs.gnu.org
Subject: Re: bug#41354: equal? has no sensible code path for symbols
Date: Thu, 28 May 2020 18:50:20 +0200
Ludovic Courtès <ludo <at> gnu.org> writes:

> David Kastrup <dak <at> gnu.org> skribis:
>
>> Ludovic Courtès <ludo <at> gnu.org> writes:
>>
>>> Hi David,
>>>
>>> David Kastrup <dak <at> gnu.org> skribis:
>>
>> Symbols comparing as _unequal_ have no special path in equal?.
>
> I was going to say that this is necessary for uninterned symbols, but it
> turns out that uninterned symbols that look the same are not ‘equal?’:
>
> scheme@(guile-user)> (define a (make-symbol "x"))
> scheme@(guile-user)> (define b (make-symbol "x"))
> scheme@(guile-user)> (eq? a b)
> $10 = #f
> scheme@(guile-user)> (equal? a b)
> $11 = #f

And it would be pretty horrible if they were, in my book.

> Thus we could go with the patch below, though I doubt it would make a
> measurable difference (and it actually adds tests for other cases).

It made a considerable measurable difference in LilyPond where it slowed
down the operation of assoc when used for symbol lookup (while assoc has
a short-circuit path to assq for SCM_IMP (key) that happens to have the
same problem of not being effective for symbols).  It took some
debugging to figure out why so much time was spent in equal? .

> Thoughts?
>
> Besides, in the common case where one is comparing against a symbol
> literal, the question is moot:
>
> scheme@(guile-user)> ,optimize (equal? 'x s)
> $14 = (eq? 'x s)

That is really quite irrelevant since the problem becomes visible when a
large number of comparisons in a row is done and if you were only
looking for a single constant key among a large set, you'd hardly have a
single constant key your code path would be looking for among that large
set.

> Ludo’.
>
> diff --git a/libguile/eq.c b/libguile/eq.c
> index 627d6f09b..16c5bfb3f 100644
> --- a/libguile/eq.c
> +++ b/libguile/eq.c
> @@ -303,6 +303,8 @@ scm_equal_p (SCM x, SCM y)
>      return SCM_BOOL_F;
>    if (SCM_IMP (y))
>      return SCM_BOOL_F;
> +  if (scm_is_symbol (x) || scm_is_symbol (y))
> +    return SCM_BOOL_F;
>    if (scm_is_pair (x) && scm_is_pair (y))
>      {
>        if (scm_is_false (scm_equal_p (SCM_CAR (x), SCM_CAR (y))))
>

Yes, that looks reasonable.  scm_is_symbol checks some tag subset that
the code for equal_p later looks at closer as well: if you worry about
the extra cost of the scm_is_symbol check, one could try folding the
symbol check into that later code passage, which would slow down the
symbol check and effect the more costly fallbacks less.  But since those
fallbacks _are_ more costly, I doubt it would be worth the trouble.

-- 
David Kastrup




Information forwarded to bug-guile <at> gnu.org:
bug#41354; Package guile. (Fri, 29 May 2020 08:06:01 GMT) Full text and rfc822 format available.

Message #20 received at 41354 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: David Kastrup <dak <at> gnu.org>
Cc: 41354 <at> debbugs.gnu.org
Subject: Re: bug#41354: equal? has no sensible code path for symbols
Date: Fri, 29 May 2020 10:05:06 +0200
Hi,

David Kastrup <dak <at> gnu.org> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:

[...]

>> Thus we could go with the patch below, though I doubt it would make a
>> measurable difference (and it actually adds tests for other cases).
>
> It made a considerable measurable difference in LilyPond

You measured with and without the patch I sent?  Or something else?

>> diff --git a/libguile/eq.c b/libguile/eq.c
>> index 627d6f09b..16c5bfb3f 100644
>> --- a/libguile/eq.c
>> +++ b/libguile/eq.c
>> @@ -303,6 +303,8 @@ scm_equal_p (SCM x, SCM y)
>>      return SCM_BOOL_F;
>>    if (SCM_IMP (y))
>>      return SCM_BOOL_F;
>> +  if (scm_is_symbol (x) || scm_is_symbol (y))
>> +    return SCM_BOOL_F;
>>    if (scm_is_pair (x) && scm_is_pair (y))
>>      {
>>        if (scm_is_false (scm_equal_p (SCM_CAR (x), SCM_CAR (y))))
>>
>
> Yes, that looks reasonable.  scm_is_symbol checks some tag subset that
> the code for equal_p later looks at closer as well: if you worry about
> the extra cost of the scm_is_symbol check, one could try folding the
> symbol check into that later code passage, which would slow down the
> symbol check and effect the more costly fallbacks less.  But since those
> fallbacks _are_ more costly, I doubt it would be worth the trouble.

Looking at eq.c, I don’t see what “costly fallbacks” you’re referring
to.  For a symbol, AIUI, we end up here:

  switch (SCM_TYP7 (x))
    {
    default:
      /* Check equality between structs of equal type (see cell-type test above). */
      if (SCM_STRUCTP (x))
	{
	  if (SCM_INSTANCEP (x))
	    goto generic_equal;
	  else
	    return scm_i_struct_equalp (x, y);
	}
      break;   // <- here, meaning we return SCM_BOOL_F

All the checks leading to this line are type tag comparisons.

Am I overlooking something?

Thanks,
Ludo’.




Information forwarded to bug-guile <at> gnu.org:
bug#41354; Package guile. (Tue, 19 Jan 2021 21:54:02 GMT) Full text and rfc822 format available.

Message #23 received at 41354 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 41354 <at> debbugs.gnu.org
Subject: Re: bug#41354: equal? has no sensible code path for symbols
Date: Tue, 19 Jan 2021 22:53:37 +0100
Ludovic Courtès <ludo <at> gnu.org> writes:

> Hi,
>
> David Kastrup <dak <at> gnu.org> skribis:
>
>> Ludovic Courtès <ludo <at> gnu.org> writes:
>
> [...]
>
>>> Thus we could go with the patch below, though I doubt it would make a
>>> measurable difference (and it actually adds tests for other cases).
>>
>> It made a considerable measurable difference in LilyPond
>
> You measured with and without the patch I sent?  Or something else?

It made a considerable measurable difference in LilyPond to use scm_eq
over scm_eqv when one variable was known to be a symbol and most
comparisons would have turned out false.

>
>>> diff --git a/libguile/eq.c b/libguile/eq.c
>>> index 627d6f09b..16c5bfb3f 100644
>>> --- a/libguile/eq.c
>>> +++ b/libguile/eq.c
>>> @@ -303,6 +303,8 @@ scm_equal_p (SCM x, SCM y)
>>>      return SCM_BOOL_F;
>>>    if (SCM_IMP (y))
>>>      return SCM_BOOL_F;
>>> +  if (scm_is_symbol (x) || scm_is_symbol (y))
>>> +    return SCM_BOOL_F;
>>>    if (scm_is_pair (x) && scm_is_pair (y))
>>>      {
>>>        if (scm_is_false (scm_equal_p (SCM_CAR (x), SCM_CAR (y))))
>>>
>>
>> Yes, that looks reasonable.  scm_is_symbol checks some tag subset that
>> the code for equal_p later looks at closer as well: if you worry about
>> the extra cost of the scm_is_symbol check, one could try folding the
>> symbol check into that later code passage, which would slow down the
>> symbol check and effect the more costly fallbacks less.  But since those
>> fallbacks _are_ more costly, I doubt it would be worth the trouble.
>
> Looking at eq.c, I don’t see what “costly fallbacks” you’re referring
> to.  For a symbol, AIUI, we end up here:
>
>   switch (SCM_TYP7 (x))
>     {
>     default:
>       /* Check equality between structs of equal type (see cell-type test above). */
>       if (SCM_STRUCTP (x))
> 	{
> 	  if (SCM_INSTANCEP (x))
> 	    goto generic_equal;
> 	  else
> 	    return scm_i_struct_equalp (x, y);
> 	}
>       break;   // <- here, meaning we return SCM_BOOL_F
>
> All the checks leading to this line are type tag comparisons.
>
> Am I overlooking something?

That "all the checks" amount to quite a bit when the whole point of a
symbol is being faster to compare than structured types?  The main
surprise for me was that a symbol is a non-immediate type even though on
second thought it is clear that the symbol name has to be stored
somewhere associated with the symbol value.  However, from the
performance semantics a symbol should not be markedly different from
immediate types to avoid violating reasonable user expectations about
the Scheme type system: their whole point is to be fast to compare, and
fast comparisons are particularly important where you go through a large
number of them (which usually implies that most comparisons will end up
false).  This is particularly important when both arguments are symbols.

The normal expectation of functions like assv would be that they would
be only marginally slower than assq when the search key is a symbol (and
particularly so if most key/value pairs have a symbol key).  Because the
outcome for eqv(symbol1, symbol2)->#f takes quite longer than the
outcome for eq(symbol1, symbol2)->#f, this expectation is not met.

-- 
David Kastrup




This bug report was last modified 4 years and 145 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.