GNU bug report logs -
#56347
Optimize/simplify STRING_SET_MULTIBYTE
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 56347 in the body.
You can then email your comments to 56347 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56347
; Package
emacs
.
(Fri, 01 Jul 2022 23:33:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Stefan Monnier <monnier <at> iro.umontreal.ca>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Fri, 01 Jul 2022 23:33:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Tags: patch
Tags: patch
The patch below simplifies code around STRING_SET_MULTIBYTE.
Any objection?
Stefan
In GNU Emacs 29.0.50 (build 1, x86_64-pc-linux-gnux32, GTK+ Version 3.24.34, cairo version 1.16.0)
of 2022-06-30 built on alfajor
Repository revision: 2c31b0d5b84471e8b1fa5737b37cf4c241aec036
Repository branch: work
Windowing system distributor 'The X.Org Foundation', version 11.0.12101003
System Description: Debian GNU/Linux bookworm/sid
Configured using:
'configure -C --enable-checking --enable-check-lisp-object-type --with-modules --with-cairo --with-tiff=ifavailable
'CFLAGS=-Wall -g3 -Og -Wno-pointer-sign'
PKG_CONFIG_PATH=/home/monnier/lib/pkgconfig'
[set-multibyte.patch (text/patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56347
; Package
emacs
.
(Sat, 02 Jul 2022 06:18:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 56347 <at> debbugs.gnu.org (full text, mbox):
> Date: Fri, 01 Jul 2022 19:32:05 -0400
> From: Stefan Monnier via "Bug reports for GNU Emacs,
> the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>
> The patch below simplifies code around STRING_SET_MULTIBYTE.
> Any objection?
Rationale? Simplification in these cases is minimal, almost
non-existent, so it cannot be the only rationale.
> --- a/src/composite.c
> +++ b/src/composite.c
> @@ -1879,11 +1879,7 @@ Otherwise (for terminal display), FONT-OBJECT must be a terminal ID, a
> for (i = SBYTES (string) - 1; i >= 0; i--)
> if (!ASCII_CHAR_P (SREF (string, i)))
> error ("Attempt to shape unibyte text");
> - /* STRING is a pure-ASCII string, so we can convert it (or,
> - rather, its copy) to multibyte and use that thereafter. */
> - Lisp_Object string_copy = Fconcat (1, &string);
> - STRING_SET_MULTIBYTE (string_copy);
> - string = string_copy;
> + /* STRING is a pure-ASCII string, so we can treat it as multibyte. */
Did you actually try your change in the situations where this problem
pops up? AFAIR, the code makes a copy of the string for good reasons:
the rest of handling of the string down the line barfs if we keep a
multibyte string here.
> --- a/src/lisp.h
> +++ b/src/lisp.h
> @@ -1637,12 +1637,10 @@ #define STRING_SET_UNIBYTE(STR) \
>
> /* Mark STR as a multibyte string. Assure that STR contains only
> ASCII characters in advance. */
> -#define STRING_SET_MULTIBYTE(STR) \
> - do { \
> - if (XSTRING (STR)->u.s.size == 0) \
> - (STR) = empty_multibyte_string; \
> - else \
> - XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
> +#define STRING_SET_MULTIBYTE(STR) \
> + do { \
> + eassert (XSTRING (STR)->u.s.size > 0); \
> + XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
> } while (false)
>
> /* Convenience functions for dealing with Lisp strings. */
You want to disallow uses of empty_multibyte_string? why?
Severity set to 'wishlist' from 'normal'
Request was from
Stefan Kangas <stefan <at> marxist.se>
to
control <at> debbugs.gnu.org
.
(Sat, 02 Jul 2022 07:44:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56347
; Package
emacs
.
(Sat, 02 Jul 2022 16:13:01 GMT)
Full text and
rfc822 format available.
Message #13 received at 56347 <at> debbugs.gnu.org (full text, mbox):
>> The patch below simplifies code around STRING_SET_MULTIBYTE.
>> Any objection?
> Rationale?
STRING_SET_MULTIBYTE is fundamentally evil because it changes the nature
of an object. Its current definition (like that of STRING_SET_UNIBYTE)
is rather scary (it sometimes changes the nature of the arg passed to
it, and sometimes replaces the arg with something else).
>> --- a/src/composite.c
>> +++ b/src/composite.c
>> @@ -1879,11 +1879,7 @@ Otherwise (for terminal display), FONT-OBJECT must be a terminal ID, a
>> for (i = SBYTES (string) - 1; i >= 0; i--)
>> if (!ASCII_CHAR_P (SREF (string, i)))
>> error ("Attempt to shape unibyte text");
>> - /* STRING is a pure-ASCII string, so we can convert it (or,
>> - rather, its copy) to multibyte and use that thereafter. */
>> - Lisp_Object string_copy = Fconcat (1, &string);
>> - STRING_SET_MULTIBYTE (string_copy);
>> - string = string_copy;
>> + /* STRING is a pure-ASCII string, so we can treat it as multibyte. */
>
> Did you actually try your change in the situations where this problem
> pops up?
I don't even know how to go about doing that, no.
> AFAIR, the code makes a copy of the string for good reasons:
> the rest of handling of the string down the line barfs if we keep a
> multibyte string here.
[ I assume you meant "barfs if we keep a *uni*byte string here". ]
Where? AFAICT `string` is only used in the subsequent code by passing
it to `fill_gstring_header` and that function only passes that arg to
`fetch_string_char_advance_no_check` and that function only looks at the
string's SDATA, so as long as the sequence of bytes is consistent with
a multibyte string (which we just checked with the ASCII_CHAR_P loop),
I don't see any problem.
>> --- a/src/lisp.h
>> +++ b/src/lisp.h
>> @@ -1637,12 +1637,10 @@ #define STRING_SET_UNIBYTE(STR) \
>>
>> /* Mark STR as a multibyte string. Assure that STR contains only
>> ASCII characters in advance. */
>> -#define STRING_SET_MULTIBYTE(STR) \
>> - do { \
>> - if (XSTRING (STR)->u.s.size == 0) \
>> - (STR) = empty_multibyte_string; \
>> - else \
>> - XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
>> +#define STRING_SET_MULTIBYTE(STR) \
>> + do { \
>> + eassert (XSTRING (STR)->u.s.size > 0); \
>> + XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
>> } while (false)
>>
>> /* Convenience functions for dealing with Lisp strings. */
>
> You want to disallow uses of empty_multibyte_string? why?
No, I want to reduce the scope of semantics of the macro, e.g. so it can
be implemented as a function rather than a macro and so it doesn't
magically substitute empty_multibyte_string into a variable that held
something else.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56347
; Package
emacs
.
(Sat, 02 Jul 2022 16:25:01 GMT)
Full text and
rfc822 format available.
Message #16 received at 56347 <at> debbugs.gnu.org (full text, mbox):
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: 56347 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 12:12:06 -0400
>
> STRING_SET_MULTIBYTE is fundamentally evil because it changes the nature
> of an object. Its current definition (like that of STRING_SET_UNIBYTE)
> is rather scary (it sometimes changes the nature of the arg passed to
> it, and sometimes replaces the arg with something else).
But do we have any alternatives?
> >> - /* STRING is a pure-ASCII string, so we can convert it (or,
> >> - rather, its copy) to multibyte and use that thereafter. */
> >> - Lisp_Object string_copy = Fconcat (1, &string);
> >> - STRING_SET_MULTIBYTE (string_copy);
> >> - string = string_copy;
> >> + /* STRING is a pure-ASCII string, so we can treat it as multibyte. */
> >
> > Did you actually try your change in the situations where this problem
> > pops up?
>
> I don't even know how to go about doing that, no.
Make a character-composition rule that composes, say, two '-'
characters, and then display a buffer where you have adjacent dashes.
> > AFAIR, the code makes a copy of the string for good reasons:
> > the rest of handling of the string down the line barfs if we keep a
> > multibyte string here.
>
> [ I assume you meant "barfs if we keep a *uni*byte string here". ]
Yes.
> Where?
I don't remember, sorry.
> >> -#define STRING_SET_MULTIBYTE(STR) \
> >> - do { \
> >> - if (XSTRING (STR)->u.s.size == 0) \
> >> - (STR) = empty_multibyte_string; \
> >> - else \
> >> - XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
> >> +#define STRING_SET_MULTIBYTE(STR) \
> >> + do { \
> >> + eassert (XSTRING (STR)->u.s.size > 0); \
> >> + XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
> >> } while (false)
> >>
> >> /* Convenience functions for dealing with Lisp strings. */
> >
> > You want to disallow uses of empty_multibyte_string? why?
>
> No, I want to reduce the scope of semantics of the macro, e.g. so it can
> be implemented as a function rather than a macro and so it doesn't
> magically substitute empty_multibyte_string into a variable that held
> something else.
But the effect is that you disallow calling STRING_SET_MULTIBYTE on an
empty string, isn't it?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56347
; Package
emacs
.
(Sat, 02 Jul 2022 16:51:01 GMT)
Full text and
rfc822 format available.
Message #19 received at 56347 <at> debbugs.gnu.org (full text, mbox):
>> --- a/src/composite.c
>> +++ b/src/composite.c
>> @@ -1879,11 +1879,7 @@ Otherwise (for terminal display), FONT-OBJECT must be a terminal ID, a
>> for (i = SBYTES (string) - 1; i >= 0; i--)
>> if (!ASCII_CHAR_P (SREF (string, i)))
>> error ("Attempt to shape unibyte text");
>> - /* STRING is a pure-ASCII string, so we can convert it (or,
>> - rather, its copy) to multibyte and use that thereafter. */
>> - Lisp_Object string_copy = Fconcat (1, &string);
>> - STRING_SET_MULTIBYTE (string_copy);
>> - string = string_copy;
>> + /* STRING is a pure-ASCII string, so we can treat it as multibyte. */
>
> Did you actually try your change in the situations where this problem
> pops up? AFAIR, the code makes a copy of the string for good reasons:
> the rest of handling of the string down the line barfs if we keep a
> multibyte string here.
Of course, if we really do need a multibyte copy of the string, I can
change the patch to call `Fstring_to_multibyte` instead of `Fconcat`.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56347
; Package
emacs
.
(Sat, 02 Jul 2022 17:07:01 GMT)
Full text and
rfc822 format available.
Message #22 received at 56347 <at> debbugs.gnu.org (full text, mbox):
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: 56347 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 12:49:53 -0400
>
> >> --- a/src/composite.c
> >> +++ b/src/composite.c
> >> @@ -1879,11 +1879,7 @@ Otherwise (for terminal display), FONT-OBJECT must be a terminal ID, a
> >> for (i = SBYTES (string) - 1; i >= 0; i--)
> >> if (!ASCII_CHAR_P (SREF (string, i)))
> >> error ("Attempt to shape unibyte text");
> >> - /* STRING is a pure-ASCII string, so we can convert it (or,
> >> - rather, its copy) to multibyte and use that thereafter. */
> >> - Lisp_Object string_copy = Fconcat (1, &string);
> >> - STRING_SET_MULTIBYTE (string_copy);
> >> - string = string_copy;
> >> + /* STRING is a pure-ASCII string, so we can treat it as multibyte. */
> >
> > Did you actually try your change in the situations where this problem
> > pops up? AFAIR, the code makes a copy of the string for good reasons:
> > the rest of handling of the string down the line barfs if we keep a
> > multibyte string here.
>
> Of course, if we really do need a multibyte copy of the string, I can
> change the patch to call `Fstring_to_multibyte` instead of `Fconcat`.
Why not call make_multibyte_string directly?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56347
; Package
emacs
.
(Sat, 02 Jul 2022 17:59:02 GMT)
Full text and
rfc822 format available.
Message #25 received at 56347 <at> debbugs.gnu.org (full text, mbox):
> Why not call make_multibyte_string directly?
That too, of course,
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56347
; Package
emacs
.
(Sat, 02 Jul 2022 18:01:01 GMT)
Full text and
rfc822 format available.
Message #28 received at 56347 <at> debbugs.gnu.org (full text, mbox):
>> No, I want to reduce the scope of semantics of the macro, e.g. so it can
>> be implemented as a function rather than a macro and so it doesn't
>> magically substitute empty_multibyte_string into a variable that held
>> something else.
> But the effect is that you disallow calling STRING_SET_MULTIBYTE on an
> empty string, isn't it?
Yes. In my book, STRING_SET_*IBYTE should basically not exist: a string
is created as unibyte or multibyte and never changes after that.
And indeed because we only have a single copy of the two possible empty
strings, they can't be changed between unibyte<->multibyte
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56347
; Package
emacs
.
(Sat, 02 Jul 2022 18:32:01 GMT)
Full text and
rfc822 format available.
Message #31 received at 56347 <at> debbugs.gnu.org (full text, mbox):
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: 56347 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 14:00:41 -0400
>
> >> No, I want to reduce the scope of semantics of the macro, e.g. so it can
> >> be implemented as a function rather than a macro and so it doesn't
> >> magically substitute empty_multibyte_string into a variable that held
> >> something else.
> > But the effect is that you disallow calling STRING_SET_MULTIBYTE on an
> > empty string, isn't it?
>
> Yes. In my book, STRING_SET_*IBYTE should basically not exist: a string
> is created as unibyte or multibyte and never changes after that.
That's require a much larger change, I think. It is not enough just
to add an assertion in one place, because that'd just cause
maintenance headaches for no real gain.
And I'm not even sure everyone will agree with such a radical change.
It should be discussed first.
> And indeed because we only have a single copy of the two possible empty
> strings, they can't be changed between unibyte<->multibyte
I can create an empty string without those singletons any time.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56347
; Package
emacs
.
(Sat, 02 Jul 2022 18:38:01 GMT)
Full text and
rfc822 format available.
Message #34 received at 56347 <at> debbugs.gnu.org (full text, mbox):
>> Yes. In my book, STRING_SET_*IBYTE should basically not exist: a string
>> is created as unibyte or multibyte and never changes after that.
>
> That's require a much larger change, I think. It is not enough just
> to add an assertion in one place, because that'd just cause
> maintenance headaches for no real gain.
`grep STRING_SET_MULTIBYTE` suggests it's not nearly as hard you think.
The only other use is in `aset`.
>> And indeed because we only have a single copy of the two possible empty
>> strings, they can't be changed between unibyte<->multibyte
> I can create an empty string without those singletons any time.
My experience is that it takes a fair bit of work to do so.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#56347
; Package
emacs
.
(Sat, 02 Jul 2022 18:46:02 GMT)
Full text and
rfc822 format available.
Message #37 received at 56347 <at> debbugs.gnu.org (full text, mbox):
> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: 56347 <at> debbugs.gnu.org
> Date: Sat, 02 Jul 2022 14:37:18 -0400
>
> >> Yes. In my book, STRING_SET_*IBYTE should basically not exist: a string
> >> is created as unibyte or multibyte and never changes after that.
> >
> > That's require a much larger change, I think. It is not enough just
> > to add an assertion in one place, because that'd just cause
> > maintenance headaches for no real gain.
>
> `grep STRING_SET_MULTIBYTE` suggests it's not nearly as hard you think.
>
> The only other use is in `aset`.
That's not what I meant. If you want to disallow changing the
representation of strings, STRING_SET_MULTIBYTE is just the tip of the
iceberg.
> >> And indeed because we only have a single copy of the two possible empty
> >> strings, they can't be changed between unibyte<->multibyte
> > I can create an empty string without those singletons any time.
>
> My experience is that it takes a fair bit of work to do so.
I don't see how that matters here, sorry.
bug closed, send any further explanations to
56347 <at> debbugs.gnu.org and Stefan Monnier <monnier <at> iro.umontreal.ca>
Request was from
Stefan Monnier <monnier <at> iro.umontreal.ca>
to
control <at> debbugs.gnu.org
.
(Sat, 09 Jul 2022 17:46:02 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 07 Aug 2022 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 3 years and 8 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.