GNU bug report logs - #56347
Optimize/simplify STRING_SET_MULTIBYTE

Previous Next

Package: emacs;

Reported by: Stefan Monnier <monnier <at> iro.umontreal.ca>

Date: Fri, 1 Jul 2022 23:33:01 UTC

Severity: wishlist

Tags: patch

Done: Stefan Monnier <monnier <at> iro.umontreal.ca>

Bug is archived. No further changes may be made.

Full log


Message #13 received at 56347 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 56347 <at> debbugs.gnu.org
Subject: Re: bug#56347: Optimize/simplify STRING_SET_MULTIBYTE
Date: Sat, 02 Jul 2022 12:12:06 -0400
>> The patch below simplifies code around STRING_SET_MULTIBYTE.
>> Any objection?
> Rationale?

STRING_SET_MULTIBYTE is fundamentally evil because it changes the nature
of an object.  Its current definition (like that of STRING_SET_UNIBYTE)
is rather scary (it sometimes changes the nature of the arg passed to
it, and sometimes replaces the arg with something else).

>> --- a/src/composite.c
>> +++ b/src/composite.c
>> @@ -1879,11 +1879,7 @@ Otherwise (for terminal display), FONT-OBJECT must be a terminal ID, a
>>  	  for (i = SBYTES (string) - 1; i >= 0; i--)
>>  	    if (!ASCII_CHAR_P (SREF (string, i)))
>>  	      error ("Attempt to shape unibyte text");
>> -	  /* STRING is a pure-ASCII string, so we can convert it (or,
>> -	     rather, its copy) to multibyte and use that thereafter.  */
>> -	  Lisp_Object string_copy = Fconcat (1, &string);
>> -	  STRING_SET_MULTIBYTE (string_copy);
>> -	  string = string_copy;
>> +	  /* STRING is a pure-ASCII string, so we can treat it as multibyte.  */
>
> Did you actually try your change in the situations where this problem
> pops up?

I don't even know how to go about doing that, no.

> AFAIR, the code makes a copy of the string for good reasons:
> the rest of handling of the string down the line barfs if we keep a
> multibyte string here.

[ I assume you meant "barfs if we keep a *uni*byte string here".  ]

Where?  AFAICT `string` is only used in the subsequent code by passing
it to `fill_gstring_header` and that function only passes that arg to
`fetch_string_char_advance_no_check` and that function only looks at the
string's SDATA, so as long as the sequence of bytes is consistent with
a multibyte string (which we just checked with the ASCII_CHAR_P loop),
I don't see any problem.

>> --- a/src/lisp.h
>> +++ b/src/lisp.h
>> @@ -1637,12 +1637,10 @@ #define STRING_SET_UNIBYTE(STR)				\
>>  
>>  /* Mark STR as a multibyte string.  Assure that STR contains only
>>     ASCII characters in advance.  */
>> -#define STRING_SET_MULTIBYTE(STR)			\
>> -  do {							\
>> -    if (XSTRING (STR)->u.s.size == 0)			\
>> -      (STR) = empty_multibyte_string;			\
>> -    else						\
>> -      XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
>> +#define STRING_SET_MULTIBYTE(STR)			    \
>> +  do {							    \
>> +    eassert (XSTRING (STR)->u.s.size > 0);		    \
>> +    XSTRING (STR)->u.s.size_byte = XSTRING (STR)->u.s.size; \
>>    } while (false)
>>  
>>  /* Convenience functions for dealing with Lisp strings.  */
>
> You want to disallow uses of empty_multibyte_string? why?

No, I want to reduce the scope of semantics of the macro, e.g. so it can
be implemented as a function rather than a macro and so it doesn't
magically substitute empty_multibyte_string into a variable that held
something else.


        Stefan





This bug report was last modified 3 years and 8 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.