GNU bug report logs -
#44155
Print integers as characters
Previous Next
Reported by: Juri Linkov <juri <at> linkov.net>
Date: Thu, 22 Oct 2020 21:12:01 UTC
Severity: normal
Tags: fixed, patch
Done: Mattias Engdegård <mattiase <at> acm.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 44155 in the body.
You can then email your comments to 44155 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Thu, 22 Oct 2020 21:12:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Juri Linkov <juri <at> linkov.net>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 22 Oct 2020 21:12:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Tags: patch
[Creating a separate feature request from bug#43866]
>> Let-binding a new variable 'print-integers-as-chars' to t:
>>
>> (let ((print-integers-as-chars t))
>> (pp '(("'A" . [?Á])
>> ("'E" . [?É])
>> ("'I" . [?Í])
>> ("'O" . [?Ó])
>> ("'U" . [?Ú])
>> ("'Y" . [?Ý]))
>> (current-buffer)))
>>
>> prints integers as characters:
>>
>> (("'A" . [?Á])
>> ("'E" . [?É])
>> ("'I" . [?Í])
>> ("'O" . [?Ó])
>> ("'U" . [?Ú])
>> ("'Y" . [?Ý]))
>>
>> with this patch:
>
> The idea is fine, but I have a few comments about implementation:
>
>> case_Lisp_Int:
>> {
>> - int len = sprintf (buf, "%"pI"d", XFIXNUM (obj));
>> - strout (buf, len, len, printcharfun);
>> + if (!NILP (Vprint_integers_as_chars) && CHARACTERP (obj))
> ^^^^^^^^^^^^^^^^^^^^^^^^
> If this is supposed to be a boolean variable, please use DEFVAR_BOOL,
> with all the consequences.
Fixed in the next patch.
>> + int len = sprintf (buf, "%s", SDATA (call1 (intern ("prin1-char"), obj)));
>
> Do we really need to call Lisp? I thought we were quite capable of
> printing characters from C, aren't we?
Thanks for the hint. Now the patch uses only C functions.
(My initial idea was to use eval-expression-print-format as a base that has
(let ((char-string
(and (characterp value)
(<= value eval-expression-print-maximum-character)
(char-displayable-p value)
(prin1-char value))))
but it seems only the condition 'characterp' is needed in C implementation.)
>> @@ -2247,6 +2255,10 @@ syms_of_print (void)
>> that represents the number without losing information. */);
>> Vfloat_output_format = Qnil;
>>
>> + DEFVAR_LISP ("print-integers-as-chars", Vprint_integers_as_chars,
>> + doc: /* Print integers as characters. */);
>> + Vprint_integers_as_chars = Qnil;
>
> I wonder whether it wouldn't be cleaner to add another optional
> argument to prin1, and let it bind some internal variable so that
> print_object does this, instead of exposing this knob to Lisp.
> Because print_object is used all over the place, and who knows what
> will this do to other callers?
The variable 'print-integers-as-chars' is modeled after many similar
variables that affect the prin1 output:
- print-escape-control-characters
- print-escape-newlines
- print-escape-nonascii
- print-escape-multibyte
- print-length
- print-level
- print-quoted
- print-circle
- float-output-format
But now this leads me to think that maybe the new variable should be
like 'float-output-format', so it could be named 'integer-output-format'
and support options for different integer formats:
- 'character': print integers as characters;
- 'decimal': the default format;
- 'binary': print integers as e.g. #b010101;
- 'octal': print integers as e.g. #o777;
- 'hex': print integers as e.g. #x00ff;
[print-integers-as-characters.patch (text/x-diff, inline)]
diff --git a/src/print.c b/src/print.c
index dca095f281..909c55efed 100644
--- a/src/print.c
+++ b/src/print.c
@@ -1908,8 +1908,16 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
{
case_Lisp_Int:
{
- int len = sprintf (buf, "%"pI"d", XFIXNUM (obj));
- strout (buf, len, len, printcharfun);
+ if (print_integers_as_characters && CHARACTERP (obj))
+ {
+ printchar ('?', printcharfun);
+ print_string (CALLN (Fstring, obj), printcharfun);
+ }
+ else
+ {
+ int len = sprintf (buf, "%"pI"d", XFIXNUM (obj));
+ strout (buf, len, len, printcharfun);
+ }
}
break;
@@ -2247,6 +2255,10 @@ syms_of_print (void)
that represents the number without losing information. */);
Vfloat_output_format = Qnil;
+ DEFVAR_BOOL ("print-integers-as-characters", print_integers_as_characters,
+ doc: /* Print integers as characters. */);
+ print_integers_as_characters = 0;
+
DEFVAR_LISP ("print-length", Vprint_length,
doc: /* Maximum length of list to print before abbreviating.
A value of nil means no limit. See also `eval-expression-print-length'. */);
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Thu, 22 Oct 2020 22:40:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 44155 <at> debbugs.gnu.org (full text, mbox):
On Okt 22 2020, Juri Linkov wrote:
> diff --git a/src/print.c b/src/print.c
> index dca095f281..909c55efed 100644
> --- a/src/print.c
> +++ b/src/print.c
> @@ -1908,8 +1908,16 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
> {
> case_Lisp_Int:
> {
> - int len = sprintf (buf, "%"pI"d", XFIXNUM (obj));
> - strout (buf, len, len, printcharfun);
> + if (print_integers_as_characters && CHARACTERP (obj))
> + {
> + printchar ('?', printcharfun);
> + print_string (CALLN (Fstring, obj), printcharfun);
That will create ambigous output.
Andreas.
--
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Fri, 23 Oct 2020 08:59:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 44155 <at> debbugs.gnu.org (full text, mbox):
>> + if (print_integers_as_characters && CHARACTERP (obj))
>> + {
>> + printchar ('?', printcharfun);
>> + print_string (CALLN (Fstring, obj), printcharfun);
>
> That will create ambigous output.
No ambiguities found:
(let ((strings (make-hash-table :test 'equal)))
(dotimes (i (max-char))
(let ((s (string i)))
(if (gethash s strings)
(message "! %S %S" s (gethash s strings))
(puthash s i strings)))))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Fri, 23 Oct 2020 09:28:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 44155 <at> debbugs.gnu.org (full text, mbox):
>> + if (print_integers_as_characters && CHARACTERP (obj))
>> + {
>> + printchar ('?', printcharfun);
>> + print_string (CALLN (Fstring, obj), printcharfun);
>
> That will create ambigous output.
Or do you mean:
(dotimes (i (max-char))
(condition-case err
(unless (eq i (read (concat "?" (string i))))
(message "%d ?%s" i (string i)))
(error (message "%d ?%s ;; %s" i (string i) (error-message-string err)))))
92 ?\ ;; End of file during parsing
4194176 ?\200
...
4194302 ?\376
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sat, 24 Oct 2020 19:59:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 44155 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
>>> + if (print_integers_as_characters && CHARACTERP (obj))
>>> + {
>>> + printchar ('?', printcharfun);
>>> + print_string (CALLN (Fstring, obj), printcharfun);
>>
>> That will create ambigous output.
>
> Or do you mean:
>
> (dotimes (i (max-char))
> (condition-case err
> (unless (eq i (read (concat "?" (string i))))
> (message "%d ?%s" i (string i)))
> (error (message "%d ?%s ;; %s" i (string i) (error-message-string err)))))
>
> 92 ?\ ;; End of file during parsing
> 4194176 ?\200
> ...
> 4194302 ?\376
Now the following patch on this code
(let ((integer-output-format t))
(pp '(?\; ?\( ?\) ?\{ ?\} ?\[ ?\] ?\" ?\' ?\\ 4194176)
(current-buffer)))
outputs
(?\; ?\( ?\) ?\{ ?\} ?\[ ?\] ?\" ?\' ?\\ 4194176)
and no ambiguities found with
(let ((integer-output-format t))
(dotimes (i (+ (max-char) 2))
(condition-case err
(unless (eq i (read (format "%S" i)))
(message "%d ?%s" i (string i)))
(error (message "%d ?%s ;; %s" i (string i) (error-message-string err))))))
The list of escaped characters was taken from 'prin1-char',
not from a similar list in 'print_object' in 'case Lisp_Symbol' branch.
Also 'integer-output-format' prints integers in hex format when set to 16.
(let ((integer-output-format 16))
(pp '(?\; ?\( ?\) ?\{ ?\} ?\[ ?\] ?\" ?\' ?\\ 4194176)
(current-buffer)))
=>
(#x3b #x28 #x29 #x7b #x7d #x5b #x5d #x22 #x27 #x5c #x3fff80)
[integer-output-format.patch (text/x-diff, inline)]
diff --git a/src/print.c b/src/print.c
index 53aa353769..53c8c4c91a 100644
--- a/src/print.c
+++ b/src/print.c
@@ -1908,8 +1908,29 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
{
case_Lisp_Int:
{
- int len = sprintf (buf, "%"pI"d", XFIXNUM (obj));
- strout (buf, len, len, printcharfun);
+ EMACS_INT c = XFIXNUM (obj);
+
+ if (EQ (Vinteger_output_format, Qt) && CHARACTERP (obj) && c < 4194176)
+ {
+ printchar ('?', printcharfun);
+
+ if (escapeflag
+ && (c == ';' || c == '(' || c == ')' || c == '{' || c == '}'
+ || c == '[' || c == ']' || c == '\"' || c == '\'' || c == '\\'))
+ printchar ('\\', printcharfun);
+ print_string (Fchar_to_string (obj), printcharfun);
+ }
+ else if (INTEGERP (Vinteger_output_format)
+ && XFIXNUM (Vinteger_output_format) == 16 && c >= 0)
+ {
+ int len = sprintf (buf, "#x%"pI"x", (EMACS_UINT) c);
+ strout (buf, len, len, printcharfun);
+ }
+ else
+ {
+ int len = sprintf (buf, "%"pI"d", c);
+ strout (buf, len, len, printcharfun);
+ }
}
break;
@@ -2247,6 +2268,13 @@ syms_of_print (void)
that represents the number without losing information. */);
Vfloat_output_format = Qnil;
+ DEFVAR_LISP ("integer-output-format", Vinteger_output_format,
+ doc: /* The format used to print integers.
+When 't', print integers as characters.
+When a number 16, print numbers in hex format.
+Otherwise, print integers in decimal format. */);
+ Vinteger_output_format = Qnil;
+
DEFVAR_LISP ("print-length", Vprint_length,
doc: /* Maximum length of list to print before abbreviating.
A value of nil means no limit. See also `eval-expression-print-length'. */);
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sun, 25 Oct 2020 17:24:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 44155 <at> debbugs.gnu.org (full text, mbox):
> From: Juri Linkov <juri <at> linkov.net>
> Date: Sat, 24 Oct 2020 22:53:44 +0300
> Cc: 44155 <at> debbugs.gnu.org
>
> + EMACS_INT c = XFIXNUM (obj);
There's no need to use EMACS_INT, a character code is at most 22 bits,
so it always fits into an 'int'.
> + if (EQ (Vinteger_output_format, Qt) && CHARACTERP (obj) && c < 4194176)
^^^^^^^
Please use MAX_5_BYTE_CHAR here. Or, better yet, CHAR_BYTE8_P.
And, btw, why not allow raw bytes here as well? is there some problem?
> + {
> + printchar ('?', printcharfun);
> +
> + if (escapeflag
> + && (c == ';' || c == '(' || c == ')' || c == '{' || c == '}'
> + || c == '[' || c == ']' || c == '\"' || c == '\'' || c == '\\'))
> + printchar ('\\', printcharfun);
> + print_string (Fchar_to_string (obj), printcharfun);
Why are you using print_string here instead of printchar? IOW, what
is the difference between printing a backslash and printing any other
character, that you can use printchar for the former, but not for the
latter?
> + else if (INTEGERP (Vinteger_output_format)
> + && XFIXNUM (Vinteger_output_format) == 16 && c >= 0)
If you really want to allow Vinteger_output_format to be a bignum, you
cannot use XFIXNUM with it, you need to use integer_to_intmax or
somesuch. Otherwise, you should use FIXNUMP instead of INTEGERP.
> + DEFVAR_LISP ("integer-output-format", Vinteger_output_format,
> + doc: /* The format used to print integers.
> +When 't', print integers as characters.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
But only integers that are small enough, yes?
> +When a number 16, print numbers in hex format.
This immediately begs the question: why cannot the value be 8 or 2?
Thanks.
P.S. This will eventually need a NEWS entry.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sun, 25 Oct 2020 19:14:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 44155 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
>> + if (EQ (Vinteger_output_format, Qt) && CHARACTERP (obj) && c < 4194176)
> ^^^^^^^
>
> Please use MAX_5_BYTE_CHAR here. Or, better yet, CHAR_BYTE8_P.
Thanks, fixed.
> And, btw, why not allow raw bytes here as well? is there some problem?
Because of ambiguity, both these return the same value:
(read (concat "?" (string 128))) => 128
(read (concat "?" (string 4194176))) => 128
>> + print_string (Fchar_to_string (obj), printcharfun);
>
> Why are you using print_string here instead of printchar? IOW, what
> is the difference between printing a backslash and printing any other
> character, that you can use printchar for the former, but not for the
> latter?
It was needed in earlier versions, but not now; fixed.
>> + else if (INTEGERP (Vinteger_output_format)
>> + && XFIXNUM (Vinteger_output_format) == 16 && c >= 0)
>
> If you really want to allow Vinteger_output_format to be a bignum, you
> cannot use XFIXNUM with it, you need to use integer_to_intmax or
> somesuch. Otherwise, you should use FIXNUMP instead of INTEGERP.
Fixed.
>> + DEFVAR_LISP ("integer-output-format", Vinteger_output_format,
>> + doc: /* The format used to print integers.
>> +When 't', print integers as characters.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> But only integers that are small enough, yes?
Fixed the docstring as well.
>> +When a number 16, print numbers in hex format.
>
> This immediately begs the question: why cannot the value be 8 or 2?
Because octal and binary are not so widely used as hex.
But variable makes room for further improvements to later support
octal and binary too, and maybe string formats like in float-output-format.
> P.S. This will eventually need a NEWS entry.
And also updates in the Info manual will be in the final version of the patch.
[integer-output-format-2.patch (text/x-diff, inline)]
diff --git a/src/print.c b/src/print.c
index 53aa353769..b04d5023f8 100644
--- a/src/print.c
+++ b/src/print.c
@@ -1908,8 +1908,30 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
{
case_Lisp_Int:
{
- int len = sprintf (buf, "%"pI"d", XFIXNUM (obj));
- strout (buf, len, len, printcharfun);
+ int c = XFIXNUM (obj);
+ intmax_t i;
+
+ if (EQ (Vinteger_output_format, Qt) && CHARACTERP (obj) && ! CHAR_BYTE8_P (c))
+ {
+ printchar ('?', printcharfun);
+ if (escapeflag
+ && (c == ';' || c == '(' || c == ')' || c == '{' || c == '}'
+ || c == '[' || c == ']' || c == '\"' || c == '\'' || c == '\\'))
+ printchar ('\\', printcharfun);
+ printchar (c, printcharfun);
+ }
+ else if (INTEGERP (Vinteger_output_format)
+ && integer_to_intmax (Vinteger_output_format, &i)
+ && i == 16 && XFIXNUM (obj) >= 0)
+ {
+ int len = sprintf (buf, "#x%"pI"x", (EMACS_UINT) XFIXNUM (obj));
+ strout (buf, len, len, printcharfun);
+ }
+ else
+ {
+ int len = sprintf (buf, "%"pI"d", XFIXNUM (obj));
+ strout (buf, len, len, printcharfun);
+ }
}
break;
@@ -2247,6 +2269,13 @@ syms_of_print (void)
that represents the number without losing information. */);
Vfloat_output_format = Qnil;
+ DEFVAR_LISP ("integer-output-format", Vinteger_output_format,
+ doc: /* The format used to print integers.
+When 't', print characters from integers that represent characters.
+When a number 16, print non-negative numbers in hex format.
+Otherwise, print integers in decimal format. */);
+ Vinteger_output_format = Qnil;
+
DEFVAR_LISP ("print-length", Vprint_length,
doc: /* Maximum length of list to print before abbreviating.
A value of nil means no limit. See also `eval-expression-print-length'. */);
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sun, 25 Oct 2020 19:55:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 44155 <at> debbugs.gnu.org (full text, mbox):
> From: Juri Linkov <juri <at> linkov.net>
> Cc: schwab <at> linux-m68k.org, 44155 <at> debbugs.gnu.org
> Date: Sun, 25 Oct 2020 21:09:07 +0200
>
> > And, btw, why not allow raw bytes here as well? is there some problem?
>
> Because of ambiguity, both these return the same value:
>
> (read (concat "?" (string 128))) => 128
> (read (concat "?" (string 4194176))) => 128
And why is that a problem?
Alternatively, we could print raw bytes in some special way. But not
treating them as characters sounds some subtlety that will be hard to
explain.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Tue, 27 Oct 2020 20:54:03 GMT)
Full text and
rfc822 format available.
Message #29 received at 44155 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
>> > And, btw, why not allow raw bytes here as well? is there some problem?
>>
>> Because of ambiguity, both these return the same value:
>>
>> (read (concat "?" (string 128))) => 128
>> (read (concat "?" (string 4194176))) => 128
>
> And why is that a problem?
I don't know, Andreas remarked that it creates ambiguous output,
and I fixed the reported problem.
> Alternatively, we could print raw bytes in some special way. But not
> treating them as characters sounds some subtlety that will be hard to
> explain.
The existing 'prin1-char' used as a reference implementation
doesn't print integers like 4194176 as characters, so the patch
does the same.
Anyway, here is a complete patch with tests and documentation:
[integer-output-format-3.patch (text/x-diff, inline)]
diff --git a/doc/lispref/streams.texi b/doc/lispref/streams.texi
index 2cd61ad04f..f171f13779 100644
--- a/doc/lispref/streams.texi
+++ b/doc/lispref/streams.texi
@@ -902,3 +902,11 @@ Output Variables
in the C function @code{sprintf}. For further restrictions on what
you can use, see the variable's documentation string.
@end defvar
+
+@defvar integer-output-format
+This variable specifies how to print integer numbers. The default is
+@code{nil}, meaning use the decimal format. When bound to @code{t},
+print integers as characters when an integer represents a character
+(@pxref{Basic Char Syntax}). When bound to the number @code{16},
+print non-negative integers in the hexadecimal format.
+@end defvar
diff --git a/etc/NEWS b/etc/NEWS
index a77c1c883e..2f7d08ad08 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1631,6 +1631,12 @@ ledit.el, lmenu.el, lucid.el and old-whitespace.el.
* Lisp Changes in Emacs 28.1
+** New variable 'integer-output-format' defines the format of integers.
+When this variable is bound to the value 't', integers are printed by
+printing functions as characters when an integer represents a character.
+When bound to the number 16, non-negative integers are printed in the
+hexadecimal format.
+
+++
** 'define-globalized-minor-mode' now takes a :predicate parameter.
This can be used to control which major modes the minor mode should be
diff --git a/src/print.c b/src/print.c
index 53aa353769..a5c56c6b48 100644
--- a/src/print.c
+++ b/src/print.c
@@ -1908,8 +1908,31 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
{
case_Lisp_Int:
{
- int len = sprintf (buf, "%"pI"d", XFIXNUM (obj));
- strout (buf, len, len, printcharfun);
+ int c;
+ intmax_t i;
+
+ if (EQ (Vinteger_output_format, Qt) && CHARACTERP (obj)
+ && (c = XFIXNUM (obj)) && ! CHAR_BYTE8_P (c))
+ {
+ printchar ('?', printcharfun);
+ if (escapeflag
+ && (c == ';' || c == '(' || c == ')' || c == '{' || c == '}'
+ || c == '[' || c == ']' || c == '\"' || c == '\'' || c == '\\'))
+ printchar ('\\', printcharfun);
+ printchar (c, printcharfun);
+ }
+ else if (INTEGERP (Vinteger_output_format)
+ && integer_to_intmax (Vinteger_output_format, &i)
+ && i == 16 && Fnatnump (obj))
+ {
+ int len = sprintf (buf, "#x%"pI"x", (EMACS_UINT) XFIXNUM (obj));
+ strout (buf, len, len, printcharfun);
+ }
+ else
+ {
+ int len = sprintf (buf, "%"pI"d", XFIXNUM (obj));
+ strout (buf, len, len, printcharfun);
+ }
}
break;
@@ -2247,6 +2270,13 @@ syms_of_print (void)
that represents the number without losing information. */);
Vfloat_output_format = Qnil;
+ DEFVAR_LISP ("integer-output-format", Vinteger_output_format,
+ doc: /* The format used to print integers.
+When t, print characters from integers that represent a character.
+When a number 16, print non-negative integers in the hexadecimal format.
+Otherwise, by default print integers in the decimal format. */);
+ Vinteger_output_format = Qnil;
+
DEFVAR_LISP ("print-length", Vprint_length,
doc: /* Maximum length of list to print before abbreviating.
A value of nil means no limit. See also `eval-expression-print-length'. */);
diff --git a/test/src/print-tests.el b/test/src/print-tests.el
index eb9572dbdf..7b026b6b21 100644
--- a/test/src/print-tests.el
+++ b/test/src/print-tests.el
@@ -383,5 +383,25 @@ print-hash-table-test
(let ((print-length 1))
(format "%S" h))))))
+(print-tests--deftest print-integer-output-format ()
+ ;; Bug#44155.
+ (let ((integer-output-format t)
+ (syms (list ?? ?\; ?\( ?\) ?\{ ?\} ?\[ ?\] ?\" ?\' ?\\ ?Á)))
+ (should (equal (read (print-tests--prin1-to-string syms)) syms))
+ (should (equal (print-tests--prin1-to-string syms)
+ (concat "(" (mapconcat #'prin1-char syms " ") ")"))))
+ (let ((integer-output-format t)
+ (syms (list -1 0 1 ?\120 4194175 4194176 (max-char) (1+ (max-char)))))
+ (should (equal (read (print-tests--prin1-to-string syms)) syms)))
+ (let ((integer-output-format 16)
+ (syms (list -1 0 1 most-positive-fixnum (1+ most-positive-fixnum))))
+ (should (equal (read (print-tests--prin1-to-string syms)) syms))
+ (should (equal (print-tests--prin1-to-string syms)
+ (concat "(" (mapconcat
+ (lambda (i)
+ (if (and (>= i 0) (<= i most-positive-fixnum))
+ (format "#x%x" i) (format "%d" i)))
+ syms " ") ")")))))
+
(provide 'print-tests)
;;; print-tests.el ends here
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Wed, 28 Oct 2020 15:52:01 GMT)
Full text and
rfc822 format available.
Message #32 received at 44155 <at> debbugs.gnu.org (full text, mbox):
> From: Juri Linkov <juri <at> linkov.net>
> Cc: schwab <at> linux-m68k.org, 44155 <at> debbugs.gnu.org
> Date: Tue, 27 Oct 2020 22:08:12 +0200
>
> > Alternatively, we could print raw bytes in some special way. But not
> > treating them as characters sounds some subtlety that will be hard to
> > explain.
>
> The existing 'prin1-char' used as a reference implementation
> doesn't print integers like 4194176 as characters, so the patch
> does the same.
I don't think it's right, FWIW. Displaying something like \100 would
be better, IMO.
> +@defvar integer-output-format
> +This variable specifies how to print integer numbers. The default is
> +@code{nil}, meaning use the decimal format. When bound to @code{t},
> +print integers as characters when an integer represents a character
> +(@pxref{Basic Char Syntax}). When bound to the number @code{16},
> +print non-negative integers in the hexadecimal format.
This should mention the functions affected by the variable.
> +** New variable 'integer-output-format' defines the format of integers.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
"determines how to print integer values"
> +When this variable is bound to the value 't', integers are printed by
> +printing functions as characters when an integer represents a character.
Please give at least one example of a function affected by this.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Wed, 28 Oct 2020 20:07:01 GMT)
Full text and
rfc822 format available.
Message #35 received at 44155 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
>> > Alternatively, we could print raw bytes in some special way. But not
>> > treating them as characters sounds some subtlety that will be hard to
>> > explain.
>>
>> The existing 'prin1-char' used as a reference implementation
>> doesn't print integers like 4194176 as characters, so the patch
>> does the same.
>
> I don't think it's right, FWIW. Displaying something like \100 would
> be better, IMO.
Sorry, I don't understand why 4194176 could be printed as \100.
>> +@defvar integer-output-format
>> +This variable specifies how to print integer numbers. The default is
>> +@code{nil}, meaning use the decimal format. When bound to @code{t},
>> +print integers as characters when an integer represents a character
>> +(@pxref{Basic Char Syntax}). When bound to the number @code{16},
>> +print non-negative integers in the hexadecimal format.
>
> This should mention the functions affected by the variable.
>
>> +** New variable 'integer-output-format' defines the format of integers.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> "determines how to print integer values"
>
>> +When this variable is bound to the value 't', integers are printed by
>> +printing functions as characters when an integer represents a character.
>
> Please give at least one example of a function affected by this.
Ok, fixed:
[integer-output-format-4.patch (text/x-diff, inline)]
diff --git a/doc/lispref/streams.texi b/doc/lispref/streams.texi
index 2cd61ad04f..08d8032e6f 100644
--- a/doc/lispref/streams.texi
+++ b/doc/lispref/streams.texi
@@ -902,3 +902,12 @@ Output Variables
in the C function @code{sprintf}. For further restrictions on what
you can use, see the variable's documentation string.
@end defvar
+
+@defvar integer-output-format
+This variable specifies how to print integer numbers. The default is
+@code{nil}, meaning use the decimal format. When bound to @code{t},
+print integers as characters when an integer represents a character
+(@pxref{Basic Char Syntax}). When bound to the number @code{16},
+print non-negative integers in the hexadecimal format.
+This variable affects all print functions.
+@end defvar
diff --git a/etc/NEWS b/etc/NEWS
index 5e159480e0..202e449b16 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1641,6 +1641,12 @@ ledit.el, lmenu.el, lucid.el and old-whitespace.el.
* Lisp Changes in Emacs 28.1
+** New variable 'integer-output-format' determines how to print integer values.
+When this variable is bound to the value 't', integers are printed by
+printing functions as characters when an integer represents a character.
+When bound to the number 16, non-negative integers are printed in the
+hexadecimal format.
+
+++
** 'define-globalized-minor-mode' now takes a :predicate parameter.
This can be used to control which major modes the minor mode should be
diff --git a/src/print.c b/src/print.c
index 53aa353769..7b3dc61065 100644
--- a/src/print.c
+++ b/src/print.c
@@ -1908,8 +1908,31 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
{
case_Lisp_Int:
{
- int len = sprintf (buf, "%"pI"d", XFIXNUM (obj));
- strout (buf, len, len, printcharfun);
+ int c;
+ intmax_t i;
+
+ if (EQ (Vinteger_output_format, Qt) && CHARACTERP (obj)
+ && (c = XFIXNUM (obj)) && ! CHAR_BYTE8_P (c))
+ {
+ printchar ('?', printcharfun);
+ if (escapeflag
+ && (c == ';' || c == '(' || c == ')' || c == '{' || c == '}'
+ || c == '[' || c == ']' || c == '\"' || c == '\'' || c == '\\'))
+ printchar ('\\', printcharfun);
+ printchar (c, printcharfun);
+ }
+ else if (INTEGERP (Vinteger_output_format)
+ && integer_to_intmax (Vinteger_output_format, &i)
+ && i == 16 && Fnatnump (obj))
+ {
+ int len = sprintf (buf, "#x%"pI"x", (EMACS_UINT) XFIXNUM (obj));
+ strout (buf, len, len, printcharfun);
+ }
+ else
+ {
+ int len = sprintf (buf, "%"pI"d", XFIXNUM (obj));
+ strout (buf, len, len, printcharfun);
+ }
}
break;
@@ -2247,6 +2270,15 @@ syms_of_print (void)
that represents the number without losing information. */);
Vfloat_output_format = Qnil;
+ DEFVAR_LISP ("integer-output-format", Vinteger_output_format,
+ doc: /* The format used to print integers.
+When t, print characters from integers that represent a character.
+When a number 16, print non-negative integers in the hexadecimal format.
+Otherwise, by default print integers in the decimal format.
+This variable affects all print functions, for example, such function
+as `print'. */);
+ Vinteger_output_format = Qnil;
+
DEFVAR_LISP ("print-length", Vprint_length,
doc: /* Maximum length of list to print before abbreviating.
A value of nil means no limit. See also `eval-expression-print-length'. */);
diff --git a/test/src/print-tests.el b/test/src/print-tests.el
index eb9572dbdf..7b026b6b21 100644
--- a/test/src/print-tests.el
+++ b/test/src/print-tests.el
@@ -383,5 +383,25 @@ print-hash-table-test
(let ((print-length 1))
(format "%S" h))))))
+(print-tests--deftest print-integer-output-format ()
+ ;; Bug#44155.
+ (let ((integer-output-format t)
+ (syms (list ?? ?\; ?\( ?\) ?\{ ?\} ?\[ ?\] ?\" ?\' ?\\ ?Á)))
+ (should (equal (read (print-tests--prin1-to-string syms)) syms))
+ (should (equal (print-tests--prin1-to-string syms)
+ (concat "(" (mapconcat #'prin1-char syms " ") ")"))))
+ (let ((integer-output-format t)
+ (syms (list -1 0 1 ?\120 4194175 4194176 (max-char) (1+ (max-char)))))
+ (should (equal (read (print-tests--prin1-to-string syms)) syms)))
+ (let ((integer-output-format 16)
+ (syms (list -1 0 1 most-positive-fixnum (1+ most-positive-fixnum))))
+ (should (equal (read (print-tests--prin1-to-string syms)) syms))
+ (should (equal (print-tests--prin1-to-string syms)
+ (concat "(" (mapconcat
+ (lambda (i)
+ (if (and (>= i 0) (<= i most-positive-fixnum))
+ (format "#x%x" i) (format "%d" i)))
+ syms " ") ")")))))
+
(provide 'print-tests)
;;; print-tests.el ends here
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Thu, 29 Oct 2020 14:21:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 44155 <at> debbugs.gnu.org (full text, mbox):
> From: Juri Linkov <juri <at> linkov.net>
> Cc: schwab <at> linux-m68k.org, 44155 <at> debbugs.gnu.org
> Date: Wed, 28 Oct 2020 21:41:46 +0200
>
> >> The existing 'prin1-char' used as a reference implementation
> >> doesn't print integers like 4194176 as characters, so the patch
> >> does the same.
> >
> > I don't think it's right, FWIW. Displaying something like \100 would
> > be better, IMO.
>
> Sorry, I don't understand why 4194176 could be printed as \100.
I meant \200, sorry. That's the raw byte that 4194176 stands for.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Thu, 29 Oct 2020 21:25:01 GMT)
Full text and
rfc822 format available.
Message #41 received at 44155 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
>> >> The existing 'prin1-char' used as a reference implementation
>> >> doesn't print integers like 4194176 as characters, so the patch
>> >> does the same.
>> >
>> > I don't think it's right, FWIW. Displaying something like \100 would
>> > be better, IMO.
>>
>> Sorry, I don't understand why 4194176 could be printed as \100.
>
> I meant \200, sorry. That's the raw byte that 4194176 stands for.
OK, in this patch the condition !CHAR_BYTE8_P(c) is removed, so
it prints \200:
[integer-output-format-4.patch (text/x-diff, inline)]
diff --git a/src/print.c b/src/print.c
index 53aa353769..20841eba61 100644
--- a/src/print.c
+++ b/src/print.c
@@ -1908,8 +1908,31 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
{
case_Lisp_Int:
{
- int len = sprintf (buf, "%"pI"d", XFIXNUM (obj));
- strout (buf, len, len, printcharfun);
+ int c;
+ intmax_t i;
+
+ if (EQ (Vinteger_output_format, Qt) && CHARACTERP (obj)
+ && (c = XFIXNUM (obj)))
+ {
+ printchar ('?', printcharfun);
+ if (escapeflag
+ && (c == ';' || c == '(' || c == ')' || c == '{' || c == '}'
+ || c == '[' || c == ']' || c == '\"' || c == '\'' || c == '\\'))
+ printchar ('\\', printcharfun);
+ printchar (c, printcharfun);
+ }
+ else if (INTEGERP (Vinteger_output_format)
+ && integer_to_intmax (Vinteger_output_format, &i)
+ && i == 16 && !NILP (Fnatnump (obj)))
+ {
+ int len = sprintf (buf, "#x%"pI"x", (EMACS_UINT) XFIXNUM (obj));
+ strout (buf, len, len, printcharfun);
+ }
+ else
+ {
+ int len = sprintf (buf, "%"pI"d", XFIXNUM (obj));
+ strout (buf, len, len, printcharfun);
+ }
}
break;
@@ -2247,6 +2270,13 @@ syms_of_print (void)
that represents the number without losing information. */);
Vfloat_output_format = Qnil;
+ DEFVAR_LISP ("integer-output-format", Vinteger_output_format,
+ doc: /* The format used to print integers.
+When t, print characters from integers that represent a character.
+When a number 16, print non-negative integers in the hexadecimal format.
+Otherwise, by default print integers in the decimal format. */);
+ Vinteger_output_format = Qnil;
+
DEFVAR_LISP ("print-length", Vprint_length,
doc: /* Maximum length of list to print before abbreviating.
A value of nil means no limit. See also `eval-expression-print-length'. */);
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Fri, 30 Oct 2020 07:36:02 GMT)
Full text and
rfc822 format available.
Message #44 received at 44155 <at> debbugs.gnu.org (full text, mbox):
> From: Juri Linkov <juri <at> linkov.net>
> Cc: schwab <at> linux-m68k.org, 44155 <at> debbugs.gnu.org
> Date: Thu, 29 Oct 2020 23:00:48 +0200
>
> > I meant \200, sorry. That's the raw byte that 4194176 stands for.
>
> OK, in this patch the condition !CHAR_BYTE8_P(c) is removed, so
> it prints \200:
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sat, 31 Oct 2020 20:12:02 GMT)
Full text and
rfc822 format available.
Message #47 received at 44155 <at> debbugs.gnu.org (full text, mbox):
tags 44155 fixed
close 44155 28.0.50
quit
>> > I meant \200, sorry. That's the raw byte that 4194176 stands for.
>>
>> OK, in this patch the condition !CHAR_BYTE8_P(c) is removed, so
>> it prints \200:
>
> Thanks.
Now pushed to master and closed.
Added tag(s) fixed.
Request was from
Juri Linkov <juri <at> linkov.net>
to
control <at> debbugs.gnu.org
.
(Sat, 31 Oct 2020 20:13:01 GMT)
Full text and
rfc822 format available.
bug marked as fixed in version 28.0.50, send any further explanations to
44155 <at> debbugs.gnu.org and Juri Linkov <juri <at> linkov.net>
Request was from
Juri Linkov <juri <at> linkov.net>
to
control <at> debbugs.gnu.org
.
(Sat, 31 Oct 2020 20:13:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sat, 31 Oct 2020 23:28:01 GMT)
Full text and
rfc822 format available.
Message #54 received at 44155 <at> debbugs.gnu.org (full text, mbox):
New test fails on some systems.
Ref: https://hydra.nixos.org/build/129474379
Reproduced on CentOS 8.2.
Test print-integer-output-format condition:
(ert-test-failed
((should
(equal
(read ...)
syms))
:form
(equal
(-1 0 1 80 4194175 128 255 4194304)
(-1 0 1 80 4194175 4194176 4194303 4194304))
:value nil :explanation
(list-elt 5
(different-atoms
(128 "#x80" "?")
(4194176 "#x3fff80" "?\200")))))
FAILED 19/39 print-integer-output-format (0.002202 sec)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sun, 01 Nov 2020 08:02:02 GMT)
Full text and
rfc822 format available.
Message #57 received at 44155 <at> debbugs.gnu.org (full text, mbox):
> New test fails on some systems.
>
> (equal
> (-1 0 1 80 4194175 128 255 4194304)
> (-1 0 1 80 4194175 4194176 4194303 4194304))
> :value nil :explanation
> (list-elt 5
> (different-atoms
> (128 "#x80" "?")
> (4194176 "#x3fff80" "?\200")))))
This is because 4194176 is printed as ?\200 that is parsed as 128.
This patch should fix test failures by printing integers
for ambiguous characters. I'm sure no user would complain
that numbers between 4194176 and 4194303 are printed as integers.
diff --git a/src/print.c b/src/print.c
index fa65a3cb26..49daf753bd 100644
--- a/src/print.c
+++ b/src/print.c
@@ -1912,7 +1912,7 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
intmax_t i;
if (EQ (Vinteger_output_format, Qt) && CHARACTERP (obj)
- && (c = XFIXNUM (obj)))
+ && (c = XFIXNUM (obj)) && ! CHAR_BYTE8_P (c))
{
printchar ('?', printcharfun);
if (escapeflag
bug No longer marked as fixed in versions 28.0.50 and reopened.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sun, 01 Nov 2020 12:04:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sun, 01 Nov 2020 12:04:03 GMT)
Full text and
rfc822 format available.
Message #62 received at 44155 <at> debbugs.gnu.org (full text, mbox):
reopen 44155
stop
I don't mind the basic idea, but I'm reopening the bug since it looks like there is some unfinished business. Hope you don't mind.
> When t, print characters from integers that represent a character.
In what way does 't' suggest a character? Wouldn't something like 'character' be more suggestive?
The variable isn't named 'print-integers-as-chars'.
> When a number 16, print non-negative integers in the hexadecimal format.
Doesn't work for bignums:
(let ((integer-output-format 16))
(print 394583945873948753948539845))
394583945873948753948539845
This must be a bug since there is no reason why bignums should be treated specially. In general we try hard not to.
Since there is a read syntax for binary and octal numbers as well, why not permit 2 and 8?
(And why not print negative numbers in the selected radix?)
And C0/C1 controls aren't printed well:
(let ((integer-output-format t))
(print 10)
(print 127))
?
?^?
I strongly suggest that the controls that have special escapes, like \n, use them. What to use for the rest depends on the user's preference really -- for example, 31 might be printed as 31, ?\037, #o37 or #x1f.
Whether to print 32 as ?‹SPACE› or ?\s is a matter of taste.
For that matter, the variable name should perhaps start with 'print-' like other variables that control printing. Maybe we should separate the default radix and print integers as characters? Thus, we'd have:
print-integer-radix -- 2, 8, 16, 10 or nil (which means 10)
print-integers-as-characters -- nil or t
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sun, 01 Nov 2020 15:15:01 GMT)
Full text and
rfc822 format available.
Message #65 received at 44155 <at> debbugs.gnu.org (full text, mbox):
> From: Juri Linkov <juri <at> linkov.net>
> Cc: Eli Zaretskii <eliz <at> gnu.org>, 44155 <at> debbugs.gnu.org,
> schwab <at> linux-m68k.org
> Date: Sun, 01 Nov 2020 09:58:25 +0200
>
> > New test fails on some systems.
> >
> > (equal
> > (-1 0 1 80 4194175 128 255 4194304)
> > (-1 0 1 80 4194175 4194176 4194303 4194304))
> > :value nil :explanation
> > (list-elt 5
> > (different-atoms
> > (128 "#x80" "?")
> > (4194176 "#x3fff80" "?\200")))))
>
> This is because 4194176 is printed as ?\200 that is parsed as 128.
>
> This patch should fix test failures by printing integers
> for ambiguous characters. I'm sure no user would complain
> that numbers between 4194176 and 4194303 are printed as integers.
>
> diff --git a/src/print.c b/src/print.c
> index fa65a3cb26..49daf753bd 100644
> --- a/src/print.c
> +++ b/src/print.c
> @@ -1912,7 +1912,7 @@ print_object (Lisp_Object obj, Lisp_Object printcharfun, bool escapeflag)
> intmax_t i;
>
> if (EQ (Vinteger_output_format, Qt) && CHARACTERP (obj)
> - && (c = XFIXNUM (obj)))
> + && (c = XFIXNUM (obj)) && ! CHAR_BYTE8_P (c))
> {
> printchar ('?', printcharfun);
> if (escapeflag
If a test fails, it is better to fix the test and not make the code
less powerful, don't you agree?
To produce 4194176 from ?\200, one way is this:
(decode-char 'eight-bit ?\200)
Can't this be used in the test?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sun, 01 Nov 2020 18:41:01 GMT)
Full text and
rfc822 format available.
Message #68 received at 44155 <at> debbugs.gnu.org (full text, mbox):
> reopen 44155
> stop
>
> I don't mind the basic idea, but I'm reopening the bug since it looks
> like there is some unfinished business. Hope you don't mind.
Thanks for bringing a fresh perspective to this feature request.
>> When t, print characters from integers that represent a character.
>
> In what way does 't' suggest a character? Wouldn't something like 'character' be more suggestive?
> The variable isn't named 'print-integers-as-chars'.
As the most frequent usage pattern, 't' is more convenient to use in code:
(let ((integer-output-format t))
whereas this would be uglier and harder to type with:
(let ((integer-output-format 'character))
>> When a number 16, print non-negative integers in the hexadecimal format.
>
> Doesn't work for bignums:
>
> (let ((integer-output-format 16))
> (print 394583945873948753948539845))
>
> 394583945873948753948539845
Yes, this is known current limitation.
> This must be a bug since there is no reason why bignums should be treated specially.
> In general we try hard not to.
I agree, support for big numbers should be added as well.
> Since there is a read syntax for binary and octal numbers as well, why not permit 2 and 8?
> (And why not print negative numbers in the selected radix?)
2 and 8 could be added as well.
> And C0/C1 controls aren't printed well:
>
> (let ((integer-output-format t))
> (print 10)
> (print 127))
>
> ?
>
>
> ?^?
>
> I strongly suggest that the controls that have special escapes, like
> \n, use them.
prin1-char uses more readable format, is this better?
(prin1-char 10) ?\C-j
(prin1-char 127) ?\C-?
Or should 10 be printed as '?\n'?
> What to use for the rest depends on the user's preference really --
> for example, 31 might be printed as 31, ?\037, #o37 or #x1f.
Maybe more user choices should be supported by the variable?
> Whether to print 32 as ?‹SPACE› or ?\s is a matter of taste.
?\s is less error-prone.
> For that matter, the variable name should perhaps start with 'print-'
> like other variables that control printing. Maybe we should separate
> the default radix and print integers as characters? Thus, we'd have:
The variable name was modeled after the similar variable float-output-format.
> print-integer-radix -- 2, 8, 16, 10 or nil (which means 10)
>
> print-integers-as-characters -- nil or t
What should be printed when both variables are bound to non-default values,
e.g. print-integers-as-characters to t, and print-integer-radix to 16?
Maybe to print with character syntax and the given radix, e.g. '?\x1f'.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sun, 01 Nov 2020 18:41:02 GMT)
Full text and
rfc822 format available.
Message #71 received at 44155 <at> debbugs.gnu.org (full text, mbox):
>> This is because 4194176 is printed as ?\200 that is parsed as 128.
>>
>> This patch should fix test failures by printing integers
>> for ambiguous characters. I'm sure no user would complain
>> that numbers between 4194176 and 4194303 are printed as integers.
>>
>> if (EQ (Vinteger_output_format, Qt) && CHARACTERP (obj)
>> - && (c = XFIXNUM (obj)))
>> + && (c = XFIXNUM (obj)) && ! CHAR_BYTE8_P (c))
>
> If a test fails, it is better to fix the test and not make the code
> less powerful, don't you agree?
This means sweeping the problems under the carpet.
> To produce 4194176 from ?\200, one way is this:
>
> (decode-char 'eight-bit ?\200)
>
> Can't this be used in the test?
Using this code in tests means that the users should use the same code
in their programs. Thus 'print' should print '(33 4194176) as such ugly code:
`(?! ,(decode-char 'eight-bit ?\200))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sun, 01 Nov 2020 18:53:01 GMT)
Full text and
rfc822 format available.
Message #74 received at 44155 <at> debbugs.gnu.org (full text, mbox):
> From: Juri Linkov <juri <at> linkov.net>
> Cc: rgm <at> gnu.org, 44155 <at> debbugs.gnu.org, schwab <at> linux-m68k.org
> Date: Sun, 01 Nov 2020 20:39:48 +0200
>
> >> if (EQ (Vinteger_output_format, Qt) && CHARACTERP (obj)
> >> - && (c = XFIXNUM (obj)))
> >> + && (c = XFIXNUM (obj)) && ! CHAR_BYTE8_P (c))
> >
> > If a test fails, it is better to fix the test and not make the code
> > less powerful, don't you agree?
>
> This means sweeping the problems under the carpet.
Which problem?
> > (decode-char 'eight-bit ?\200)
> >
> > Can't this be used in the test?
>
> Using this code in tests means that the users should use the same code
> in their programs.
Why would they need to do that? The test needs it because it wants to
verify the result, but "normal" programs don't need to read back the
values they printed.
> Thus 'print' should print '(33 4194176) as such ugly code:
> `(?! ,(decode-char 'eight-bit ?\200))
I don't see why. ?\200 and 4194176 are two forms of the same
character.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sun, 01 Nov 2020 19:15:01 GMT)
Full text and
rfc822 format available.
Message #77 received at 44155 <at> debbugs.gnu.org (full text, mbox):
>> >> if (EQ (Vinteger_output_format, Qt) && CHARACTERP (obj)
>> >> - && (c = XFIXNUM (obj)))
>> >> + && (c = XFIXNUM (obj)) && ! CHAR_BYTE8_P (c))
>> >
>> > If a test fails, it is better to fix the test and not make the code
>> > less powerful, don't you agree?
>>
>> This means sweeping the problems under the carpet.
>
> Which problem?
Problem of ambiguous numbers 128 and 4194176 that are both printed as ?\200.
>> > (decode-char 'eight-bit ?\200)
>> >
>> > Can't this be used in the test?
>>
>> Using this code in tests means that the users should use the same code
>> in their programs.
>
> Why would they need to do that? The test needs it because it wants to
> verify the result, but "normal" programs don't need to read back the
> values they printed.
Programs print the lists of characters, and other programs read them.
>> Thus 'print' should print '(33 4194176) as such ugly code:
>> `(?! ,(decode-char 'eight-bit ?\200))
>
> I don't see why. ?\200 and 4194176 are two forms of the same
> character.
?\200 and 128 are two forms of the same character too.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sun, 01 Nov 2020 19:43:02 GMT)
Full text and
rfc822 format available.
Message #80 received at 44155 <at> debbugs.gnu.org (full text, mbox):
> From: Juri Linkov <juri <at> linkov.net>
> Cc: rgm <at> gnu.org, 44155 <at> debbugs.gnu.org, schwab <at> linux-m68k.org
> Date: Sun, 01 Nov 2020 21:13:03 +0200
>
> >> >> if (EQ (Vinteger_output_format, Qt) && CHARACTERP (obj)
> >> >> - && (c = XFIXNUM (obj)))
> >> >> + && (c = XFIXNUM (obj)) && ! CHAR_BYTE8_P (c))
> >> >
> >> > If a test fails, it is better to fix the test and not make the code
> >> > less powerful, don't you agree?
> >>
> >> This means sweeping the problems under the carpet.
> >
> > Which problem?
>
> Problem of ambiguous numbers 128 and 4194176 that are both printed as ?\200.
Octal escapes are generally a sign of a raw byte. This is not
different from buffer display -- how do you know what does ?\200 mean
inside buffer text?
> ?\200 and 128 are two forms of the same character too.
See my question above. I don't think what you say is true.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sun, 01 Nov 2020 20:18:02 GMT)
Full text and
rfc822 format available.
Message #83 received at 44155 <at> debbugs.gnu.org (full text, mbox):
>> Problem of ambiguous numbers 128 and 4194176 that are both printed as ?\200.
>
> Octal escapes are generally a sign of a raw byte. This is not
> different from buffer display -- how do you know what does ?\200 mean
> inside buffer text?
>
>> ?\200 and 128 are two forms of the same character too.
>
> See my question above. I don't think what you say is true.
Typing 'C-x C-e' after ?\200 displays: 128 (#o200, #x80, ?\x80)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Sun, 01 Nov 2020 20:53:02 GMT)
Full text and
rfc822 format available.
Message #86 received at 44155 <at> debbugs.gnu.org (full text, mbox):
1 nov. 2020 kl. 19.35 skrev Juri Linkov <juri <at> linkov.net>:
> Thanks for bringing a fresh perspective to this feature request.
You are very graceful. The devil is in the details, as always!
> (prin1-char 10) ?\C-j
> (prin1-char 127) ?\C-?
>
> Or should 10 be printed as '?\n'?
Yes, I think ?\n is more useful. As a character, 10 is more commonly thought of as newline than as control-j.
>> What to use for the rest depends on the user's preference really --
>> for example, 31 might be printed as 31, ?\037, #o37 or #x1f.
>
> Maybe more user choices should be supported by the variable?
Maybe, but only if we can identify sensible such choices. Otherwise we should just try to pick the best representation in each case. Giving users too much choice isn't necessarily making them a favour!
I'd suggest plain number syntax for control characters without named escapes, for several reasons:
* Such numbers are less likely to represent characters and more likely to be, well, numbers.
* It would allow a separate radix control to govern their output format.
* Writing ?\x1f is no clearer than #x1f, and sometimes more confusing: \xff is a raw byte in a string, but ?\xff is always 255.
Thus we would have 10 -> ?\n, 13 -> ?\r, 127 -> ?\d, 65 -> ?A, 255 -> ?ÿ, but 31 -> 31, 129 -> 129, 4194303 -> 4194303.
>> Whether to print 32 as ?‹SPACE› or ?\s is a matter of taste.
>
> ?\s is less error-prone.
Yes, I agree. (I prefer ?\s or 32 as characters, but " " in strings.)
>> For that matter, the variable name should perhaps start with 'print-'
>> like other variables that control printing. Maybe we should separate
>> the default radix and print integers as characters? Thus, we'd have:
>
> The variable name was modeled after the similar variable float-output-format.
I see, interesting! One possibility would be to use a string in the same way, thus "%x", "%c" etc, but it makes less sense for integers than floating-point: no precision field, and many format alternatives such as %#x do not produce valid Lisp read syntax. Better keep it simple.
>> print-integer-radix -- 2, 8, 16, 10 or nil (which means 10)
>>
>> print-integers-as-characters -- nil or t
>
> What should be printed when both variables are bound to non-default values,
> e.g. print-integers-as-characters to t, and print-integer-radix to 16?
> Maybe to print with character syntax and the given radix, e.g. '?\x1f'.
Well, it should clearly use character syntax for printable characters and the given radix for non-characters. As you correctly point out, what to use for non-printable characters (C0 and C1 controls, raw bytes) is less obvious. I'd probably just use the given radix; I see no readability advantage in printing ?\x1f to #x1f.
Since your original motivation was to print characters in pretty-printed nested Lisp expressions, perhaps we should just define print-integers-as-characters as a Boolean and skip the radix for the time being? We could add a print radix control later on if desired. (That would save us the hassle to deal with bignums, for that matter.)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Mon, 02 Nov 2020 21:44:02 GMT)
Full text and
rfc822 format available.
Message #89 received at 44155 <at> debbugs.gnu.org (full text, mbox):
> Thus we would have 10 -> ?\n, 13 -> ?\r, 127 -> ?\d, 65 -> ?A,
> 255 -> ?ÿ, but 31 -> 31, 129 -> 129, 4194303 -> 4194303.
Hopefully, printing some characters as numbers will fix
the currently broken test.
> Since your original motivation was to print characters in pretty-printed
> nested Lisp expressions, perhaps we should just define
> print-integers-as-characters as a Boolean and skip the radix for the time
> being? We could add a print radix control later on if desired. (That would
> save us the hassle to deal with bignums, for that matter.)
This was my intention - to start with something simple that does only
what was needed (to print integers as characters), then extend it later
when such a need arises as printing hex numbers. I added hex numbers only
as a proof that the variable integer-output-format is extensible enough
to support more formats in the future.
But as you point out, this is achievable by adding another variable like
print-integer-radix.
PS: I notices inconsistency in these names: "integer" in print-integer-radix
is singular, but "integers" in print-integers-as-characters is plural.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Mon, 02 Nov 2020 23:04:01 GMT)
Full text and
rfc822 format available.
Message #92 received at 44155 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
2 nov. 2020 kl. 22.36 skrev Juri Linkov <juri <at> linkov.net>:
>
>> Thus we would have 10 -> ?\n, 13 -> ?\r, 127 -> ?\d, 65 -> ?A,
>> 255 -> ?ÿ, but 31 -> 31, 129 -> 129, 4194303 -> 4194303.
>
> Hopefully, printing some characters as numbers will fix
> the currently broken test.
It does! Here is a proposed patch. We could add a separate radix control later if you like.
One detail that I'm undecided about is whether to remove the more obscure control escapes \f, \a, \v, \e and \d, on the grounds that they are less likely to be used as actual characters and that users may prefer to see them as numbers instead. C, and most languages inheriting them from C, lack \e or \d; \f and \a are rare today, and \v is an anachronism.
> PS: I notices inconsistency in these names: "integer" in print-integer-radix
> is singular, but "integers" in print-integers-as-characters is plural.
Actually, 'integer' in 'integer radix' plays the part of adjective!
[0001-Reduce-integer-output-format-to-print-integers-as-ch.patch (application/octet-stream, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Tue, 03 Nov 2020 08:32:01 GMT)
Full text and
rfc822 format available.
Message #95 received at 44155 <at> debbugs.gnu.org (full text, mbox):
>> Hopefully, printing some characters as numbers will fix
>> the currently broken test.
>
> It does! Here is a proposed patch. We could add a separate radix control later if you like.
Thanks, I like your patch, hope that Eli will like it too.
> One detail that I'm undecided about is whether to remove the more obscure
> control escapes \f, \a, \v, \e and \d, on the grounds that they are less
> likely to be used as actual characters and that users may prefer to see
> them as numbers instead. C, and most languages inheriting them from C, lack
> \e or \d; \f and \a are rare today, and \v is an anachronism.
I don't think that \f is rare, it's used as a page separator
in many Emacs Lisp files. But it would be surprising to me to see
127 printed as ?\d, maybe because C lacks it.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Tue, 03 Nov 2020 15:25:01 GMT)
Full text and
rfc822 format available.
Message #98 received at 44155 <at> debbugs.gnu.org (full text, mbox):
> From: Mattias Engdegård <mattiase <at> acm.org>
> Date: Tue, 3 Nov 2020 00:03:31 +0100
> Cc: Eli Zaretskii <eliz <at> gnu.org>, Andreas Schwab <schwab <at> suse.de>,
> 44155 <at> debbugs.gnu.org
>
> +@defvar print-integers-as-characters
> +When this variable is non-@code{nil}, integers that represent
> +printable characters or control characters with their own escape
> +syntax such as newline will be printed using Lisp character syntax
What is meant by "printable characters" here? One could think you
mean [:print:], but that doesn't seem to be what then code does.
> + DEFVAR_BOOL ("print-integers-as-characters", print_integers_as_characters,
> + doc: /* Non-nil means integers are printed using characters syntax.
> +Only non-control characters, and control characters with named escape
> +sequences such as newline, are printed this way. Other integers,
> +including those corresponding to raw bytes, are not affected. */);
And here, what does "non-control characters" mean?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Tue, 03 Nov 2020 18:48:02 GMT)
Full text and
rfc822 format available.
Message #101 received at 44155 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
3 nov. 2020 kl. 16.24 skrev Eli Zaretskii <eliz <at> gnu.org>:
> What is meant by "printable characters" here? One could think you
> mean [:print:], but that doesn't seem to be what then code does.
Non-control characters (characters other than control characters), in this case. I wanted to keep things simple and not involve the Unicode database in the printer.
(For that matter, [:print:] is a regexp feature and doesn't really define the meaning of 'printable', but your question was valid.)
On the other hand, printing all non-controls using the ?X syntax is maybe not ideal. Attached is a new patch that uses Unicode properties to select only printable base characters.
This patch also removes \a, \v, \e and \d from the characters printed as escaped controls.
[0001-Reduce-integer-output-format-to-print-integers-as-ch.patch (application/octet-stream, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Tue, 03 Nov 2020 19:37:02 GMT)
Full text and
rfc822 format available.
Message #104 received at 44155 <at> debbugs.gnu.org (full text, mbox):
> From: Mattias Engdegård <mattiase <at> acm.org>
> Date: Tue, 3 Nov 2020 19:47:17 +0100
> Cc: juri <at> linkov.net, schwab <at> suse.de, 44155 <at> debbugs.gnu.org
>
> > What is meant by "printable characters" here? One could think you
> > mean [:print:], but that doesn't seem to be what then code does.
>
> Non-control characters (characters other than control characters), in this case. I wanted to keep things simple and not involve the Unicode database in the printer.
>
> (For that matter, [:print:] is a regexp feature and doesn't really define the meaning of 'printable', but your question was valid.)
>
> On the other hand, printing all non-controls using the ?X syntax is maybe not ideal. Attached is a new patch that uses Unicode properties to select only printable base characters.
Thanks, but my main question is still not answered. I asked it from
the POV of documentation: we should provide a more specific
description of which characters will be printed as characters, so that
users are not surprised. The text in NEWS still says "printable
characters" without defining that term, and so does the doc string of
print-integers-as-characters.
And now there's another question, which is what caused you to filter
characters like you did? E.g., what's wrong with combining classes?
why not simply use graphicp?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Wed, 04 Nov 2020 11:04:02 GMT)
Full text and
rfc822 format available.
Message #107 received at 44155 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
3 nov. 2020 kl. 20.36 skrev Eli Zaretskii <eliz <at> gnu.org>:
> Thanks, but my main question is still not answered. I asked it from
> the POV of documentation: we should provide a more specific
> description of which characters will be printed as characters, so that
> users are not surprised. The text in NEWS still says "printable
> characters" without defining that term, and so does the doc string of
> print-integers-as-characters.
'Printable' was used informally, not in an exact technical meaning. Intuitively, it should be the set of characters that make sense to print using the '?X' syntax. I initially thought that 'graphic' was too technical but it is more precise. 'Independently printable graphic character' is descriptive but a mouthful; perhaps 'independent graphic char' would do?
> And now there's another question, which is what caused you to filter
> characters like you did? E.g., what's wrong with combining classes?
> why not simply use graphicp?
For the ?X syntax to make sense, X must be visible; thus controls are out, and so are formatting chars (language tags etc). Spaces should probably have been excluded as well since it's typically not possible to see what kind of space follows the '?' (SPC is explicitly rendered as ?\s).
Furthermore, X must be independent since it isn't a grapheme cluster but a single code point. Therefore combining chars cannot be included as they would attach to the '?'.
'graphicp' cannot be used because it includes combining, enclosing and nonspacing marks (M) and formats (Cf); otherwise it's fine.
While we could put the exact list of excluded general categories in the documentation, it is not very important because the selection only matters for usability and aesthetics, not (realistically) for code behaviour.
The attached patch excludes spaces (Zs) and revises the terminology.
[0001-Reduce-integer-output-format-to-print-integers-as-ch.patch (application/octet-stream, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Wed, 04 Nov 2020 15:39:02 GMT)
Full text and
rfc822 format available.
Message #110 received at 44155 <at> debbugs.gnu.org (full text, mbox):
> From: Mattias Engdegård <mattiase <at> acm.org>
> Date: Wed, 4 Nov 2020 12:03:32 +0100
> Cc: juri <at> linkov.net, schwab <at> suse.de, 44155 <at> debbugs.gnu.org
>
> 'Printable' was used informally, not in an exact technical meaning. Intuitively, it should be the set of characters that make sense to print using the '?X' syntax. I initially thought that 'graphic' was too technical but it is more precise. 'Independently printable graphic character' is descriptive but a mouthful; perhaps 'independent graphic char' would do?
I'm not sure. I think we should use something more familiar, or
explain it in more detail. We already mention Unicode properties
elsewhere in the manual, so we could define this in those terms, and
send the reader there for the details, for example.
> For the ?X syntax to make sense, X must be visible; thus controls are out, and so are formatting chars (language tags etc). Spaces should probably have been excluded as well since it's typically not possible to see what kind of space follows the '?' (SPC is explicitly rendered as ?\s).
>
> Furthermore, X must be independent since it isn't a grapheme cluster but a single code point. Therefore combining chars cannot be included as they would attach to the '?'.
>
> 'graphicp' cannot be used because it includes combining, enclosing and nonspacing marks (M) and formats (Cf); otherwise it's fine.
>
> While we could put the exact list of excluded general categories in the documentation, it is not very important because the selection only matters for usability and aesthetics, not (realistically) for code behaviour.
>
> The attached patch excludes spaces (Zs) and revises the terminology.
I'm not going to argue about this aspect, but just FTR: whether to
include combining characters is a decision that we make here, it is
not a necessity. Because we are perfectly capable of displaying
combining characters without risking them to become composed with
surrounding characters: we could either precede them with U+25CC
DOTTED CIRCLE, or use the technique describe-char-padded-string in
descr-text.el uses.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Wed, 04 Nov 2020 16:47:01 GMT)
Full text and
rfc822 format available.
Message #113 received at 44155 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
4 nov. 2020 kl. 16.38 skrev Eli Zaretskii <eliz <at> gnu.org>:
> I'm not sure. I think we should use something more familiar, or
> explain it in more detail. We already mention Unicode properties
> elsewhere in the manual, so we could define this in those terms, and
> send the reader there for the details, for example.
Thanks for the review. Please look at the revised patch below with your requested changes.
> I'm not going to argue about this aspect, but just FTR: whether to
> include combining characters is a decision that we make here, it is
> not a necessity. Because we are perfectly capable of displaying
> combining characters without risking them to become composed with
> surrounding characters: we could either precede them with U+25CC
> DOTTED CIRCLE, or use the technique describe-char-padded-string in
> descr-text.el uses.
No we cannot, because the output must be valid Lisp.
[0001-Reduce-integer-output-format-to-print-integers-as-ch.patch (application/octet-stream, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#44155
; Package
emacs
.
(Wed, 04 Nov 2020 16:59:01 GMT)
Full text and
rfc822 format available.
Message #116 received at 44155 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
The last patch was incorrect; here is the right one. Apologies for the confusion.
[0001-Reduce-integer-output-format-to-print-integers-as-ch.patch (application/octet-stream, attachment)]
Reply sent
to
Mattias Engdegård <mattiase <at> acm.org>
:
You have taken responsibility.
(Fri, 06 Nov 2020 13:04:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Juri Linkov <juri <at> linkov.net>
:
bug acknowledged by developer.
(Fri, 06 Nov 2020 13:04:02 GMT)
Full text and
rfc822 format available.
Message #121 received at 44155-done <at> debbugs.gnu.org (full text, mbox):
4 nov. 2020 kl. 17.58 skrev Mattias Engdegård <mattiase <at> acm.org>:
> The last patch was incorrect; here is the right one. Apologies for the confusion.
Pushed to master, since there wasn't much left to discuss. As usual, it can be modified or reverted as needed.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Sat, 05 Dec 2020 12:24:05 GMT)
Full text and
rfc822 format available.
This bug report was last modified 4 years and 258 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.