GNU bug report logs - #78528
[PATCH v1] calc: Allow strings with higher character codes

Previous Next

Package: emacs;

Reported by: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>

Date: Wed, 21 May 2025 07:04:03 UTC

Severity: normal

Tags: patch

Done: Eli Zaretskii <eliz <at> gnu.org>

To reply to this bug, email your comments to 78528 AT debbugs.gnu.org.
There is no need to reopen the bug first.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#78528; Package emacs. (Wed, 21 May 2025 07:04:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 21 May 2025 07:04:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: [PATCH v1] calc: Allow strings with higher character codes
Date: Tue, 20 May 2025 21:34:02 -0400
[Message part 1 (text/plain, inline)]
Tags: patch

Hello all,

Please find below a feature proposal for strings in `calc', and a first
draft of a patch attached to this message.

Motivation
==========

Suppose you're working with Unicode code points in `calc', and you end
up with the following vector on the stack. You'd like to know what a
string composed of these character codes would look like, so you toggle
`calc-display-strings' (`d "') and … nothing happens.

,----
| 1:  [383, 117, 99, 99, 101, 383, 115]
`----

Later, in an `org-mode' file you have the following table with a list of
dates in the first column. Since [formulas] can be any algebraic
expression understood by `calc', and `calc' [understands dates], you try
to insert a Unicode character for rows where the first column is in the
past. When you evaluate the formula (`C-c C-c' on the `#+TBLFM:' line)
`calc' stops short of displaying the string.

,----
| | Date             | Past?           |
| |------------------+-----------------|
| | [2025-05-01 Thu] | string([10003]) |
| | [2026-05-01 Fri] |                 |
| #+TBLFM: $2 = if($1 < now(), string("✓"), string(""))
`----

Both of these problems are due to the fact that some or all of the
character codes are outside the `Latin-1' (8-bit) range. If we replace
this hard-coded limitation with a custom variable and increase its
value, both of these use-cases can be supported.

,----
| 1:  "ſucceſs"
`----

,----
| | Date             | Past? |
| |------------------+-------|
| | [2025-05-01 Thu] | ✓     |
| | [2026-05-01 Fri] |       |
| #+TBLFM: $2 = if($1 < now(), string("✓"), string(""))
`----

The alternative is that the user has to exit `calc' (or its syntax) and
dip into `Lisp':

,----
| (concat '(383 117 99 99 101 383 115))
`----

,----
| | Date             | Past? |
| |------------------+-------|
| | [2025-05-01 Thu] | ✓     |
| | [2026-05-01 Fri] |       |
| #+TBLFM: $2 = '(if (time-less-p (org-read-date t t $1) (current-time)) "✓" "")
`----

[formulas] <https://orgmode.org/manual/Formula-syntax-for-Calc.html>

[understands dates]
<https://www.gnu.org/software/emacs/manual/html_node/calc/Date-Forms.html>

Proposal & Impact
=================

The attached patch introduces a custom variable
`calc-string-maximum-character' (optimistically versioned for `31.1'),
which replaces a hard-coded maximum in the function
`math-vector-is-string'. This variable defaults to `0xFF' in order to
preserve the current behaviour, but otherwise can be any character up to
`(max-char)'. Since the vector contents are passed to
`math-vector-to-string', the Unicode-aware `concat' has no problem with
the higher characters:

,----
| (defun math-vector-to-string (a &optional quoted)
|   (setq a (concat (mapcar (lambda (x) (if (consp x) (nth 1 x) x))
|                           (cdr a))))
|   […])
`----

Here are the outstanding issues I've identified for discussion:

1. Since users can blow past the variable type and set
   `calc-string-maximum-character' to /anything/, I'm not sure the
   patch's error handling is enough. If a hapless user sets it to
   something invalid like a string (`"invalid"', let's say), then with
   the current patch they'll encounter at least two kinds of errors:

   a) With the following vector on the stack, executing
      `calc-display-strings' (`d "') will display `Wrong type argument:
      number-or-marker-p, "invalid"' in the minibuffer, /and/ enter a
      string display mode where the vector isn't rendered as seen in the
      second block below.

      ,----
      | 1:  [0, 1, 2]
      `----

      ,----
      | 1:  .
      `----

      Only executing `calc-display-strings' (`d "') again will toggle
      the display mode and show the original vector. This is a bad
      experience for the user, and should be mitigated by raising an
      error in `calc-display-strings' before the display mode is
      toggled.

   b) If a user tries to enter a string algebraically with
      `calc-algebraic-entry' (`''), say `string("abc")', the same
      message from the first error will appear in the minibuffer, but
      the string is not added to the stack. This is slightly cryptic,
      but not as bad an experience as the first error.

2. With a higher value of `calc-string-maximum-character', the displayed
   string could contain right-to-left or a bidirectional mixture of
   characters that could conceivably interfere with the `calc' alignment
   functions `calc-left-justify' (`d <'), `calc-center-justify' (`d ='),
   and `calc-right-justify' (`d >'). Toggling the display of the
   following vectors reveals a misalignment of the fully Arabic string
   under center justification, and misalignment of the full- and
   mixed-Arabic strings under right justification. None of these contain
   any of the funky bidirectional Unicode markers so I'm not sure if
   there's other problems lurking.

   ,----
   | 3:  [108, 101, 102, 116, 45, 116, 111, 45, 114, 105, 103, 104, 116]
   | 2:  [1605, 1606, 32, 1575, 1604, 1610, 1605, 1610, 1606, 32, 1573, 1604, 1609, 32, 1575, 1604, 1610, 1587, 1575, 1585]
   | 1:  [108, 101, 102, 116, 45, 1610, 1605, 1610, 1606]
   `----

   ,----
   | 3:  "left-to-right"
   | 2:  "من اليمين إلى اليسار"
   | 1:  "left-يمين"
   `----

   ,----
   | 3:                       "left-to-right"
   | 2:                   "من اليمين إلى اليسار"
   | 1:                         "left-يمين"
   `----

   ,----
   | 3:                                               "left-to-right"
   | 2:                                        "من اليمين إلى اليسار"
   | 1:                                                   "left-يمين"
   `----

   Also, combining diacritical marks appear as separate characters, but
   I'm not sure if this is the expected behaviour and/or related to my
   configuration.

   ,----
   | 1.  [117, 776]
   `----

   ,----
   | 1:  "ü"
   `----

3. I haven't found any internal references to `math-vector-is-string'
   that look like they could conflict with this change
   (`math-format-flat-expr-fancy', `math-compose-expr',
   `calc-kbd-query'). Existing references are mostly related to
   displaying strings from vectors, `string' or `bstring' objects, and
   composite objects involving vectors or strings, but I could use an
   extra set of eyes to confirm. Since `org-mode' uses `calc'
   expressions in tables, I might need to get their concurrence with the
   change. I'm unaware of any third-party dependencies on this function.

4. For unit tests, are there any naming conventions I should follow? I
   just stuck all of the tests in one place for `math-vector-is-string'.


Thanks for your consideration!

--
Jacob S. Gordon
jacob.as.gordon <at> gmail.com

=========================

Please avoid sending me HTML emails and MS Office documents.
https://useplaintext.email/#etiquette
https://www.gnu.org/philosophy/no-word-attachments.html

In GNU Emacs 30.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.24.49,
cairo version 1.18.4)
System Description: Arch Linux

Configured using:
 'configure --with-pgtk --sysconfdir=/etc --prefix=/usr
 --libexecdir=/usr/lib --localstatedir=/var --disable-build-details
 --with-cairo --with-harfbuzz --with-libsystemd --with-modules
 --with-native-compilation=aot --with-tree-sitter 'CFLAGS=-march=x86-64
 -mtune=generic -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=3
 -Wformat -Werror=format-security -fstack-clash-protection
 -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -g
 -ffile-prefix-map=/build/emacs/src=/usr/src/debug/emacs -flto=auto'
 'LDFLAGS=-Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro
 -Wl,-z,now -Wl,-z,pack-relative-relocs -flto=auto'
 'CXXFLAGS=-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions
 -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security
 -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer
 -mno-omit-leaf-frame-pointer -Wp,-D_GLIBCXX_ASSERTIONS -g
 -ffile-prefix-map=/build/emacs/src=/usr/src/debug/emacs -flto=auto''

[v1-0001-calc-Allow-strings-with-higher-character-codes.patch (text/patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78528; Package emacs. (Fri, 30 May 2025 18:12:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: [PATCH v1] calc: Allow strings with higher character codes
Date: Fri, 30 May 2025 09:05:01 -0400
Hello again,

On 2025-05-20 21:34, Jacob S. Gordon wrote:
> Please find below a feature proposal for strings in `calc', and a first
> draft of a patch attached to this message.

Any thoughts on this patch?

-- 
Jacob S. Gordon
jacob.as.gordon <at> gmail.com

======================

Please avoid sending me HTML emails and MS Office documents.
https://useplaintext.email/#etiquette
https://www.gnu.org/philosophy/no-word-attachments.html




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78528; Package emacs. (Sat, 31 May 2025 06:28:04 GMT) Full text and rfc822 format available.

Message #11 received at 78528 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
Cc: 78528 <at> debbugs.gnu.org
Subject: Re: bug#78528: [PATCH v1] calc: Allow strings with higher character
 codes
Date: Sat, 31 May 2025 09:27:11 +0300
> Date: Fri, 30 May 2025 09:05:01 -0400
> From: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
> 
> Hello again,
> 
> On 2025-05-20 21:34, Jacob S. Gordon wrote:
> > Please find below a feature proposal for strings in `calc', and a first
> > draft of a patch attached to this message.
> 
> Any thoughts on this patch?

It's in my (too long, admittedly) queue.

Can you think of any possible downsides to installing the patch?

In any case, to accept such a large contribution we'd need you to sign
the copyright assignment agreement (which you currently don't have,
AFAICT).  If you are willing to do that, I will send you the form to
fill and the instructions to go with it.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78528; Package emacs. (Tue, 03 Jun 2025 04:01:06 GMT) Full text and rfc822 format available.

Message #14 received at 78528 <at> debbugs.gnu.org (full text, mbox):

From: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 78528 <at> debbugs.gnu.org
Subject: Re: bug#78528: [PATCH v1] calc: Allow strings with higher character
 codes
Date: Mon, 2 Jun 2025 15:28:16 -0400
Hello,

On 2025-05-31 09:27 Eli Zaretskii wrote:
> It's in my (too long, admittedly) queue.

No problem, thanks for the confirmation.

> Can you think of any possible downsides to installing the patch?

Nothing that I don’t think can be ironed out.

+ The custom variable defaults to the previously hard‐coded value, so
unless users change it, `calc' will act the same as before.

+ This variable only affects the display of vectors‐of‐chars, and
touches none of the underlying types (e.g., algebraic variables are
still restricted to a basic Latin-Greek range through an independent
parsing step).

+ Allowing a higher maximum means that users can encounter characters
without a fixed width, or contextual forms that change the rendered
string length. Alignment/justification, and some elements of
“compositions” assume fixed-width characters for their calculations,
so their results can be off. Here are some representative examples
from all the affected compositions (the extent is font‐dependent):

  + `choriz' (horizontal composition) optionally takes a `SEP' vector:

  #+begin_src calc
  choriz([a b/c],"✕")
  #+end_src

  #+begin_src text
  1:  a✕b / c
  #+end_src

  + Only the `crule' component of vertical compositions is affected,
  which optionally takes a character to form the horizontal rule. For
  example, comparing the em dash, hyphen-minus, and hyphen,
  respectively, the hyphen rule isn’t full enough:

  #+begin_src calc
  cvert([a + 1, cbase(crule("—")), b^2])
  cvert([a + 1, cbase(crule("-")), b^2])
  cvert([a + 1, cbase(crule("‐")), b^2])
  #+end_src

  #+begin_src text
  3:  a + 1
      —————
       b^2
  2:  a + 1
      -----
       b^2
  1:  a + 1
      ‐‐‐‐‐
       b^2
  #+end_src

  + `cspace', `cvspace', `ctspace', `cbspace' all take strings as an
  optional second argument to repeat some number of times, and will
  behave similarly to `string' with respect to alignment.

  + `cwidth' counts characters, and will be different from the actual
  length with variable-width characters or contextual forms. I’m less
  familiar with vertically‐oriented scripts, but I imagine `cheight'
  can suffer similarly with something like `cvspace'.

  + Any user‐defined compositions involving strings may be affected if
  they make the same assumptions about string width, increase the
  custom variable, and include offending characters.

+ With the `calc-big-language' display mode (`d B'), but none of the
other modes, pure RTL strings are aligned opposite to the LTR strings.

> In any case, to accept such a large contribution we'd need you to
> sign the copyright assignment agreement (which you currently don't
> have, AFAICT). If you are willing to do that, I will send you the
> form to fill and the instructions to go with it.

That’s right, I haven’t signed the copyright assignment agreement yet,
but I’m willing.

Thanks,

-- 
Jacob S. Gordon
jacob.as.gordon <at> gmail.com

======================

Please avoid sending me HTML emails and MS Office documents.
https://useplaintext.email/#etiquette
https://www.gnu.org/philosophy/no-word-attachments.html




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78528; Package emacs. (Tue, 03 Jun 2025 11:43:04 GMT) Full text and rfc822 format available.

Message #17 received at 78528 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
Cc: 78528 <at> debbugs.gnu.org
Subject: Re: bug#78528: [PATCH v1] calc: Allow strings with higher character
 codes
Date: Tue, 03 Jun 2025 14:42:01 +0300
> Date: Mon, 2 Jun 2025 15:28:16 -0400
> Cc: 78528 <at> debbugs.gnu.org
> From: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
> 
> > In any case, to accept such a large contribution we'd need you to
> > sign the copyright assignment agreement (which you currently don't
> > have, AFAICT). If you are willing to do that, I will send you the
> > form to fill and the instructions to go with it.
> 
> That’s right, I haven’t signed the copyright assignment agreement yet,
> but I’m willing.

Thanks, form sent off-list.




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sat, 14 Jun 2025 14:14:12 GMT) Full text and rfc822 format available.

Notification sent to "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>:
bug acknowledged by developer. (Sat, 14 Jun 2025 14:14:13 GMT) Full text and rfc822 format available.

Message #22 received at 78528-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
Cc: 78528-done <at> debbugs.gnu.org
Subject: Re: bug#78528: [PATCH v1] calc: Allow strings with higher character
 codes
Date: Sat, 14 Jun 2025 17:12:34 +0300
> Date: Mon, 2 Jun 2025 15:28:16 -0400
> Cc: 78528 <at> debbugs.gnu.org
> From: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
> 
> Hello,
> 
> On 2025-05-31 09:27 Eli Zaretskii wrote:
> > It's in my (too long, admittedly) queue.
> 
> No problem, thanks for the confirmation.
> 
> > Can you think of any possible downsides to installing the patch?
> 
> Nothing that I don’t think can be ironed out.
> 
> + The custom variable defaults to the previously hard‐coded value, so
> unless users change it, `calc' will act the same as before.
> 
> + This variable only affects the display of vectors‐of‐chars, and
> touches none of the underlying types (e.g., algebraic variables are
> still restricted to a basic Latin-Greek range through an independent
> parsing step).
> 
> + Allowing a higher maximum means that users can encounter characters
> without a fixed width, or contextual forms that change the rendered
> string length. Alignment/justification, and some elements of
> “compositions” assume fixed-width characters for their calculations,
> so their results can be off. Here are some representative examples
> from all the affected compositions (the extent is font‐dependent):
> 
>   + `choriz' (horizontal composition) optionally takes a `SEP' vector:
> 
>   #+begin_src calc
>   choriz([a b/c],"✕")
>   #+end_src
> 
>   #+begin_src text
>   1:  a✕b / c
>   #+end_src
> 
>   + Only the `crule' component of vertical compositions is affected,
>   which optionally takes a character to form the horizontal rule. For
>   example, comparing the em dash, hyphen-minus, and hyphen,
>   respectively, the hyphen rule isn’t full enough:
> 
>   #+begin_src calc
>   cvert([a + 1, cbase(crule("—")), b^2])
>   cvert([a + 1, cbase(crule("-")), b^2])
>   cvert([a + 1, cbase(crule("‐")), b^2])
>   #+end_src
> 
>   #+begin_src text
>   3:  a + 1
>       —————
>        b^2
>   2:  a + 1
>       -----
>        b^2
>   1:  a + 1
>       ‐‐‐‐‐
>        b^2
>   #+end_src
> 
>   + `cspace', `cvspace', `ctspace', `cbspace' all take strings as an
>   optional second argument to repeat some number of times, and will
>   behave similarly to `string' with respect to alignment.
> 
>   + `cwidth' counts characters, and will be different from the actual
>   length with variable-width characters or contextual forms. I’m less
>   familiar with vertically‐oriented scripts, but I imagine `cheight'
>   can suffer similarly with something like `cvspace'.
> 
>   + Any user‐defined compositions involving strings may be affected if
>   they make the same assumptions about string width, increase the
>   custom variable, and include offending characters.
> 
> + With the `calc-big-language' display mode (`d B'), but none of the
> other modes, pure RTL strings are aligned opposite to the LTR strings.
> 
> > In any case, to accept such a large contribution we'd need you to
> > sign the copyright assignment agreement (which you currently don't
> > have, AFAICT). If you are willing to do that, I will send you the
> > form to fill and the instructions to go with it.
> 
> That’s right, I haven’t signed the copyright assignment agreement yet,
> but I’m willing.

Thanks, since the copyright assignment paperwork is now done, I've
installed this on the master branch, and I'm closing the bug.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78528; Package emacs. (Sat, 14 Jun 2025 14:55:02 GMT) Full text and rfc822 format available.

Message #25 received at 78528-done <at> debbugs.gnu.org (full text, mbox):

From: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 78528-done <at> debbugs.gnu.org
Subject: Re: bug#78528: [PATCH v1] calc: Allow strings with higher character
 codes
Date: Sat, 14 Jun 2025 10:54:18 -0400
On 2025-06-14 10:12, Eli Zaretskii wrote:
> I've installed this on the master branch, and I'm closing the bug.

Oh, I wasn’t expecting that due to a few rough edges I raised (e.g.
alignment, type-checking on the variable). But since it’s opt-in and
only appears in a few places, there’s no issue.

I’ll continue looking at this and send a patch in a new thread when
ready?

Thanks,

-- 
Jacob S. Gordon
jacob.as.gordon <at> gmail.com

======================

Please avoid sending me HTML emails and MS Office documents.
https://useplaintext.email/#etiquette
https://www.gnu.org/philosophy/no-word-attachments.html




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78528; Package emacs. (Sat, 14 Jun 2025 15:19:02 GMT) Full text and rfc822 format available.

Message #28 received at 78528 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
Cc: 78528 <at> debbugs.gnu.org
Subject: Re: bug#78528: [PATCH v1] calc: Allow strings with higher character
 codes
Date: Sat, 14 Jun 2025 18:18:07 +0300
> Date: Sat, 14 Jun 2025 10:54:18 -0400
> From: "Jacob S. Gordon" <jacob.as.gordon <at> gmail.com>
> Cc: 78528-done <at> debbugs.gnu.org
> 
> On 2025-06-14 10:12, Eli Zaretskii wrote:
> > I've installed this on the master branch, and I'm closing the bug.
> 
> Oh, I wasn’t expecting that due to a few rough edges I raised (e.g.
> alignment, type-checking on the variable). But since it’s opt-in and
> only appears in a few places, there’s no issue.
> 
> I’ll continue looking at this and send a patch in a new thread when
> ready?

Sure, please do.




This bug report was last modified 2 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.