GNU bug report logs - #55815
[PATCH] bindat: Improve str, strz documentation

Previous Next

Package: emacs;

Reported by: Richard Hansen <rhansen <at> rhansen.org>

Date: Mon, 6 Jun 2022 02:23:02 UTC

Severity: normal

Tags: patch

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 55815 in the body.
You can then email your comments to 55815 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to monnier <at> iro.umontreal.ca, bug-gnu-emacs <at> gnu.org:
bug#55815; Package emacs. (Mon, 06 Jun 2022 02:23:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Richard Hansen <rhansen <at> rhansen.org>:
New bug report received and forwarded. Copy sent to monnier <at> iro.umontreal.ca, bug-gnu-emacs <at> gnu.org. (Mon, 06 Jun 2022 02:23:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Richard Hansen <rhansen <at> rhansen.org>
To: bug-gnu-emacs <at> gnu.org
Subject: [PATCH] bindat: Improve str, strz documentation
Date: Sun, 5 Jun 2022 22:22:01 -0400
[Message part 1 (text/plain, inline)]
X-Debbugs-CC: monnier <at> iro.umontreal.ca

* doc/lispref/processes.texi (Bindat Types): Expand the documentation
for the `str' and `strz' types to clarify expectations and explain
edge case behavior.
---
 doc/lispref/processes.texi | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/doc/lispref/processes.texi b/doc/lispref/processes.texi
index 668a577870..68621d32a8 100644
--- a/doc/lispref/processes.texi
+++ b/doc/lispref/processes.texi
@@ -3479,11 +3479,31 @@ Bindat Types
 @var{bitlen} has to be a multiple of 8.
 
 @item str @var{len}
-String of bytes of length @var{len}.
+String of length @var{len}.  When packing, the first @var{len} bytes
+of the input string are copied to the packed output.  If the input
+string is shorter than @var{len}, the remaining bytes are set to zero.
+The input string must be unibyte (@pxref{Text Representations}).  When
+unpacking, any zero bytes in the packed input string will appear in
+the unpacked output.
 
 @item strz &optional @var{len}
-Zero-terminated string of bytes, can be of arbitrary length or in a fixed-size
-field with length @var{len}.
+If @var{len} is not provided: Variable-length null-terminated string.
+When packing, the entire input string is copied to the packed output
+followed by a zero byte (null terminator).  The input string must be
+unibyte (@pxref{Text Representations}) and must not contain any zero
+bytes.  When unpacking, the resulting string contains all bytes up to
+(but excluding) the null terminator.
+
+If @var{len} is provided: @code{strz} behaves the same as @code{str}
+with one difference. When unpacking, the first zero byte (null
+terminator) encountered in the packed string and all subsequent bytes
+are excluded from the unpacked result.
+
+@quotation Caution
+The packed output will not be null-terminated unless the input string
+is shorter than @var{len} or it contains a zero byte within the first
+@var{len} bytes.
+@end quotation
 
 @item vec @var{len} [@var{type}]
 Vector of @var{len} elements.  The type of the elements is given by
-- 
2.36.1
[0001-bindat-Improve-str-strz-documentation.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55815; Package emacs. (Mon, 06 Jun 2022 11:01:02 GMT) Full text and rfc822 format available.

Message #8 received at 55815 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Richard Hansen <rhansen <at> rhansen.org>
Cc: 55815 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#55815: [PATCH] bindat: Improve str, strz documentation
Date: Mon, 06 Jun 2022 13:59:44 +0300
> Cc: monnier <at> iro.umontreal.ca
> Date: Sun, 5 Jun 2022 22:22:01 -0400
> From: Richard Hansen <rhansen <at> rhansen.org>
> 
>   @item str @var{len}
> -String of bytes of length @var{len}.
> +String of length @var{len}.

I think it is better to say

  Unibyte string that is @var{len} bytes long.

>   @item strz &optional @var{len}
> -Zero-terminated string of bytes, can be of arbitrary length or in a fixed-size
> -field with length @var{len}.
> +If @var{len} is not provided: Variable-length null-terminated string.

Same here: it is better to mention the unibyte-ness up front, since
it's important.

> +If @var{len} is provided: @code{strz} behaves the same as @code{str}
> +with one difference. When unpacking, the first zero byte (null
                      ^^
Our conventions are to leave two spaces between sentences.

Also, for consistency, I suggest to use "null byte" everywhere, to
avoid potential confusion of non-native English speakers.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55815; Package emacs. (Mon, 06 Jun 2022 23:32:02 GMT) Full text and rfc822 format available.

Message #11 received at 55815 <at> debbugs.gnu.org (full text, mbox):

From: Richard Hansen <rhansen <at> rhansen.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 55815 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#55815: [PATCH] bindat: Improve str, strz documentation
Date: Mon, 6 Jun 2022 19:31:35 -0400
[Message part 1 (text/plain, inline)]
Thanks for the review.  A new revision is attached.

On 6/6/22 06:59, Eli Zaretskii wrote:
> I think it is better to say
> 
>    Unibyte string that is @var{len} bytes long.

Done.  I may have gone overboard though -- I did so because there are three representations that matter:

  1. The input string to be packed.
  2. The packed output.
  3. The result of unpacking.

Right now all three of those are unibyte, but in a future patch I plan on changing the first to accept unibyte-convertible multibyte input strings.

> Our conventions are to leave two spaces between sentences.

Done.

> Also, for consistency, I suggest to use "null byte" everywhere, to 
> avoid potential confusion of non-native English speakers.

Done.

I also fixed a flaw in the previous revision: packing to a fixed-length field doesn't actually write a null byte if the input is shorter than the field. This only matters if the caller provided a pre-allocated string that doesn't have null bytes.

Thanks,
Richard
[v2-0001-bindat-Improve-str-strz-documentation.patch (text/x-patch, attachment)]
[OpenPGP_signature (application/pgp-signature, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55815; Package emacs. (Tue, 07 Jun 2022 16:31:02 GMT) Full text and rfc822 format available.

Message #14 received at 55815 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Richard Hansen <rhansen <at> rhansen.org>
Cc: 55815 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#55815: [PATCH] bindat: Improve str, strz documentation
Date: Tue, 07 Jun 2022 19:30:25 +0300
> Date: Mon, 6 Jun 2022 19:31:35 -0400
> Cc: 55815 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Richard Hansen <rhansen <at> rhansen.org>
> 
> > I think it is better to say
> > 
> >    Unibyte string that is @var{len} bytes long.
> 
> Done.  I may have gone overboard though -- I did so because there are three representations that matter:
> 
>    1. The input string to be packed.
>    2. The packed output.
>    3. The result of unpacking.
> 
> Right now all three of those are unibyte, but in a future patch I plan on changing the first to accept unibyte-convertible multibyte input strings.

Not sure I understand: what do you mean by "unibyte-convertible
multibyte input strings", and how do they differ from the other kinds?

In any case, you say "unibyte input string" too many time, and that's
unnecessary.  One example:

> +Unibyte string of length @var{len}.  When packing, the first @var{len}
> +bytes of the input string are copied to the packed output.  If the
> +input string is shorter than @var{len}, the remaining bytes will be
> +null (zero) unless a pre-allocated string was provided to
> +@code{bindat-pack}, in which case the remaining bytes are left
> +unmodified.  The input string must be unibyte (@pxref{Text

Why do we need to say the input must be unibyte when we already said
that up front?

(There's more of this redundancy in the patch.)

Stefan, any further comments?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55815; Package emacs. (Tue, 07 Jun 2022 18:18:01 GMT) Full text and rfc822 format available.

Message #17 received at 55815 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 55815 <at> debbugs.gnu.org, Richard Hansen <rhansen <at> rhansen.org>
Subject: Re: bug#55815: [PATCH] bindat: Improve str, strz documentation
Date: Tue, 07 Jun 2022 14:17:43 -0400
> Stefan, any further comments?

Nothign specific, no.  The patch sounds good (it's important to clarify
what kind of "zero-terminated strings" we're supporting).


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#55815; Package emacs. (Wed, 08 Jun 2022 04:18:02 GMT) Full text and rfc822 format available.

Message #20 received at 55815 <at> debbugs.gnu.org (full text, mbox):

From: Richard Hansen <rhansen <at> rhansen.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 55815 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#55815: [PATCH] bindat: Improve str, strz documentation
Date: Wed, 8 Jun 2022 00:16:51 -0400
[Message part 1 (text/plain, inline)]
On 6/7/22 12:30, Eli Zaretskii wrote:
>> Right now all three of those are unibyte, but in a future patch I 
>> plan on changing the first to accept unibyte-convertible multibyte 
>> input strings.
> 
> Not sure I understand: what do you mean by "unibyte-convertible 
> multibyte input strings", and how do they differ from the other kinds?

I mean multibyte strings that do not contain characters that will cause string-to-unibyte to signal an error.

> In any case, you say "unibyte input string" too many time, and that's 
> unnecessary.

Done, see attached.
[v3-0001-bindat-Improve-str-strz-documentation.patch (text/x-patch, attachment)]
[OpenPGP_signature (application/pgp-signature, attachment)]

Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Thu, 09 Jun 2022 07:31:02 GMT) Full text and rfc822 format available.

Notification sent to Richard Hansen <rhansen <at> rhansen.org>:
bug acknowledged by developer. (Thu, 09 Jun 2022 07:31:02 GMT) Full text and rfc822 format available.

Message #25 received at 55815-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Richard Hansen <rhansen <at> rhansen.org>
Cc: 55815-done <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#55815: [PATCH] bindat: Improve str, strz documentation
Date: Thu, 09 Jun 2022 10:30:34 +0300
> Date: Wed, 8 Jun 2022 00:16:51 -0400
> Cc: 55815 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
> From: Richard Hansen <rhansen <at> rhansen.org>
> 
> On 6/7/22 12:30, Eli Zaretskii wrote:
> >> Right now all three of those are unibyte, but in a future patch I 
> >> plan on changing the first to accept unibyte-convertible multibyte 
> >> input strings.
> > 
> > Not sure I understand: what do you mean by "unibyte-convertible 
> > multibyte input strings", and how do they differ from the other kinds?
> 
> I mean multibyte strings that do not contain characters that will cause string-to-unibyte to signal an error.

IOW, multibyte strings that contain only ASCII characters and
characters of the 'eight-bit' charset.

> > In any case, you say "unibyte input string" too many time, and that's 
> > unnecessary.
> 
> Done, see attached.

Thanks, installed.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 07 Jul 2022 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 35 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.