GNU bug report logs - #55777
[PATCH] Improve documentation of `string-to-multibyte', `string-to-unibyte'

Previous Next

Package: emacs;

Reported by: Richard Hansen <rhansen <at> rhansen.org>

Date: Fri, 3 Jun 2022 06:21:02 UTC

Severity: minor

Tags: patch

Done: Stefan Kangas <stefankangas <at> gmail.com>

Bug is archived. No further changes may be made.

Full log


Message #23 received at 55777 <at> debbugs.gnu.org (full text, mbox):

From: Richard Hansen <rhansen <at> rhansen.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 55777 <at> debbugs.gnu.org
Subject: Re: bug#55777: [PATCH] Improve documentation of
 `string-to-multibyte', `string-to-unibyte'
Date: Sun, 5 Jun 2022 22:00:35 -0400
On 6/5/22 01:37, Eli Zaretskii wrote:
> Could you please state what is confusing in the current wording?

  * "Raw 8-bit bytes" isn't really defined. It's mentioned earlier in
    the chapter -- the term is even in a @dfn{} -- but there's no
    definition there.

  * The term "raw 8-bit bytes" is misleading. It suggests binary data
    (bytes with values 0-255) but it's actually meant to only cover
    128-255.

  * The term "raw 8-bit bytes" is not used consistently. Sometimes "8"
    is spelled out as "eight", sometimes "raw" comes after "8-bit",
    and sometimes it refers to all byte values 0-255 (see the first
    sentence under `@cindex unibyte text`).

  * It's not clear whether "raw 8-bit bytes" is meant to refer to
    bytes with values 128-255, or to the *characters* that map to
    those byte values.

  * The following phrasing is weird: "The function assumes that
    @var{string} includes ASCII characters and raw 8-bit bytes". The
    purpose of "raw 8-bit bytes" is to cover non-ASCII byte values, so
    by definition that assumption is always true. By saying "the
    function assumes", the reader is left wondering about the cases
    where that assumption is not true, which in turn causes the reader
    to question whether "raw 8-bit bytes" fully covers non-ASCII byte
    values, which in turn causes the reader to wonder how to handle
    those non-covered values (whatever they are).

    Maybe something like this:

        By definition, unibyte strings contain only @acronym{ASCII}
        characters (bytes with values 0-127) and raw 8-bit bytes
        (bytes with values 128-255); the latter are converted to their
        corresponding multibyte representations in the
        @code{eight-bit} character set (@pxref{Text Representations,
        codepoints}).




This bug report was last modified 2 years and 363 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.