GNU bug report logs -
#6576
documentation `string-to-char' is incorrect
Previous Next
Reported by: MON KEY <monkey <at> sandpframing.com>
Date: Tue, 6 Jul 2010 21:35:01 UTC
Severity: minor
Done: Chong Yidong <cyd <at> stupidchicken.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 6576 in the body.
You can then email your comments to 6576 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Tue, 06 Jul 2010 21:35:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
MON KEY <monkey <at> sandpframing.com>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Tue, 06 Jul 2010 21:35:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
First sentence of doc string for `string-to-char' is incorrect.
,---- (documentation 'string-to-char )
| "Convert arg STRING to a character, the first character of that
| string. A multibyte character is handled correctly.
|
| (fn STRING)"
`----
Should be something more like:
"Return decimal integer value of first character in STRING."
The rationale for the proposed docstring change are:
- The second clause of sentence doesn't parse;
- The arg string (nor its 1st char) are _not_ converted, e.g.:
(let ((not-cnvrtd "bubba"))
(string-to-char not-cnvrtd)
not-cnvrtd) ;; <- value of not-cnvrtd is a string not a char.
- It is more in keeping with what the manual says:
,---- (info "(elisp)Basic Char Syntax")
| Since characters are really integers, the printed representation
| of a character is a decimal number.
`----
Affliced docstring appears in GNU Emacs 23.2.1 and current through
Bzr-100633
--
/s_P\
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Wed, 07 Jul 2010 07:15:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 6576 <at> debbugs.gnu.org (full text, mbox):
> Date: Tue, 6 Jul 2010 17:34:41 -0400
> From: MON KEY <monkey <at> sandpframing.com>
> Cc:
>
> First sentence of doc string for `string-to-char' is incorrect.
>
> ,---- (documentation 'string-to-char )
> | "Convert arg STRING to a character, the first character of that
> | string. A multibyte character is handled correctly.
> |
> | (fn STRING)"
> `----
>
> Should be something more like:
>
> "Return decimal integer value of first character in STRING."
The "decimal" thing has no place here, because we are not talking
about the printed representation of that integer.
I would suggest the following, which also takes care of the second
sentence in the current doc string:
"Return the Unicode codepoint of the first character of STRING.
Note: eight-bit characters are returned as single-byte values in the
range 160 to 255, inclusive."
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Wed, 07 Jul 2010 08:41:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 6576 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> "Return the Unicode codepoint of the first character of STRING.
This is not correct. The value is just the internal encoding of the
character. It's identical to (aref STRING 0) except that it returns 0
for the empty string.
> Note: eight-bit characters are returned as single-byte values in the
> range 160 to 255, inclusive."
That depends on the multibyteness of the string.
Andreas.
--
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Wed, 07 Jul 2010 10:35:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 6576 <at> debbugs.gnu.org (full text, mbox):
> From: Andreas Schwab <schwab <at> linux-m68k.org>
> Cc: MON KEY <monkey <at> sandpframing.com>, 6576 <at> debbugs.gnu.org
> Date: Wed, 07 Jul 2010 10:40:00 +0200
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
> > "Return the Unicode codepoint of the first character of STRING.
>
> This is not correct. The value is just the internal encoding of the
> character.
Which is Unicode, AFAIK. The note takes care of the extension that
is specific to Emacs. If there are other extensions that I forgot, we
can add more notes.
> It's identical to (aref STRING 0)
I don't think talking about `(aref STRING 0)' in a doc string is a
good idea. Only people who know quite a lot about the internal
representation and what aref does in this case will understand such a
documentation.
> except that it returns 0 for the empty string
This fact should probably be mentioned in the doc string.
> > Note: eight-bit characters are returned as single-byte values in the
> > range 160 to 255, inclusive."
>
> That depends on the multibyteness of the string.
Eight-bit characters are defined as such only in multibyte strings.
But I think the note is correct for unibyte strings as well, because
they by definition include raw bytes.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Wed, 07 Jul 2010 12:17:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 6576 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: Andreas Schwab <schwab <at> linux-m68k.org>
>> Cc: MON KEY <monkey <at> sandpframing.com>, 6576 <at> debbugs.gnu.org
>> Date: Wed, 07 Jul 2010 10:40:00 +0200
>>
>> Eli Zaretskii <eliz <at> gnu.org> writes:
>>
>> > "Return the Unicode codepoint of the first character of STRING.
>>
>> This is not correct. The value is just the internal encoding of the
>> character.
>
> Which is Unicode, AFAIK.
No, it is an extension of Unicode. Eight-bit characters, for example,
are not part of Unicode.
>> > Note: eight-bit characters are returned as single-byte values in the
>> > range 160 to 255, inclusive."
>>
>> That depends on the multibyteness of the string.
>
> Eight-bit characters are defined as such only in multibyte strings.
That makes it even more incorrect. For multibyte strings you'll get the
internal encoding, which is not in the range 160 to 255.
Andreas.
--
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Wed, 07 Jul 2010 14:27:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 6576 <at> debbugs.gnu.org (full text, mbox):
> From: Andreas Schwab <schwab <at> linux-m68k.org>
> Cc: monkey <at> sandpframing.com, 6576 <at> debbugs.gnu.org
> Date: Wed, 07 Jul 2010 14:16:28 +0200
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
> >> From: Andreas Schwab <schwab <at> linux-m68k.org>
> >> Cc: MON KEY <monkey <at> sandpframing.com>, 6576 <at> debbugs.gnu.org
> >> Date: Wed, 07 Jul 2010 10:40:00 +0200
> >>
> >> Eli Zaretskii <eliz <at> gnu.org> writes:
> >>
> >> > "Return the Unicode codepoint of the first character of STRING.
> >>
> >> This is not correct. The value is just the internal encoding of the
> >> character.
> >
> > Which is Unicode, AFAIK.
>
> No, it is an extension of Unicode. Eight-bit characters, for example,
> are not part of Unicode.
And that's why there's this note:
> >> > Note: eight-bit characters are returned as single-byte values in the
> >> > range 160 to 255, inclusive."
> >>
> >> That depends on the multibyteness of the string.
> >
> > Eight-bit characters are defined as such only in multibyte strings.
>
> That makes it even more incorrect. For multibyte strings you'll get the
> internal encoding, which is not in the range 160 to 255.
Sounds like a bug, assuming it's true.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Wed, 07 Jul 2010 15:49:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 6576 <at> debbugs.gnu.org (full text, mbox):
> Date: Wed, 07 Jul 2010 17:23:40 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: monkey <at> sandpframing.com, 6576 <at> debbugs.gnu.org
>
> > >> > Note: eight-bit characters are returned as single-byte values in the
> > >> > range 160 to 255, inclusive."
> > >>
> > >> That depends on the multibyteness of the string.
> > >
> > > Eight-bit characters are defined as such only in multibyte strings.
> >
> > That makes it even more incorrect. For multibyte strings you'll get the
> > internal encoding, which is not in the range 160 to 255.
>
> Sounds like a bug, assuming it's true.
Actually, there's no way we could return the eight-bit characters in
the 160 to 255 range, since that range is already taken by Unicode
codepoints of Latin characters. So how about
"Return the codepoint of the first character of STRING.
Value is the Unicode codepoint, if it is below #x110000 (in hex).
Codepoints beyond that are Emacs extensions of Unicode. In
particular, eight-bit characters are returned as codepoints in
the range #x3FFF80 through #x3FFFFF, inclusive."
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Wed, 13 Jul 2011 23:52:03 GMT)
Full text and
rfc822 format available.
Message #26 received at 6576 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> Actually, there's no way we could return the eight-bit characters in
> the 160 to 255 range, since that range is already taken by Unicode
> codepoints of Latin characters. So how about
>
> "Return the codepoint of the first character of STRING.
>
> Value is the Unicode codepoint, if it is below #x110000 (in hex).
> Codepoints beyond that are Emacs extensions of Unicode. In
> particular, eight-bit characters are returned as codepoints in
> the range #x3FFF80 through #x3FFFFF, inclusive."
I've now installed a slight variation on this in Emacs 24.
But after checking it in, I started wondering whether this doc string
really makes sense. The function returns an Emacs character, and it
would be rather weird if all functions that take or return an Emacs
character goes through that entire explanation.
Is there a specific reason this particular function deserves this
detailed explanation?
If not, I'd rather just revert the change I just checked in...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Thu, 14 Jul 2011 02:14:01 GMT)
Full text and
rfc822 format available.
Message #29 received at 6576 <at> debbugs.gnu.org (full text, mbox):
Lars Magne Ingebrigtsen <larsi <at> gnus.org> writes:
> I've now installed a slight variation on this in Emacs 24.
>
> But after checking it in, I started wondering whether this doc string
> really makes sense. The function returns an Emacs character, and it
> would be rather weird if all functions that take or return an Emacs
> character goes through that entire explanation.
>
> Is there a specific reason this particular function deserves this
> detailed explanation?
How bout linking to the `String and Character Basics' node in the Elisp
manual instead?
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Thu, 14 Jul 2011 03:05:03 GMT)
Full text and
rfc822 format available.
Message #32 received at 6576 <at> debbugs.gnu.org (full text, mbox):
> From: Lars Magne Ingebrigtsen <larsi <at> gnus.org>
> Cc: schwab <at> linux-m68k.org, monkey <at> sandpframing.com, 6576 <at> debbugs.gnu.org
> Date: Thu, 14 Jul 2011 01:50:59 +0200
>
> > "Return the codepoint of the first character of STRING.
> >
> > Value is the Unicode codepoint, if it is below #x110000 (in hex).
> > Codepoints beyond that are Emacs extensions of Unicode. In
> > particular, eight-bit characters are returned as codepoints in
> > the range #x3FFF80 through #x3FFFFF, inclusive."
>
> I've now installed a slight variation on this in Emacs 24.
>
> But after checking it in, I started wondering whether this doc string
> really makes sense. The function returns an Emacs character, and it
> would be rather weird if all functions that take or return an Emacs
> character goes through that entire explanation.
Which other functions would need this?
> Is there a specific reason this particular function deserves this
> detailed explanation?
If you can suggest a better one that takes care of the original bug
report, please show your suggestion.
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Thu, 14 Jul 2011 13:04:02 GMT)
Full text and
rfc822 format available.
Message #35 received at 6576 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> But after checking it in, I started wondering whether this doc string
>> really makes sense. The function returns an Emacs character, and it
>> would be rather weird if all functions that take or return an Emacs
>> character goes through that entire explanation.
>
> Which other functions would need this?
`char-after', `aref' on a string, `following-char'... Basically
anything that returns a character.
>> Is there a specific reason this particular function deserves this
>> detailed explanation?
>
> If you can suggest a better one that takes care of the original bug
> report, please show your suggestion.
I think "close, notabug" would have taken care of the bug report. :-)
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Thu, 14 Jul 2011 13:04:03 GMT)
Full text and
rfc822 format available.
Message #38 received at 6576 <at> debbugs.gnu.org (full text, mbox):
Chong Yidong <cyd <at> stupidchicken.com> writes:
> How bout linking to the `String and Character Basics' node in the Elisp
> manual instead?
I can do that if you want, but wouldn't it be better just to leave it as
--
Return the first character in STRING.
A multibyte character is handled correctly.
--
If somebody then wonders (and they shouldn't) "hm, what's a character,
then?", they probably already know that a manual exists.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Thu, 14 Jul 2011 13:35:04 GMT)
Full text and
rfc822 format available.
Message #41 received at 6576 <at> debbugs.gnu.org (full text, mbox):
> From: Lars Magne Ingebrigtsen <larsi <at> gnus.org>
> Cc: schwab <at> linux-m68k.org, monkey <at> sandpframing.com, 6576 <at> debbugs.gnu.org
> Date: Thu, 14 Jul 2011 15:00:17 +0200
>
> > Which other functions would need this?
>
> `char-after', `aref' on a string, `following-char'... Basically
> anything that returns a character.
>
> >> Is there a specific reason this particular function deserves this
> >> detailed explanation?
> >
> > If you can suggest a better one that takes care of the original bug
> > report, please show your suggestion.
>
> I think "close, notabug" would have taken care of the bug report. :-)
The original problem which triggered the report was this part, and
this part only:
A multibyte character is handled correctly.
To make a decent job for this bug, we need, as a minimum, to do
something with this unparsable sentence. I admit that my suggestion
went well beyond that, but if we want to take a step back, please
suggest what to say instead. I hope you agree that this sentence
cannot be left as-is.
(To know what is meant by this sentence, look at the source of the
function.)
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Thu, 14 Jul 2011 14:08:01 GMT)
Full text and
rfc822 format available.
Message #44 received at 6576 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> The original problem which triggered the report was this part, and
> this part only:
>
> A multibyte character is handled correctly.
I understand why it is there, because (presumably) in the olden times,
this wasn't always true?
But I think that line can be deleted, too. If the function returns the
first character, then it returns the first character...
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Thu, 14 Jul 2011 15:58:02 GMT)
Full text and
rfc822 format available.
Message #47 received at 6576 <at> debbugs.gnu.org (full text, mbox):
> From: Lars Magne Ingebrigtsen <larsi <at> gnus.org>
> Cc: schwab <at> linux-m68k.org, monkey <at> sandpframing.com, 6576 <at> debbugs.gnu.org
> Date: Thu, 14 Jul 2011 16:06:34 +0200
>
> > A multibyte character is handled correctly.
>
> I understand why it is there, because (presumably) in the olden times,
> this wasn't always true?
>
> But I think that line can be deleted, too.
If you delete it, how would users know what does this function do with
unibyte strings? What numerical value will it return when passed a
unibyte string?
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Thu, 14 Jul 2011 16:13:01 GMT)
Full text and
rfc822 format available.
Message #50 received at 6576 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> If you delete it, how would users know what does this function do with
> unibyte strings? What numerical value will it return when passed a
> unibyte string?
Isn't the same the case with `char-after' in a unibyte buffer, for
instance?
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Thu, 14 Jul 2011 19:37:02 GMT)
Full text and
rfc822 format available.
Message #53 received at 6576 <at> debbugs.gnu.org (full text, mbox):
> From: Lars Magne Ingebrigtsen <larsi <at> gnus.org>
> Cc: schwab <at> linux-m68k.org, monkey <at> sandpframing.com, 6576 <at> debbugs.gnu.org
> Date: Thu, 14 Jul 2011 18:12:24 +0200
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
> > If you delete it, how would users know what does this function do with
> > unibyte strings? What numerical value will it return when passed a
> > unibyte string?
>
> Isn't the same the case with `char-after' in a unibyte buffer, for
> instance?
Yes, but so what? Are you sure users know what that does?
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Thu, 14 Jul 2011 19:40:03 GMT)
Full text and
rfc822 format available.
Message #56 received at 6576 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> Yes, but so what? Are you sure users know what that does?
No, not really. :-) I think these functions should say that they
return a character, and if that's not sufficient, they should refer you
to the manual to explain what Emacs means by a character.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog http://lars.ingebrigtsen.no/
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Fri, 15 Jul 2011 19:07:02 GMT)
Full text and
rfc822 format available.
Message #59 received at 6576 <at> debbugs.gnu.org (full text, mbox):
On Thu, Jul 14, 2011 at 3:39 PM, Lars Magne Ingebrigtsen <larsi <at> gnus.org> wrote:
> No, not really. :-) I think these functions should say that they
> return a character, and if that's not sufficient, they should refer you
> to the manual to explain what Emacs means by a character.
If it doesn't do what it says or then a pointer into the manual won't
help clarify the underlying issue and only glosses over it.
--
/s_P\
Information forwarded
to
owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org
:
bug#6576
; Package
emacs
.
(Thu, 21 Jul 2011 18:39:02 GMT)
Full text and
rfc822 format available.
Message #62 received at 6576 <at> debbugs.gnu.org (full text, mbox):
MON KEY <monkey <at> sandpframing.com> writes:
>> I think these functions should say that they return a character, and
>> if that's not sufficient, they should refer you to the manual to
>> explain what Emacs means by a character.
>
> If it doesn't do what it says or then a pointer into the manual won't
> help clarify the underlying issue and only glosses over it.
I removed the explanation of what a character is; there's no need to do
that here, or in all the other character-related functions like
char-after.
bug closed, send any further explanations to
6576 <at> debbugs.gnu.org and MON KEY <monkey <at> sandpframing.com>
Request was from
Chong Yidong <cyd <at> stupidchicken.com>
to
control <at> debbugs.gnu.org
.
(Thu, 21 Jul 2011 18:39:03 GMT)
Full text and
rfc822 format available.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 19 Aug 2011 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 14 years and 1 day ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.