Package: emacs;
Reported by: MON KEY <monkey <at> sandpframing.com>
Date: Thu, 27 May 2010 17:29:02 UTC
Severity: minor
Done: Chong Yidong <cyd <at> stupidchicken.com>
Bug is archived. No further changes may be made.
View this message in rfc822 format
From: MON KEY <monkey <at> sandpframing.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: 6283 <at> debbugs.gnu.org Subject: bug#6283: doc/lispref/searching.texi reference to octal code `0377' correct? Date: Mon, 31 May 2010 01:35:41 -0400
On Sat, May 29, 2010 at 2:45 AM, Eli Zaretskii <eliz <at> gnu.org> wrote: > > It's not an Emacs convention to represent characters by their > codepoints expressed in octal. It's a widely accepted practice. If > we were to describe every convention in the world in the manual, 99% > of the manual would be devoted to describing conventions. > That it is widely accepted practice is what makes it a convention. Within Emacs lisp it also widely accepted practice to denote numeric representations with #<radixN> notation. This is a conflict of convention. The purpose of demarcating the use of a particular convention in the stead of another is to clarify when one shall be used with preference over another. It is unconventional for the manual to use conflicting conventions without prejudice. This is my concern. > Again, this part of the manual is not about how Emacs represents > characters or reads them. It's about their codes. This is how I understood this portion of the manual. Maybe I'm misunderstanding something fundamental about this distinction. If this is so, I would greatly appreciate it if you could help me to see it more clearly. >> 0377 doesn't have a character that I'm aware of. > > In Unicode, it's a codepoint of LATIN SMALL LETTER Y WITH DIAERESIS. I don't understand this. > > But the text says "...many non-ASCII characters have codes above octal > 0377". It doesn't talk about a specific character here, just about > which codepoints are below it and which are above it. Yes, but the regexp is "[\200-\377]". > > I didn't say that we are going to remove these features any time soon. > Just that the manual doesn't talk too much about this, to avoid > confusing users with issues that are both very complicated and very > obscure, and are rarely if at all needed on the Lisp level. > I certainly agree they are confusing and easily misunderstood. I disagree however that these issues are all that obscure. You seem to suggest that the notation "octal 0NNN" is commonplace yet i personally find this notation to be obscure. tomato|potato <-> potato|tomato > > Of course. But why do you expect to find the description of such > abuse in the manual? > I _do_ find them whereas I don't find reference such w/re the 0377 convention. This is, I guess, my concern. Following is my attempt to come to grips with the distinction between the numeric codepoint, integer character representations, reader conventions etc. w/re the manual and particularly their use in conjuction w/ regexps. I believe this example illustrates some reasonable familiarity with aspects of char/code representation. But maybe this bit of code can help to show if is there something that I am not getting??? ;;; ================================================================ (let (chars-found frob-found) (with-temp-buffer (save-excursion (insert 10 255 10 ?\377 10 "\255" 10 4194221 10 "\377" 10 4194303)) (while (search-forward-regexp "[\200-\377]" nil t) (let* ((md (match-data t)) (md-char (char-before (cadr md)))) (push `(,md-char ,(car md) ,(cadr md)) chars-found)))) (setq chars-found (nreverse chars-found)) (dolist (cf chars-found (setq chars-found `(,(setq frob-found (nreverse frob-found)) ,chars-found))) (push (car (read-from-string (format "#o%o" (car cf)))) frob-found)) (setq frob-found nil) (dolist (ints (car chars-found) (setq chars-found `(,(setq frob-found (nreverse frob-found)) ,@chars-found))) (push `(,ints . ,(char-to-string ints)) frob-found)) (setq frob-found nil) (dolist (d (car chars-found) (setq chars-found `(,(setq frob-found (nreverse frob-found)) ,@chars-found))) (let* ((mltb-int (car d)) (unib-str (cdr d)) (unib-str->mchar (string-to-char (symbol-name (read unib-str)))) (mltb-int->uchar (multibyte-char-to-unibyte mltb-int))) (push `(:mltb-int ,mltb-int :unib-str ,unib-str :unib-str->mchar ,unib-str->mchar :mltb-int->uchar ,mltb-int->uchar) frob-found))) (insert 10 (make-string 68 59) 10 ";; With this regexp:" 10 ";; \(search-forward-regexp \"[\\200-\\377]\" nil t\)" 10 ";; Matched these chars:" 10 255 10 ?\377 10 "\255" 10 4194221 10 "\377" 10 4194303 10 (make-string 68 59) 10) (pp chars-found (current-buffer)) (insert (make-string 68 59) "\n") (let ((cnt 0)) (dolist (pl (car chars-found)) (setq cnt (1+ cnt)) (insert 10 (make-string 68 59) 10 (format (concat ";; :MATCH-DATA-#%d\n" "\n(char-to-string (unibyte-char-to-multibyte %d)) ;<-\"%c%d\"\n" "\n(insert (char-to-string (unibyte-char-to-multibyte %d))) ;<- multibyte-char\n" "\n(insert (identity %S)) ;<- raw-byte\n" "\n(insert (string-to-char (identity %S))) ;<- multibyte-char\n" "\n(insert-byte %d 1) ;<-raw-byte unibyte-char\n" "\n(insert (format \"(insert (identity #o%%o))\" (unibyte-char-to-multibyte %d)))\n") cnt (plist-get pl :mltb-int->uchar) 92 (string-to-number (format "%o" (plist-get pl :mltb-int->uchar))) (plist-get pl :mltb-int->uchar) (plist-get pl :unib-str) (plist-get pl :unib-str) (plist-get pl :mltb-int->uchar) (plist-get pl :mltb-int->uchar)))))) ;;; ================================================================ -- /s_P\
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.