GNU bug report logs -
#6283
doc/lispref/searching.texi reference to octal code `0377' correct?
Previous Next
Reported by: MON KEY <monkey <at> sandpframing.com>
Date: Thu, 27 May 2010 17:29:02 UTC
Severity: minor
Done: Chong Yidong <cyd <at> stupidchicken.com>
Bug is archived. No further changes may be made.
Full log
Message #17 received at 6283 <at> debbugs.gnu.org (full text, mbox):
On Fri, May 28, 2010 at 3:15 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
> Sorry, I don't see the relevance. The manual talks about the
> _numeric_ code of characters, not about their read syntax.
I must be misunderstanding something.
What is the numeric code of \255 ?
> It uses "octal 0377" to present values because octal notation of
> single-byte characters is something many people are familiar with,
Where is this convention detailed/discussed in the manual?
I don't find it mentioned in the (info "(elisp)Conventions").
Should it be, esp. as 0377 is not a representation exposed by the
Emacs user level interface (at least none that that I'm aware of).
> After all, that is the codepoint of the character.
Of which character?
0377 doesn't have a character that I'm aware of.
> This is explained in "Non-ASCII Characters". But we generally try not
But this is my point, that section (being the most relevant to
Non-ASCII notation) tends to use the #<Radian> notation.
> to advertise this issue too much, because there should be no good
> reason for a Lisp program to create raw bytes. Emacs is a text
> editor, while raw bytes are not text
Thats just silly. Emacs accomodates noodling w/ raw-bytes because it
is neccesary to edit them on occasion. Heck, Emacs w32 distributes
with a dedicated executable just to edit binary data in hexadecimal
form.
>> whenever I need to manually revert some raw-bytes or improperly
>> encoded bit-rotted text using regexps.
>
> It's hard to believe Emacs couldn't handle any such text in some other
> way.
It generally can. However, sometimes file encodings get out of whack
over time and once they are more than a generation away from
rightedness Emacs isn't always able to revert them.
The good thing is Emacs can do this and I'm very glad it does :)
Besides, its my prerogative how I choose to abuse Emacs into abusing
my data.
> What "improper encoding" was that which Emacs couldn't handle?
The "mixed bag encoding". Not all of my files origniated in Emacs. Not
all of them get read into an Emacs buffer without problems.
GIGO c'est la vie.
FWIW I have entire SQL databases multi-lingual multi-encoding data
that was improperly uploaded into them via a misconfigured PHP script
with a funky encoding declartion which itself got its input from a
certain legacy proprietary w32 web-browser that understood (read
willfully mis-interpreted) UTF-8 according to its own whims and I can
assure you that encodings don't translate perfectly nor are the
mis-translations always easily caught or corrected.
Stuff like this can sometimes happen with system locales too.
Transitioning files from vfat will clobber file names too if your not carefull.
Sometimes I need to find the raw-bytes and replace them with their
character equivalent.
> Could it be that you simply gave up too early and tried to solve the
> problem by treating text as bytes, while it really wasn't?
Nope. I'm usually pretty good about _not_ approaching these problems
with this type of hammer unless it is a last resort.
--
/s_P\
This bug report was last modified 14 years and 358 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.