GNU bug report logs - #7962
23.2; capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1

Previous Next

Package: emacs;

Reported by: Emmanuel Bigler <Emmanuel.Bigler <at> ens2m.fr>

Date: Wed, 2 Feb 2011 14:42:03 UTC

Severity: normal

Found in version 23.2

Done: Stefan Monnier <monnier <at> iro.umontreal.ca>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Emmanuel Bigler <Emmanuel.Bigler <at> ens2m.fr>
To: Eli Zaretskii <eliz <at> gnu.org>, Lennart Borgman <lennart.borgman <at> gmail.com>, schwab <at> linux-m68k.org, 7962 <at> debbugs.gnu.org
Cc: svenjoac <at> gmx.de
Subject: bug#7962: 23.2;	capitalize letters ISO-8859-1 with diacritic signs in emacs 23.2.1
Date: Fri, 04 Feb 2011 18:08:51 +0100
[Message part 1 (text/plain, inline)]
>>
>> I see this:
>>    buffer-file-coding-system is a variable defined in `C source code'.
>>    Its value is iso-latin-1-dos
>
> See "M-: (coding-system-priority-list) RET".
>
> The highest-priority encoding is set from your locale, but look what
> is the next one.
>

hello again.
I think I'm starting to understand what is going on.

I had created a long time ago an unibyte file containing the 1-byte 
characters I want to test within emacs. The file was created with a 
program on which I have total control byte-by-byte, so I know what is 
inside the file exactly. I have attached the file to this mail, not sure 
that this is allowed on the gnu-debug mailing list, but this is simple 
and very short .txt file, that reads as follows : (this mail itself  is 
typeset and displayed here as iso-8859-1)

------- mytestchars-224-255-iso-8859.txt ---------------------

  224 \340  à   225 \341  á   226 \342  â   227 \343  ã
  228 \344  ä   229 \345  å   230 \346  æ   231 \347  ç
  232 \350  è   233 \351  é   234 \352  ê   235 \353  ë
  236 \354  ì   237 \355  í   238 \356  î   239 \357  ï
  240 \360  ð   241 \361  ñ   242 \362  ò   243 \363  ó
  244 \364  ô   245 \365  õ   246 \366  ö   247 \367  ÷
  248 \370  ø   249 \371  ù   250 \372  ú   251 \373  û
  252 \374  ü   253 \375  ý   254 \376  þ   255 \377  ÿ

éèçàù  < test strings to see how they behave
Éèçàù

----------------------------------------------------------


I started /usr/local/bin/emacs -Q mytestchars-224-255-iso-8859.txt
under emacs  23.2.93.1 (i686-pc-linux-gnu)

The file displays perfectly correctly. (describe-char (point)) gives me 
exactly what I want, i.e. an extended asci decimal code between 224 and 255.
Almost all operations (except capitalize, see below) work exactly as I 
wish and exactly like in older emacs versions, no mystery since the 
priority list
M-: (coding-system-priority-list) RET reads as :
(iso-latin-1 utf-8 iso-2022-7bit iso-2022-7bit-lock iso-2022-8bit-ss2 
emacs-mule raw-text iso-2022-jp in-is13194-devanagari chinese-iso-8bit 
utf-8-auto utf-8-with-signature ...)

Again I'm perfectly happy since I see that iso-latin-1 comes first, but 
is this what I want ? certainly yes,
my locale environment variables look like :
LC_ALL=fr_FR.ISO8859-1
LC_COLLATE=fr_FR.ISO8859-1
LANG=fr_FR.ISO8859-1
GDM_LANG=fr_FR.iso88591
LC_CTYPE=fr_FR.ISO8859-1
XTERM_LOCALE=fr_FR.ISO8859-1

However, in this emacs -Q session, with a correct unibyte display of
an unibyte file, *capitalize does not work*.
At the beginning of this discussion, Sven explained that capitalize 
would only work on 2-byte characters. Which I tested of course, and of 
course it works, but I simply wish I could continue to capitalize M-c 
unibyte words like  in the good old iso-8859 days !!

Additional info : when applying the M-c command to a letter above
decimal ascii 224, nothing happens on the display as reported, *although 
the buffer is marked as being changed.*

Incidentally in a good ol' xterm window (fitted with gnu readline and
obeying my LOCALE preferences as liste above), M-c works perfectly as
it should, and if I cut-paste from the xterm to the emacs buffer,
everything looks fine & unibyte ... except that I can no longer change
the case of the pasted string with 'capitalize' or a similar 'case'
command.

Bug, or UTF-8 emacs 23.2 feature ?

--
Emmanuel
[mytestchars-224-255-iso-8859.txt (text/plain, attachment)]

This bug report was last modified 14 years and 160 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.