GNU bug report logs - #6974
Emacs doesn't like Swedish ä (on w32)

Package: emacs;

Reported by: Lennart Borgman <lennart.borgman <at> gmail.com>

Date: Thu, 2 Sep 2010 21:57:02 UTC

Severity: normal

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 6974 in the body.
You can then email your comments to 6974 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Thu, 02 Sep 2010 21:57:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Lennart Borgman <lennart.borgman <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 02 Sep 2010 21:57:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Lennart Borgman <lennart.borgman <at> gmail.com>
To: Emacs Bugs <bug-gnu-emacs <at> gnu.org>
Subject: Emacs doesn't like Swedish ä (on w32)
Date: Thu, 2 Sep 2010 23:58:04 +0200

I have a file whose name contains the Swedish char ä (a with two dots
above, 228 in latin-1).

I have seen several strange things with this. Here are some I remember:


I start from "emacs -Q". In an org-mode file I try to insert a link to
this file with "C-c C-l file FILENAME".


* After M-x set-language-environment RET RET (i.e. "English") and then
opening a new org file.

Works nicely. Examining the char "ä" in the link in the org buffer
gives as expected.

        character: ä (228, #o344, #xe4)
preferred charset: unicode-bmp
		   (Unicode Basic Multilingual Plane (U+0000..U+FFFF))
       code point: 0xE4
           syntax: w 	which means: word
         category: .:Base, j:Japanese, l:Latin
      buffer code: #xC3 #xA4
        file code: #xE4 (encoded by coding system iso-latin-1-dos)
          display: by this font (glyph code)
    uniscribe:-outline-Courier
New-normal-normal-normal-mono-13-*-*-*-c-*-iso8859-1 (#x6C)

In this case I can save the file as usual.


* After M-x set-language-environment RET utf-8 RET and then opening a
new org file.

When choosing the file the char "ä" is shown as \344. It looks the
same when inserted in the buffer as an org link to the file.

        character:   (4194276, #o17777744, #x3fffe4)
preferred charset: eight-bit (Raw bytes 128-255)
       code point: 0xE4
           syntax: w 	which means: word
      buffer code: #xE4
        file code: not encodable by coding system utf-8-dos
          display: no font available

After this I can not save the file (or rather Emacs prompts me for a
coding system).


I also saw that pasting the file name in some circumstances converts
the char "ä" (or was it \344) to a space. Unfortunately I do not
remember how that happened and I can not reproduce it now. (But I am
very sure it happened this morning.)


In GNU Emacs 24.0.50.1 (i386-mingw-nt5.1.2600)
 of 2010-08-10
Windowing system distributor `Microsoft Corp.', version 5.1.2600
configured using `configure --with-gcc (3.4) --no-opt --cflags
-Ic:/g/include -fno-crossjumping'

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Fri, 03 Sep 2010 01:44:02 GMT) Full text and rfc822 format available.

Message #8 received at 6974 <at> debbugs.gnu.org (full text, mbox):

From: Jason Rumney <jasonr <at> gnu.org>
To: 6974 <at> debbugs.gnu.org
Cc: Lennart Borgman <lennart.borgman <at> gmail.com>
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Fri, 03 Sep 2010 09:44:37 +0800

On 03/09/2010 05:58, Lennart Borgman wrote:
> I have a file whose name contains the Swedish char ä (a with two dots
> above, 228 in latin-1).
>
> I have seen several strange things with this. Here are some I remember:
>
>
> I start from "emacs -Q". In an org-mode file I try to insert a link to
> this file with "C-c C-l file FILENAME".
>    
> * After M-x set-language-environment RET RET (i.e. "English") and then
> opening a new org file.
>
> Works nicely. Examining the char "ä" in the link in the org buffer
> gives as expected.
>    

OK, but what happens in the default case you tried above, before you 
started messing with the language environment?


> * After M-x set-language-environment RET utf-8 RET and then opening a
> new org file.
>    

Of course it is possible to break things by setting your language 
environment inappropriately. But what happens in the default case?

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Fri, 03 Sep 2010 08:03:02 GMT) Full text and rfc822 format available.

Message #11 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Fri, 03 Sep 2010 10:03:32 +0200

[Message part 1 (text/plain, inline)]

Am 02.09.2010 23:58, schrieb Lennart Borgman:
> I have a file whose name contains the Swedish char ä (a with two dots
> above, 228 in latin-1).
>
> I have seen several strange things with this. Here are some I remember:
>
>
> I start from "emacs -Q". In an org-mode file I try to insert a link to
> this file with "C-c C-l file FILENAME".
>
>
> * After M-x set-language-environment RET RET (i.e. "English") and then
> opening a new org file.
>
> Works nicely. Examining the char "ä" in the link in the org buffer
> gives as expected.
>
>          character: ä (228, #o344, #xe4)
> preferred charset: unicode-bmp
> 		   (Unicode Basic Multilingual Plane (U+0000..U+FFFF))
>         code point: 0xE4
>             syntax: w 	which means: word
>           category: .:Base, j:Japanese, l:Latin
>        buffer code: #xC3 #xA4
>          file code: #xE4 (encoded by coding system iso-latin-1-dos)
>            display: by this font (glyph code)
>      uniscribe:-outline-Courier
> New-normal-normal-normal-mono-13-*-*-*-c-*-iso8859-1 (#x6C)
>
> In this case I can save the file as usual.
>
>
> * After M-x set-language-environment RET utf-8 RET and then opening a
> new org file.
>
> When choosing the file the char "ä" is shown as \344. It looks the
> same when inserted in the buffer as an org link to the file.
>
>          character:   (4194276, #o17777744, #x3fffe4)
> preferred charset: eight-bit (Raw bytes 128-255)
>         code point: 0xE4
>             syntax: w 	which means: word
>        buffer code: #xE4
>          file code: not encodable by coding system utf-8-dos
>            display: no font available
>
> After this I can not save the file (or rather Emacs prompts me for a
> coding system).
>
>
> I also saw that pasting the file name in some circumstances converts
> the char "ä" (or was it \344) to a space. Unfortunately I do not
> remember how that happened and I can not reproduce it now. (But I am
> very sure it happened this morning.)
>
>
> In GNU Emacs 24.0.50.1 (i386-mingw-nt5.1.2600)
>   of 2010-08-10
> Windowing system distributor `Microsoft Corp.', version 5.1.2600
> configured using `configure --with-gcc (3.4) --no-opt --cflags
> -Ic:/g/include -fno-crossjumping'
>
>
>
>


Hi,

seeing the similar:

when opening a file containing non-ascii chars, german
umlauts for example, in some case these aren't shown
as glyphs but as numbers.

See screenshot attached how the following code looks
like:

(define-abbrev-table
  'global-abbrev-table
  '(("Infinity" "∞" nil 0)
    ("alpha" "α" nil 2)
    ("beta" "β" nil 1)
    ("gamma" "γ" nil 1)
    ("theta" "θ" nil 0)))

As the only thing I remember since, is the use of

GNU Emacs 24.0.50.1 (i686-pc-linux-gnu, GTK+ Version 2.12.0) of 2010-08-28

assume the bug is there.

BTW if I paste the wrongly shown text into this mail for example,
glyphs are shown correctly.

Andreas

--
https://code.launchpad.net/~a-roehler/python-mode
https://code.launchpad.net/s-x-emacs-werkstatt/

[zeichen.png (image/png, attachment)]

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Fri, 03 Sep 2010 08:05:02 GMT) Full text and rfc822 format available.

Message #14 received at 6974 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lennart Borgman <lennart.borgman <at> gmail.com>
Cc: 6974 <at> debbugs.gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Fri, 03 Sep 2010 11:08:33 +0300

> From: Lennart Borgman <lennart.borgman <at> gmail.com>
> Date: Thu, 2 Sep 2010 23:58:04 +0200
> Cc: 
> 
> * After M-x set-language-environment RET utf-8 RET and then opening a
> new org file.
> 
> When choosing the file the char "ä" is shown as \344. It looks the
> same when inserted in the buffer as an org link to the file.

set-language-environment changes the defaults for various
coding-systems, including file-name-coding-system that's used for
decoding file names.  On Windows, you should _never_ have
file-name-coding-system different from the current codepage, because
that's the only encoding of file names Emacs can currently support on
Windows.  (The other one is UTF-16, which is how Windows encodes file
names on the disk, but Emacs does not yet support that, because such
support would need to switch all the file APIs to use wide
characters.)

So the question is: what is your value of file-name-coding-system,
after you invoke set-language-environment?  If it's anything but your
current Windows codepage, set it back with "C-x RET F" (note: capital
F).

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Fri, 03 Sep 2010 08:59:01 GMT) Full text and rfc822 format available.

Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Fri, 03 Sep 2010 12:01:52 +0300

> Date: Fri, 03 Sep 2010 10:03:32 +0200
> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
> Cc: 
> 
> seeing the similar:
> 
> when opening a file containing non-ascii chars, german
> umlauts for example, in some case these aren't shown
> as glyphs but as numbers.

This is a different problem entirely, please file a separate bug
report (although my guess is that this is some cockpit error on your
part, so perhaps discussing this on emacs-devel is a better way of
resolving it).

> See screenshot attached how the following code looks
> like:
> 
> (define-abbrev-table
>    'global-abbrev-table
>    '(("Infinity" "∞" nil 0)
>      ("alpha" "α" nil 2)
>      ("beta" "β" nil 1)
>      ("gamma" "γ" nil 1)
>      ("theta" "θ" nil 0)))

The screenshot shows "t" at the mode-line's left edge, which means
Emacs decoded the file's contents with raw-text coding-system.
raw-text interprets all non-ASCII characters as raw bytes, and
displays them as such, with octal escapes.

The most probable reason for Emacs not to decode the file correctly
(as UTF-8) is that the file includes some bytes that are invalid UTF-8
sequences.  What happens if you force UTF-8 with "C-x RET c" before
visiting the file with "C-x C-f"?

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Fri, 03 Sep 2010 09:45:02 GMT) Full text and rfc822 format available.

Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Fri, 03 Sep 2010 11:46:00 +0200

Am 03.09.2010 11:01, schrieb Eli Zaretskii:
>> Date: Fri, 03 Sep 2010 10:03:32 +0200
>> From: Andreas Röhler<andreas.roehler <at> easy-emacs.de>
>> Cc:
>>
>> seeing the similar:
>>
>> when opening a file containing non-ascii chars, german
>> umlauts for example, in some case these aren't shown
>> as glyphs but as numbers.
>
> This is a different problem entirely, please file a separate bug
> report

done with bug#6941

(although my guess is that this is some cockpit error on your
> part,

a theoretically possible source might be code from some 
auto-saved-buffer-file mangled in (?), even if not noticed that.


 so perhaps discussing this on emacs-devel is a better way of
> resolving it).
>
>> See screenshot attached how the following code looks
>> like:
>>
>> (define-abbrev-table
>>     'global-abbrev-table
>>     '(("Infinity" "∞" nil 0)
>>       ("alpha" "α" nil 2)
>>       ("beta" "β" nil 1)
>>       ("gamma" "γ" nil 1)
>>       ("theta" "θ" nil 0)))
>
> The screenshot shows "t" at the mode-line's left edge, which means
> Emacs decoded the file's contents with raw-text coding-system.
> raw-text interprets all non-ASCII characters as raw bytes, and
> displays them as such, with octal escapes.
>
> The most probable reason for Emacs not to decode the file correctly
> (as UTF-8) is that the file includes some bytes that are invalid UTF-8
> sequences.  What happens if you force UTF-8 with "C-x RET c" before
> visiting the file with "C-x C-f"?
>

All fine at the first glance than.

However, re-opening the newly saved buffer repeats the wrong display.

Also when saving, it always prompts for coding-system, suggests raw-text 
first.

Setting buffer-file-coding-system explicitly to utf-8-unix, followed by 
a save, doesn't change the wrong display after new opening.

Difference so for from earlier, it accepts a save at all.

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Fri, 03 Sep 2010 11:19:02 GMT) Full text and rfc822 format available.

Message #23 received at 6974 <at> debbugs.gnu.org (full text, mbox):

From: Lennart Borgman <lennart.borgman <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 6974 <at> debbugs.gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Fri, 3 Sep 2010 13:19:42 +0200

On Fri, Sep 3, 2010 at 10:08 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
>> From: Lennart Borgman <lennart.borgman <at> gmail.com>
>> Date: Thu, 2 Sep 2010 23:58:04 +0200
>> Cc:
>>
>> * After M-x set-language-environment RET utf-8 RET and then opening a
>> new org file.
>>
>> When choosing the file the char "ä" is shown as \344. It looks the
>> same when inserted in the buffer as an org link to the file.
>
> set-language-environment changes the defaults for various
> coding-systems, including file-name-coding-system that's used for
> decoding file names.  On Windows, you should _never_ have
> file-name-coding-system different from the current codepage, because
> that's the only encoding of file names Emacs can currently support on
> Windows.  (The other one is UTF-16, which is how Windows encodes file
> names on the disk, but Emacs does not yet support that, because such
> support would need to switch all the file APIs to use wide
> characters.)

Using "M-x set-language-environment" was just a way to try to
reproduce the problem. I do not know how to do that otherwise. (I know
very little about coding systems.)

> So the question is: what is your value of file-name-coding-system,
> after you invoke set-language-environment?

It is nil both before and after "M-x set-language-environment".

But something clearly has changed, see what I wrote initially.

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Fri, 03 Sep 2010 11:20:03 GMT) Full text and rfc822 format available.

Message #26 received at 6974 <at> debbugs.gnu.org (full text, mbox):

From: Lennart Borgman <lennart.borgman <at> gmail.com>
To: Jason Rumney <jasonr <at> gnu.org>
Cc: 6974 <at> debbugs.gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Fri, 3 Sep 2010 13:20:59 +0200

On Fri, Sep 3, 2010 at 3:44 AM, Jason Rumney <jasonr <at> gnu.org> wrote:
> On 03/09/2010 05:58, Lennart Borgman wrote:
>>
>> I have a file whose name contains the Swedish char ä (a with two dots
>> above, 228 in latin-1).
>>
>> I have seen several strange things with this. Here are some I remember:
>>
>>
>> I start from "emacs -Q". In an org-mode file I try to insert a link to
>> this file with "C-c C-l file FILENAME".
>>   * After M-x set-language-environment RET RET (i.e. "English") and then
>> opening a new org file.
>>
>> Works nicely. Examining the char "ä" in the link in the org buffer
>> gives as expected.
>>
>
> OK, but what happens in the default case you tried above, before you started
> messing with the language environment?

It works ok.

>> * After M-x set-language-environment RET utf-8 RET and then opening a
>> new org file.
>>
>
> Of course it is possible to break things by setting your language
> environment inappropriately. But what happens in the default case?

Thanks, but please see my reply to Eli.

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Fri, 03 Sep 2010 11:59:02 GMT) Full text and rfc822 format available.

Message #29 received at 6974 <at> debbugs.gnu.org (full text, mbox):

From: Lennart Borgman <lennart.borgman <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 6974 <at> debbugs.gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Fri, 3 Sep 2010 13:59:59 +0200

On Fri, Sep 3, 2010 at 1:19 PM, Lennart Borgman
<lennart.borgman <at> gmail.com> wrote:
> On Fri, Sep 3, 2010 at 10:08 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
>>> From: Lennart Borgman <lennart.borgman <at> gmail.com>
>>> Date: Thu, 2 Sep 2010 23:58:04 +0200
>>> Cc:
>>>
>>> * After M-x set-language-environment RET utf-8 RET and then opening a
>>> new org file.
>>>
>>> When choosing the file the char "ä" is shown as \344. It looks the
>>> same when inserted in the buffer as an org link to the file.
>>
>> set-language-environment changes the defaults for various
>> coding-systems, including file-name-coding-system that's used for
>> decoding file names.  On Windows, you should _never_ have
>> file-name-coding-system different from the current codepage, because
>> that's the only encoding of file names Emacs can currently support on
>> Windows.  (The other one is UTF-16, which is how Windows encodes file
>> names on the disk, but Emacs does not yet support that, because such
>> support would need to switch all the file APIs to use wide
>> characters.)
>
> Using "M-x set-language-environment" was just a way to try to
> reproduce the problem. I do not know how to do that otherwise. (I know
> very little about coding systems.)
>
>> So the question is: what is your value of file-name-coding-system,
>> after you invoke set-language-environment?
>
> It is nil both before and after "M-x set-language-environment".
>
> But something clearly has changed, see what I wrote initially.


It is default-file-name-coding-system that has changed.

I found that my problem was caused by a left over
current-language-environment (set to UTF-8) in my custom file.

Thanks for the help. Sorry for the noise.

Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Fri, 03 Sep 2010 13:37:02 GMT) Full text and rfc822 format available.

Notification sent to Lennart Borgman <lennart.borgman <at> gmail.com>:
bug acknowledged by developer. (Fri, 03 Sep 2010 13:37:02 GMT) Full text and rfc822 format available.

Message #34 received at 6974-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lennart Borgman <lennart.borgman <at> gmail.com>
Cc: 6974-done <at> debbugs.gnu.org
Subject: Re: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Fri, 03 Sep 2010 16:38:42 +0300

> From: Lennart Borgman <lennart.borgman <at> gmail.com>
> Date: Fri, 3 Sep 2010 13:59:59 +0200
> Cc: 6974 <at> debbugs.gnu.org
> 
> It is default-file-name-coding-system that has changed.

Right, that would be my next question.

> Thanks for the help. Sorry for the noise.

Okay, closing the bug.

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Fri, 03 Sep 2010 13:49:01 GMT) Full text and rfc822 format available.

Message #37 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Fri, 03 Sep 2010 16:49:29 +0300

> Date: Fri, 03 Sep 2010 11:46:00 +0200
> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
> CC: bug-gnu-emacs <at> gnu.org
> 
> > The most probable reason for Emacs not to decode the file correctly
> > (as UTF-8) is that the file includes some bytes that are invalid UTF-8
> > sequences.  What happens if you force UTF-8 with "C-x RET c" before
> > visiting the file with "C-x C-f"?
> >
> 
> All fine at the first glance than.
> 
> However, re-opening the newly saved buffer repeats the wrong display.

Sure, because the problem that caused Emacs to decode the file as
raw-text is still in the file.

> Also when saving, it always prompts for coding-system, suggests raw-text 
> first.

Expected, since there are problematic characters in the file.  Try
this:

 M-: (unencodable-char-position (point-min) (point-max) 'utf-8) RET

It should show you the first position in the buffer where you have a
character that cannot be encoded by UTF-8.  If all the characters can
be encoded by UTF-8, this will evaluate to nil.

> Setting buffer-file-coding-system explicitly to utf-8-unix, followed by 
> a save, doesn't change the wrong display after new opening.

And it shouldn't.

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Fri, 03 Sep 2010 16:24:02 GMT) Full text and rfc822 format available.

Message #40 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Fri, 03 Sep 2010 18:23:33 +0200

Am 03.09.2010 15:49, schrieb Eli Zaretskii:
>> Date: Fri, 03 Sep 2010 11:46:00 +0200
>> From: Andreas Röhler<andreas.roehler <at> easy-emacs.de>
>> CC: bug-gnu-emacs <at> gnu.org
>>
>>> The most probable reason for Emacs not to decode the file correctly
>>> (as UTF-8) is that the file includes some bytes that are invalid UTF-8
>>> sequences.  What happens if you force UTF-8 with "C-x RET c" before
>>> visiting the file with "C-x C-f"?
>>>
>>
>> All fine at the first glance than.
>>
>> However, re-opening the newly saved buffer repeats the wrong display.
>
> Sure, because the problem that caused Emacs to decode the file as
> raw-text is still in the file.
>
>> Also when saving, it always prompts for coding-system, suggests raw-text
>> first.
>
> Expected, since there are problematic characters in the file.  Try
> this:
>
>   M-: (unencodable-char-position (point-min) (point-max) 'utf-8) RET
>
> It should show you the first position in the buffer where you have a
> character that cannot be encoded by UTF-8.  If all the characters can
> be encoded by UTF-8, this will evaluate to nil.

With buffer narrowed to the code from screen-shot it says:

3993 (#o7631, #xf99)


>
>> Setting buffer-file-coding-system explicitly to utf-8-unix, followed by
>> a save, doesn't change the wrong display after new opening.
>
> And it shouldn't.
>

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Fri, 03 Sep 2010 17:59:02 GMT) Full text and rfc822 format available.

Message #43 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Fri, 03 Sep 2010 20:59:56 +0300

> Date: Fri, 03 Sep 2010 18:23:33 +0200
> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
> CC: bug-gnu-emacs <at> gnu.org
> 
> >   M-: (unencodable-char-position (point-min) (point-max) 'utf-8) RET
> >
> > It should show you the first position in the buffer where you have a
> > character that cannot be encoded by UTF-8.  If all the characters can
> > be encoded by UTF-8, this will evaluate to nil.
> 
> With buffer narrowed to the code from screen-shot it says:
> 
> 3993 (#o7631, #xf99)

Well, what is at buffer position 3993?

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Fri, 03 Sep 2010 19:33:01 GMT) Full text and rfc822 format available.

Message #46 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Fri, 03 Sep 2010 21:33:20 +0200

Am 03.09.2010 19:59, schrieb Eli Zaretskii:
>> Date: Fri, 03 Sep 2010 18:23:33 +0200
>> From: Andreas Röhler<andreas.roehler <at> easy-emacs.de>
>> CC: bug-gnu-emacs <at> gnu.org
>>
>>>    M-: (unencodable-char-position (point-min) (point-max) 'utf-8) RET
>>>
>>> It should show you the first position in the buffer where you have a
>>> character that cannot be encoded by UTF-8.  If all the characters can
>>> be encoded by UTF-8, this will evaluate to nil.
>>
>> With buffer narrowed to the code from screen-shot it says:
>>
>> 3993 (#o7631, #xf99)
>
> Well, what is at buffer position 3993?
>

the first char not displayed correctly
should display the infinity-symbol,

If I copy this tree numbers here,
it's shown:
"∞"

thats funny.

In this Emacs-buffer it's displayed as "\342\210\236"

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Fri, 03 Sep 2010 21:03:02 GMT) Full text and rfc822 format available.

Message #49 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Sat, 04 Sep 2010 00:05:33 +0300

> Date: Fri, 03 Sep 2010 21:33:20 +0200
> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
> CC: bug-gnu-emacs <at> gnu.org
> 
> Am 03.09.2010 19:59, schrieb Eli Zaretskii:
> >> Date: Fri, 03 Sep 2010 18:23:33 +0200
> >> From: Andreas Röhler<andreas.roehler <at> easy-emacs.de>
> >> CC: bug-gnu-emacs <at> gnu.org
> >>
> >>>    M-: (unencodable-char-position (point-min) (point-max) 'utf-8) RET
> >>>
> >>> It should show you the first position in the buffer where you have a
> >>> character that cannot be encoded by UTF-8.  If all the characters can
> >>> be encoded by UTF-8, this will evaluate to nil.
> >>
> >> With buffer narrowed to the code from screen-shot it says:
> >>
> >> 3993 (#o7631, #xf99)
> >
> > Well, what is at buffer position 3993?
> >
> 
> the first char not displayed correctly
> should display the infinity-symbol,

Can you use some kind of hex dump program (e.g., `od') to show what's
in the file at that place?

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Sat, 04 Sep 2010 06:22:01 GMT) Full text and rfc822 format available.

Message #52 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Sat, 04 Sep 2010 08:22:51 +0200

Am 03.09.2010 23:05, schrieb Eli Zaretskii:
>> Date: Fri, 03 Sep 2010 21:33:20 +0200
>> From: Andreas Röhler<andreas.roehler <at> easy-emacs.de>
>> CC: bug-gnu-emacs <at> gnu.org
>>
>> Am 03.09.2010 19:59, schrieb Eli Zaretskii:
>>>> Date: Fri, 03 Sep 2010 18:23:33 +0200
>>>> From: Andreas Röhler<andreas.roehler <at> easy-emacs.de>
>>>> CC: bug-gnu-emacs <at> gnu.org
>>>>
>>>>>     M-: (unencodable-char-position (point-min) (point-max) 'utf-8) RET
>>>>>
>>>>> It should show you the first position in the buffer where you have a
>>>>> character that cannot be encoded by UTF-8.  If all the characters can
>>>>> be encoded by UTF-8, this will evaluate to nil.
>>>>
>>>> With buffer narrowed to the code from screen-shot it says:
>>>>
>>>> 3993 (#o7631, #xf99)
>>>
>>> Well, what is at buffer position 3993?
>>>
>>
>> the first char not displayed correctly
>> should display the infinity-symbol,
>
> Can you use some kind of hex dump program (e.g., `od') to show what's
> in the file at that place?
>

0000000 062050 063145 067151 026545 061141 071142 073145 072055
0000020 061141 062554 020012 023440 066147 061157 066141 060455
0000040 061142 062562 026566 060564 066142 005145 020040 024047
0000060 021050 067111 064546 064556 074564 020042 161042 117210
0000100 020042 064556 020154 024460 020012 020040 024040 060442
0000120 070154 060550 020042 147042 021261 067040 066151 031040
0000140 005051 020040 020040 021050 071141 021061 021040 103342
0000160 021222 067040 066151 030040 005051 020040 020040 021050
0000200 071141 021062 021040 103742 021222 067040 066151 030040
0000220 005051 020040 020040 021050 062542 060564 020042 147042
0000240 021262 067040 066151 030440 005051 020040 020040 021050
0000260 060547 066555 021141 021040 131716 020042 064556 020154
0000300 024461 020012 020040 024040 072042 062550 060564 020042
0000320 147042 021270 067040 066151 030040 024451 005051
0000336

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Sat, 04 Sep 2010 06:41:02 GMT) Full text and rfc822 format available.

Message #55 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Sat, 04 Sep 2010 09:43:27 +0300

> Date: Sat, 04 Sep 2010 08:22:51 +0200
> From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
> CC: bug-gnu-emacs <at> gnu.org
> 
> > Can you use some kind of hex dump program (e.g., `od') to show what's
> > in the file at that place?
> >
> 
> 0000000 062050 063145 067151 026545 061141 071142 073145 072055
> 0000020 061141 062554 020012 023440 066147 061157 066141 060455
> 0000040 061142 062562 026566 060564 066142 005145 020040 024047
> 0000060 021050 067111 064546 064556 074564 020042 161042 117210
> 0000100 020042 064556 020154 024460 020012 020040 024040 060442
> 0000120 070154 060550 020042 147042 021261 067040 066151 031040
> 0000140 005051 020040 020040 021050 071141 021061 021040 103342
> 0000160 021222 067040 066151 030040 005051 020040 020040 021050
> 0000200 071141 021062 021040 103742 021222 067040 066151 030040
> 0000220 005051 020040 020040 021050 062542 060564 020042 147042
> 0000240 021262 067040 066151 030440 005051 020040 020040 021050
> 0000260 060547 066555 021141 021040 131716 020042 064556 020154
> 0000300 024461 020012 020040 024040 072042 062550 060564 020042
> 0000320 147042 021270 067040 066151 030040 024451 005051
> 0000336

Please post the file as an attachment.

Information forwarded to owner <at> debbugs.gnu.org, bug-gnu-emacs <at> gnu.org:
bug#6974; Package emacs. (Sat, 04 Sep 2010 07:30:03 GMT) Full text and rfc822 format available.

Message #58 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andreas Röhler <andreas.roehler <at> easy-emacs.de>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#6974: Emacs doesn't like Swedish ä (on w32)
Date: Sat, 04 Sep 2010 09:30:55 +0200

[Message part 1 (text/plain, inline)]

Am 04.09.2010 08:43, schrieb Eli Zaretskii:
>> Date: Sat, 04 Sep 2010 08:22:51 +0200
>> From: Andreas Röhler<andreas.roehler <at> easy-emacs.de>
>> CC: bug-gnu-emacs <at> gnu.org
>>
>>> Can you use some kind of hex dump program (e.g., `od') to show what's
>>> in the file at that place?
>>>
>>
>> 0000000 062050 063145 067151 026545 061141 071142 073145 072055
>> 0000020 061141 062554 020012 023440 066147 061157 066141 060455
>> 0000040 061142 062562 026566 060564 066142 005145 020040 024047
>> 0000060 021050 067111 064546 064556 074564 020042 161042 117210
>> 0000100 020042 064556 020154 024460 020012 020040 024040 060442
>> 0000120 070154 060550 020042 147042 021261 067040 066151 031040
>> 0000140 005051 020040 020040 021050 071141 021061 021040 103342
>> 0000160 021222 067040 066151 030040 005051 020040 020040 021050
>> 0000200 071141 021062 021040 103742 021222 067040 066151 030040
>> 0000220 005051 020040 020040 021050 062542 060564 020042 147042
>> 0000240 021262 067040 066151 030440 005051 020040 020040 021050
>> 0000260 060547 066555 021141 021040 131716 020042 064556 020154
>> 0000300 024461 020012 020040 024040 072042 062550 060564 020042
>> 0000320 147042 021270 067040 066151 030040 024451 005051
>> 0000336
>
> Please post the file as an attachment.
>

Attached.

BTW it's my personal note-file, some emacs-issues to remember. So 
doesn't contain real privat stuff, even if not conceived being public 
once... :)

Hex-dump is from the narrowed section containing the example-code.

Encoding-errors are at other positions too.

[befehle.txt (text/plain, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 02 Oct 2010 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 14 years and 314 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #6974 Emacs doesn't like Swedish ä (on w32)

GNU bug report logs - #6974
Emacs doesn't like Swedish ä (on w32)