GNU bug report logs - #3607
23.0.94; odd character in fringe.el

Previous Next

Package: emacs;

Reported by: "Drew Adams" <drew.adams <at> oracle.com>

Date: Thu, 18 Jun 2009 18:10:06 UTC

Severity: normal

Done: Glenn Morris <rgm <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 3607 in the body.
You can then email your comments to 3607 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Thu, 18 Jun 2009 18:10:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Drew Adams" <drew.adams <at> oracle.com>:
New bug report received and forwarded. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Thu, 18 Jun 2009 18:10:07 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> emacsbugs.donarmstrong.com (full text, mbox):

From: "Drew Adams" <drew.adams <at> oracle.com>
To: <emacs-pretest-bug <at> gnu.org>
Subject: 23.0.94; odd character in fringe.el
Date: Thu, 18 Jun 2009 11:00:52 -0700
I don't claim this is a bug, but perhaps someone could take a look to
be sure.
 
In fringe.el, I see this text: Pavel Jan=c3=adk. The next-to-last
character shows in my Emacs with face `escape-glyph'. This is what
`C-u C-x =' shows:
 
        character: =ad (173, #o255, #xad)
preferred charset: iso-8859-1 (Latin-1 (ISO/IEC 8859-1))
       code point: 0xAD
           syntax: _  which means: symbol
         category: b:Arabic, h:Korean, j:Japanese, l:Latin
      buffer code: #xC2 #xAD
        file code: #xAD (encoded by coding system iso-latin-1-unix)
          display: by this font (glyph code)
    uniscribe:-outline-Courier
New-normal-normal-normal-mono-13-*-*-*-c-*-iso8859-1 (#x10)
   hardcoded face: escape-glyph
 
Character code properties: customize what to show
  name: SOFT HYPHEN
  general-category: Cf (Other, Format)
 
There are text properties here:
  charset              iso-8859-1
  face                 font-lock-comment-face
  fontified            t
 
Is this something weird, or is it OK? Since I have customized face
`escape-glyph', I notice this easily, but it is not really noticeable
in emacs -Q. Why would a soft hyphen character be displayed using face
`escape-glyph'?
 
BTW, when trying to send this, Emacs asks if I want to convert
non-ASCII chars to hexadecimal. Dunno which would be more helpful, so
I'll guess yes. The *Help* text quoted should give enough info,
anyway.
 
 
 
In GNU Emacs 23.0.94.1 (i386-mingw-nt5.1.2600)
 of 2009-05-24 on SOFT-MJASON
Windowing system distributor `Microsoft Corp.', version 5.1.2600
configured using `configure --with-gcc (3.4)'
 




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Thu, 18 Jun 2009 18:45:10 GMT) Full text and rfc822 format available.

Acknowledgement sent to Teemu Likonen <tlikonen <at> iki.fi>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Thu, 18 Jun 2009 18:45:10 GMT) Full text and rfc822 format available.

Message #10 received at 3607 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Teemu Likonen <tlikonen <at> iki.fi>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 3607 <at> debbugs.gnu.org
Subject: Re: bug#3607: 23.0.94; odd character in fringe.el
Date: Thu, 18 Jun 2009 21:41:19 +0300
On 2009-06-18 11:00 (-0700), Drew Adams wrote:

> I don't claim this is a bug, but perhaps someone could take a look to
> be sure.
>  
> In fringe.el, I see this text: Pavel Jan=c3=adk. The next-to-last
> character shows in my Emacs with face `escape-glyph'. This is what
> `C-u C-x =' shows:

Perhaps you know most of this already but here's some information
anyway. The name is "Pavel Janík" and it displays just fine in my
system. File fringe.el is UTF-8-encoded.

When encoded in UTF-8 the character í (U+00CD) consists of two bytes,
0xC3 and 0xAD. When some system interpretes those bytes as separete
ISO-8859-1-encoded characters they are à (0xC3) and a soft hyphen
(0xAD). This is what your system seems to have done:

> preferred charset: iso-8859-1 (Latin-1 (ISO/IEC 8859-1))
>        code point: 0xAD

>   name: SOFT HYPHEN


So it sounds like some kind of singlebyte-multibyte encoding problem.



Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Thu, 18 Jun 2009 19:00:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Drew Adams" <drew.adams <at> oracle.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Thu, 18 Jun 2009 19:00:04 GMT) Full text and rfc822 format available.

Message #15 received at 3607 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: "Drew Adams" <drew.adams <at> oracle.com>
To: "'Teemu Likonen'" <tlikonen <at> iki.fi>
Cc: <3607 <at> debbugs.gnu.org>
Subject: RE: bug#3607: 23.0.94; odd character in fringe.el
Date: Thu, 18 Jun 2009 11:54:06 -0700
> Perhaps you know most of this already but here's some information
> anyway. The name is "Pavel Janík" and it displays just fine in my
> system. File fringe.el is UTF-8-encoded.
> 
> When encoded in UTF-8 the character í (U+00CD) consists of two bytes,
> 0xC3 and 0xAD. When some system interpretes those bytes as separete
> ISO-8859-1-encoded characters they are à (0xC3) and a soft hyphen
> (0xAD). This is what your system seems to have done:
> 
> > preferred charset: iso-8859-1 (Latin-1 (ISO/IEC 8859-1))
> >        code point: 0xAD
> >   name: SOFT HYPHEN
> 
> So it sounds like some kind of singlebyte-multibyte encoding problem.

I see. Thanks for the info. I'm pretty ignorant about this stuff.

I see this in emacs -Q (on MS Windows), however, so I wonder if it isn't a bug.

And I wonder how you can see it as being UTF-8 encoded - are you using emacs -Q?

I don't see any local-variable thingy that would specify that the file is to be
UTF-8 encoded.




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Thu, 18 Jun 2009 19:25:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Teemu Likonen <tlikonen <at> iki.fi>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Thu, 18 Jun 2009 19:25:05 GMT) Full text and rfc822 format available.

Message #20 received at 3607 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Teemu Likonen <tlikonen <at> iki.fi>
To: "Drew Adams" <drew.adams <at> oracle.com>
Cc: <3607 <at> debbugs.gnu.org>
Subject: Re: bug#3607: 23.0.94; odd character in fringe.el
Date: Thu, 18 Jun 2009 22:20:15 +0300
On 2009-06-18 11:54 (-0700), Drew Adams wrote:

> I see this in emacs -Q (on MS Windows), however, so I wonder if it
> isn't a bug.
>
> And I wonder how you can see it as being UTF-8 encoded - are you using
> emacs -Q?
>
> I don't see any local-variable thingy that would specify that the file
> is to be UTF-8 encoded.

Yes, the file shows correctly with "emacs -Q lisp/fringe.el" too. And it
is UTF-8 encoded file.

    $ file lisp/fringe.el
    lisp/fringe.el: UTF-8 Unicode English text

My Debian GNU/Linux system uses UTF-8 locale (as all GNU/Linux systems
do these days). Emacs probably detects my environment and uses correct
encoding settings. But I don't know Emacs works - except everything just
works. :-)

It really seems that your default environment is something other than
UTF-8, something single-byte.



Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Thu, 18 Jun 2009 21:10:08 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Drew Adams" <drew.adams <at> oracle.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Thu, 18 Jun 2009 21:10:09 GMT) Full text and rfc822 format available.

Message #25 received at 3607 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: "Drew Adams" <drew.adams <at> oracle.com>
To: "'Teemu Likonen'" <tlikonen <at> iki.fi>
Cc: <3607 <at> debbugs.gnu.org>
Subject: RE: bug#3607: 23.0.94; odd character in fringe.el
Date: Thu, 18 Jun 2009 14:08:14 -0700
> Yes, the file shows correctly with "emacs -Q lisp/fringe.el" 
> too. And it is UTF-8 encoded file.
> 
>     $ file lisp/fringe.el
>     lisp/fringe.el: UTF-8 Unicode English text
> 
> My Debian GNU/Linux system uses UTF-8 locale (as all GNU/Linux systems
> do these days). Emacs probably detects my environment and uses correct
> encoding settings. But I don't know Emacs works - except 
> everything just
> works. :-)
> 
> It really seems that your default environment is something other than
> UTF-8, something single-byte.

OK, thanks for checking.

IMO, if the file should be encoded in UTF-8, then the file itself should control
that - as buff-menu.el does, for instance. The user's locale shouldn't enter
into it at this level. Seems like a bug, to me. (But I'm no expert on this.)




Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Thu, 18 Jun 2009 21:40:08 GMT) Full text and rfc822 format available.

Acknowledgement sent to Lennart Borgman <lennart.borgman <at> gmail.com>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Thu, 18 Jun 2009 21:40:08 GMT) Full text and rfc822 format available.

Message #30 received at 3607 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Lennart Borgman <lennart.borgman <at> gmail.com>
To: Drew Adams <drew.adams <at> oracle.com>, 3607 <at> debbugs.gnu.org
Cc: Teemu Likonen <tlikonen <at> iki.fi>
Subject: Re: bug#3607: 23.0.94; odd character in fringe.el
Date: Thu, 18 Jun 2009 23:34:08 +0200
On Thu, Jun 18, 2009 at 11:08 PM, Drew Adams<drew.adams <at> oracle.com> wrote:
>> Yes, the file shows correctly with "emacs -Q lisp/fringe.el"
>> too. And it is UTF-8 encoded file.
>>
>>     $ file lisp/fringe.el
>>     lisp/fringe.el: UTF-8 Unicode English text
>>
>> My Debian GNU/Linux system uses UTF-8 locale (as all GNU/Linux systems
>> do these days). Emacs probably detects my environment and uses correct
>> encoding settings. But I don't know Emacs works - except
>> everything just
>> works. :-)
>>
>> It really seems that your default environment is something other than
>> UTF-8, something single-byte.
>
> OK, thanks for checking.
>
> IMO, if the file should be encoded in UTF-8, then the file itself should control
> that - as buff-menu.el does, for instance. The user's locale shouldn't enter
> into it at this level. Seems like a bug, to me. (But I'm no expert on this.)


Yes, that must be a bug.



Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Fri, 19 Jun 2009 00:50:04 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kenichi Handa <handa <at> m17n.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Fri, 19 Jun 2009 00:50:04 GMT) Full text and rfc822 format available.

Message #35 received at 3607 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Lennart Borgman <lennart.borgman <at> gmail.com>,
        3607 <at> debbugs.gnu.org
Cc: drew.adams <at> oracle.com, 3607 <at> debbugs.gnu.org, tlikonen <at> iki.fi
Subject: Re: bug#3607: 23.0.94; odd character in fringe.el
Date: Fri, 19 Jun 2009 09:47:28 +0900
I've just added "coding: utf-8" cookie to fringe.el.

---
Kenichi Handa
handa <at> m17n.org


In article <e01d8a50906181434t6e6c296ega704dc11fbb77b31 <at> mail.gmail.com>, Lennart Borgman <lennart.borgman <at> gmail.com> writes:

> On Thu, Jun 18, 2009 at 11:08 PM, Drew Adams<drew.adams <at> oracle.com> wrote:
>>> Yes, the file shows correctly with "emacs -Q lisp/fringe.el"
>>> too. And it is UTF-8 encoded file.
>>> 
>>>     $ file lisp/fringe.el
>>>     lisp/fringe.el: UTF-8 Unicode English text
>>> 
>>> My Debian GNU/Linux system uses UTF-8 locale (as all GNU/Linux systems
>>> do these days). Emacs probably detects my environment and uses correct
>>> encoding settings. But I don't know Emacs works - except
>>> everything just
>>> works. :-)
>>> 
>>> It really seems that your default environment is something other than
>>> UTF-8, something single-byte.
> >
> > OK, thanks for checking.
> >
> > IMO, if the file should be encoded in UTF-8, then the file itself should control
> > that - as buff-menu.el does, for instance. The user's locale shouldn't enter
> > into it at this level. Seems like a bug, to me. (But I'm no expert on this.)


> Yes, that must be a bug.



bug closed, send any further explanations to "Drew Adams" <drew.adams <at> oracle.com> Request was from Glenn Morris <rgm <at> gnu.org> to control <at> emacsbugs.donarmstrong.com. (Fri, 19 Jun 2009 18:50:08 GMT) Full text and rfc822 format available.

Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Sat, 27 Jun 2009 01:15:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefan Monnier <monnier <at> iro.umontreal.ca>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Sat, 27 Jun 2009 01:15:07 GMT) Full text and rfc822 format available.

Message #42 received at 3607 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Kenichi Handa <handa <at> m17n.org>
Cc: 3607 <at> debbugs.gnu.org,
        Lennart Borgman <lennart.borgman <at> gmail.com>, tlikonen <at> iki.fi
Subject: Re: bug#3607: 23.0.94; odd character in fringe.el
Date: Sat, 27 Jun 2009 03:07:23 +0200
> I've just added "coding: utf-8" cookie to fringe.el.

Thanks.  But there is another bug here: a utf-8 file should be
recognized as such even in a latin-1 locale (i.e. utf-8 should always
have (one of) the highest priority).  IIUC this is done right in
GNU/Linux but not under Windows.


        Stefan



Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Sat, 27 Jun 2009 01:30:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kenichi Handa <handa <at> m17n.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Sat, 27 Jun 2009 01:30:06 GMT) Full text and rfc822 format available.

Message #47 received at 3607 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 3607 <at> debbugs.gnu.org, lennart.borgman <at> gmail.com,
        tlikonen <at> iki.fi
Subject: Re: bug#3607: 23.0.94; odd character in fringe.el
Date: Sat, 27 Jun 2009 10:25:47 +0900
In article <jwveit6iizt.fsf-monnier+emacsbugreports <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

> > I've just added "coding: utf-8" cookie to fringe.el.

> Thanks.  But there is another bug here: a utf-8 file should be
> recognized as such even in a latin-1 locale (i.e. utf-8 should always
> have (one of) the highest priority).

Current Emacs doesn't give utf-8 the higher priority than
iso-8859-1 in Latin-X language environment.  Are you
proposing such a change?  I can't decide that is good or not
because I'm not that familiar with such locales.

> IIUC this is done right in GNU/Linux but not under
> Windows.

?? Even on GNU/Linux (ubuntu), when I start emacs as this:

% LANG=de_DE emacs

iso-8859-1 has higher priority than utf-8.

Or, do you mean the other applications on GNU/Linux?

---
Kenichi Handa
handa <at> m17n.org



Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Sat, 27 Jun 2009 21:50:05 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefan Monnier <monnier <at> iro.umontreal.ca>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Sat, 27 Jun 2009 21:50:05 GMT) Full text and rfc822 format available.

Message #52 received at 3607 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Kenichi Handa <handa <at> m17n.org>
Cc: 3607 <at> debbugs.gnu.org, lennart.borgman <at> gmail.com,
        tlikonen <at> iki.fi
Subject: Re: bug#3607: 23.0.94; odd character in fringe.el
Date: Sat, 27 Jun 2009 23:44:44 +0200
> Current Emacs doesn't give utf-8 the higher priority than
> iso-8859-1 in Latin-X language environment.  Are you
> proposing such a change?  I can't decide that is good or not
> because I'm not that familiar with such locales.

It is a good change, because the likelyhood of a valid utf-8 file being
a proper latin-1 file is extremely low.

> ?? Even on GNU/Linux (ubuntu), when I start emacs as this:
> % LANG=de_DE emacs
> iso-8859-1 has higher priority than utf-8.

Duh, you're right.  Even Emacs-22 still does that.  It's wrong.

> Or, do you mean the other applications on GNU/Linux?

No, I meant Emacs, I was convinced we'd fixed it in Emacs-22 for
GNU/Linux, but it appears I was confused.


        Stefan



Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Mon, 29 Jun 2009 07:55:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kenichi Handa <handa <at> m17n.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Mon, 29 Jun 2009 07:55:07 GMT) Full text and rfc822 format available.

Message #57 received at 3607 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 3607 <at> debbugs.gnu.org, lennart.borgman <at> gmail.com,
        tlikonen <at> iki.fi
Subject: Re: bug#3607: 23.0.94; odd character in fringe.el
Date: Mon, 29 Jun 2009 16:49:32 +0900
In article <jwvbpo9gxsu.fsf-monnier+emacsbugreports <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

> > Current Emacs doesn't give utf-8 the higher priority than
> > iso-8859-1 in Latin-X language environment.  Are you
> > proposing such a change?  I can't decide that is good or not
> > because I'm not that familiar with such locales.

> It is a good change, because the likelyhood of a valid utf-8 file being
> a proper latin-1 file is extremely low.

Ok.  For that, we must do:

  (set-coding-system-priority 'utf-8) 

somewhere.  I at first thought it could be done by
`setup-function' of Latin-1 language environment.  Actually,
when a user does C-x C-m L Latin-1 RET, it works.

But, when emacs starts up, it calls set-locale-environment,
and it at first calls set-language-environment then
overrides coding-system setups.  So, at the moment, I don't
have a good idea other than this very ad-hoc change for 23.1.

--- mule-cmds.el.~1.360.~	2009-04-09 03:03:17.000000000 +0900
+++ mule-cmds.el	2009-06-29 16:45:08.000000000 +0900
@@ -2643,6 +2643,10 @@
 		   (not (coding-system-equal coding-system
 					     locale-coding-system)))
 	  (prefer-coding-system coding-system)
+	  ;; Even if we prefer "iso-latin-1", it is better to detect
+	  ;; UTF-8.
+	  (if (eq (coding-system-base coding-system) 'iso-latin-1)
+	      (set-coding-system-priority 'utf-8))
 	  ;; Fixme: perhaps prefer-coding-system should set this too.
 	  ;; But it's not the time to do such a fundamental change.
 	  (setq default-sendmail-coding-system coding-system)

For 23.2, I think we should re-design language-info-alist.

---
Kenichi Handa
handa <at> m17n.org



Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Mon, 29 Jun 2009 09:00:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefan Monnier <monnier <at> iro.umontreal.ca>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Mon, 29 Jun 2009 09:00:03 GMT) Full text and rfc822 format available.

Message #62 received at 3607 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Kenichi Handa <handa <at> m17n.org>
Cc: 3607 <at> debbugs.gnu.org, lennart.borgman <at> gmail.com,
        tlikonen <at> iki.fi
Subject: Re: bug#3607: 23.0.94; odd character in fringe.el
Date: Mon, 29 Jun 2009 10:52:20 +0200
>> It is a good change, because the likelyhood of a valid utf-8 file being
>> a proper latin-1 file is extremely low.

> Ok.  For that, we must do:
>   (set-coding-system-priority 'utf-8) 
> somewhere.  I at first thought it could be done by
> `setup-function' of Latin-1 language environment.  Actually,
> when a user does C-x C-m L Latin-1 RET, it works.

> But, when emacs starts up, it calls set-locale-environment,
> and it at first calls set-language-environment then
> overrides coding-system setups.  So, at the moment, I don't
> have a good idea other than this very ad-hoc change for 23.1.

Actually, AFAIK the "unlikely false positives" property of the utf-8
encoding is not only true when applied to latin-1 files but also to most
other encodings.  So really utf-8 should probably always be first (not
only for latin-1 environments), except maybe for some envs where there's
a knows non-negligible risk of false positives.


        Stefan



Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Mon, 29 Jun 2009 11:45:08 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kenichi Handa <handa <at> m17n.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Mon, 29 Jun 2009 11:45:08 GMT) Full text and rfc822 format available.

Message #67 received at 3607 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Kenichi Handa <handa <at> m17n.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 3607 <at> debbugs.gnu.org, lennart.borgman <at> gmail.com,
        tlikonen <at> iki.fi
Subject: Re: bug#3607: 23.0.94; odd character in fringe.el
Date: Mon, 29 Jun 2009 20:39:34 +0900
In article <jwv63ef4e6w.fsf-monnier+emacsbugreports <at> gnu.org>, Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

> Actually, AFAIK the "unlikely false positives" property of the utf-8
> encoding is not only true when applied to latin-1 files but also to most
> other encodings.  So really utf-8 should probably always be first (not
> only for latin-1 environments), except maybe for some envs where there's
> a knows non-negligible risk of false positives.

I think it's only Latin-X (and perhaps Vietnamese too) that
are mostly safe to give utf-8 the higher priority on code
detection, because only they use Latin script in which 8-bit
characters rarely appear succeedingly.

---
Kenichi Handa
handa <at> m17n.org



Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Mon, 29 Jun 2009 18:25:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eli Zaretskii <eliz <at> gnu.org>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Mon, 29 Jun 2009 18:25:06 GMT) Full text and rfc822 format available.

Message #72 received at 3607 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kenichi Handa <handa <at> m17n.org>, 3607 <at> debbugs.gnu.org
Cc: monnier <at> iro.umontreal.ca, tlikonen <at> iki.fi
Subject: Re: bug#3607: 23.0.94; odd character in fringe.el
Date: Mon, 29 Jun 2009 21:16:32 +0300
> From: Kenichi Handa <handa <at> m17n.org>
> Date: Mon, 29 Jun 2009 16:49:32 +0900
> Cc: tlikonen <at> iki.fi, 3607 <at> emacsbugs.donarmstrong.com
> 
> But, when emacs starts up, it calls set-locale-environment,
> and it at first calls set-language-environment then
> overrides coding-system setups.  So, at the moment, I don't
> have a good idea other than this very ad-hoc change for 23.1.

PLEEEEAAAAASE do _not_ make such ad-hoc changes on the branch at this time.
Experience shows that there be dragons, and we _do_ want to release
Emacs 23.1 some time this year...

> --- mule-cmds.el.~1.360.~	2009-04-09 03:03:17.000000000 +0900
> +++ mule-cmds.el	2009-06-29 16:45:08.000000000 +0900
> @@ -2643,6 +2643,10 @@
>  		   (not (coding-system-equal coding-system
>  					     locale-coding-system)))
>  	  (prefer-coding-system coding-system)
> +	  ;; Even if we prefer "iso-latin-1", it is better to detect
> +	  ;; UTF-8.
> +	  (if (eq (coding-system-base coding-system) 'iso-latin-1)
> +	      (set-coding-system-priority 'utf-8))
>  	  ;; Fixme: perhaps prefer-coding-system should set this too.
>  	  ;; But it's not the time to do such a fundamental change.
>  	  (setq default-sendmail-coding-system coding-system)
> 
> For 23.2, I think we should re-design language-info-alist.

Then let's defer the whole thing to Emacs 23.2.  It's not a grave
problem, IMO, certainly not worth taking a risk of unintended
consequences.



Information forwarded to bug-submit-list <at> lists.donarmstrong.com, Emacs Bugs <bug-gnu-emacs <at> gnu.org>:
bug#3607; Package emacs. (Mon, 29 Jun 2009 20:55:06 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stefan Monnier <monnier <at> iro.umontreal.ca>:
Extra info received and forwarded to list. Copy sent to Emacs Bugs <bug-gnu-emacs <at> gnu.org>. (Mon, 29 Jun 2009 20:55:07 GMT) Full text and rfc822 format available.

Message #77 received at 3607 <at> emacsbugs.donarmstrong.com (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Kenichi Handa <handa <at> m17n.org>, 3607 <at> debbugs.gnu.org,
        tlikonen <at> iki.fi
Subject: Re: bug#3607: 23.0.94; odd character in fringe.el
Date: Mon, 29 Jun 2009 22:48:00 +0200
>> But, when emacs starts up, it calls set-locale-environment,
>> and it at first calls set-language-environment then
>> overrides coding-system setups.  So, at the moment, I don't
>> have a good idea other than this very ad-hoc change for 23.1.

> PLEEEEAAAAASE do _not_ make such ad-hoc changes on the branch at this time.
> Experience shows that there be dragons, and we _do_ want to release
> Emacs 23.1 some time this year...

Indeed, those things should go on the trunk only,


        Stefan



bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> emacsbugs.donarmstrong.com. (Tue, 28 Jul 2009 14:24:21 GMT) Full text and rfc822 format available.

This bug report was last modified 15 years and 334 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.