GNU bug report logs - #12008
Alphabetic sorting respect user's language and/or locale

Previous Next

Package: emacs;

Reported by: martin rudalics <rudalics <at> gmx.at>

Date: Sat, 21 Jul 2012 14:39:02 UTC

Severity: wishlist

Merged with 2263

Fixed in version 25.1

Done: Glenn Morris <rgm <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 12008 in the body.
You can then email your comments to 12008 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#12008; Package emacs. (Sat, 21 Jul 2012 14:39:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to martin rudalics <rudalics <at> gmx.at>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 21 Jul 2012 14:39:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Bug-Gnu-Emacs <bug-gnu-emacs <at> gnu.org>
Subject: Alphabetic sorting respect user's language and/or locale
Date: Sat, 21 Jul 2012 16:32:50 +0200
Currently, specifying alphabetic order for output produced by functions
like `dired' and `sort-subr' makes that output appear in ASCII-code
order.  This means that such output deviates from the order expected by
users of Latin-derived alphabets like French, German or Spanish and
can make working with these functions very awkward.

Please consider adding a predicate which makes it possible to produce
such output in alphabetic order respecting the language and/or locale
of the user.

Thank you, martin




Severity set to 'wishlist' from 'normal' Request was from martin rudalics <rudalics <at> gmx.at> to control <at> debbugs.gnu.org. (Sat, 21 Jul 2012 14:43:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12008; Package emacs. (Sat, 21 Jul 2012 16:08:02 GMT) Full text and rfc822 format available.

Message #10 received at 12008 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: 12008 <at> debbugs.gnu.org
Subject: Re: bug#12008: Alphabetic sorting respect user's language and/or
	locale
Date: Sat, 21 Jul 2012 19:01:11 +0300
> Date: Sat, 21 Jul 2012 16:32:50 +0200
> From: martin rudalics <rudalics <at> gmx.at>
> 
> Currently, specifying alphabetic order for output produced by functions
> like `dired' and `sort-subr' makes that output appear in ASCII-code
> order.  This means that such output deviates from the order expected by
> users of Latin-derived alphabets like French, German or Spanish and
> can make working with these functions very awkward.
> 
> Please consider adding a predicate which makes it possible to produce
> such output in alphabetic order respecting the language and/or locale
> of the user.

A simple way of doing this goes along the following lines:

  Lisp_Object enc_str1 = ENCODE_SYSTEM (string1);
  Lisp_Object enc_str2 = ENCODE_SYSTEM (string2);

  return make_number (strcoll (enc_str1, enc_str2));

However, there are 2 potential issues with this:

 . do typical libc implementations of strcoll handle multibyte
   characters correctly, if ENCODE_SYSTEM happens to produce multibyte
   encoding, such as UTF-8?

 . is the above efficient enough, when ENCODE_SYSTEM is not a no-op
   (which it is for UTF-8 locales)?

I don't know the answer to these, mainly to the first.  (The
MS-Windows implementation is claimed to handle multibyte strings.)
Anyone?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12008; Package emacs. (Sat, 21 Jul 2012 16:45:02 GMT) Full text and rfc822 format available.

Message #13 received at 12008 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: rudalics <at> gmx.at
Cc: 12008 <at> debbugs.gnu.org
Subject: Re: bug#12008: Alphabetic sorting respect user's language and/or
	locale
Date: Sat, 21 Jul 2012 19:38:21 +0300
> Date: Sat, 21 Jul 2012 19:01:11 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 12008 <at> debbugs.gnu.org
> 
>   Lisp_Object enc_str1 = ENCODE_SYSTEM (string1);
>   Lisp_Object enc_str2 = ENCODE_SYSTEM (string2);
> 
>   return make_number (strcoll (enc_str1, enc_str2));

Err... make that

   return make_number (strcoll (SDATA (enc_str1), SDATA (enc_str2)));

Sorry.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12008; Package emacs. (Sun, 22 Jul 2012 10:32:01 GMT) Full text and rfc822 format available.

Message #16 received at 12008 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: martin rudalics <rudalics <at> gmx.at>, 12008 <at> debbugs.gnu.org
Subject: Re: bug#12008: Alphabetic sorting respect user's language and/or
	locale
Date: Sun, 22 Jul 2012 06:24:31 -0400
> A simple way of doing this goes along the following lines:
>   Lisp_Object enc_str1 = ENCODE_SYSTEM (string1);
>   Lisp_Object enc_str2 = ENCODE_SYSTEM (string2);
>   return make_number (strcoll (enc_str1, enc_str2));

That's probably OK for dired'd sorting but not for sort-subr where we
need to be independent from the system locale.  So better would be to
switch the locale to utf-8 and call strcoll without calling
ENCODE_SYSTEM (tho of course, only if the strings are multibyte).


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12008; Package emacs. (Sun, 22 Jul 2012 15:32:01 GMT) Full text and rfc822 format available.

Message #19 received at 12008 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
Cc: rudalics <at> gmx.at, 12008 <at> debbugs.gnu.org
Subject: Re: bug#12008: Alphabetic sorting respect user's language and/or
	locale
Date: Sun, 22 Jul 2012 18:25:20 +0300
> From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
> Cc: martin rudalics <rudalics <at> gmx.at>, 12008 <at> debbugs.gnu.org
> Date: Sun, 22 Jul 2012 06:24:31 -0400
> 
> > A simple way of doing this goes along the following lines:
> >   Lisp_Object enc_str1 = ENCODE_SYSTEM (string1);
> >   Lisp_Object enc_str2 = ENCODE_SYSTEM (string2);
> >   return make_number (strcoll (enc_str1, enc_str2));
> 
> That's probably OK for dired'd sorting but not for sort-subr where we
> need to be independent from the system locale.

How do you mean "independent of the system locale"?  We already have
locale-independent string comparison: compare-strings, string<, etc.
By contrast, sorting strings in collation order is AFAIK inherently
locale-specific.  Or am I missing something?

> So better would be to switch the locale to utf-8 and call strcoll
> without calling ENCODE_SYSTEM (tho of course, only if the strings
> are multibyte).

How will this be different from string< etc., that we already have?

(And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
'ls' uses 'strcoll' to sort file names.  Only on MS-Windows, where we
use ls-lisp.el, do we need to collate in Lisp as part of Dired.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12008; Package emacs. (Mon, 23 Jul 2012 09:04:01 GMT) Full text and rfc822 format available.

Message #22 received at 12008 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rudalics <at> gmx.at, 12008 <at> debbugs.gnu.org
Subject: Re: bug#12008: Alphabetic sorting respect user's language and/or
	locale
Date: Mon, 23 Jul 2012 04:57:14 -0400
>> So better would be to switch the locale to utf-8 and call strcoll
>> without calling ENCODE_SYSTEM (tho of course, only if the strings
>> are multibyte).
> How will this be different from string< etc., that we already have?

I'd expect strcoll in a utf-8 locale to sort e é è ê and such
close together.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12008; Package emacs. (Mon, 23 Jul 2012 09:42:02 GMT) Full text and rfc822 format available.

Message #25 received at 12008 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 12008 <at> debbugs.gnu.org, Stefan Monnier <monnier <at> IRO.UMontreal.CA>
Subject: Re: bug#12008: Alphabetic sorting respect user's language and/or
	locale
Date: Mon, 23 Jul 2012 11:34:53 +0200
> How do you mean "independent of the system locale"?  We already have
> locale-independent string comparison: compare-strings, string<, etc.
> By contrast, sorting strings in collation order is AFAIK inherently
> locale-specific.  Or am I missing something?

Ideally, it should be possible to specify a locale-independent behavior.
But using the locale-specific one would be already a great improvement
for me.

> (And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
> 'ls' uses 'strcoll' to sort file names.

I suppose this doesn't hold for the `ls' coming with GnuWin32.

> Only on MS-Windows, where we
> use ls-lisp.el, do we need to collate in Lisp as part of Dired.)

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12008; Package emacs. (Mon, 23 Jul 2012 15:42:01 GMT) Full text and rfc822 format available.

Message #28 received at 12008 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
Cc: rudalics <at> gmx.at, 12008 <at> debbugs.gnu.org
Subject: Re: bug#12008: Alphabetic sorting respect user's language and/or
	locale
Date: Mon, 23 Jul 2012 18:34:59 +0300
> From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
> Cc: rudalics <at> gmx.at, 12008 <at> debbugs.gnu.org
> Date: Mon, 23 Jul 2012 04:57:14 -0400
> 
> >> So better would be to switch the locale to utf-8 and call strcoll
> >> without calling ENCODE_SYSTEM (tho of course, only if the strings
> >> are multibyte).
> > How will this be different from string< etc., that we already have?
> 
> I'd expect strcoll in a utf-8 locale to sort e é è ê and such
> close together.

Does the encoding really matter for strcoll?  That is, won't
de_DE.UTF-8 and de_DE.iso8859-1 produce the same collation order for
the same characters?

If the encoding doesn't matter, then why do you keep mentioning utf-8
locales in this context?  The request, as I understood it, was to be
able to sort arbitrary strings in the locale-dependent collating
order.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12008; Package emacs. (Mon, 23 Jul 2012 15:46:02 GMT) Full text and rfc822 format available.

Message #31 received at 12008 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: 12008 <at> debbugs.gnu.org, monnier <at> IRO.UMontreal.CA
Subject: Re: bug#12008: Alphabetic sorting respect user's language and/or
	locale
Date: Mon, 23 Jul 2012 18:38:19 +0300
> Date: Mon, 23 Jul 2012 11:34:53 +0200
> From: martin rudalics <rudalics <at> gmx.at>
> CC: Stefan Monnier <monnier <at> IRO.UMontreal.CA>, 12008 <at> debbugs.gnu.org
> 
>  > How do you mean "independent of the system locale"?  We already have
>  > locale-independent string comparison: compare-strings, string<, etc.
>  > By contrast, sorting strings in collation order is AFAIK inherently
>  > locale-specific.  Or am I missing something?
> 
> Ideally, it should be possible to specify a locale-independent behavior.

I'm confused: what is "locale-independent behavior" in this context?
Do you mean the ability to request a collation specific for the German
locale when your current system locale is something else?  If so, this
is not locale-independent behavior as I understand it.  If you mean
something else, please elaborate.

>  > (And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
>  > 'ls' uses 'strcoll' to sort file names.
> 
> I suppose this doesn't hold for the `ls' coming with GnuWin32.

Why do you think so?  The code calls strcoll even on Windows.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12008; Package emacs. (Mon, 23 Jul 2012 15:56:02 GMT) Full text and rfc822 format available.

Message #34 received at 12008 <at> debbugs.gnu.org (full text, mbox):

From: martin rudalics <rudalics <at> gmx.at>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 12008 <at> debbugs.gnu.org, monnier <at> IRO.UMontreal.CA
Subject: Re: bug#12008: Alphabetic sorting respect user's language and/or
	locale
Date: Mon, 23 Jul 2012 17:49:49 +0200
>> Ideally, it should be possible to specify a locale-independent behavior.
>
> I'm confused: what is "locale-independent behavior" in this context?
> Do you mean the ability to request a collation specific for the German
> locale when your current system locale is something else?

Yes.

> If so, this
> is not locale-independent behavior as I understand it.  If you mean
> something else, please elaborate.
>
>>  > (And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
>>  > 'ls' uses 'strcoll' to sort file names.
>>
>> I suppose this doesn't hold for the `ls' coming with GnuWin32.
>
> Why do you think so?  The code calls strcoll even on Windows.

But how do I request it when calling `ls'?  Or, maybe better, what can I
do to have `ls' respect my locale?

martin




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12008; Package emacs. (Mon, 23 Jul 2012 16:03:02 GMT) Full text and rfc822 format available.

Message #37 received at 12008 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: martin rudalics <rudalics <at> gmx.at>
Cc: 12008 <at> debbugs.gnu.org, monnier <at> IRO.UMontreal.CA
Subject: Re: bug#12008: Alphabetic sorting respect user's language and/or
	locale
Date: Mon, 23 Jul 2012 18:55:24 +0300
> Date: Mon, 23 Jul 2012 17:49:49 +0200
> From: martin rudalics <rudalics <at> gmx.at>
> CC: monnier <at> IRO.UMontreal.CA, 12008 <at> debbugs.gnu.org
> 
>  >>  > (And btw, AFAIK Dired doesn't sort, it relies on 'ls' to do so, and
>  >>  > 'ls' uses 'strcoll' to sort file names.
>  >>
>  >> I suppose this doesn't hold for the `ls' coming with GnuWin32.
>  >
>  > Why do you think so?  The code calls strcoll even on Windows.
> 
> But how do I request it when calling `ls'?

It works automagically.  Or, shall I say, "should work" (keeping in
mind how buggy GnuWin32 ports are).

On Unix, you can set LC_COLLATE in the environment to control the
collation order, but I don't think it works on Windows.

> Or, maybe better, what can I do to have `ls' respect my locale?

Barring any bugs, it should do so already.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12008; Package emacs. (Mon, 23 Jul 2012 23:38:02 GMT) Full text and rfc822 format available.

Message #40 received at 12008 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> IRO.UMontreal.CA>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: rudalics <at> gmx.at, 12008 <at> debbugs.gnu.org
Subject: Re: bug#12008: Alphabetic sorting respect user's language and/or
	locale
Date: Mon, 23 Jul 2012 19:30:18 -0400
> Does the encoding really matter for strcoll?  That is, won't
> de_DE.UTF-8 and de_DE.iso8859-1 produce the same collation order for
> the same characters?

The encoding matters because de_DE.iso8859-1's strcoll won't work right
if your strings include λ, Π, τ, ⊢, →, ↦, ≡, ...


        Stefan




Merged 2263 12008. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Sun, 29 Jul 2012 23:01:02 GMT) Full text and rfc822 format available.

bug marked as fixed in version 24.5, send any further explanations to 2263 <at> debbugs.gnu.org and Glenn Morris <rgm <at> gnu.org> Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Tue, 26 Aug 2014 18:07:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 24 Sep 2014 11:24:03 GMT) Full text and rfc822 format available.

bug unarchived. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 04 Oct 2014 16:36:02 GMT) Full text and rfc822 format available.

bug Marked as fixed in versions 25.1. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 04 Oct 2014 16:36:02 GMT) Full text and rfc822 format available.

bug No longer marked as fixed in versions 24.5. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Sat, 04 Oct 2014 16:36:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 02 Nov 2014 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 10 years and 290 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.