GNU bug report logs - #18051
24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?

Previous Next

Package: emacs;

Reported by: michael_heerdegen <at> web.de

Date: Fri, 18 Jul 2014 06:24:01 UTC

Severity: wishlist

Found in version 24.3.92

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

Full log


View this message in rfc822 format

From: Eli Zaretskii <eliz <at> gnu.org>
To: Michael Albinus <michael.albinus <at> gmx.de>
Cc: dmantipov <at> yandex.ru, 18051 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
Subject: bug#18051: [Emacs-diffs] trunk r117726: Add string collation.
Date: Mon, 25 Aug 2014 18:03:32 +0300
> From: Michael Albinus <michael.albinus <at> gmx.de>
> Date: Mon, 25 Aug 2014 08:41:03 +0200
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 18051 <at> debbugs.gnu.org
> 
> > BTW, I think that collation functions with 3rd optional argument
> > to specify locale settings will be a bit more versatile, e.g.
> >
> > (string-collate-lessp a b "es_ES.UTF-8")
>
> We discuss this already, see 
> <http://lists.gnu.org/archive/html/bug-gnu-emacs/2014-08/msg00623.html>
>
> My major reservation to this approach is that it doesn't fit well using
> string-collate-lessp as predicate of sort. That's why I have proposed a
> global variable as alternative, which could be let-bounded.

I think that binding a variable will indeed be cleaner.  Using
process-environment for that purpose should be reserved for the
application level.  Also, what if LC_COLLATE is not set in the
environment, but 'setlocale' does return some value for it? shouldn't
we use that?

Here are a few more thoughts about related issues:

1. Why does str_collate return a ptrdiff_t value?  AFAIK, wcscoll
   etc. return int data type, and of rather small values.

2. Should we signal an error if the input strings are not pure-ASCII
   or multibyte?  Unibyte strings will at best cause incorrect
   results.  And what about strings with invalid codepoints,
   e.g. those outside of the Unicode range, which can happen inside
   Lisp strings?

3. What about errors in wcscoll?  The current code ignores them;
   however, the value returned by wcscoll in case of an error is not
   documented, so it could be random.  Should we signal an error if
   errno gets set by wcscoll?

4. How to control the optional features of the collating sequence?  I
   mean, for example, the fact that punctuation characters are ignored
   in the .UTF-8 locales on glibc hosts (or so it seems).  At least on
   Windows, a somewhat higher degree of control is available, but it
   must be specified separately of the locale ID.  E.g., the
   comparison function accepts flags to ignore punctuation and
   symbols, width differences, diacritics, etc. Should we have another
   variable, perhaps w32-specific, to request these features?
   Alternatively, we could use .UTF-8 on Windows to communicate that,
   although that sounds like a kludge.

5. The locale names on Windows are different from Posix: Windows uses
   3-letter abbreviations of the country and the language,
   e.g. "fra_FRA" instead of the Posix "fr_FR".  Do we want the locale
   string values used for let-binding the above-mentioned variable to
   be portable across systems?  Then we'd need some conversion
   database on MS-Windows.

6. I think we will want case-insensitive version of this function.




This bug report was last modified 10 years and 224 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.