GNU bug report logs -
#18051
24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
Previous Next
Reported by: michael_heerdegen <at> web.de
Date: Fri, 18 Jul 2014 06:24:01 UTC
Severity: wishlist
Found in version 24.3.92
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
> From: Michael Albinus <michael.albinus <at> gmx.de>
> Date: Mon, 25 Aug 2014 08:41:03 +0200
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 18051 <at> debbugs.gnu.org
>
> > BTW, I think that collation functions with 3rd optional argument
> > to specify locale settings will be a bit more versatile, e.g.
> >
> > (string-collate-lessp a b "es_ES.UTF-8")
>
> We discuss this already, see
> <http://lists.gnu.org/archive/html/bug-gnu-emacs/2014-08/msg00623.html>
>
> My major reservation to this approach is that it doesn't fit well using
> string-collate-lessp as predicate of sort. That's why I have proposed a
> global variable as alternative, which could be let-bounded.
I think that binding a variable will indeed be cleaner. Using
process-environment for that purpose should be reserved for the
application level. Also, what if LC_COLLATE is not set in the
environment, but 'setlocale' does return some value for it? shouldn't
we use that?
Here are a few more thoughts about related issues:
1. Why does str_collate return a ptrdiff_t value? AFAIK, wcscoll
etc. return int data type, and of rather small values.
2. Should we signal an error if the input strings are not pure-ASCII
or multibyte? Unibyte strings will at best cause incorrect
results. And what about strings with invalid codepoints,
e.g. those outside of the Unicode range, which can happen inside
Lisp strings?
3. What about errors in wcscoll? The current code ignores them;
however, the value returned by wcscoll in case of an error is not
documented, so it could be random. Should we signal an error if
errno gets set by wcscoll?
4. How to control the optional features of the collating sequence? I
mean, for example, the fact that punctuation characters are ignored
in the .UTF-8 locales on glibc hosts (or so it seems). At least on
Windows, a somewhat higher degree of control is available, but it
must be specified separately of the locale ID. E.g., the
comparison function accepts flags to ignore punctuation and
symbols, width differences, diacritics, etc. Should we have another
variable, perhaps w32-specific, to request these features?
Alternatively, we could use .UTF-8 on Windows to communicate that,
although that sounds like a kludge.
5. The locale names on Windows are different from Posix: Windows uses
3-letter abbreviations of the country and the language,
e.g. "fra_FRA" instead of the Posix "fr_FR". Do we want the locale
string values used for let-binding the above-mentioned variable to
be portable across systems? Then we'd need some conversion
database on MS-Windows.
6. I think we will want case-insensitive version of this function.
This bug report was last modified 10 years and 224 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.