GNU bug report logs -
#18051
24.3.92; ls-lisp: Sorting; make ls-lisp-string-lessp a normal function?
Previous Next
Reported by: michael_heerdegen <at> web.de
Date: Fri, 18 Jul 2014 06:24:01 UTC
Severity: wishlist
Found in version 24.3.92
Done: Eli Zaretskii <eliz <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Eli Zaretskii <eliz <at> gnu.org> writes:
> Here are a few more thoughts about related issues:
>
> 1. Why does str_collate return a ptrdiff_t value? AFAIK, wcscoll
> etc. return int data type, and of rather small values.
Hm, yes. Both wcscoll and w32_compare_strings return int, so I've
changed that for str_collate accordingly.
> 2. Should we signal an error if the input strings are not pure-ASCII
> or multibyte? Unibyte strings will at best cause incorrect
> results.
Maybe we shall convert the strings to multibyte, via string_to_multibyte()?
If the string is already multibyte, it doesn't harm.
> And what about strings with invalid codepoints,
> e.g. those outside of the Unicode range, which can happen inside
> Lisp strings?
> 3. What about errors in wcscoll? The current code ignores them;
> however, the value returned by wcscoll in case of an error is not
> documented, so it could be random. Should we signal an error if
> errno gets set by wcscoll?
wcscoll sets EINVAL when the codepoint is out of range. I've added a
check for this case, returning an error.
(string-collate-equalp (string 1) (string ?\U0020FFFF))
=> error: Non-Unicode character: 0x20ffff
> 4. How to control the optional features of the collating sequence? I
> mean, for example, the fact that punctuation characters are ignored
> in the .UTF-8 locales on glibc hosts (or so it seems). At least on
> Windows, a somewhat higher degree of control is available, but it
> must be specified separately of the locale ID. E.g., the
> comparison function accepts flags to ignore punctuation and
> symbols, width differences, diacritics, etc. Should we have another
> variable, perhaps w32-specific, to request these features?
> Alternatively, we could use .UTF-8 on Windows to communicate that,
> although that sounds like a kludge.
In Posix systems, I'm not aware of configuring such optional features
via glibc. The most granular selection is what you dou with LC_COLLATE.
If we want to offer more granular settings, we would need to use a library
like libicu (http://icu-project.org/). Could be done, but should be optional.
> 5. The locale names on Windows are different from Posix: Windows uses
> 3-letter abbreviations of the country and the language,
> e.g. "fra_FRA" instead of the Posix "fr_FR". Do we want the locale
> string values used for let-binding the above-mentioned variable to
> be portable across systems? Then we'd need some conversion
> database on MS-Windows.
Here I'm a bit undecided. We could let it to the users to find the
proper locale name, but this is inconvenient. OTOH it would be much work
to install a mapping system, and we would need to maintain it. What if
there would be a new "en_SC" (Scotland) locale? We would need to
maintain such changes in Emacs forever ...
> 6. I think we will want case-insensitive version of this function.
That's also on my todo list. But I'm a little bit undecided whether we
shall add it to string-collate-* functions, or whether there shall be
further functions.
Maybe we could use sort-fold-case for this as indication? Or is this too
specific?
Best regards, Michael.
This bug report was last modified 10 years and 224 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.