GNU bug report logs -
#9086
'dircolors' request: support UPPERCASE suffixes, also, please.
Previous Next
Reported by: SciFi <sci-fi <at> hush.ai>
Date: Thu, 14 Jul 2011 22:43:02 UTC
Severity: wishlist
Tags: fixed
Done: Assaf Gordon <assafgordon <at> gmail.com>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
On 10/09/2012 01:32 PM, Jim Meyering wrote:
> Pádraig Brady wrote:
>> On 10/09/2012 12:52 PM, Jim Meyering wrote:> Eric Blake wrote:
>>>> On 07/26/2011 02:33 PM, marcel partap wrote:
>>>>> Here's a patch. Adds STRCASEEQ_LEN macro for case insensitive extension
>>>>> matching.
>>>>> #regards/marcel.
>>>>
>>>> Your patch would make the new behavior unconditional. But I like case
>>>> sensitivity, and think that case insensitivity should be an opt-in
>>>> process that I request, with coordination between dircolors to
>>>> generate a new string for LS_COLORS to be honored by ls. Furthermore,
>>>> the patch is lacking in NEWS, documentation, and testsuite coverage.
>>>>
>>>> Additionally, you should be aware that strncasecmp() has undefined
>>>> behavior in non-C multibyte locales. It would probably be better to
>>>> use c_strncasecmp(), so that you are guaranteed defined behavior
>>>> regardless of the current locale.
>>>>
>>>> Would you care to tackle those additional issues? And are you set up
>>>> for copyright assignment, since the patch will probably be non-trivial
>>>> by that point in time?
>>>
>>> I saw this while going back through old bugs, so started poking around.
>>> How about this: if a suffix is not matched, convert it to lower case
>>> and check again. That way, anyone who cares to highlight .TaR or .TAR
>>> differently from .tar can still easily do so, yet names ending with
>>> uppercase .TAR, .JPG, .FLAC, etc. will now be highlighted by default.
>>>
>>> Does anyone think it's worth making this new fallback-to-case-insensitivity
>>> an option?
>>
>> Not me anyway.
>>
>>> While looking at that, I realized that comparing each name in an ls'd
>>> directory with each of nearly 100 suffixes should leave nontrivial room
>>> for improvement. Sure enough, once I'd replaced that linear search
>>> with a hash-table lookup, I see a measurable improvement.
>>
>>> and taking best-of-10 times, I see a 0.34 -> 0.23 (~30%) improvement.
>>
>> Very nice.
>>
>>> Of course, I have deliberately used the case that shows the most improvement:
>>> - a directory with very many files
>>> - no suffix match, so the old code searches all suffixes
>>> - using tmpfs: minimal stat overhead
>>
>>> @@ -4343,21 +4412,11 @@ print_color_indicator (const struct fileinfo *f, bool symlink_target)
>>> }
>>>
>>> /* Check the file's suffix only if still classified as C_FILE. */
>>> - ext = NULL;
>>> - if (type == C_FILE)
>>> - {
>>> - /* Test if NAME has a recognized suffix. */
>>> -
>>> - len = strlen (name);
>>> - name += len; /* Pointer to final \0. */
>>> - for (ext = color_ext_list; ext != NULL; ext = ext->next)
>>> - {
>>> - if (ext->ext.len <= len
>>> - && STREQ_LEN (name - ext->ext.len, ext->ext.string,
>>> - ext->ext.len))
>>> - break;
>>> - }
>>> - }
>>> + char *suffix;
>>> + struct sufco *ext; /* Color extension */
>>> + ext = (type == C_FILE && (suffix = strrchr (name, '.'))
>>> + ? suffix_color_lookup (suffix)
>>> + : NULL);
>>
>> So do we now only support suffixes delimited by '.' ?
>> Previously the delimiter was arbitrary or optional:
>>
>> touch star; LS_COLORS="*tar=01;31" /bin/ls --color *tar
>
> Good catch. I realized that early on, but then forgot to mention it.
> Yes, I would have to document that the "." is now required.
> It seems like a reasonable restriction, but technically
> it could be called a regression.
>
> Also, with these changes, a multiple-"." suffix will no longer work.
> I.e., before, if you wanted to give *.tar.xz files a color different
> from plain *.xz files, you could.
>
> Does anyone object to that?
It's marginal, though I'd be inclined to keep the existing
support for arbitrary suffixes. We could fall back to the
slower linear scan iff an entension entry in LS_COLORS didn't
contain a single '.' To be more generally performant and
support a longest suffix match we'd have to use something
like a trie I think.
cheers,
Pádraig.
This bug report was last modified 6 years and 218 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.