From debbugs-submit-bounces@debbugs.gnu.org Thu Jul 14 18:42:05 2011 Received: (at submit) by debbugs.gnu.org; 14 Jul 2011 22:42:06 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QhUbd-0003s2-9t for submit@debbugs.gnu.org; Thu, 14 Jul 2011 18:42:05 -0400 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1QhUbb-0003ra-ND for submit@debbugs.gnu.org; Thu, 14 Jul 2011 18:42:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QhUbV-0001Hq-9k for submit@debbugs.gnu.org; Thu, 14 Jul 2011 18:41:58 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, T_TO_NO_BRKTS_FREEMAIL autolearn=unavailable version=3.3.1 Received: from lists.gnu.org ([140.186.70.17]:56522) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QhUbU-0001Hm-SL for submit@debbugs.gnu.org; Thu, 14 Jul 2011 18:41:56 -0400 Received: from eggs.gnu.org ([140.186.70.92]:49461) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QhUbT-0007U6-58 for bug-coreutils@gnu.org; Thu, 14 Jul 2011 18:41:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QhUbR-0001HW-6v for bug-coreutils@gnu.org; Thu, 14 Jul 2011 18:41:54 -0400 Received: from smtp12.hushmail.com ([65.39.178.135]:44531) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QhUbQ-0001HI-Kj for bug-coreutils@gnu.org; Thu, 14 Jul 2011 18:41:52 -0400 Received: from smtp12.hushmail.com (localhost.localdomain [127.0.0.1]) by smtp12.hushmail.com (Postfix) with SMTP id AE7D9C030A for ; Thu, 14 Jul 2011 22:41:49 +0000 (UTC) X-Hush-Verified-Domain: hush.ai Received: from smtp.hushmail.com (w7.hushmail.com [65.39.178.32]) by smtp12.hushmail.com (Postfix) with ESMTP; Thu, 14 Jul 2011 22:41:49 +0000 (UTC) Received: from [192.168.1.65] (unknown [65.39.178.78]) by smtp.hushmail.com (Postfix) with ESMTP id 2F8E36F43F; Thu, 14 Jul 2011 22:41:49 +0000 (UTC) Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha1; boundary="Apple-Mail-3--483861024" Mime-Version: 1.0 (Apple Message framework v1084) Subject: 'dircolors' request: support UPPERCASE suffixes, also, please. From: SciFi X-Priority: 5 Date: Thu, 14 Jul 2011 17:41:44 -0500 Content-Transfer-Encoding: 7bit Message-Id: To: bug-coreutils@gnu.org X-Pgp-Agent: GPGMail 1.3.3 X-Mailer: Apple Mail (2.1084) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -6.0 (------) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.0 (------) This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --Apple-Mail-3--483861024 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hi, For those of us who use case-SENsitive file systems, and then come = across filenames with UPPERCASE suffixes, the provided 'dircolors' = function does not colorize those listings. What I usually do, every time a new release of coreutils comes out, is = to duplicate by hand the list of suffixes in src/dircolors.hin to add = the UPPERCASE versions of all those entries. But this does not 'fix' any Mixed-casE suffixes, tho. And I wonder if doubling this list of suffixes has a bit less = performance, and eats-up a bit more RAM for the LS_COLORS env-var etc. So, I need to request a change to color processing in various commands = (esp'ly 'ls') to internally process a case-neutral version of the = filename to determine the highlighting strings to be sent to the = screen/terminal/whatever. The coding logic seems way-too complex for my lil mind to try providing = a patch. ;) Thanks for considering this request. --Apple-Mail-3--483861024 content-type: application/pgp-signature; x-mac-type=70674453; name=PGP.sig content-description: This is a digitally signed message part content-disposition: inline; filename=PGP.sig content-transfer-encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org iQEcBAEBAgAGBQJOH3CsAAoJEKkeWNKet7+KUY4H/ipHfXbLDLqZeVFLtd7W4KFW dMtvwZYGReboV2r/a9MyucHZV6ZHr7lkqDwziwMK5f6se59ctOOIH820HdyjhjL8 EzBFgM1xqj/cAOCZbqxDigXQVn/6Bl9qFlm09y9Qjet9vPYeN4InXzeYgYB0IesC 2dTmQQx3nPT4eCY8PAaq6/DmSRo8msGDMxcxuDSBZHZNw0+q+z5XeHRU86rQHfNV mDkBLd7Tbq3RYtd1rjHqCcYbgWrpE/bsvUVI+Z9UtF14N7wITt7jGj0rJql3QxoW Yc58oGtb+8w8T7KZISzxnXOieHEBZbnSvqJ6DqfDIGT4B2ZoGFDEdQQPtdVlh6M= =QTtV -----END PGP SIGNATURE----- --Apple-Mail-3--483861024-- From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 27 15:17:14 2011 Received: (at 9086) by debbugs.gnu.org; 27 Jul 2011 19:17:15 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Qm9bW-0007Fw-64 for submit@debbugs.gnu.org; Wed, 27 Jul 2011 15:17:14 -0400 Received: from mailout-de.gmx.net ([213.165.64.22]) by debbugs.gnu.org with smtp (Exim 4.69) (envelope-from ) id 1QloK6-0007Az-OQ for 9086@debbugs.gnu.org; Tue, 26 Jul 2011 16:33:51 -0400 Received: (qmail invoked by alias); 26 Jul 2011 20:33:44 -0000 Received: from 149-042.oih.RWTH-Aachen.DE (EHLO [137.226.149.42]) [137.226.149.42] by mail.gmx.net (mp034) with SMTP; 26 Jul 2011 22:33:44 +0200 X-Authenticated: #724076 X-Provags-ID: V01U2FsdGVkX19MP94Sp3JMc3p4J1TnKOF5hiJsJej/p7x2bLRHEc Y0GOYEQARD0Slb Message-ID: <4E2F24A7.3000206@gmx.net> Date: Tue, 26 Jul 2011 22:33:43 +0200 From: marcel partap User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.18) Gecko/20110616 Thunderbird/3.1.11 ThunderBrowse/3.8 MIME-Version: 1.0 To: 9086@debbugs.gnu.org Subject: [PATCH] ls --color case insensitive extension matching Content-Type: multipart/mixed; boundary="------------000907010504010302050508" X-Y-GMX-Trusted: 0 X-Spam-Score: -2.6 (--) X-Debbugs-Envelope-To: 9086 X-Mailman-Approved-At: Wed, 27 Jul 2011 15:17:12 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.6 (--) This is a multi-part message in MIME format. --------------000907010504010302050508 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Here's a patch. Adds STRCASEEQ_LEN macro for case insensitive extension matching. #regards/marcel. --------------000907010504010302050508 Content-Type: text/plain; name="ls-color-case-insensitive-extension-matching.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="ls-color-case-insensitive-extension-matching.patch" --- src/ls.c.orig 2011-04-12 12:07:43.000000000 +0200 +++ src/ls.c 2011-07-26 22:27:08.503523893 +0200 @@ -4209,7 +4209,7 @@ for (ext = color_ext_list; ext != NULL; ext = ext->next) { if (ext->ext.len <= len - && STREQ_LEN (name - ext->ext.len, ext->ext.string, + && STRCASEEQ_LEN (name - ext->ext.len, ext->ext.string, ext->ext.len)) break; } --- src/system.h.orig 2011-04-25 11:45:49.000000000 +0200 +++ src/system.h 2011-07-26 22:26:48.539795894 +0200 @@ -258,6 +258,7 @@ #define STREQ(a, b) (strcmp (a, b) == 0) #define STREQ_LEN(a, b, n) (strncmp (a, b, n) == 0) +#define STRCASEEQ_LEN(a, b, n) (strncasecmp (a, b, n) == 0) #define STRPREFIX(a, b) (strncmp(a, b, strlen (b)) == 0) /* Just like strncmp, but the first argument must be a literal string --------------000907010504010302050508-- From debbugs-submit-bounces@debbugs.gnu.org Wed Jul 27 15:25:57 2011 Received: (at 9086) by debbugs.gnu.org; 27 Jul 2011 19:25:57 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Qm9jx-0007T9-1d for submit@debbugs.gnu.org; Wed, 27 Jul 2011 15:25:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Qm9ju-0007T0-72 for 9086@debbugs.gnu.org; Wed, 27 Jul 2011 15:25:55 -0400 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p6RJPrfF010222 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 27 Jul 2011 15:25:53 -0400 Received: from [10.3.113.141] (ovpn-113-141.phx2.redhat.com [10.3.113.141]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p6RJPqOV023973; Wed, 27 Jul 2011 15:25:52 -0400 Message-ID: <4E306640.1080908@redhat.com> Date: Wed, 27 Jul 2011 13:25:52 -0600 From: Eric Blake Organization: Red Hat User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110621 Fedora/3.1.11-1.fc14 Lightning/1.0b3pre Mnenhy/0.8.3 Thunderbird/3.1.11 MIME-Version: 1.0 To: marcel partap Subject: Re: bug#9086: [PATCH] ls --color case insensitive extension matching References: <4E2F24A7.3000206@gmx.net> In-Reply-To: <4E2F24A7.3000206@gmx.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 X-Spam-Score: -10.3 (----------) X-Debbugs-Envelope-To: 9086 Cc: 9086@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -10.3 (----------) On 07/26/2011 02:33 PM, marcel partap wrote: > Here's a patch. Adds STRCASEEQ_LEN macro for case insensitive extension > matching. > #regards/marcel. Your patch would make the new behavior unconditional. But I like case sensitivity, and think that case insensitivity should be an opt-in process that I request, with coordination between dircolors to generate a new string for LS_COLORS to be honored by ls. Furthermore, the patch is lacking in NEWS, documentation, and testsuite coverage. Additionally, you should be aware that strncasecmp() has undefined behavior in non-C multibyte locales. It would probably be better to use c_strncasecmp(), so that you are guaranteed defined behavior regardless of the current locale. Would you care to tackle those additional issues? And are you set up for copyright assignment, since the patch will probably be non-trivial by that point in time? -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org From debbugs-submit-bounces@debbugs.gnu.org Tue Oct 09 07:53:23 2012 Received: (at 9086) by debbugs.gnu.org; 9 Oct 2012 11:53:23 +0000 Received: from localhost ([127.0.0.1]:34783 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TLYNF-00014o-Hx for submit@debbugs.gnu.org; Tue, 09 Oct 2012 07:53:22 -0400 Received: from mx.meyering.net ([88.168.87.75]:47094) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TLYN9-00014W-8E for 9086@debbugs.gnu.org; Tue, 09 Oct 2012 07:53:19 -0400 Received: from rho.meyering.net (rho.meyering.net [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id 07337600EE; Tue, 9 Oct 2012 13:52:37 +0200 (CEST) From: Jim Meyering To: Eric Blake Subject: ls --color: 30% speed-up and case-insensitive suffixes [Re: bug#9086 In-Reply-To: <4E306640.1080908@redhat.com> (Eric Blake's message of "Wed, 27 Jul 2011 13:25:52 -0600") References: <4E2F24A7.3000206@gmx.net> <4E306640.1080908@redhat.com> Date: Tue, 09 Oct 2012 13:52:37 +0200 Message-ID: <87y5jfooai.fsf_-_@rho.meyering.net> Lines: 521 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: 9086 Cc: sci-fi@hush.ai, 9086@debbugs.gnu.org, marcel partap X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.3 (-) Eric Blake wrote: > On 07/26/2011 02:33 PM, marcel partap wrote: >> Here's a patch. Adds STRCASEEQ_LEN macro for case insensitive extension >> matching. >> #regards/marcel. > > Your patch would make the new behavior unconditional. But I like case > sensitivity, and think that case insensitivity should be an opt-in > process that I request, with coordination between dircolors to > generate a new string for LS_COLORS to be honored by ls. Furthermore, > the patch is lacking in NEWS, documentation, and testsuite coverage. > > Additionally, you should be aware that strncasecmp() has undefined > behavior in non-C multibyte locales. It would probably be better to > use c_strncasecmp(), so that you are guaranteed defined behavior > regardless of the current locale. > > Would you care to tackle those additional issues? And are you set up > for copyright assignment, since the patch will probably be non-trivial > by that point in time? I saw this while going back through old bugs, so started poking around. How about this: if a suffix is not matched, convert it to lower case and check again. That way, anyone who cares to highlight .TaR or .TAR differently from .tar can still easily do so, yet names ending with uppercase .TAR, .JPG, .FLAC, etc. will now be highlighted by default. Does anyone think it's worth making this new fallback-to-case-insensitivity an option? While looking at that, I realized that comparing each name in an ls'd directory with each of nearly 100 suffixes should leave nontrivial room for improvement. Sure enough, once I'd replaced that linear search with a hash-table lookup, I see a measurable improvement. This is on a tmpfs file system, with 100,000 .c files in one directory, created like this: seq --f %g.c 100000|xargs touch Comparing old to new: $ for i in $(seq 10); do env time --f=%e ls --color=always > /t/old; done 0.35 0.34 0.48 0.36 0.35 0.35 0.35 0.35 0.36 0.35 $ for i in $(seq 10);do env time --f=%e /cu/src/ls --color=always >/t/new;done 0.24 0.24 0.23 0.23 0.23 0.33 0.23 0.23 0.24 0.23 and taking best-of-10 times, I see a 0.34 -> 0.23 (~30%) improvement. Of course, I have deliberately used the case that shows the most improvement: - a directory with very many files - no suffix match, so the old code searches all suffixes - using tmpfs: minimal stat overhead In general, the improvement will be smaller. The first patch is the O(100-strncmp) -> O(hash-lookup) speed-up. The second adds the case-insensitive fallback. >From be82e6866e35320a3c228d7c8af5416a31c6a14e Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Mon, 8 Oct 2012 19:06:18 +0200 Subject: [PATCH 1/2] ls: --color: speed up suffix-coloring Prior to this change, when coloring a file name based on its suffix, and with default dircolors-generated LS_COLORS, ls would compare the file name to each of nearly 100 suffixes. Convert that process to use a hash-table lookup. On a tmpfs file system, with 100,000 .c files in one directory, created via "seq --f %g.c 100000|xargs touch", compare old to new: $ for i in $(seq 10);do env time --f=%e ls --col=always >/t/old; done ... 0.34 $ for i in $(seq 10);do env time --f=%e /t/ls --col=al >/t/new; done ... 0.23 Taking best-of-10 times, that's a 0.34 -> 0.23 (~30%) improvement. * src/ls.c: Include "hash-pjw.h". (color_ext_type): Remove global. (suffix_map, struct sufco): Declare. (sufco_hasher, sufco_comparator, suffix_color_init): New functions. (suffix_color_insert, suffix_color_lookup): New functions. (main): Call suffix_color_init. (parse_ls_color): Rather than creating a linked list of suffix/color-code pairs, insert each pair into our new suffix_map. (print_color_indicator): Use map-lookup, rather than linear search. * NEWS (Improvements): Mention it. --- NEWS | 2 + src/ls.c | 157 +++++++++++++++++++++++++++++++++++++++++++-------------------- 2 files changed, 110 insertions(+), 49 deletions(-) diff --git a/NEWS b/NEWS index aff5bf1..16d7dc8 100644 --- a/NEWS +++ b/NEWS @@ -57,6 +57,8 @@ GNU coreutils NEWS -*- outline -*- deterministic primality test for each prime factor, not just a probabilistic test. + ls --color is as much as 30% faster + seq is now up to 70 times faster than it was in coreutils-8.19 and prior, but only with non-negative whole numbers, an increment of 1, and no format-changing options. diff --git a/src/ls.c b/src/ls.c index 106d234..acebd11 100644 --- a/src/ls.c +++ b/src/ls.c @@ -91,6 +91,7 @@ #include "filenamecat.h" #include "hard-locale.h" #include "hash.h" +#include "hash-pjw.h" #include "human.h" #include "filemode.h" #include "filevercmp.h" @@ -599,9 +600,6 @@ static struct bin_str color_indicator[] = { LEN_STR_PAIR ("\033[K") }, /* cl: clear to end of line */ }; -/* FIXME: comment */ -static struct color_ext_type *color_ext_list = NULL; - /* Buffer for color sequences */ static char *color_buf; @@ -1018,6 +1016,83 @@ dired_dump_obstack (const char *prefix, struct obstack *os) } } +/* ********************************************************************* */ +/* Map each suffix from LS_COLORS to its escape code. + A filename's suffix is the key, the value is a byte sequence + with which to color-highlight the corresponding name. */ +static Hash_table *suffix_map; + +/* Initial size of that table. */ +enum { INITIAL_SUFFIX_COLOR_TABLE_SIZE = 131 }; + +/* An entry in the suffix_map table. */ +struct sufco + { + char *suffix; /* file name suffix */ + struct bin_str color_code; /* terminal codes to elicit desired color */ + }; + +static size_t +sufco_hasher (void const *entry, size_t table_size) +{ + struct sufco const *s = entry; + return hash_pjw (s->suffix, table_size); +} + +static bool +sufco_comparator (void const *e1, void const *e2) +{ + struct sufco const *s1 = e1; + struct sufco const *s2 = e2; + return STREQ (s1->suffix, s2->suffix); +} + +/* Initialize a hash table to implement our suffix/color-code map. */ +static void +suffix_color_init (void) +{ + suffix_map = hash_initialize (INITIAL_SUFFIX_COLOR_TABLE_SIZE, NULL, + sufco_hasher, + sufco_comparator, + NULL); + if (suffix_map == NULL) + xalloc_die (); +} + +/* Insert S, a malloc'd suffix/color-code pair. If the specified suffix is + initially in the table, change the mapping to refer to the new entry. + Either insert S or, when its suffix is a duplicate, record this new + color code in the table and free S's memory. */ +static bool +suffix_color_insert (struct sufco *s) +{ + struct sufco *match = NULL; + if (hash_insert_if_absent (suffix_map, s, (const void **) &match) < 0) + return false; + + if (match) + { + /* This suffix is already in the table. + Incoming color code takes precedence. */ + match->color_code.len = s->color_code.len; + match->color_code.string = s->color_code.string; + free (s); + } + + return true; +} + +/* Map a suffix string from LS_COLORS to its color code bytes. */ +static struct sufco * +suffix_color_lookup (char const *suffix) +{ + struct sufco s; + s.suffix = (char *) suffix; + return hash_lookup (suffix_map, &s); +} + +/* ********************************************************************* */ + /* Read the abbreviated month names from the locale, to align them and to determine the max width of the field and to truncate names greater than our max allowed. @@ -1287,7 +1362,10 @@ main (int argc, char **argv) i = decode_switches (argc, argv); if (print_with_color) - parse_ls_color (); + { + suffix_color_init (); + parse_ls_color (); + } /* Test print_with_color again, because the call to parse_ls_color may have just reset it -- e.g., if LS_COLORS is invalid. */ @@ -2100,7 +2178,7 @@ decode_switches (int argc, char **argv) } /* Parse a string as part of the LS_COLORS variable; this may involve - decoding all kinds of escape characters. If equals_end is set an + decoding all kinds of escape characters. If equals_end is set, an unescaped equal sign ends the string, otherwise only a : or \0 does. Set *OUTPUT_COUNT to the number of bytes output. Return true if successful. @@ -2324,7 +2402,7 @@ parse_ls_color (void) char *buf; /* color_buf buffer pointer */ int ind_no; /* Indicator number */ char label[3]; /* Indicator label */ - struct color_ext_type *ext; /* Extension we are working on */ + struct sufco *ext; /* suffix/color-code we are working on */ if ((p = getenv ("LS_COLORS")) == NULL || *p == '\0') return; @@ -2351,20 +2429,18 @@ parse_ls_color (void) break; case '*': - /* Allocate new extension block and add to head of - linked list (this way a later definition will - override an earlier one, which can be useful for - having terminal-specific defs override global). */ - - ext = xmalloc (sizeof *ext); - ext->next = color_ext_list; - color_ext_list = ext; + { + ext = xmalloc (sizeof *ext); + ext->suffix = buf; - ++p; - ext->ext.string = buf; + ++p; + size_t suff_len; + state = (get_funky_string (&buf, &p, true, &suff_len) + ? PS_4 : PS_FAIL); - state = (get_funky_string (&buf, &p, true, &ext->ext.len) - ? PS_4 : PS_FAIL); + /* Insert '\0', to NUL-terminate ext->suffix. */ + *buf++ = 0; + } break; case '\0': @@ -2411,9 +2487,13 @@ parse_ls_color (void) case PS_4: /* Equal sign after *.ext */ if (*(p++) == '=') { - ext->seq.string = buf; - state = (get_funky_string (&buf, &p, false, &ext->seq.len) + ext->color_code.string = buf; + state = (get_funky_string (&buf, &p, false, &ext->color_code.len) ? PS_START : PS_FAIL); + + if ( ! suffix_color_insert (ext)) + goto done; + ext = NULL; } else state = PS_FAIL; @@ -2430,18 +2510,10 @@ parse_ls_color (void) if (state == PS_FAIL) { - struct color_ext_type *e; - struct color_ext_type *e2; - error (0, 0, _("unparsable value for LS_COLORS environment variable")); free (color_buf); - for (e = color_ext_list; e != NULL; /* empty */) - { - e2 = e; - e = e->next; - free (e2); - } + free (ext); print_with_color = false; } @@ -4265,10 +4337,6 @@ print_type_indicator (bool stat_ok, mode_t mode, enum filetype type) static bool print_color_indicator (const struct fileinfo *f, bool symlink_target) { - enum indicator_no type; - struct color_ext_type *ext; /* Color extension */ - size_t len; /* Length of name */ - const char* name; mode_t mode; int linkok; @@ -4287,6 +4355,7 @@ print_color_indicator (const struct fileinfo *f, bool symlink_target) /* Is this a nonexistent file? If so, linkok == -1. */ + enum indicator_no type; if (linkok == -1 && is_colored (C_MISSING)) type = C_MISSING; else if (!f->stat_ok) @@ -4343,21 +4412,11 @@ print_color_indicator (const struct fileinfo *f, bool symlink_target) } /* Check the file's suffix only if still classified as C_FILE. */ - ext = NULL; - if (type == C_FILE) - { - /* Test if NAME has a recognized suffix. */ - - len = strlen (name); - name += len; /* Pointer to final \0. */ - for (ext = color_ext_list; ext != NULL; ext = ext->next) - { - if (ext->ext.len <= len - && STREQ_LEN (name - ext->ext.len, ext->ext.string, - ext->ext.len)) - break; - } - } + char *suffix; + struct sufco *ext; /* Color extension */ + ext = (type == C_FILE && (suffix = strrchr (name, '.')) + ? suffix_color_lookup (suffix) + : NULL); /* Adjust the color for orphaned symlinks. */ if (type == C_LINK && !linkok) @@ -4368,7 +4427,7 @@ print_color_indicator (const struct fileinfo *f, bool symlink_target) { const struct bin_str *const s - = ext ? &(ext->seq) : &color_indicator[type]; + = ext ? &(ext->color_code) : &color_indicator[type]; if (s->string != NULL) { /* Need to reset so not dealing with attribute combinations */ -- 1.8.0.rc1 >From 30a762cacf2031ac4431b427a195c9f535c745e2 Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Mon, 8 Oct 2012 10:05:32 +0200 Subject: [PATCH 2/2] ls: --color suffix-match: fall back to case-insensitive look-up * src/ls.c: FIXME * tests/ls/color-icase.sh: Test the change. * tests/local.mk (all_tests): Add it. * NEWS (Changes in behavior): Mention it. Suggested by SciFi in http://bugs.gnu.org/9086 --- NEWS | 5 +++++ src/ls.c | 27 ++++++++++++++++++++++++++- tests/local.mk | 1 + tests/ls/color-icase.sh | 31 +++++++++++++++++++++++++++++++ 4 files changed, 63 insertions(+), 1 deletion(-) create mode 100755 tests/ls/color-icase.sh diff --git a/NEWS b/NEWS index 16d7dc8..8364fd4 100644 --- a/NEWS +++ b/NEWS @@ -46,6 +46,11 @@ GNU coreutils NEWS -*- outline -*- ** Changes in behavior + ls --color's suffix-matching code now uses a case-insensitive fall-back, + which means a file named e.g., "PKG.TAR" is now colored red, just like + "pkg.tar", but if you prefer to highlight names ending in .TAR differently + from those ending in .tar you may. + nproc now diagnoses with an error, non option command line parameters. ** Improvements diff --git a/src/ls.c b/src/ls.c index acebd11..3a1ba3c 100644 --- a/src/ls.c +++ b/src/ls.c @@ -86,6 +86,7 @@ #include "acl.h" #include "argmatch.h" +#include "c-ctype.h" #include "dev-ino.h" #include "error.h" #include "filenamecat.h" @@ -1088,7 +1089,31 @@ suffix_color_lookup (char const *suffix) { struct sufco s; s.suffix = (char *) suffix; - return hash_lookup (suffix_map, &s); + struct sufco *e = hash_lookup (suffix_map, &s); + if (e) + return e; + + size_t s_len = strlen (suffix); + /* If there is no uppercase byte, then stop here. */ + if (strcspn (suffix, "ABCDEFGHIJKLMNOPQRSTUVWXYZ") == s_len) + return NULL; + + char *lower_case = malloc (s_len + 1); + if (lower_case == NULL) + return NULL; + char *p = lower_case; + while (*suffix) + { + *p++ = c_tolower (*suffix); + suffix++; + } + *p = 0; + + s.suffix = lower_case; + e = hash_lookup (suffix_map, &s); + free (lower_case); + + return e; } /* ********************************************************************* */ diff --git a/tests/local.mk b/tests/local.mk index 486bf31..12c0ac2 100644 --- a/tests/local.mk +++ b/tests/local.mk @@ -516,6 +516,7 @@ all_tests = \ tests/ls/block-size.sh \ tests/ls/color-clear-to-eol.sh \ tests/ls/color-dtype-dir.sh \ + tests/ls/color-icase.sh \ tests/ls/color-norm.sh \ tests/ls/dangle.sh \ tests/ls/dired.sh \ diff --git a/tests/ls/color-icase.sh b/tests/ls/color-icase.sh new file mode 100755 index 0000000..726dee5 --- /dev/null +++ b/tests/ls/color-icase.sh @@ -0,0 +1,31 @@ +#!/bin/sh +# after 8.19, ls --color recognized suffixes regardless of case + +# Copyright (C) 2012 Free Software Foundation, Inc. + +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation, either version 3 of the License, or +# (at your option) any later version. + +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. + +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +. "${srcdir=.}/tests/init.sh"; path_prepend_ ./src +print_ver_ ls + +color_code='01;31' + +echo x > red.TAR || framework_failure_ +printf '\e[0m\e['"$color_code"'mred.TAR\e[0m\n' > exp || fail=1 + +LS_COLORS="*.tar=$color_code" ls --color=always red.TAR > out || fail=1 + +compare exp out || fail=1 + +Exit $fail -- 1.8.0.rc1 From debbugs-submit-bounces@debbugs.gnu.org Tue Oct 09 08:26:00 2012 Received: (at 9086) by debbugs.gnu.org; 9 Oct 2012 12:26:00 +0000 Received: from localhost ([127.0.0.1]:34825 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TLYsp-0001qp-Uy for submit@debbugs.gnu.org; Tue, 09 Oct 2012 08:26:00 -0400 Received: from mx1.redhat.com ([209.132.183.28]:16286) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TLYsn-0001qf-0D for 9086@debbugs.gnu.org; Tue, 09 Oct 2012 08:25:58 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q99CPMaF023092 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 9 Oct 2012 08:25:22 -0400 Received: from [10.36.116.82] (ovpn-116-82.ams2.redhat.com [10.36.116.82]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q99CPJ2f000569 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 9 Oct 2012 08:25:21 -0400 Message-ID: <507417AF.4010605@draigBrady.com> Date: Tue, 09 Oct 2012 13:25:19 +0100 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#9086: ls --color: 30% speed-up and case-insensitive suffixes [Re: bug#9086 References: <4E2F24A7.3000206@gmx.net> <4E306640.1080908@redhat.com> <87y5jfooai.fsf_-_@rho.meyering.net> In-Reply-To: <87y5jfooai.fsf_-_@rho.meyering.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id q99CPMaF023092 X-Spam-Score: -4.2 (----) X-Debbugs-Envelope-To: 9086 Cc: 9086@debbugs.gnu.org, Eric Blake , marcel partap X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.2 (----) On 10/09/2012 12:52 PM, Jim Meyering wrote:> Eric Blake wrote: >> On 07/26/2011 02:33 PM, marcel partap wrote: >>> Here's a patch. Adds STRCASEEQ_LEN macro for case insensitive extens= ion >>> matching. >>> #regards/marcel. >> >> Your patch would make the new behavior unconditional. But I like cas= e >> sensitivity, and think that case insensitivity should be an opt-in >> process that I request, with coordination between dircolors to >> generate a new string for LS_COLORS to be honored by ls. Furthermore= , >> the patch is lacking in NEWS, documentation, and testsuite coverage. >> >> Additionally, you should be aware that strncasecmp() has undefined >> behavior in non-C multibyte locales. It would probably be better to >> use c_strncasecmp(), so that you are guaranteed defined behavior >> regardless of the current locale. >> >> Would you care to tackle those additional issues? And are you set up >> for copyright assignment, since the patch will probably be non-trivia= l >> by that point in time? > > I saw this while going back through old bugs, so started poking around. > How about this: if a suffix is not matched, convert it to lower case > and check again. That way, anyone who cares to highlight .TaR or .TAR > differently from .tar can still easily do so, yet names ending with > uppercase .TAR, .JPG, .FLAC, etc. will now be highlighted by default. > > Does anyone think it's worth making this new fallback-to-case-insensit= ivity > an option? Not me anyway. > While looking at that, I realized that comparing each name in an ls'd > directory with each of nearly 100 suffixes should leave nontrivial roo= m > for improvement. Sure enough, once I'd replaced that linear search > with a hash-table lookup, I see a measurable improvement. > and taking best-of-10 times, I see a 0.34 -> 0.23 (~30%) improvement. Very nice. > Of course, I have deliberately used the case that shows the most impro= vement: > - a directory with very many files > - no suffix match, so the old code searches all suffixes > - using tmpfs: minimal stat overhead > @@ -4343,21 +4412,11 @@ print_color_indicator (const struct fileinfo *= f, bool symlink_target) > } > > /* Check the file's suffix only if still classified as C_FILE. */ > - ext =3D NULL; > - if (type =3D=3D C_FILE) > - { > - /* Test if NAME has a recognized suffix. */ > - > - len =3D strlen (name); > - name +=3D len; /* Pointer to final \0. */ > - for (ext =3D color_ext_list; ext !=3D NULL; ext =3D ext->next) > - { > - if (ext->ext.len <=3D len > - && STREQ_LEN (name - ext->ext.len, ext->ext.string, > - ext->ext.len)) > - break; > - } > - } > + char *suffix; > + struct sufco *ext; /* Color extension */ > + ext =3D (type =3D=3D C_FILE && (suffix =3D strrchr (name, '.')) > + ? suffix_color_lookup (suffix) > + : NULL); So do we now only support suffixes delimited by '.' ? Previously the delimiter was arbitrary or optional: touch star; LS_COLORS=3D"*tar=3D01;31" /bin/ls --color *tar cheers, P=E1draig. From debbugs-submit-bounces@debbugs.gnu.org Tue Oct 09 08:32:46 2012 Received: (at 9086) by debbugs.gnu.org; 9 Oct 2012 12:32:46 +0000 Received: from localhost ([127.0.0.1]:34834 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TLYzN-00021z-S7 for submit@debbugs.gnu.org; Tue, 09 Oct 2012 08:32:46 -0400 Received: from mx.meyering.net ([88.168.87.75]:47201) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TLYzL-00021o-Vu for 9086@debbugs.gnu.org; Tue, 09 Oct 2012 08:32:45 -0400 Received: from rho.meyering.net (rho.meyering.net [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id 0DC3A601EB; Tue, 9 Oct 2012 14:32:07 +0200 (CEST) From: Jim Meyering To: =?iso-8859-1?Q?P=E1draig?= Brady Subject: Re: bug#9086: ls --color: 30% speed-up and case-insensitive suffixes [Re: bug#9086 In-Reply-To: <507417AF.4010605@draigBrady.com> (=?iso-8859-1?Q?=22P=E1drai?= =?iso-8859-1?Q?g?= Brady"'s message of "Tue, 09 Oct 2012 13:25:19 +0100") References: <4E2F24A7.3000206@gmx.net> <4E306640.1080908@redhat.com> <87y5jfooai.fsf_-_@rho.meyering.net> <507417AF.4010605@draigBrady.com> Date: Tue, 09 Oct 2012 14:32:07 +0200 Message-ID: <87bogbomgo.fsf@rho.meyering.net> Lines: 87 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: 9086 Cc: 9086@debbugs.gnu.org, Eric Blake , marcel partap X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.3 (-) P=E1draig Brady wrote: > On 10/09/2012 12:52 PM, Jim Meyering wrote:> Eric Blake wrote: >>> On 07/26/2011 02:33 PM, marcel partap wrote: >>>> Here's a patch. Adds STRCASEEQ_LEN macro for case insensitive extension >>>> matching. >>>> #regards/marcel. >>> >>> Your patch would make the new behavior unconditional. But I like case >>> sensitivity, and think that case insensitivity should be an opt-in >>> process that I request, with coordination between dircolors to >>> generate a new string for LS_COLORS to be honored by ls. Furthermore, >>> the patch is lacking in NEWS, documentation, and testsuite coverage. >>> >>> Additionally, you should be aware that strncasecmp() has undefined >>> behavior in non-C multibyte locales. It would probably be better to >>> use c_strncasecmp(), so that you are guaranteed defined behavior >>> regardless of the current locale. >>> >>> Would you care to tackle those additional issues? And are you set up >>> for copyright assignment, since the patch will probably be non-trivial >>> by that point in time? >> >> I saw this while going back through old bugs, so started poking around. >> How about this: if a suffix is not matched, convert it to lower case >> and check again. That way, anyone who cares to highlight .TaR or .TAR >> differently from .tar can still easily do so, yet names ending with >> uppercase .TAR, .JPG, .FLAC, etc. will now be highlighted by default. >> >> Does anyone think it's worth making this new fallback-to-case-insensitiv= ity >> an option? > > Not me anyway. > >> While looking at that, I realized that comparing each name in an ls'd >> directory with each of nearly 100 suffixes should leave nontrivial room >> for improvement. Sure enough, once I'd replaced that linear search >> with a hash-table lookup, I see a measurable improvement. > >> and taking best-of-10 times, I see a 0.34 -> 0.23 (~30%) improvement. > > Very nice. > >> Of course, I have deliberately used the case that shows the most improve= ment: >> - a directory with very many files >> - no suffix match, so the old code searches all suffixes >> - using tmpfs: minimal stat overhead > >> @@ -4343,21 +4412,11 @@ print_color_indicator (const struct fileinfo *f,= bool symlink_target) >> } >> >> /* Check the file's suffix only if still classified as C_FILE. */ >> - ext =3D NULL; >> - if (type =3D=3D C_FILE) >> - { >> - /* Test if NAME has a recognized suffix. */ >> - >> - len =3D strlen (name); >> - name +=3D len; /* Pointer to final \0. */ >> - for (ext =3D color_ext_list; ext !=3D NULL; ext =3D ext->next) >> - { >> - if (ext->ext.len <=3D len >> - && STREQ_LEN (name - ext->ext.len, ext->ext.string, >> - ext->ext.len)) >> - break; >> - } >> - } >> + char *suffix; >> + struct sufco *ext; /* Color extension */ >> + ext =3D (type =3D=3D C_FILE && (suffix =3D strrchr (name, '.')) >> + ? suffix_color_lookup (suffix) >> + : NULL); > > So do we now only support suffixes delimited by '.' ? > Previously the delimiter was arbitrary or optional: > > touch star; LS_COLORS=3D"*tar=3D01;31" /bin/ls --color *tar Good catch. I realized that early on, but then forgot to mention it. Yes, I would have to document that the "." is now required. It seems like a reasonable restriction, but technically it could be called a regression. Also, with these changes, a multiple-"." suffix will no longer work. I.e., before, if you wanted to give *.tar.xz files a color different from plain *.xz files, you could. Does anyone object to that? From debbugs-submit-bounces@debbugs.gnu.org Tue Oct 09 08:51:45 2012 Received: (at 9086) by debbugs.gnu.org; 9 Oct 2012 12:51:45 +0000 Received: from localhost ([127.0.0.1]:34879 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TLZHk-0002Wf-Fl for submit@debbugs.gnu.org; Tue, 09 Oct 2012 08:51:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57073) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TLZHh-0002WW-F4 for 9086@debbugs.gnu.org; Tue, 09 Oct 2012 08:51:43 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q99Cp6SK006703 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 9 Oct 2012 08:51:07 -0400 Received: from [10.36.116.82] (ovpn-116-82.ams2.redhat.com [10.36.116.82]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q99Cp34i024011 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 9 Oct 2012 08:51:05 -0400 Message-ID: <50741DB7.3080407@draigBrady.com> Date: Tue, 09 Oct 2012 13:51:03 +0100 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#9086: ls --color: 30% speed-up and case-insensitive suffixes [Re: bug#9086 References: <4E2F24A7.3000206@gmx.net> <4E306640.1080908@redhat.com> <87y5jfooai.fsf_-_@rho.meyering.net> <507417AF.4010605@draigBrady.com> <87bogbomgo.fsf@rho.meyering.net> In-Reply-To: <87bogbomgo.fsf@rho.meyering.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id q99Cp6SK006703 X-Spam-Score: -4.2 (----) X-Debbugs-Envelope-To: 9086 Cc: 9086@debbugs.gnu.org, Eric Blake , marcel partap X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.2 (----) On 10/09/2012 01:32 PM, Jim Meyering wrote: > P=E1draig Brady wrote: >> On 10/09/2012 12:52 PM, Jim Meyering wrote:> Eric Blake wrote: >>>> On 07/26/2011 02:33 PM, marcel partap wrote: >>>>> Here's a patch. Adds STRCASEEQ_LEN macro for case insensitive exten= sion >>>>> matching. >>>>> #regards/marcel. >>>> >>>> Your patch would make the new behavior unconditional. But I like ca= se >>>> sensitivity, and think that case insensitivity should be an opt-in >>>> process that I request, with coordination between dircolors to >>>> generate a new string for LS_COLORS to be honored by ls. Furthermor= e, >>>> the patch is lacking in NEWS, documentation, and testsuite coverage. >>>> >>>> Additionally, you should be aware that strncasecmp() has undefined >>>> behavior in non-C multibyte locales. It would probably be better to >>>> use c_strncasecmp(), so that you are guaranteed defined behavior >>>> regardless of the current locale. >>>> >>>> Would you care to tackle those additional issues? And are you set u= p >>>> for copyright assignment, since the patch will probably be non-trivi= al >>>> by that point in time? >>> >>> I saw this while going back through old bugs, so started poking aroun= d. >>> How about this: if a suffix is not matched, convert it to lower case >>> and check again. That way, anyone who cares to highlight .TaR or .TA= R >>> differently from .tar can still easily do so, yet names ending with >>> uppercase .TAR, .JPG, .FLAC, etc. will now be highlighted by default. >>> >>> Does anyone think it's worth making this new fallback-to-case-insensi= tivity >>> an option? >> >> Not me anyway. >> >>> While looking at that, I realized that comparing each name in an ls'd >>> directory with each of nearly 100 suffixes should leave nontrivial ro= om >>> for improvement. Sure enough, once I'd replaced that linear search >>> with a hash-table lookup, I see a measurable improvement. >> >>> and taking best-of-10 times, I see a 0.34 -> 0.23 (~30%) improvement. >> >> Very nice. >> >>> Of course, I have deliberately used the case that shows the most impr= ovement: >>> - a directory with very many files >>> - no suffix match, so the old code searches all suffixes >>> - using tmpfs: minimal stat overhead >> >>> @@ -4343,21 +4412,11 @@ print_color_indicator (const struct fileinfo = *f, bool symlink_target) >>> } >>> >>> /* Check the file's suffix only if still classified as C_FILE. = */ >>> - ext =3D NULL; >>> - if (type =3D=3D C_FILE) >>> - { >>> - /* Test if NAME has a recognized suffix. */ >>> - >>> - len =3D strlen (name); >>> - name +=3D len; /* Pointer to final \0. */ >>> - for (ext =3D color_ext_list; ext !=3D NULL; ext =3D ext->next) >>> - { >>> - if (ext->ext.len <=3D len >>> - && STREQ_LEN (name - ext->ext.len, ext->ext.string, >>> - ext->ext.len)) >>> - break; >>> - } >>> - } >>> + char *suffix; >>> + struct sufco *ext; /* Color extension */ >>> + ext =3D (type =3D=3D C_FILE && (suffix =3D strrchr (name, '.')) >>> + ? suffix_color_lookup (suffix) >>> + : NULL); >> >> So do we now only support suffixes delimited by '.' ? >> Previously the delimiter was arbitrary or optional: >> >> touch star; LS_COLORS=3D"*tar=3D01;31" /bin/ls --color *tar > > Good catch. I realized that early on, but then forgot to mention it. > Yes, I would have to document that the "." is now required. > It seems like a reasonable restriction, but technically > it could be called a regression. > > Also, with these changes, a multiple-"." suffix will no longer work. > I.e., before, if you wanted to give *.tar.xz files a color different > from plain *.xz files, you could. > > Does anyone object to that? It's marginal, though I'd be inclined to keep the existing support for arbitrary suffixes. We could fall back to the slower linear scan iff an entension entry in LS_COLORS didn't contain a single '.' To be more generally performant and support a longest suffix match we'd have to use something like a trie I think. cheers, P=E1draig. From debbugs-submit-bounces@debbugs.gnu.org Tue Oct 09 09:33:18 2012 Received: (at 9086) by debbugs.gnu.org; 9 Oct 2012 13:33:18 +0000 Received: from localhost ([127.0.0.1]:34952 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TLZvx-0003We-LF for submit@debbugs.gnu.org; Tue, 09 Oct 2012 09:33:18 -0400 Received: from mx.meyering.net ([88.168.87.75]:47347) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TLZvt-0003WQ-1m for 9086@debbugs.gnu.org; Tue, 09 Oct 2012 09:33:15 -0400 Received: from rho.meyering.net (rho.meyering.net [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id DA120600EE; Tue, 9 Oct 2012 15:32:36 +0200 (CEST) From: Jim Meyering To: =?iso-8859-1?Q?P=E1draig?= Brady Subject: Re: bug#9086: ls --color: 30% speed-up and case-insensitive suffixes [Re: bug#9086 In-Reply-To: <50741DB7.3080407@draigBrady.com> (=?iso-8859-1?Q?=22P=E1drai?= =?iso-8859-1?Q?g?= Brady"'s message of "Tue, 09 Oct 2012 13:51:03 +0100") References: <4E2F24A7.3000206@gmx.net> <4E306640.1080908@redhat.com> <87y5jfooai.fsf_-_@rho.meyering.net> <507417AF.4010605@draigBrady.com> <87bogbomgo.fsf@rho.meyering.net> <50741DB7.3080407@draigBrady.com> Date: Tue, 09 Oct 2012 15:32:36 +0200 Message-ID: <87y5jfn53f.fsf@rho.meyering.net> Lines: 34 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: 9086 Cc: 9086@debbugs.gnu.org, Eric Blake , marcel partap X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.3 (-) P=E1draig Brady wrote: > On 10/09/2012 01:32 PM, Jim Meyering wrote: >> P=E1draig Brady wrote: ... >>> So do we now only support suffixes delimited by '.' ? >>> Previously the delimiter was arbitrary or optional: >>> >>> touch star; LS_COLORS=3D"*tar=3D01;31" /bin/ls --color *tar >> >> Good catch. I realized that early on, but then forgot to mention it. >> Yes, I would have to document that the "." is now required. >> It seems like a reasonable restriction, but technically >> it could be called a regression. >> >> Also, with these changes, a multiple-"." suffix will no longer work. >> I.e., before, if you wanted to give *.tar.xz files a color different >> from plain *.xz files, you could. >> >> Does anyone object to that? > > It's marginal, though I'd be inclined to keep the existing > support for arbitrary suffixes. We could fall back to the > slower linear scan iff an entension entry in LS_COLORS didn't > contain a single '.' To be more generally performant and > support a longest suffix match we'd have to use something > like a trie I think. Well, I confess that I am not inclined to spend more time on this, and don't think the dot-less or longest-suffix use cases are worth the added code (we were already using most of hash.c already, so my change induced almost no bloat), so I'm tempted to go ahead with the patch and wait for complaints before adding trie-based lookup. wdyt? From debbugs-submit-bounces@debbugs.gnu.org Tue Oct 09 10:19:40 2012 Received: (at 9086) by debbugs.gnu.org; 9 Oct 2012 14:19:40 +0000 Received: from localhost ([127.0.0.1]:35740 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TLaep-0004ne-7c for submit@debbugs.gnu.org; Tue, 09 Oct 2012 10:19:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:31654) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TLaen-0004nV-J8 for 9086@debbugs.gnu.org; Tue, 09 Oct 2012 10:19:38 -0400 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q99EIxCN010779 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 9 Oct 2012 10:19:00 -0400 Received: from [10.36.116.82] (ovpn-116-82.ams2.redhat.com [10.36.116.82]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id q99EIvsr010325 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 9 Oct 2012 10:18:58 -0400 Message-ID: <50743250.8040109@draigBrady.com> Date: Tue, 09 Oct 2012 15:18:56 +0100 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#9086: ls --color: 30% speed-up and case-insensitive suffixes [Re: bug#9086 References: <4E2F24A7.3000206@gmx.net> <4E306640.1080908@redhat.com> <87y5jfooai.fsf_-_@rho.meyering.net> <507417AF.4010605@draigBrady.com> <87bogbomgo.fsf@rho.meyering.net> <50741DB7.3080407@draigBrady.com> <87y5jfn53f.fsf@rho.meyering.net> In-Reply-To: <87y5jfn53f.fsf@rho.meyering.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed X-Scanned-By: MIMEDefang 2.67 on 10.5.11.12 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id q99EIxCN010779 X-Spam-Score: -4.2 (----) X-Debbugs-Envelope-To: 9086 Cc: 9086@debbugs.gnu.org, Eric Blake , marcel partap X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.2 (----) On 10/09/2012 02:32 PM, Jim Meyering wrote: > P=E1draig Brady wrote: >> On 10/09/2012 01:32 PM, Jim Meyering wrote: >>> P=E1draig Brady wrote: > ... >>>> So do we now only support suffixes delimited by '.' ? >>>> Previously the delimiter was arbitrary or optional: >>>> >>>> touch star; LS_COLORS=3D"*tar=3D01;31" /bin/ls --color *tar >>> >>> Good catch. I realized that early on, but then forgot to mention it. >>> Yes, I would have to document that the "." is now required. >>> It seems like a reasonable restriction, but technically >>> it could be called a regression. >>> >>> Also, with these changes, a multiple-"." suffix will no longer work. >>> I.e., before, if you wanted to give *.tar.xz files a color different >>> from plain *.xz files, you could. >>> >>> Does anyone object to that? >> >> It's marginal, though I'd be inclined to keep the existing >> support for arbitrary suffixes. We could fall back to the >> slower linear scan iff an entension entry in LS_COLORS didn't >> contain a single '.' To be more generally performant and >> support a longest suffix match we'd have to use something >> like a trie I think. > > Well, I confess that I am not inclined to spend more time on this, > and don't think the dot-less or longest-suffix use cases are worth the > added code (we were already using most of hash.c already, so my change > induced almost no bloat), so I'm tempted to go ahead with the patch and > wait for complaints before adding trie-based lookup. Sure. I wasn't really suggesting we do trie now, just mentioning that would be a possible route in future. As for enabling the faster and slightly more limited code for now... It's a hard decision. I suppose it's OK for now but it'll need a change in behavior note in NEWS. cheers, P=E1draig. From debbugs-submit-bounces@debbugs.gnu.org Sat Oct 13 10:42:17 2012 Received: (at 9086) by debbugs.gnu.org; 13 Oct 2012 14:42:17 +0000 Received: from localhost ([127.0.0.1]:42682 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TN2uv-0004h3-Bc for submit@debbugs.gnu.org; Sat, 13 Oct 2012 10:42:17 -0400 Received: from mx.meyering.net ([88.168.87.75]:33420) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TN2ur-0004gt-9t for 9086@debbugs.gnu.org; Sat, 13 Oct 2012 10:42:14 -0400 Received: from rho.meyering.net (rho.meyering.net [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id A4AD8601ED; Sat, 13 Oct 2012 16:41:15 +0200 (CEST) From: Jim Meyering To: =?iso-8859-1?Q?P=E1draig?= Brady Subject: Re: bug#9086: ls --color: 30% speed-up and case-insensitive suffixes [Re: bug#9086 In-Reply-To: <50743250.8040109@draigBrady.com> (=?iso-8859-1?Q?=22P=E1drai?= =?iso-8859-1?Q?g?= Brady"'s message of "Tue, 09 Oct 2012 15:18:56 +0100") References: <4E2F24A7.3000206@gmx.net> <4E306640.1080908@redhat.com> <87y5jfooai.fsf_-_@rho.meyering.net> <507417AF.4010605@draigBrady.com> <87bogbomgo.fsf@rho.meyering.net> <50741DB7.3080407@draigBrady.com> <87y5jfn53f.fsf@rho.meyering.net> <50743250.8040109@draigBrady.com> Date: Sat, 13 Oct 2012 16:41:15 +0200 Message-ID: <87bog6e8ok.fsf@rho.meyering.net> Lines: 51 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: 9086 Cc: 9086@debbugs.gnu.org, Eric Blake , marcel partap X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.3 (-) P=E1draig Brady wrote: > On 10/09/2012 02:32 PM, Jim Meyering wrote: >> P=E1draig Brady wrote: >>> On 10/09/2012 01:32 PM, Jim Meyering wrote: >>>> P=E1draig Brady wrote: >> ... >>>>> So do we now only support suffixes delimited by '.' ? >>>>> Previously the delimiter was arbitrary or optional: >>>>> >>>>> touch star; LS_COLORS=3D"*tar=3D01;31" /bin/ls --color *tar >>>> >>>> Good catch. I realized that early on, but then forgot to mention it. >>>> Yes, I would have to document that the "." is now required. >>>> It seems like a reasonable restriction, but technically >>>> it could be called a regression. >>>> >>>> Also, with these changes, a multiple-"." suffix will no longer work. >>>> I.e., before, if you wanted to give *.tar.xz files a color different >>>> from plain *.xz files, you could. >>>> >>>> Does anyone object to that? >>> >>> It's marginal, though I'd be inclined to keep the existing >>> support for arbitrary suffixes. We could fall back to the >>> slower linear scan iff an entension entry in LS_COLORS didn't >>> contain a single '.' To be more generally performant and >>> support a longest suffix match we'd have to use something >>> like a trie I think. >> >> Well, I confess that I am not inclined to spend more time on this, >> and don't think the dot-less or longest-suffix use cases are worth the >> added code (we were already using most of hash.c already, so my change >> induced almost no bloat), so I'm tempted to go ahead with the patch and >> wait for complaints before adding trie-based lookup. > > Sure. I wasn't really suggesting we do trie now, > just mentioning that would be a possible route in future. > > As for enabling the faster and slightly more limited code for now... > It's a hard decision. I suppose it's OK for now but it'll need a > change in behavior note in NEWS. Thinking more about it, there are at least a few common file name suffixes that can be usefully colored, but that are not delimited by ".", like ",v" and "~", so I will discard that patch, for now at least. Can anyone point to a good trie implementation in C (preferably GPL'd)? (other than the ones referred to in https://en.wikipedia.org/wiki/Trie) I was surprised to see that there is none in gnulib. If not, does anyone want to write one for gnulib? From debbugs-submit-bounces@debbugs.gnu.org Sat Oct 13 17:05:18 2012 Received: (at 9086) by debbugs.gnu.org; 13 Oct 2012 21:05:18 +0000 Received: from localhost ([127.0.0.1]:42842 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TN8ta-0005OZ-BZ for submit@debbugs.gnu.org; Sat, 13 Oct 2012 17:05:18 -0400 Received: from mx1.redhat.com ([209.132.183.28]:1710) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TN8tX-0005OO-B9 for 9086@debbugs.gnu.org; Sat, 13 Oct 2012 17:05:17 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q9DL4DUA004690 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 13 Oct 2012 17:04:13 -0400 Received: from [10.36.116.20] (ovpn-116-20.ams2.redhat.com [10.36.116.20]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q9DL45fU007716 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 13 Oct 2012 17:04:08 -0400 Message-ID: <5079D744.2070704@draigBrady.com> Date: Sat, 13 Oct 2012 22:04:04 +0100 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120615 Thunderbird/13.0.1 MIME-Version: 1.0 To: Jim Meyering Subject: Re: bug#9086: ls --color: 30% speed-up and case-insensitive suffixes [Re: bug#9086 References: <4E2F24A7.3000206@gmx.net> <4E306640.1080908@redhat.com> <87y5jfooai.fsf_-_@rho.meyering.net> <507417AF.4010605@draigBrady.com> <87bogbomgo.fsf@rho.meyering.net> <50741DB7.3080407@draigBrady.com> <87y5jfn53f.fsf@rho.meyering.net> <50743250.8040109@draigBrady.com> <87bog6e8ok.fsf@rho.meyering.net> In-Reply-To: <87bog6e8ok.fsf@rho.meyering.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id q9DL4DUA004690 X-Spam-Score: -4.2 (----) X-Debbugs-Envelope-To: 9086 Cc: 9086@debbugs.gnu.org, Eric Blake , marcel partap X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -4.2 (----) On 10/13/2012 03:41 PM, Jim Meyering wrote: > P=E1draig Brady wrote: > >> On 10/09/2012 02:32 PM, Jim Meyering wrote: >>> P=E1draig Brady wrote: >>>> On 10/09/2012 01:32 PM, Jim Meyering wrote: >>>>> P=E1draig Brady wrote: >>> ... >>>>>> So do we now only support suffixes delimited by '.' ? >>>>>> Previously the delimiter was arbitrary or optional: >>>>>> >>>>>> touch star; LS_COLORS=3D"*tar=3D01;31" /bin/ls --color *tar >>>>> >>>>> Good catch. I realized that early on, but then forgot to mention i= t. >>>>> Yes, I would have to document that the "." is now required. >>>>> It seems like a reasonable restriction, but technically >>>>> it could be called a regression. >>>>> >>>>> Also, with these changes, a multiple-"." suffix will no longer work. >>>>> I.e., before, if you wanted to give *.tar.xz files a color differen= t >>>>> from plain *.xz files, you could. >>>>> >>>>> Does anyone object to that? >>>> >>>> It's marginal, though I'd be inclined to keep the existing >>>> support for arbitrary suffixes. We could fall back to the >>>> slower linear scan iff an entension entry in LS_COLORS didn't >>>> contain a single '.' To be more generally performant and >>>> support a longest suffix match we'd have to use something >>>> like a trie I think. >>> >>> Well, I confess that I am not inclined to spend more time on this, >>> and don't think the dot-less or longest-suffix use cases are worth th= e >>> added code (we were already using most of hash.c already, so my chang= e >>> induced almost no bloat), so I'm tempted to go ahead with the patch a= nd >>> wait for complaints before adding trie-based lookup. >> >> Sure. I wasn't really suggesting we do trie now, >> just mentioning that would be a possible route in future. >> >> As for enabling the faster and slightly more limited code for now... >> It's a hard decision. I suppose it's OK for now but it'll need a >> change in behavior note in NEWS. > > Thinking more about it, there are at least a few common file name > suffixes that can be usefully colored, but that are not delimited by ".= ", > like ",v" and "~", so I will discard that patch, for now at least. Agreed. I presume the case insensitive fallback can still be done independent from this speedup patch. cheers, P=E1draig. From debbugs-submit-bounces@debbugs.gnu.org Sat Oct 13 17:11:47 2012 Received: (at 9086) by debbugs.gnu.org; 13 Oct 2012 21:11:47 +0000 Received: from localhost ([127.0.0.1]:42846 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TN8zr-0005X6-8M for submit@debbugs.gnu.org; Sat, 13 Oct 2012 17:11:47 -0400 Received: from mx.meyering.net ([88.168.87.75]:34205) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TN8zo-0005Wy-KC for 9086@debbugs.gnu.org; Sat, 13 Oct 2012 17:11:46 -0400 Received: from rho.meyering.net (rho.meyering.net [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id 75101600BD; Sat, 13 Oct 2012 23:10:45 +0200 (CEST) From: Jim Meyering To: =?iso-8859-1?Q?P=E1draig?= Brady Subject: Re: bug#9086: ls --color: 30% speed-up and case-insensitive suffixes [Re: bug#9086 In-Reply-To: <5079D744.2070704@draigBrady.com> (=?iso-8859-1?Q?=22P=E1drai?= =?iso-8859-1?Q?g?= Brady"'s message of "Sat, 13 Oct 2012 22:04:04 +0100") References: <4E2F24A7.3000206@gmx.net> <4E306640.1080908@redhat.com> <87y5jfooai.fsf_-_@rho.meyering.net> <507417AF.4010605@draigBrady.com> <87bogbomgo.fsf@rho.meyering.net> <50741DB7.3080407@draigBrady.com> <87y5jfn53f.fsf@rho.meyering.net> <50743250.8040109@draigBrady.com> <87bog6e8ok.fsf@rho.meyering.net> <5079D744.2070704@draigBrady.com> Date: Sat, 13 Oct 2012 23:10:45 +0200 Message-ID: <87k3uucc2y.fsf@rho.meyering.net> Lines: 57 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: 9086 Cc: 9086@debbugs.gnu.org, Eric Blake , marcel partap X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.3 (-) P=E1draig Brady wrote: > On 10/13/2012 03:41 PM, Jim Meyering wrote: >> P=E1draig Brady wrote: >> >>> On 10/09/2012 02:32 PM, Jim Meyering wrote: >>>> P=E1draig Brady wrote: >>>>> On 10/09/2012 01:32 PM, Jim Meyering wrote: >>>>>> P=E1draig Brady wrote: >>>> ... >>>>>>> So do we now only support suffixes delimited by '.' ? >>>>>>> Previously the delimiter was arbitrary or optional: >>>>>>> >>>>>>> touch star; LS_COLORS=3D"*tar=3D01;31" /bin/ls --color *tar >>>>>> >>>>>> Good catch. I realized that early on, but then forgot to mention it. >>>>>> Yes, I would have to document that the "." is now required. >>>>>> It seems like a reasonable restriction, but technically >>>>>> it could be called a regression. >>>>>> >>>>>> Also, with these changes, a multiple-"." suffix will no longer work. >>>>>> I.e., before, if you wanted to give *.tar.xz files a color different >>>>>> from plain *.xz files, you could. >>>>>> >>>>>> Does anyone object to that? >>>>> >>>>> It's marginal, though I'd be inclined to keep the existing >>>>> support for arbitrary suffixes. We could fall back to the >>>>> slower linear scan iff an entension entry in LS_COLORS didn't >>>>> contain a single '.' To be more generally performant and >>>>> support a longest suffix match we'd have to use something >>>>> like a trie I think. >>>> >>>> Well, I confess that I am not inclined to spend more time on this, >>>> and don't think the dot-less or longest-suffix use cases are worth the >>>> added code (we were already using most of hash.c already, so my change >>>> induced almost no bloat), so I'm tempted to go ahead with the patch and >>>> wait for complaints before adding trie-based lookup. >>> >>> Sure. I wasn't really suggesting we do trie now, >>> just mentioning that would be a possible route in future. >>> >>> As for enabling the faster and slightly more limited code for now... >>> It's a hard decision. I suppose it's OK for now but it'll need a >>> change in behavior note in NEWS. >> >> Thinking more about it, there are at least a few common file name >> suffixes that can be usefully colored, but that are not delimited by ".", >> like ",v" and "~", so I will discard that patch, for now at least. > > Agreed. > > I presume the case insensitive fallback can still > be done independent from this speedup patch. Sure, it could be, but with the existing linear search, that would double the search overhead that was nearly eliminated by my patch. From debbugs-submit-bounces@debbugs.gnu.org Mon Oct 15 10:01:23 2018 Received: (at 9086) by debbugs.gnu.org; 15 Oct 2018 14:01:23 +0000 Received: from localhost ([127.0.0.1]:50824 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gC3Qw-0005ky-Nl for submit@debbugs.gnu.org; Mon, 15 Oct 2018 10:01:23 -0400 Received: from mail-pg1-f176.google.com ([209.85.215.176]:41898) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1gC3Qu-0005f8-SP; Mon, 15 Oct 2018 10:01:21 -0400 Received: by mail-pg1-f176.google.com with SMTP id 23-v6so9206158pgc.8; Mon, 15 Oct 2018 07:01:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=Y9AxkuY5afMwb43Xi+OMopMoHGiUU7jXQC5eWW1aiEA=; b=WPU5vl9fw13Kr9+LBJu3rW7RT1o72ot7h3rQaFpcw+VawmwDREC2xxykom7fO9mVeS D/op4656LKxfp9nhxgTuKYCinMPnDMdT5GiUvplGo2ExRvid0CxzHhH0OsD9gvQcTn1h 3KxnnIx36Q8a4/0PxzCbqc3bRJvk1VjjefYHZH38vr5sYEgHCqBnr06kmJqPMCUq3C/p MLk8YZ8OnpHKgpkucxcO9w95VpNO44Fr5IbgPU43R3Rb9cCuRIPOztU/iqnjW3eiq5Gx T+zB9E1BFw5Xxh2fu/qwYcY9BWF3zbfJ2o04JA47nmRrQuyXfpUQ9GG3lf1X3Wv6/vZk a81A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=Y9AxkuY5afMwb43Xi+OMopMoHGiUU7jXQC5eWW1aiEA=; b=d4kGfN2U5TtrcqgmmjZpOFCQCp3V82Zh1flXyDlhWno3iB7KBYX30KSSoBcp+Zkqjd EdOBAPsCT4Y6lkBmSxRBsd0f2sVUbXoJB5O38BRAAXsHQQGaVLoB3Rr4X+QNQyZhT0UO xuVoUUDsn3nMjL4N/gyI4vstcryRAIROBqAbiEESqUJb860hJo3618pa0TCh8zmpw1px Npon5FE0maa+H6ncXtM0aeESjs/l05AeoDKOJIFio/Bmr/qh6XVDzvi7kKOAWeSIz7Fw 1aP/gGR9CGVGqq7O6VrqWq0xDGTmbpzOqWFFcMAOyW2OIM2RkmTapl3yFOoGYseIKh97 RQig== X-Gm-Message-State: ABuFfohrSXQja/tA5aK0Z5bZ2M7IZgQqDT8atMzq0/QsRq/oWjDuufpF XQmALmt/ElE1yUrbgcqFeEFIeUenrts= X-Google-Smtp-Source: ACcGV62P7KxcQZGKjOof1Kq0w8ammBhFIVCJJOCKDFMLUrU7qLKsv5aPl1UHp/eMs7TMJ/l0LJWeHw== X-Received: by 2002:a62:3106:: with SMTP id x6-v6mr8300565pfx.154.1539612074513; Mon, 15 Oct 2018 07:01:14 -0700 (PDT) Received: from tomato.housegordon.com (moose.housegordon.com. [184.68.105.38]) by smtp.googlemail.com with ESMTPSA id x23-v6sm12797865pfh.56.2018.10.15.07.01.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Oct 2018 07:01:13 -0700 (PDT) Subject: Re: bug#9086: ls --color case insensitive extension matching To: Jim Meyering , =?UTF-8?Q?P=c3=a1draig_Brady?= References: <4E2F24A7.3000206@gmx.net> <4E306640.1080908@redhat.com> <87y5jfooai.fsf_-_@rho.meyering.net> <507417AF.4010605@draigBrady.com> <87bogbomgo.fsf@rho.meyering.net> <50741DB7.3080407@draigBrady.com> <87y5jfn53f.fsf@rho.meyering.net> <50743250.8040109@draigBrady.com> <87bog6e8ok.fsf@rho.meyering.net> <5079D744.2070704@draigBrady.com> <87k3uucc2y.fsf@rho.meyering.net> From: Assaf Gordon Message-ID: <1349c910-1f90-a9a5-d612-a5364e62f571@gmail.com> Date: Mon, 15 Oct 2018 08:01:11 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <87k3uucc2y.fsf@rho.meyering.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 9086 Cc: 9086@debbugs.gnu.org, Eric Blake , marcel partap X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) severity 9086 wishlist tags 9086 fixed close 9086 stop (triaging old bugs) Six years later, "ls --color" just gained case-insensitive extension matching. https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24053fbd Closing the bug. -assaf From unknown Tue Jun 17 01:39:42 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 13 Nov 2018 12:24:13 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator