From debbugs-submit-bounces@debbugs.gnu.org Sun Dec 25 14:32:09 2011 Received: (at submit) by debbugs.gnu.org; 25 Dec 2011 19:32:09 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Retnk-0002fL-Vv for submit@debbugs.gnu.org; Sun, 25 Dec 2011 14:32:09 -0500 Received: from eggs.gnu.org ([140.186.70.92]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1ReoAg-000356-0X for submit@debbugs.gnu.org; Sun, 25 Dec 2011 08:31:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Reo8H-0008Sn-BC for submit@debbugs.gnu.org; Sun, 25 Dec 2011 08:28:58 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([140.186.70.17]:59063) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Reo8H-0008Sj-9h for submit@debbugs.gnu.org; Sun, 25 Dec 2011 08:28:57 -0500 Received: from eggs.gnu.org ([140.186.70.92]:34035) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Reo8F-0007Y8-TC for bug-coreutils@gnu.org; Sun, 25 Dec 2011 08:28:57 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Reo8E-0008SN-Bb for bug-coreutils@gnu.org; Sun, 25 Dec 2011 08:28:55 -0500 Received: from xvm-20-226.ghst.net ([92.243.20.226]:57641 helo=fruli.krunch.be) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Reo8D-0008Rs-VI for bug-coreutils@gnu.org; Sun, 25 Dec 2011 08:28:54 -0500 Received: from localhost (localhost [127.0.0.1]) by fruli.krunch.be (Postfix) with ESMTP id 1E35E227F7; Sun, 25 Dec 2011 12:49:05 +0000 (UTC) Date: Sun, 25 Dec 2011 12:54:18 +0000 From: Adrien Kunysz To: bug-coreutils@gnu.org Subject: [PATCH] uniq: add ability to skip last N chars or fields Message-ID: <20111225125418.GA1488@chouffe> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DocE+STaALJfprDB" Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 3) X-Received-From: 140.186.70.17 X-Spam-Score: -6.6 (------) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sun, 25 Dec 2011 14:32:08 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.6 (------) --DocE+STaALJfprDB Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable * doc/coreutils.texi: document the new feature * src/uniq.c (find_end): new function (check_file): use find_end() to determine when to stop comparing (usage): document the new feature (main): expose the new feature to user * tests/misc/uniq: add tests to exercise the new code --- doc/coreutils.texi | 17 +++++++++++++ src/uniq.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++= +--- tests/misc/uniq | 15 +++++++++++ 3 files changed, 97 insertions(+), 4 deletions(-) I have recently found myself wishing I could have uniq(1) skip the last N fields before comparison. I am aware of the rev(1) trick but I don't find it very satisfactory. So I ended up patching uniq and implementing the feature for characters skipping as well. Documentation and tests included. Tests have also been run within Valgrind on x86_64. diff --git a/doc/coreutils.texi b/doc/coreutils.texi index c229f98..b2ca430 100644 --- a/doc/coreutils.texi +++ b/doc/coreutils.texi @@ -4680,6 +4680,15 @@ each other by at least one space or tab. For compatibility @command{uniq} supports an obsolete option syntax @option{-@var{n}}. New scripts should use @option{-f @var{n}} instead. =20 +@item -F @var{n} +@itemx --ignore-fields=3D@var{n} +@opindex -F +@opindex --ignore-fields +Ignore last @var{n} fields on each line before checking for uniqueness. U= se +a null string for comparison if a line has fewer than @var{n} fields. Fie= lds +are sequences of non-space non-tab characters that are separated from +each other by at least one space or tab. + @item -s @var{n} @itemx --skip-chars=3D@var{n} @opindex -s @@ -4698,6 +4707,14 @@ behavior depends on this variable. For example, use @samp{uniq ./+10} or @samp{uniq -s 10} rather than the ambiguous @samp{uniq +10}. =20 +@item -S @var{n} +@itemx --ignore-chars=3D@var{n} +@opindex -S +@opindex --ignore-chars +Ignore last @var{n} characters before checking for uniqueness. Use a null +string for comparison if a line has fewer than @var{n} characters. If you= use +both the field and character ignoring options, fields are ignored over fir= st. + @item -c @itemx --count @opindex -c diff --git a/src/uniq.c b/src/uniq.c index db717b1..31205f4 100644 --- a/src/uniq.c +++ b/src/uniq.c @@ -60,6 +60,12 @@ static size_t skip_fields; /* Number of chars to skip after skipping any fields. */ static size_t skip_chars; =20 +/* Number of fields to ignore at the end. */ +static size_t ignore_fields; + +/* Number of chars to ignore at the end after ignoring any fields. */ +static size_t ignore_chars; + /* Number of chars to compare. */ static size_t check_chars; =20 @@ -116,7 +122,9 @@ static struct option const longopts[] =3D {"ignore-case", no_argument, NULL, 'i'}, {"unique", no_argument, NULL, 'u'}, {"skip-fields", required_argument, NULL, 'f'}, + {"ignore-fields", required_argument, NULL, 'F'}, {"skip-chars", required_argument, NULL, 's'}, + {"ignore-chars", required_argument, NULL, 'S'}, {"check-chars", required_argument, NULL, 'w'}, {"zero-terminated", no_argument, NULL, 'z'}, {GETOPT_HELP_OPTION_DECL}, @@ -155,8 +163,10 @@ Mandatory arguments to long options are mandatory for = short options too.\n\ delimit-method=3D{none(default),prepend,separate}\= n\ Delimiting is done with blank lines\n\ -f, --skip-fields=3DN avoid comparing the first N fields\n\ + -F, --ignore-fields=3DN avoid comparing the last N fields\n\ -i, --ignore-case ignore differences in case when comparing\n\ -s, --skip-chars=3DN avoid comparing the first N characters\n\ + -S, --ignore-chars=3DN avoid comparing the last N characters\n\ -u, --unique only print unique lines\n\ -z, --zero-terminated end lines with 0 byte, not newline\n\ "), stdout); @@ -227,6 +237,29 @@ find_field (struct linebuffer const *line) return line->buffer + i; } =20 +/* Given a linebuffer LINE, + return the offset of the first character that doesn't need to be compar= ed. */ + +static size_t +find_end (struct linebuffer const *line) +{ + size_t count; + char const *lp =3D line->buffer; + size_t i =3D line->length - 1; + + for (count =3D 0; count < ignore_fields && 0 < i; count++) + { + while (0 < i && isblank (to_uchar (lp[i]))) + i--; + while (0 < i && !isblank (to_uchar (lp[i]))) + i--; + } + + i -=3D MIN (ignore_chars, i); + + return i; +} + /* Return false if two strings OLD and NEW match, true if not. OLD and NEW point not to the beginnings of the lines but rather to the beginnings of the fields to compare. @@ -310,10 +343,15 @@ check_file (const char *infile, const char *outfile, = char delimiter) { char *thisfield; size_t thislen; + size_t thisend; if (readlinebuffer_delim (thisline, stdin, delimiter) =3D=3D 0) break; thisfield =3D find_field (thisline); - thislen =3D thisline->length - 1 - (thisfield - thisline->buffer= ); + thisend =3D find_end (thisline); + if (thisend <=3D thisfield - thisline->buffer) + thislen =3D 0; + else + thislen =3D thisend - (thisfield - thisline->buffer); if (prevline->length =3D=3D 0 || different (thisfield, prevfield, thislen, prevlen)) { @@ -330,19 +368,25 @@ check_file (const char *infile, const char *outfile, = char delimiter) { char *prevfield; size_t prevlen; + size_t prevend; uintmax_t match_count =3D 0; bool first_delimiter =3D true; =20 if (readlinebuffer_delim (prevline, stdin, delimiter) =3D=3D 0) goto closefiles; prevfield =3D find_field (prevline); - prevlen =3D prevline->length - 1 - (prevfield - prevline->buffer); + prevend =3D find_end (prevline); + if (prevend <=3D prevfield - prevline->buffer) + prevlen =3D 0; + else + prevlen =3D prevend - (prevfield - prevline->buffer); =20 while (!feof (stdin)) { bool match; char *thisfield; size_t thislen; + size_t thisend; if (readlinebuffer_delim (thisline, stdin, delimiter) =3D=3D 0) { if (ferror (stdin)) @@ -350,7 +394,11 @@ check_file (const char *infile, const char *outfile, c= har delimiter) break; } thisfield =3D find_field (thisline); - thislen =3D thisline->length - 1 - (thisfield - thisline->buffer= ); + thisend =3D find_end (thisline); + if (thisend <=3D thisfield - thisline->buffer) + thislen =3D 0; + else + thislen =3D thisend - (thisfield - thisline->buffer); match =3D !different (thisfield, prevfield, thislen, prevlen); match_count +=3D match; =20 @@ -430,6 +478,8 @@ main (int argc, char **argv) =20 skip_chars =3D 0; skip_fields =3D 0; + ignore_chars =3D 0; + ignore_fields =3D 0; check_chars =3D SIZE_MAX; output_unique =3D output_first_repeated =3D true; output_later_repeated =3D false; @@ -445,7 +495,8 @@ main (int argc, char **argv) if (optc =3D=3D -1 || (posixly_correct && nfiles !=3D 0) || ((optc =3D getopt_long (argc, argv, - "-0123456789Dcdf:is:uw:z", longopts, NU= LL)) + "-0123456789Dcdf:F:is:S:uw:z", + longopts, NULL)) =3D=3D -1)) { if (argc <=3D optind) @@ -523,6 +574,11 @@ main (int argc, char **argv) N_("invalid number of fields to skip")); break; =20 + case 'F': + ignore_fields =3D size_opt (optarg, + N_("invalid number of fields to ignore= ")); + break; + case 'i': ignore_case =3D true; break; @@ -532,6 +588,11 @@ main (int argc, char **argv) N_("invalid number of bytes to skip")); break; =20 + case 'S': + ignore_chars =3D size_opt (optarg, + N_("invalud number of bytes to ignore")= ); + break; + case 'u': output_first_repeated =3D false; break; diff --git a/tests/misc/uniq b/tests/misc/uniq index 99aa8ed..0817b2f 100755 --- a/tests/misc/uniq +++ b/tests/misc/uniq @@ -199,6 +199,21 @@ my @Tests =3D # Check that --zero-terminated is synonymous with -z. ['123', '--zero-terminated', {IN=3D>"a\na\nb"}, {OUT=3D>"a\na\nb\0"}], ['124', '--zero-terminated', {IN=3D>"a\0a\0b"}, {OUT=3D>"a\0b\0"}], + # Skip last N characters/fields + ['125', qw(-F 1), {IN=3D>"a a\na b\n"}, {OUT=3D>"a a\n"}], + ['126', qw(-F 1), {IN=3D>"a a\nb b\n"}, {OUT=3D>"a a\nb b\n"}], + ['127', qw(-F 1), {IN=3D>"a a a\nc a b\n"}, {OUT=3D>"a a a\nc a b\n"}], + ['128', qw(-F 1), {IN=3D>"a b\na a\n"}, {OUT=3D>"a b\n"}], + ['129', qw(-F 2), {IN=3D>"c a a\nc a b\n"}, {OUT=3D>"c a a\n"}], + ['130', qw(-S 1), {IN=3D>"aaa\naaa\n"}, {OUT=3D>"aaa\n"}], + ['131', qw(-S 2), {IN=3D>"aab\naaa\n"}, {OUT=3D>"aab\n"}], + ['132', qw(-F 1 -S 1), {IN=3D>"aaa a\nba b\n"}, {OUT=3D>"aaa a\nba b\n"}], + ['133', qw(-F 1 -S 1), {IN=3D>"aaa a\naaa b\n"}, {OUT=3D>"aaa a\n"}], + ['134', qw(-S 1 -F 1), {IN=3D>"aaa a\nba b\n"}, {OUT=3D>"aaa a\nba b\n"}], + ['135', qw(-S 1 -F 1), {IN=3D>"aaa a\naaa b\n"}, {OUT=3D>"aaa a\n"}], + ['136', qw(-S 4), {IN=3D>"cba\ndcba\n"}, {OUT=3D>"cba\n"}], + ['137', qw(-S 0), {IN=3D>"cba\ndcba\n"}, {OUT=3D>"cba\ndcba\n"}], + ['138', qw(-S 0), {IN=3D>"cba\n"}, {OUT=3D>"cba\n"}], ); =20 # Set _POSIX2_VERSION=3D199209 in the environment of each obs-plus* test. --=20 1.7.2.5 --DocE+STaALJfprDB Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAk73HPoACgkQKLX03ZSPZGxpGwCfRLuQCaTmBa873qnxUTCfp31n w6oAnjLpolJlrCP/y3vyCbKgiYGZ2ZBU =xu9V -----END PGP SIGNATURE----- --DocE+STaALJfprDB-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 26 11:38:15 2011 Received: (at 10365) by debbugs.gnu.org; 26 Dec 2011 16:38:15 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RfDZ1-0000c2-5E for submit@debbugs.gnu.org; Mon, 26 Dec 2011 11:38:15 -0500 Received: from mail3.vodafone.ie ([213.233.128.45]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RfDYy-0000bt-G6 for 10365@debbugs.gnu.org; Mon, 26 Dec 2011 11:38:13 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApMBAH+h+E5tTn9m/2dsb2JhbAAMMAavNQEBAQQyAUYQCw0LCRYPCQMCAQIBRQYNAQcBAb0ZiGGDLgSacYUDh0I Received: from unknown (HELO [192.168.1.79]) ([109.78.127.102]) by mail3.vodafone.ie with ESMTP; 26 Dec 2011 16:35:38 +0000 Message-ID: <4EF8A259.6020404@draigBrady.com> Date: Mon, 26 Dec 2011 16:35:37 +0000 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0 MIME-Version: 1.0 To: Adrien Kunysz Subject: Re: bug#10365: [PATCH] uniq: add ability to skip last N chars or fields References: <20111225125418.GA1488@chouffe> In-Reply-To: <20111225125418.GA1488@chouffe> X-Enigmail-Version: 1.3.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.5 (--) X-Debbugs-Envelope-To: 10365 Cc: 10365@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.5 (--) On 12/25/2011 12:54 PM, Adrien Kunysz wrote: > * doc/coreutils.texi: document the new feature > * src/uniq.c (find_end): new function > (check_file): use find_end() to determine when to stop comparing > (usage): document the new feature > (main): expose the new feature to user > * tests/misc/uniq: add tests to exercise the new code > --- > doc/coreutils.texi | 17 +++++++++++++ > src/uniq.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++--- > tests/misc/uniq | 15 +++++++++++ > 3 files changed, 97 insertions(+), 4 deletions(-) > > I have recently found myself wishing I could have uniq(1) skip > the last N fields before comparison. I am aware of the rev(1) trick > but I don't find it very satisfactory. So I ended up patching uniq > and implementing the feature for characters skipping as well. > > Documentation and tests included. Tests have also been run within > Valgrind on x86_64. Thank you for being so thorough. Hmm, this is quite unusual functionality. I was about to merge this with a previous feature request: http://debbugs.gnu.org/5832 But in fact supporting --key would not provide this functionality. Why does `rev | uniq -f | rev` not suffice for you? BTW you would need to start the copyright assignment process for this feature, but we'd have to decide if it generally useful enough to proceed. Perhaps a concrete example would help. cheers, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 26 11:45:03 2011 Received: (at 10365) by debbugs.gnu.org; 26 Dec 2011 16:45:03 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RfDfa-0000lv-Ie for submit@debbugs.gnu.org; Mon, 26 Dec 2011 11:45:03 -0500 Received: from mx.meyering.net ([88.168.87.75]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RfDfY-0000lS-7L for 10365@debbugs.gnu.org; Mon, 26 Dec 2011 11:45:01 -0500 Received: from rho.meyering.net (localhost.localdomain [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id F2464600E3; Mon, 26 Dec 2011 17:42:25 +0100 (CET) From: Jim Meyering To: =?iso-8859-1?Q?P=E1draig?= Brady Subject: Re: bug#10365: [PATCH] uniq: add ability to skip last N chars or fields In-Reply-To: <4EF8A259.6020404@draigBrady.com> (=?iso-8859-1?Q?=22P=E1drai?= =?iso-8859-1?Q?g?= Brady"'s message of "Mon, 26 Dec 2011 16:35:37 +0000") References: <20111225125418.GA1488@chouffe> <4EF8A259.6020404@draigBrady.com> Date: Mon, 26 Dec 2011 17:42:25 +0100 Message-ID: <87y5tzb7ym.fsf@rho.meyering.net> Lines: 45 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.7 (--) X-Debbugs-Envelope-To: 10365 Cc: Adrien Kunysz , 10365@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.7 (--) P=E1draig Brady wrote: > On 12/25/2011 12:54 PM, Adrien Kunysz wrote: >> * doc/coreutils.texi: document the new feature >> * src/uniq.c (find_end): new function >> (check_file): use find_end() to determine when to stop comparing >> (usage): document the new feature >> (main): expose the new feature to user >> * tests/misc/uniq: add tests to exercise the new code >> --- >> doc/coreutils.texi | 17 +++++++++++++ >> src/uniq.c | 69 +++++++++++++++++++++++++++++++++++++++++++++= ++++--- >> tests/misc/uniq | 15 +++++++++++ >> 3 files changed, 97 insertions(+), 4 deletions(-) >> >> I have recently found myself wishing I could have uniq(1) skip >> the last N fields before comparison. I am aware of the rev(1) trick >> but I don't find it very satisfactory. So I ended up patching uniq >> and implementing the feature for characters skipping as well. >> >> Documentation and tests included. Tests have also been run within >> Valgrind on x86_64. > > Thank you for being so thorough. > > Hmm, this is quite unusual functionality. > I was about to merge this with a previous feature request: > http://debbugs.gnu.org/5832 > But in fact supporting --key would not provide this functionality. > > Why does `rev | uniq -f | rev` not suffice for you? > > BTW you would need to start the copyright assignment process for > this feature, but we'd have to decide if it generally useful enough > to proceed. Perhaps a concrete example would help. I agree that it's borderline. If we add this functionality, I'd prefer to do it without adding new options. Instead, just accept negative values for N in the three options that accept counts: $ uniq --help|grep -w N -f, --skip-fields=3DN avoid comparing the first N fields -s, --skip-chars=3DN avoid comparing the first N characters -w, --check-chars=3DN compare no more than N characters in lines From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 26 12:42:10 2011 Received: (at 10365) by debbugs.gnu.org; 26 Dec 2011 17:42:10 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RfEYr-0002pm-Mn for submit@debbugs.gnu.org; Mon, 26 Dec 2011 12:42:09 -0500 Received: from smtp.cs.ucla.edu ([131.179.128.62]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RfEYo-0002pa-IP; Mon, 26 Dec 2011 12:42:07 -0500 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 970DFA60008; Mon, 26 Dec 2011 09:39:31 -0800 (PST) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uLml8sLPEteI; Mon, 26 Dec 2011 09:39:31 -0800 (PST) Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net [71.189.109.235]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 310FCA60007; Mon, 26 Dec 2011 09:39:31 -0800 (PST) Message-ID: <4EF8B14E.6030004@cs.ucla.edu> Date: Mon, 26 Dec 2011 09:39:26 -0800 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux i686; rv:8.0) Gecko/20111124 Thunderbird/8.0 MIME-Version: 1.0 To: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= Subject: Re: bug#10365: [PATCH] uniq: add ability to skip last N chars or fields References: <20111225125418.GA1488@chouffe> <4EF8A259.6020404@draigBrady.com> In-Reply-To: <4EF8A259.6020404@draigBrady.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 10365 Cc: Adrien Kunysz , 10365@debbugs.gnu.org, 5832@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.9 (--) On 12/26/11 08:35, P=C3=A1draig Brady wrote: > supporting --key would not provide this functionality. It would support it in the most common cases, no? That is, if every line has (say) 10 fields, then the proposed 'uniq -F3' would be equivalent to the proposed 'uniq -k1,7'. I can't offhand think of good use cases for uniq -F that would not be subsumed by uniq -k. From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 26 13:06:18 2011 Received: (at 10365) by debbugs.gnu.org; 26 Dec 2011 18:06:18 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RfEwE-00048I-FB for submit@debbugs.gnu.org; Mon, 26 Dec 2011 13:06:18 -0500 Received: from mail3.vodafone.ie ([213.233.128.45]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RfEwC-000487-FB; Mon, 26 Dec 2011 13:06:17 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApMBAKi2+E5tTn9m/2dsb2JhbAAMMQYOhQGkXYVSAQEBAwEjDwFGBQsLDQsCAgUWCwICCQMCAQIBRQYNAQcBAYd2pBuRB4EvhzKCGIEWBJpxjA43 Received: from unknown (HELO [192.168.1.79]) ([109.78.127.102]) by mail3.vodafone.ie with ESMTP; 26 Dec 2011 18:03:41 +0000 Message-ID: <4EF8B6FC.3080407@draigBrady.com> Date: Mon, 26 Dec 2011 18:03:40 +0000 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0 MIME-Version: 1.0 To: Paul Eggert Subject: Re: bug#10365: [PATCH] uniq: add ability to skip last N chars or fields References: <20111225125418.GA1488@chouffe> <4EF8A259.6020404@draigBrady.com> <4EF8B14E.6030004@cs.ucla.edu> In-Reply-To: <4EF8B14E.6030004@cs.ucla.edu> X-Enigmail-Version: 1.3.2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.5 (--) X-Debbugs-Envelope-To: 10365 Cc: Adrien Kunysz , 10365@debbugs.gnu.org, 5832@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.5 (--) On 12/26/2011 05:39 PM, Paul Eggert wrote: > On 12/26/11 08:35, Pádraig Brady wrote: >> supporting --key would not provide this functionality. > > It would support it in the most common cases, no? > That is, if every line has (say) 10 fields, then > the proposed 'uniq -F3' would be equivalent to > the proposed 'uniq -k1,7'. That's what I thought at first too, but then why didn't Adrien propose the more normal --check-fields=7 rather than the unusual -F3. > I can't offhand think of good use cases for uniq -F > that would not be subsumed by uniq -k. Me too, Having a variable number of fields per line, but ignoring the last constant N fields is very unusual, and why I asked for a concrete example. Personally I'm leaning towards suggesting `the rev| uniq -f | rev` is fine for this edge case. cheers, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 26 14:00:41 2011 Received: (at 10365) by debbugs.gnu.org; 26 Dec 2011 19:00:41 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RfFmr-0006s4-Js for submit@debbugs.gnu.org; Mon, 26 Dec 2011 14:00:41 -0500 Received: from xvm-20-226.ghst.net ([92.243.20.226] helo=fruli.krunch.be) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RfFmp-0006rx-Bl for 10365@debbugs.gnu.org; Mon, 26 Dec 2011 14:00:41 -0500 Received: from localhost (localhost [127.0.0.1]) by fruli.krunch.be (Postfix) with ESMTP id D89FD22804; Mon, 26 Dec 2011 18:55:18 +0000 (UTC) Date: Mon, 26 Dec 2011 19:00:36 +0000 From: Adrien Kunysz To: Jim Meyering Subject: Re: bug#10365: [PATCH] uniq: add ability to skip last N chars or fields Message-ID: <20111226185944.GB1438@chouffe> References: <20111225125418.GA1488@chouffe> <4EF8A259.6020404@draigBrady.com> <87y5tzb7ym.fsf@rho.meyering.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="EY/WZ/HvNxOox07X" Content-Disposition: inline In-Reply-To: <87y5tzb7ym.fsf@rho.meyering.net> User-Agent: Mutt/1.5.20 (2009-06-14) X-Spam-Score: -4.6 (----) X-Debbugs-Envelope-To: 10365 Cc: 10365@debbugs.gnu.org, =?iso-8859-1?Q?P=E1draig?= Brady X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -3.9 (---) --EY/WZ/HvNxOox07X Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Dec 26, 2011 at 05:42:25PM +0100, Jim Meyering wrote: > P=E1draig Brady wrote: >=20 > > On 12/25/2011 12:54 PM, Adrien Kunysz wrote: > >> * doc/coreutils.texi: document the new feature > >> * src/uniq.c (find_end): new function > >> (check_file): use find_end() to determine when to stop comparing > >> (usage): document the new feature > >> (main): expose the new feature to user > >> * tests/misc/uniq: add tests to exercise the new code > >> --- > >> doc/coreutils.texi | 17 +++++++++++++ > >> src/uniq.c | 69 +++++++++++++++++++++++++++++++++++++++++++= ++++++--- > >> tests/misc/uniq | 15 +++++++++++ > >> 3 files changed, 97 insertions(+), 4 deletions(-) > >> > >> I have recently found myself wishing I could have uniq(1) skip > >> the last N fields before comparison. I am aware of the rev(1) trick > >> but I don't find it very satisfactory. So I ended up patching uniq > >> and implementing the feature for characters skipping as well. > >> > >> Documentation and tests included. Tests have also been run within > >> Valgrind on x86_64. > > > > Thank you for being so thorough. > > > > Hmm, this is quite unusual functionality. > > I was about to merge this with a previous feature request: > > http://debbugs.gnu.org/5832 > > But in fact supporting --key would not provide this functionality. > > > > Why does `rev | uniq -f | rev` not suffice for you? It just doesn't look very nice to me but I admit it actually works fine. > > BTW you would need to start the copyright assignment process for > > this feature, but we'd have to decide if it generally useful enough > > to proceed. Perhaps a concrete example would help. I ended up refactoring my script in such a way that I don't need either so I don't even have a concrete use case for this any more :) If anybody finds this useful enough to be merged I am happy to go through the copyright assignment process. > I agree that it's borderline. > If we add this functionality, I'd prefer to do it without adding new > options. Instead, just accept negative values for N in the three > options that accept counts: >=20 > $ uniq --help|grep -w N > -f, --skip-fields=3DN avoid comparing the first N fields > -s, --skip-chars=3DN avoid comparing the first N characters > -w, --check-chars=3DN compare no more than N characters in lines I initially wanted to implement it by using negative values for -f but then realised it would mean you can't say "-f2 -F3" for example. I wasn't aware of the feature request for --key and I think that certainly looks more useful (with or without supporting negative field indexes). I might try to write a patch for that later but don't hold your breath. --EY/WZ/HvNxOox07X Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iEYEARECAAYFAk74xFQACgkQKLX03ZSPZGyyeACfVNgjvLl2J6P9FJLXAA9879s0 HdUAn13Qo2D7LGPKVJCinV9x7zbwVV62 =PIgO -----END PGP SIGNATURE----- --EY/WZ/HvNxOox07X-- From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 26 14:38:31 2011 Received: (at 10365-done) by debbugs.gnu.org; 26 Dec 2011 19:38:31 +0000 Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RfGNT-0007jf-4J for submit@debbugs.gnu.org; Mon, 26 Dec 2011 14:38:31 -0500 Received: from mail3.vodafone.ie ([213.233.128.45]) by debbugs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1RfGNQ-0007jW-Lz for 10365-done@debbugs.gnu.org; Mon, 26 Dec 2011 14:38:29 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApMBAMDL+E5tTn9m/2dsb2JhbAAMMQavPgEBAQQyAUYQCw0LCRYPCQMCAQIBRQYNAQcBAb0riGGDLgSacYUDh0I Received: from unknown (HELO [192.168.1.79]) ([109.78.127.102]) by mail3.vodafone.ie with ESMTP; 26 Dec 2011 19:35:53 +0000 Message-ID: <4EF8CC98.6060100@draigBrady.com> Date: Mon, 26 Dec 2011 19:35:52 +0000 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0 MIME-Version: 1.0 To: Adrien Kunysz Subject: Re: bug#10365: [PATCH] uniq: add ability to skip last N chars or fields References: <20111225125418.GA1488@chouffe> <4EF8A259.6020404@draigBrady.com> <87y5tzb7ym.fsf@rho.meyering.net> <20111226185944.GB1438@chouffe> In-Reply-To: <20111226185944.GB1438@chouffe> X-Enigmail-Version: 1.3.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: -2.5 (--) X-Debbugs-Envelope-To: 10365-done Cc: 10365-done@debbugs.gnu.org, Jim Meyering X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -2.5 (--) On 12/26/2011 07:00 PM, Adrien Kunysz wrote: > On Mon, Dec 26, 2011 at 05:42:25PM +0100, Jim Meyering wrote: >> Pádraig Brady wrote: >> >>> On 12/25/2011 12:54 PM, Adrien Kunysz wrote: >>>> * doc/coreutils.texi: document the new feature >>>> * src/uniq.c (find_end): new function >>>> (check_file): use find_end() to determine when to stop comparing >>>> (usage): document the new feature >>>> (main): expose the new feature to user >>>> * tests/misc/uniq: add tests to exercise the new code >>>> --- >>>> doc/coreutils.texi | 17 +++++++++++++ >>>> src/uniq.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++--- >>>> tests/misc/uniq | 15 +++++++++++ >>>> 3 files changed, 97 insertions(+), 4 deletions(-) >>>> >>>> I have recently found myself wishing I could have uniq(1) skip >>>> the last N fields before comparison. I am aware of the rev(1) trick >>>> but I don't find it very satisfactory. So I ended up patching uniq >>>> and implementing the feature for characters skipping as well. >>>> >>>> Documentation and tests included. Tests have also been run within >>>> Valgrind on x86_64. >>> >>> Thank you for being so thorough. >>> >>> Hmm, this is quite unusual functionality. >>> I was about to merge this with a previous feature request: >>> http://debbugs.gnu.org/5832 >>> But in fact supporting --key would not provide this functionality. >>> >>> Why does `rev | uniq -f | rev` not suffice for you? > > It just doesn't look very nice to me but I admit it actually works fine. OK given this, and that --key should handle most of these cases, I'm going to close this request. cheers, Pádraig. From unknown Sun Jun 22 03:57:26 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 24 Jan 2012 12:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator