From debbugs-submit-bounces@debbugs.gnu.org Sun Dec 09 05:28:33 2012 Received: (at submit) by debbugs.gnu.org; 9 Dec 2012 10:28:33 +0000 Received: from localhost ([127.0.0.1]:33641 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1The7c-0004qE-CY for submit@debbugs.gnu.org; Sun, 09 Dec 2012 05:28:33 -0500 Received: from eggs.gnu.org ([208.118.235.92]:43569) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1The7Y-0004q6-Vu for submit@debbugs.gnu.org; Sun, 09 Dec 2012 05:28:31 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1The72-00030w-1t for submit@debbugs.gnu.org; Sun, 09 Dec 2012 05:27:59 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, RP_MATCHES_RCVD autolearn=unavailable version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:46727) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1The71-00030q-UK for submit@debbugs.gnu.org; Sun, 09 Dec 2012 05:27:55 -0500 Received: from eggs.gnu.org ([208.118.235.92]:54931) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1The6y-0000UR-5l for bug-coreutils@gnu.org; Sun, 09 Dec 2012 05:27:55 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1The6t-00030X-TR for bug-coreutils@gnu.org; Sun, 09 Dec 2012 05:27:52 -0500 Received: from mailout-eu.gmx.com ([213.165.64.42]:49853) by eggs.gnu.org with smtp (Exim 4.71) (envelope-from ) id 1The6t-0002zl-BZ for bug-coreutils@gnu.org; Sun, 09 Dec 2012 05:27:47 -0500 Received: (qmail invoked by alias); 09 Dec 2012 10:27:44 -0000 Received: from unknown (EHLO smag-R59-R60-R61) [151.65.149.48] by mail.gmx.com (mp-eu002) with SMTP; 09 Dec 2012 11:27:44 +0100 X-Authenticated: #130707387 X-Provags-ID: V01U2FsdGVkX18aAzgP/+sal1mdLrY9MX5nHoFwGitsDsXyo5ks0G zeb9mXglY5TsY0 Date: Sun, 9 Dec 2012 11:28:05 +0100 From: Cojocaru Alexandru To: bug-coreutils@gnu.org Subject: [PATCH] cut: use only one data strucutre Message-Id: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> X-Mailer: Sylpheed 3.3.0 (GTK+ 2.24.13; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Multipart=_Sun__9_Dec_2012_11_28_05_+0100_qC9IODJNeE9dkjEt" X-Y-GMX-Trusted: 0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 208.118.235.17 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: xojoc@gmx.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) This is a multi-part message in MIME format. --Multipart=_Sun__9_Dec_2012_11_28_05_+0100_qC9IODJNeE9dkjEt Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit >From 678c2ecfebbf7a278c14b7e6fcb815e87569cd20 Mon Sep 17 00:00:00 2001 From: Cojocaru Alexandru Date: Sun, 9 Dec 2012 10:43:10 +0100 Subject: [PATCH] cut: use only one data structure The current implementation of cut, uses a bit array, an array of `struct range_pair's, and (when --output-delimiter is specified) a hash_table. The new implementation will use only an array of `struct range_pair's. The old implementation is inefficient for the following reasons: 1. When -b with a big num is specified, it allocates a lot of useless memory for `printable_field'. 2. When --output-delimiter is specified, it will allocate 31 buckets. Even if a few ranges are specified. * src/cut.c (set_fields): Set and initialize RP instead of printable_field. * src/cut.c (print_kth): Split it. Check *only* if a given field or byte is printable. * src/cut.c (is_range_start): New function. * tests/misc/cut.pl: Check if `eol_range_start' is set correctly. --- src/cut.c | 312 +++++++++++++++++++----------------------------------- tests/misc/cut.pl | 4 + 2 files changed, 113 insertions(+), 203 deletions(-) diff --git a/src/cut.c b/src/cut.c index de9320c..545639d 100644 --- a/src/cut.c +++ b/src/cut.c @@ -53,8 +53,31 @@ } \ while (0) + +struct range_pair + { + size_t lo; + size_t hi; + }; + +/* Array of `struct range_pair' holding all the finite ranges. */ +static struct range_pair *rp; + +/* Pointer inside RP. When checking if a byte or field is selected + by a finite range, we check if it is between CURRENT_RP.LO + and CURRENT_RP.HI. If the byte or field index is greater than + CURRENT_RP.HI then we make CURRENT_RP to point to the next range pair. */ +static struct range_pair *current_rp; + +/* Number of finite ranges specified by the user. */ +static size_t n_rp; + +/* Number of `struct range_pair's allocated. */ +static size_t n_rp_allocated; + + /* Append LOW, HIGH to the list RP of range pairs, allocating additional - space if necessary. Update local variable N_RP. When allocating, + space if necessary. Update global variable N_RP. When allocating, update global variable N_RP_ALLOCATED. */ #define ADD_RANGE_PAIR(rp, low, high) \ @@ -72,11 +95,6 @@ } \ while (0) -struct range_pair - { - size_t lo; - size_t hi; - }; /* This buffer is used to support the semantics of the -s option (or lack of same) when the specified field list includes (does @@ -90,26 +108,11 @@ static char *field_1_buffer; /* The number of bytes allocated for FIELD_1_BUFFER. */ static size_t field_1_bufsize; -/* The largest field or byte index used as an endpoint of a closed - or degenerate range specification; this doesn't include the starting - index of right-open-ended ranges. For example, with either range spec - '2-5,9-', '2-3,5,9-' this variable would be set to 5. */ -static size_t max_range_endpoint; /* If nonzero, this is the index of the first field in a range that goes to end of line. */ static size_t eol_range_start; -/* This is a bit vector. - In byte mode, which bytes to output. - In field mode, which DELIM-separated fields to output. - Both bytes and fields are numbered starting with 1, - so the zeroth bit of this array is unused. - A field or byte K has been selected if - (K <= MAX_RANGE_ENDPOINT and is_printable_field(K)) - || (EOL_RANGE_START > 0 && K >= EOL_RANGE_START). */ -static unsigned char *printable_field; - enum operating_mode { undefined_mode, @@ -148,15 +151,6 @@ static char *output_delimiter_string; /* True if we have ever read standard input. */ static bool have_read_stdin; -#define HT_RANGE_START_INDEX_INITIAL_CAPACITY 31 - -/* The set of range-start indices. For example, given a range-spec list like - '-b1,3-5,4-9,15-', the following indices will be recorded here: 1, 3, 15. - Note that although '4' looks like a range-start index, it is in the middle - of the '3-5' range, so it doesn't count. - This table is created/used IFF output_delimiter_specified is set. */ -static Hash_table *range_start_ht; - /* For long options that have no equivalent short option, use a non-character as a pseudo short option, starting with CHAR_MAX + 1. */ enum @@ -240,73 +234,33 @@ With no FILE, or when FILE is -, read standard input.\n\ exit (status); } -static inline void -mark_range_start (size_t i) -{ - /* Record the fact that 'i' is a range-start index. */ - void *ent_from_table = hash_insert (range_start_ht, (void*) i); - if (ent_from_table == NULL) - { - /* Insertion failed due to lack of memory. */ - xalloc_die (); - } - assert ((size_t) ent_from_table == i); -} - -static inline void -mark_printable_field (size_t i) -{ - size_t n = i / CHAR_BIT; - printable_field[n] |= (1 << (i % CHAR_BIT)); -} - -static inline bool -is_printable_field (size_t i) -{ - size_t n = i / CHAR_BIT; - return (printable_field[n] >> (i % CHAR_BIT)) & 1; -} - -static size_t -hash_int (const void *x, size_t tablesize) -{ -#ifdef UINTPTR_MAX - uintptr_t y = (uintptr_t) x; -#else - size_t y = (size_t) x; -#endif - return y % tablesize; -} +/* Return nonzero if the K'th field or byte is printable. */ static bool -hash_compare_ints (void const *x, void const *y) +print_kth (size_t k) { - return (x == y) ? true : false; -} + bool k_selected = false; + if (0 < eol_range_start && eol_range_start <= k) + k_selected = true; + else if (current_rp->lo <= k && k <= current_rp->hi) + k_selected = true; -static bool -is_range_start_index (size_t i) -{ - return hash_lookup (range_start_ht, (void *) i) ? true : false; + return k_selected ^ complement; } -/* Return nonzero if the K'th field or byte is printable. - When returning nonzero, if RANGE_START is non-NULL, - set *RANGE_START to true if K is the beginning of a range, and to - false otherwise. */ +/* Return nonzero if K'th byte is the beginning of a range. */ -static bool -print_kth (size_t k, bool *range_start) +static inline bool +is_range_start (size_t k) { - bool k_selected - = ((0 < eol_range_start && eol_range_start <= k) - || (k <= max_range_endpoint && is_printable_field (k))); + bool is_start = false; - bool is_selected = k_selected ^ complement; - if (range_start && is_selected) - *range_start = is_range_start_index (k); + if (!complement) + is_start = (k == eol_range_start || k == current_rp->lo); + else + is_start = (k == (current_rp - 1)->hi + 1); - return is_selected; + return is_start; } /* Comparison function for qsort to order the list of @@ -319,24 +273,14 @@ compare_ranges (const void *a, const void *b) return a_start < b_start ? -1 : a_start > b_start; } -/* Given the list of field or byte range specifications FIELDSTR, set - MAX_RANGE_ENDPOINT and allocate and initialize the PRINTABLE_FIELD - array. If there is a right-open-ended range, set EOL_RANGE_START - to its starting index. FIELDSTR should be composed of one or more - numbers or ranges of numbers, separated by blanks or commas. - Incomplete ranges may be given: '-m' means '1-m'; 'n-' means 'n' - through end of line. Return true if FIELDSTR contains at least - one field specification, false otherwise. */ - -/* FIXME-someday: What if the user wants to cut out the 1,000,000-th - field of some huge input file? This function shouldn't have to - allocate a table of a million bits just so we can test every - field < 10^6 with an array dereference. Instead, consider using - an adaptive approach: if the range of selected fields is too large, - but only a few fields/byte-offsets are actually selected, use a - hash table. If the range of selected fields is too large, and - too many are selected, then resort to using the range-pairs (the - 'rp' array) directly. */ +/* Given the list of field or byte range specifications FIELDSTR, + allocate and initialize the RP array. If there is a right-open-ended + range, set EOL_RANGE_START to its starting index. FIELDSTR should + be composed of one or more numbers or ranges of numbers, separated + by blanks or commas. Incomplete ranges may be given: '-m' means '1-m'; + 'n-' means 'n' through end of line. + Return true if FIELDSTR contains at least one field specification, + false otherwise. */ static bool set_fields (const char *fieldstr) @@ -349,9 +293,6 @@ set_fields (const char *fieldstr) bool field_found = false; /* True if at least one field spec has been processed. */ - struct range_pair *rp = NULL; - size_t n_rp = 0; - size_t n_rp_allocated = 0; size_t i; bool in_digits = false; @@ -403,41 +344,10 @@ set_fields (const char *fieldstr) if (value < initial) FATAL_ERROR (_("invalid decreasing range")); - /* Is there already a range going to end of line? */ - if (eol_range_start != 0) - { - /* Yes. Is the new sequence already contained - in the old one? If so, no processing is - necessary. */ - if (initial < eol_range_start) - { - /* No, the new sequence starts before the - old. Does the old range going to end of line - extend into the new range? */ - if (eol_range_start <= value) - { - /* Yes. Simply move the end of line marker. */ - eol_range_start = initial; - } - else - { - /* No. A simple range, before and disjoint from - the range going to end of line. Fill it. */ - ADD_RANGE_PAIR (rp, initial, value); - } - - /* In any case, some fields were selected. */ - field_found = true; - } - } - else - { - /* There is no range going to end of line. */ - ADD_RANGE_PAIR (rp, initial, value); - field_found = true; - } - value = 0; + ADD_RANGE_PAIR (rp, initial, value); + field_found = true; } + value = 0; } else { @@ -448,9 +358,7 @@ set_fields (const char *fieldstr) } if (*fieldstr == '\0') - { - break; - } + break; fieldstr++; lhs_specified = false; @@ -494,47 +402,43 @@ set_fields (const char *fieldstr) FATAL_ERROR (_("invalid byte, character or field list")); } - max_range_endpoint = 0; - for (i = 0; i < n_rp; i++) + qsort (rp, n_rp, sizeof (rp[0]), compare_ranges); + + /* Omit finite ranges subsumed by a to-EOL range. */ + if (eol_range_start && n_rp) { - if (rp[i].hi > max_range_endpoint) - max_range_endpoint = rp[i].hi; + i = n_rp; + while (i && eol_range_start <= rp[i - 1].hi) + { + eol_range_start = MIN (rp[i - 1].lo, eol_range_start); + --n_rp; + --i; + } } - /* Allocate an array large enough so that it may be indexed by - the field numbers corresponding to all finite ranges - (i.e. '2-6' or '-4', but not '5-') in FIELDSTR. */ - - if (max_range_endpoint) - printable_field = xzalloc (max_range_endpoint / CHAR_BIT + 1); - - qsort (rp, n_rp, sizeof (rp[0]), compare_ranges); - - /* Set the array entries corresponding to integers in the ranges of RP. */ - for (i = 0; i < n_rp; i++) + /* Merge finite range pairs (e.g. `2-5,3-4' becomes `2-5'). */ + for (i = 0; i < n_rp; ++i) { - /* Ignore any range that is subsumed by the to-EOL range. */ - if (eol_range_start && eol_range_start <= rp[i].lo) - continue; - - /* Record the range-start indices, i.e., record each start - index that is not part of any other (lo..hi] range. */ - size_t rsi_candidate = complement ? rp[i].hi + 1 : rp[i].lo; - if (output_delimiter_specified - && !is_printable_field (rsi_candidate)) - mark_range_start (rsi_candidate); - - for (size_t j = rp[i].lo; j <= rp[i].hi; j++) - mark_printable_field (j); + for (size_t j = i + 1; j < n_rp; ++j) + { + if (rp[j].lo <= rp[i].hi) + { + rp[i].hi = MAX (rp[j].hi, rp[i].hi); + memmove (rp + j, rp + j + 1, + (n_rp - j - 1) * sizeof (struct range_pair)); + --n_rp; + } + else + break; + } } - if (output_delimiter_specified - && !complement - && eol_range_start - && max_range_endpoint && !is_printable_field (eol_range_start)) - mark_range_start (eol_range_start); - free (rp); + /* After merging, reallocate RP so we realise memory to the system. + Also add a sentinel at the end of RP, so we never get memory segfault. */ + ++n_rp; + rp = xrealloc (rp, n_rp * sizeof (struct range_pair)); + rp[n_rp - 1].lo = rp[n_rp - 1].hi = 0; return field_found; } @@ -551,7 +455,8 @@ cut_bytes (FILE *stream) byte_idx = 0; print_delimiter = false; - while (1) + current_rp = rp; + while (true) { int c; /* Each character from the file. */ @@ -562,6 +467,7 @@ cut_bytes (FILE *stream) putchar ('\n'); byte_idx = 0; print_delimiter = false; + current_rp = rp; } else if (c == EOF) { @@ -571,16 +477,21 @@ cut_bytes (FILE *stream) } else { - bool range_start; - bool *rs = output_delimiter_specified ? &range_start : NULL; - if (print_kth (++byte_idx, rs)) + ++byte_idx; + if ((current_rp->hi < byte_idx) && (current_rp < rp + n_rp - 1)) + ++current_rp; + if (print_kth (byte_idx)) { - if (rs && *rs && print_delimiter) + if (output_delimiter_specified) { - fwrite (output_delimiter_string, sizeof (char), - output_delimiter_length, stdout); + if (print_delimiter && is_range_start (byte_idx)) + { + fwrite (output_delimiter_string, sizeof (char), + output_delimiter_length, stdout); + } + print_delimiter = true; } - print_delimiter = true; + putchar (c); } } @@ -597,6 +508,8 @@ cut_fields (FILE *stream) bool found_any_selected_field = false; bool buffer_first_field; + current_rp = rp; + c = getc (stream); if (c == EOF) return; @@ -609,7 +522,7 @@ cut_fields (FILE *stream) and the first field has been selected, or if non-delimited lines must be suppressed and the first field has *not* been selected. That is because a non-delimited line has exactly one field. */ - buffer_first_field = (suppress_non_delimited ^ !print_kth (1, NULL)); + buffer_first_field = (suppress_non_delimited ^ !print_kth (1)); while (1) { @@ -650,7 +563,7 @@ cut_fields (FILE *stream) } continue; } - if (print_kth (1, NULL)) + if (print_kth (1)) { /* Print the field, but not the trailing delimiter. */ fwrite (field_1_buffer, sizeof (char), n_bytes - 1, stdout); @@ -661,7 +574,7 @@ cut_fields (FILE *stream) if (c != EOF) { - if (print_kth (field_idx, NULL)) + if (print_kth (field_idx)) { if (found_any_selected_field) { @@ -695,7 +608,11 @@ cut_fields (FILE *stream) } if (c == delim) - ++field_idx; + { + ++field_idx; + if ((field_idx > current_rp->hi) && (current_rp < rp + n_rp - 1)) + ++current_rp; + } else if (c == '\n' || c == EOF) { if (found_any_selected_field @@ -704,6 +621,7 @@ cut_fields (FILE *stream) if (c == EOF) break; field_idx = 1; + current_rp = rp; found_any_selected_field = false; } } @@ -854,16 +772,6 @@ main (int argc, char **argv) FATAL_ERROR (_("suppressing non-delimited lines makes sense\n\ \tonly when operating on fields")); - if (output_delimiter_specified) - { - range_start_ht = hash_initialize (HT_RANGE_START_INDEX_INITIAL_CAPACITY, - NULL, hash_int, - hash_compare_ints, NULL); - if (range_start_ht == NULL) - xalloc_die (); - - } - if (! set_fields (spec_list_string)) { if (operating_mode == field_mode) @@ -890,8 +798,6 @@ main (int argc, char **argv) for (ok = true; optind < argc; optind++) ok &= cut_file (argv[optind]); - if (range_start_ht) - hash_free (range_start_ht); if (have_read_stdin && fclose (stdin) == EOF) { diff --git a/tests/misc/cut.pl b/tests/misc/cut.pl index 0f0a3a3..120880c 100755 --- a/tests/misc/cut.pl +++ b/tests/misc/cut.pl @@ -182,6 +182,10 @@ my @Tests = {IN=>"123456\n"}, {OUT=>"23456\n"}], ['EOL-subsumed-3', '--complement -b3,4-4,5,2-', {IN=>"123456\n"}, {OUT=>"1\n"}], + + ['EOL-subsumed-4', '--output-d=: -b1-2,2-3,3-', + {IN=>"1234\n"}, {OUT=>"1234\n"}], + ); if ($mb_locale ne 'C') -- 1.8.0.1 Best regards, Cojocaru Alexandru --Multipart=_Sun__9_Dec_2012_11_28_05_+0100_qC9IODJNeE9dkjEt Content-Type: application/octet-stream; name="DIFF-new-impl" Content-Disposition: attachment; filename="DIFF-new-impl" Content-Transfer-Encoding: base64 RnJvbSA2NzhjMmVjZmViYmY3YTI3OGMxNGI3ZTZmY2I4MTVlODc1NjljZDIwIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBDb2pvY2FydSBBbGV4YW5kcnUgPHhvam9jQGdteC5jb20+CkRh dGU6IFN1biwgOSBEZWMgMjAxMiAxMDo0MzoxMCArMDEwMApTdWJqZWN0OiBbUEFUQ0hdIGN1dDog dXNlIG9ubHkgb25lIGRhdGEgc3RydWN0dXJlCgpUaGUgY3VycmVudCBpbXBsZW1lbnRhdGlvbiBv ZiBjdXQsIHVzZXMgYSBiaXQgYXJyYXksCmFuIGFycmF5IG9mIGBzdHJ1Y3QgcmFuZ2VfcGFpcidz LCBhbmQgKHdoZW4gLS1vdXRwdXQtZGVsaW1pdGVyCmlzIHNwZWNpZmllZCkgYSBoYXNoX3RhYmxl LiBUaGUgbmV3IGltcGxlbWVudGF0aW9uIHdpbGwgdXNlCm9ubHkgYW4gYXJyYXkgb2YgYHN0cnVj dCByYW5nZV9wYWlyJ3MuClRoZSBvbGQgaW1wbGVtZW50YXRpb24gaXMgaW5lZmZpY2llbnQgZm9y IHRoZSBmb2xsb3dpbmcgcmVhc29uczoKIDEuIFdoZW4gLWIgd2l0aCBhIGJpZyBudW0gaXMgc3Bl Y2lmaWVkLCBpdCBhbGxvY2F0ZXMgYSBsb3Qgb2YgdXNlbGVzcwogICAgbWVtb3J5IGZvciBgcHJp bnRhYmxlX2ZpZWxkJy4KIDIuIFdoZW4gLS1vdXRwdXQtZGVsaW1pdGVyIGlzIHNwZWNpZmllZCwg aXQgd2lsbCBhbGxvY2F0ZSAzMSBidWNrZXRzLgogICAgRXZlbiBpZiBhIGZldyByYW5nZXMgYXJl IHNwZWNpZmllZC4KCiogc3JjL2N1dC5jIChzZXRfZmllbGRzKTogU2V0IGFuZCBpbml0aWFsaXpl IFJQCmluc3RlYWQgb2YgcHJpbnRhYmxlX2ZpZWxkLgoqIHNyYy9jdXQuYyAocHJpbnRfa3RoKTog U3BsaXQgaXQuIENoZWNrICpvbmx5KiBpZiBhIGdpdmVuIGZpZWxkCm9yIGJ5dGUgaXMgcHJpbnRh YmxlLgoqIHNyYy9jdXQuYyAoaXNfcmFuZ2Vfc3RhcnQpOiBOZXcgZnVuY3Rpb24uCiogdGVzdHMv bWlzYy9jdXQucGw6IENoZWNrIGlmIGBlb2xfcmFuZ2Vfc3RhcnQnIGlzIHNldCBjb3JyZWN0bHku Ci0tLQogc3JjL2N1dC5jICAgICAgICAgfCAzMTIgKysrKysrKysrKysrKysrKysrKy0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tCiB0ZXN0cy9taXNjL2N1dC5wbCB8ICAgNCArCiAy IGZpbGVzIGNoYW5nZWQsIDExMyBpbnNlcnRpb25zKCspLCAyMDMgZGVsZXRpb25zKC0pCgpkaWZm IC0tZ2l0IGEvc3JjL2N1dC5jIGIvc3JjL2N1dC5jCmluZGV4IGRlOTMyMGMuLjU0NTYzOWQgMTAw NjQ0Ci0tLSBhL3NyYy9jdXQuYworKysgYi9zcmMvY3V0LmMKQEAgLTUzLDggKzUzLDMxIEBACiAg ICAgfQkJCQkJCQkJCVwKICAgd2hpbGUgKDApCiAKKworc3RydWN0IHJhbmdlX3BhaXIKKyAgewor ICAgIHNpemVfdCBsbzsKKyAgICBzaXplX3QgaGk7CisgIH07CisKKy8qIEFycmF5IG9mIGBzdHJ1 Y3QgcmFuZ2VfcGFpcicgaG9sZGluZyBhbGwgdGhlIGZpbml0ZSByYW5nZXMuICovCitzdGF0aWMg c3RydWN0IHJhbmdlX3BhaXIgKnJwOworCisvKiBQb2ludGVyIGluc2lkZSBSUC4gV2hlbiBjaGVj a2luZyBpZiBhIGJ5dGUgb3IgZmllbGQgaXMgc2VsZWN0ZWQKKyAgIGJ5IGEgZmluaXRlIHJhbmdl LCB3ZSBjaGVjayBpZiBpdCBpcyBiZXR3ZWVuIENVUlJFTlRfUlAuTE8KKyAgIGFuZCBDVVJSRU5U X1JQLkhJLiBJZiB0aGUgYnl0ZSBvciBmaWVsZCBpbmRleCBpcyBncmVhdGVyIHRoYW4KKyAgIENV UlJFTlRfUlAuSEkgdGhlbiB3ZSBtYWtlIENVUlJFTlRfUlAgdG8gcG9pbnQgdG8gdGhlIG5leHQg cmFuZ2UgcGFpci4gKi8KK3N0YXRpYyBzdHJ1Y3QgcmFuZ2VfcGFpciAqY3VycmVudF9ycDsKKwor LyogTnVtYmVyIG9mIGZpbml0ZSByYW5nZXMgc3BlY2lmaWVkIGJ5IHRoZSB1c2VyLiAqLworc3Rh dGljIHNpemVfdCBuX3JwOworCisvKiBOdW1iZXIgb2YgYHN0cnVjdCByYW5nZV9wYWlyJ3MgYWxs b2NhdGVkLiAqLworc3RhdGljIHNpemVfdCBuX3JwX2FsbG9jYXRlZDsKKworCiAvKiBBcHBlbmQg TE9XLCBISUdIIHRvIHRoZSBsaXN0IFJQIG9mIHJhbmdlIHBhaXJzLCBhbGxvY2F0aW5nIGFkZGl0 aW9uYWwKLSAgIHNwYWNlIGlmIG5lY2Vzc2FyeS4gIFVwZGF0ZSBsb2NhbCB2YXJpYWJsZSBOX1JQ LiAgV2hlbiBhbGxvY2F0aW5nLAorICAgc3BhY2UgaWYgbmVjZXNzYXJ5LiAgVXBkYXRlIGdsb2Jh bCB2YXJpYWJsZSBOX1JQLiAgV2hlbiBhbGxvY2F0aW5nLAogICAgdXBkYXRlIGdsb2JhbCB2YXJp YWJsZSBOX1JQX0FMTE9DQVRFRC4gICovCiAKICNkZWZpbmUgQUREX1JBTkdFX1BBSVIocnAsIGxv dywgaGlnaCkJCQlcCkBAIC03MiwxMSArOTUsNiBAQAogICAgIH0JCQkJCQkJXAogICB3aGlsZSAo MCkKIAotc3RydWN0IHJhbmdlX3BhaXIKLSAgewotICAgIHNpemVfdCBsbzsKLSAgICBzaXplX3Qg aGk7Ci0gIH07CiAKIC8qIFRoaXMgYnVmZmVyIGlzIHVzZWQgdG8gc3VwcG9ydCB0aGUgc2VtYW50 aWNzIG9mIHRoZSAtcyBvcHRpb24KICAgIChvciBsYWNrIG9mIHNhbWUpIHdoZW4gdGhlIHNwZWNp ZmllZCBmaWVsZCBsaXN0IGluY2x1ZGVzIChkb2VzCkBAIC05MCwyNiArMTA4LDExIEBAIHN0YXRp YyBjaGFyICpmaWVsZF8xX2J1ZmZlcjsKIC8qIFRoZSBudW1iZXIgb2YgYnl0ZXMgYWxsb2NhdGVk IGZvciBGSUVMRF8xX0JVRkZFUi4gICovCiBzdGF0aWMgc2l6ZV90IGZpZWxkXzFfYnVmc2l6ZTsK IAotLyogVGhlIGxhcmdlc3QgZmllbGQgb3IgYnl0ZSBpbmRleCB1c2VkIGFzIGFuIGVuZHBvaW50 IG9mIGEgY2xvc2VkCi0gICBvciBkZWdlbmVyYXRlIHJhbmdlIHNwZWNpZmljYXRpb247ICB0aGlz IGRvZXNuJ3QgaW5jbHVkZSB0aGUgc3RhcnRpbmcKLSAgIGluZGV4IG9mIHJpZ2h0LW9wZW4tZW5k ZWQgcmFuZ2VzLiAgRm9yIGV4YW1wbGUsIHdpdGggZWl0aGVyIHJhbmdlIHNwZWMKLSAgICcyLTUs OS0nLCAnMi0zLDUsOS0nIHRoaXMgdmFyaWFibGUgd291bGQgYmUgc2V0IHRvIDUuICAqLwotc3Rh dGljIHNpemVfdCBtYXhfcmFuZ2VfZW5kcG9pbnQ7CiAKIC8qIElmIG5vbnplcm8sIHRoaXMgaXMg dGhlIGluZGV4IG9mIHRoZSBmaXJzdCBmaWVsZCBpbiBhIHJhbmdlIHRoYXQgZ29lcwogICAgdG8g ZW5kIG9mIGxpbmUuICovCiBzdGF0aWMgc2l6ZV90IGVvbF9yYW5nZV9zdGFydDsKIAotLyogVGhp cyBpcyBhIGJpdCB2ZWN0b3IuCi0gICBJbiBieXRlIG1vZGUsIHdoaWNoIGJ5dGVzIHRvIG91dHB1 dC4KLSAgIEluIGZpZWxkIG1vZGUsIHdoaWNoIERFTElNLXNlcGFyYXRlZCBmaWVsZHMgdG8gb3V0 cHV0LgotICAgQm90aCBieXRlcyBhbmQgZmllbGRzIGFyZSBudW1iZXJlZCBzdGFydGluZyB3aXRo IDEsCi0gICBzbyB0aGUgemVyb3RoIGJpdCBvZiB0aGlzIGFycmF5IGlzIHVudXNlZC4KLSAgIEEg ZmllbGQgb3IgYnl0ZSBLIGhhcyBiZWVuIHNlbGVjdGVkIGlmCi0gICAoSyA8PSBNQVhfUkFOR0Vf RU5EUE9JTlQgYW5kIGlzX3ByaW50YWJsZV9maWVsZChLKSkKLSAgICB8fCAoRU9MX1JBTkdFX1NU QVJUID4gMCAmJiBLID49IEVPTF9SQU5HRV9TVEFSVCkuICAqLwotc3RhdGljIHVuc2lnbmVkIGNo YXIgKnByaW50YWJsZV9maWVsZDsKLQogZW51bSBvcGVyYXRpbmdfbW9kZQogICB7CiAgICAgdW5k ZWZpbmVkX21vZGUsCkBAIC0xNDgsMTUgKzE1MSw2IEBAIHN0YXRpYyBjaGFyICpvdXRwdXRfZGVs aW1pdGVyX3N0cmluZzsKIC8qIFRydWUgaWYgd2UgaGF2ZSBldmVyIHJlYWQgc3RhbmRhcmQgaW5w dXQuICovCiBzdGF0aWMgYm9vbCBoYXZlX3JlYWRfc3RkaW47CiAKLSNkZWZpbmUgSFRfUkFOR0Vf U1RBUlRfSU5ERVhfSU5JVElBTF9DQVBBQ0lUWSAzMQotCi0vKiBUaGUgc2V0IG9mIHJhbmdlLXN0 YXJ0IGluZGljZXMuICBGb3IgZXhhbXBsZSwgZ2l2ZW4gYSByYW5nZS1zcGVjIGxpc3QgbGlrZQot ICAgJy1iMSwzLTUsNC05LDE1LScsIHRoZSBmb2xsb3dpbmcgaW5kaWNlcyB3aWxsIGJlIHJlY29y ZGVkIGhlcmU6IDEsIDMsIDE1LgotICAgTm90ZSB0aGF0IGFsdGhvdWdoICc0JyBsb29rcyBsaWtl IGEgcmFuZ2Utc3RhcnQgaW5kZXgsIGl0IGlzIGluIHRoZSBtaWRkbGUKLSAgIG9mIHRoZSAnMy01 JyByYW5nZSwgc28gaXQgZG9lc24ndCBjb3VudC4KLSAgIFRoaXMgdGFibGUgaXMgY3JlYXRlZC91 c2VkIElGRiBvdXRwdXRfZGVsaW1pdGVyX3NwZWNpZmllZCBpcyBzZXQuICAqLwotc3RhdGljIEhh c2hfdGFibGUgKnJhbmdlX3N0YXJ0X2h0OwotCiAvKiBGb3IgbG9uZyBvcHRpb25zIHRoYXQgaGF2 ZSBubyBlcXVpdmFsZW50IHNob3J0IG9wdGlvbiwgdXNlIGEKICAgIG5vbi1jaGFyYWN0ZXIgYXMg YSBwc2V1ZG8gc2hvcnQgb3B0aW9uLCBzdGFydGluZyB3aXRoIENIQVJfTUFYICsgMS4gICovCiBl bnVtCkBAIC0yNDAsNzMgKzIzNCwzMyBAQCBXaXRoIG5vIEZJTEUsIG9yIHdoZW4gRklMRSBpcyAt LCByZWFkIHN0YW5kYXJkIGlucHV0LlxuXAogICBleGl0IChzdGF0dXMpOwogfQogCi1zdGF0aWMg aW5saW5lIHZvaWQKLW1hcmtfcmFuZ2Vfc3RhcnQgKHNpemVfdCBpKQotewotICAvKiBSZWNvcmQg dGhlIGZhY3QgdGhhdCAnaScgaXMgYSByYW5nZS1zdGFydCBpbmRleC4gICovCi0gIHZvaWQgKmVu dF9mcm9tX3RhYmxlID0gaGFzaF9pbnNlcnQgKHJhbmdlX3N0YXJ0X2h0LCAodm9pZCopIGkpOwot ICBpZiAoZW50X2Zyb21fdGFibGUgPT0gTlVMTCkKLSAgICB7Ci0gICAgICAvKiBJbnNlcnRpb24g ZmFpbGVkIGR1ZSB0byBsYWNrIG9mIG1lbW9yeS4gICovCi0gICAgICB4YWxsb2NfZGllICgpOwot ICAgIH0KLSAgYXNzZXJ0ICgoc2l6ZV90KSBlbnRfZnJvbV90YWJsZSA9PSBpKTsKLX0KLQotc3Rh dGljIGlubGluZSB2b2lkCi1tYXJrX3ByaW50YWJsZV9maWVsZCAoc2l6ZV90IGkpCi17Ci0gIHNp emVfdCBuID0gaSAvIENIQVJfQklUOwotICBwcmludGFibGVfZmllbGRbbl0gfD0gKDEgPDwgKGkg JSBDSEFSX0JJVCkpOwotfQotCi1zdGF0aWMgaW5saW5lIGJvb2wKLWlzX3ByaW50YWJsZV9maWVs ZCAoc2l6ZV90IGkpCi17Ci0gIHNpemVfdCBuID0gaSAvIENIQVJfQklUOwotICByZXR1cm4gKHBy aW50YWJsZV9maWVsZFtuXSA+PiAoaSAlIENIQVJfQklUKSkgJiAxOwotfQotCi1zdGF0aWMgc2l6 ZV90Ci1oYXNoX2ludCAoY29uc3Qgdm9pZCAqeCwgc2l6ZV90IHRhYmxlc2l6ZSkKLXsKLSNpZmRl ZiBVSU5UUFRSX01BWAotICB1aW50cHRyX3QgeSA9ICh1aW50cHRyX3QpIHg7Ci0jZWxzZQotICBz aXplX3QgeSA9IChzaXplX3QpIHg7Ci0jZW5kaWYKLSAgcmV0dXJuIHkgJSB0YWJsZXNpemU7Ci19 CisvKiBSZXR1cm4gbm9uemVybyBpZiB0aGUgSyd0aCBmaWVsZCBvciBieXRlIGlzIHByaW50YWJs ZS4gKi8KIAogc3RhdGljIGJvb2wKLWhhc2hfY29tcGFyZV9pbnRzICh2b2lkIGNvbnN0ICp4LCB2 b2lkIGNvbnN0ICp5KQorcHJpbnRfa3RoIChzaXplX3QgaykKIHsKLSAgcmV0dXJuICh4ID09IHkp ID8gdHJ1ZSA6IGZhbHNlOwotfQorICBib29sIGtfc2VsZWN0ZWQgPSBmYWxzZTsKKyAgaWYgKDAg PCBlb2xfcmFuZ2Vfc3RhcnQgJiYgZW9sX3JhbmdlX3N0YXJ0IDw9IGspCisgICAga19zZWxlY3Rl ZCA9IHRydWU7CisgIGVsc2UgaWYgKGN1cnJlbnRfcnAtPmxvIDw9IGsgJiYgayA8PSBjdXJyZW50 X3JwLT5oaSkKKyAgICBrX3NlbGVjdGVkID0gdHJ1ZTsKIAotc3RhdGljIGJvb2wKLWlzX3Jhbmdl X3N0YXJ0X2luZGV4IChzaXplX3QgaSkKLXsKLSAgcmV0dXJuIGhhc2hfbG9va3VwIChyYW5nZV9z dGFydF9odCwgKHZvaWQgKikgaSkgPyB0cnVlIDogZmFsc2U7CisgIHJldHVybiBrX3NlbGVjdGVk IF4gY29tcGxlbWVudDsKIH0KIAotLyogUmV0dXJuIG5vbnplcm8gaWYgdGhlIEsndGggZmllbGQg b3IgYnl0ZSBpcyBwcmludGFibGUuCi0gICBXaGVuIHJldHVybmluZyBub256ZXJvLCBpZiBSQU5H RV9TVEFSVCBpcyBub24tTlVMTCwKLSAgIHNldCAqUkFOR0VfU1RBUlQgdG8gdHJ1ZSBpZiBLIGlz IHRoZSBiZWdpbm5pbmcgb2YgYSByYW5nZSwgYW5kIHRvCi0gICBmYWxzZSBvdGhlcndpc2UuICAq LworLyogUmV0dXJuIG5vbnplcm8gaWYgSyd0aCBieXRlIGlzIHRoZSBiZWdpbm5pbmcgb2YgYSBy YW5nZS4gKi8KIAotc3RhdGljIGJvb2wKLXByaW50X2t0aCAoc2l6ZV90IGssIGJvb2wgKnJhbmdl X3N0YXJ0KQorc3RhdGljIGlubGluZSBib29sCitpc19yYW5nZV9zdGFydCAoc2l6ZV90IGspCiB7 Ci0gIGJvb2wga19zZWxlY3RlZAotICAgID0gKCgwIDwgZW9sX3JhbmdlX3N0YXJ0ICYmIGVvbF9y YW5nZV9zdGFydCA8PSBrKQotICAgICAgIHx8IChrIDw9IG1heF9yYW5nZV9lbmRwb2ludCAmJiBp c19wcmludGFibGVfZmllbGQgKGspKSk7CisgIGJvb2wgaXNfc3RhcnQgPSBmYWxzZTsKIAotICBi b29sIGlzX3NlbGVjdGVkID0ga19zZWxlY3RlZCBeIGNvbXBsZW1lbnQ7Ci0gIGlmIChyYW5nZV9z dGFydCAmJiBpc19zZWxlY3RlZCkKLSAgICAqcmFuZ2Vfc3RhcnQgPSBpc19yYW5nZV9zdGFydF9p bmRleCAoayk7CisgIGlmICghY29tcGxlbWVudCkKKyAgICBpc19zdGFydCA9IChrID09IGVvbF9y YW5nZV9zdGFydCB8fCBrID09IGN1cnJlbnRfcnAtPmxvKTsKKyAgZWxzZQorICAgIGlzX3N0YXJ0 ID0gKGsgPT0gKGN1cnJlbnRfcnAgLSAxKS0+aGkgKyAxKTsKIAotICByZXR1cm4gaXNfc2VsZWN0 ZWQ7CisgIHJldHVybiBpc19zdGFydDsKIH0KIAogLyogQ29tcGFyaXNvbiBmdW5jdGlvbiBmb3Ig cXNvcnQgdG8gb3JkZXIgdGhlIGxpc3Qgb2YKQEAgLTMxOSwyNCArMjczLDE0IEBAIGNvbXBhcmVf cmFuZ2VzIChjb25zdCB2b2lkICphLCBjb25zdCB2b2lkICpiKQogICByZXR1cm4gYV9zdGFydCA8 IGJfc3RhcnQgPyAtMSA6IGFfc3RhcnQgPiBiX3N0YXJ0OwogfQogCi0vKiBHaXZlbiB0aGUgbGlz dCBvZiBmaWVsZCBvciBieXRlIHJhbmdlIHNwZWNpZmljYXRpb25zIEZJRUxEU1RSLCBzZXQKLSAg IE1BWF9SQU5HRV9FTkRQT0lOVCBhbmQgYWxsb2NhdGUgYW5kIGluaXRpYWxpemUgdGhlIFBSSU5U QUJMRV9GSUVMRAotICAgYXJyYXkuICBJZiB0aGVyZSBpcyBhIHJpZ2h0LW9wZW4tZW5kZWQgcmFu Z2UsIHNldCBFT0xfUkFOR0VfU1RBUlQKLSAgIHRvIGl0cyBzdGFydGluZyBpbmRleC4gIEZJRUxE U1RSIHNob3VsZCBiZSBjb21wb3NlZCBvZiBvbmUgb3IgbW9yZQotICAgbnVtYmVycyBvciByYW5n ZXMgb2YgbnVtYmVycywgc2VwYXJhdGVkIGJ5IGJsYW5rcyBvciBjb21tYXMuCi0gICBJbmNvbXBs ZXRlIHJhbmdlcyBtYXkgYmUgZ2l2ZW46ICctbScgbWVhbnMgJzEtbSc7ICduLScgbWVhbnMgJ24n Ci0gICB0aHJvdWdoIGVuZCBvZiBsaW5lLiAgUmV0dXJuIHRydWUgaWYgRklFTERTVFIgY29udGFp bnMgYXQgbGVhc3QKLSAgIG9uZSBmaWVsZCBzcGVjaWZpY2F0aW9uLCBmYWxzZSBvdGhlcndpc2Uu ICAqLwotCi0vKiBGSVhNRS1zb21lZGF5OiAgV2hhdCBpZiB0aGUgdXNlciB3YW50cyB0byBjdXQg b3V0IHRoZSAxLDAwMCwwMDAtdGgKLSAgIGZpZWxkIG9mIHNvbWUgaHVnZSBpbnB1dCBmaWxlPyAg VGhpcyBmdW5jdGlvbiBzaG91bGRuJ3QgaGF2ZSB0bwotICAgYWxsb2NhdGUgYSB0YWJsZSBvZiBh IG1pbGxpb24gYml0cyBqdXN0IHNvIHdlIGNhbiB0ZXN0IGV2ZXJ5Ci0gICBmaWVsZCA8IDEwXjYg d2l0aCBhbiBhcnJheSBkZXJlZmVyZW5jZS4gIEluc3RlYWQsIGNvbnNpZGVyIHVzaW5nCi0gICBh biBhZGFwdGl2ZSBhcHByb2FjaDogaWYgdGhlIHJhbmdlIG9mIHNlbGVjdGVkIGZpZWxkcyBpcyB0 b28gbGFyZ2UsCi0gICBidXQgb25seSBhIGZldyBmaWVsZHMvYnl0ZS1vZmZzZXRzIGFyZSBhY3R1 YWxseSBzZWxlY3RlZCwgdXNlIGEKLSAgIGhhc2ggdGFibGUuICBJZiB0aGUgcmFuZ2Ugb2Ygc2Vs ZWN0ZWQgZmllbGRzIGlzIHRvbyBsYXJnZSwgYW5kCi0gICB0b28gbWFueSBhcmUgc2VsZWN0ZWQs IHRoZW4gcmVzb3J0IHRvIHVzaW5nIHRoZSByYW5nZS1wYWlycyAodGhlCi0gICAncnAnIGFycmF5 KSBkaXJlY3RseS4gICovCisvKiBHaXZlbiB0aGUgbGlzdCBvZiBmaWVsZCBvciBieXRlIHJhbmdl IHNwZWNpZmljYXRpb25zIEZJRUxEU1RSLAorICAgYWxsb2NhdGUgYW5kIGluaXRpYWxpemUgdGhl IFJQIGFycmF5LiBJZiB0aGVyZSBpcyBhIHJpZ2h0LW9wZW4tZW5kZWQKKyAgIHJhbmdlLCBzZXQg RU9MX1JBTkdFX1NUQVJUIHRvIGl0cyBzdGFydGluZyBpbmRleC4gRklFTERTVFIgc2hvdWxkCisg ICBiZSBjb21wb3NlZCBvZiBvbmUgb3IgbW9yZSBudW1iZXJzIG9yIHJhbmdlcyBvZiBudW1iZXJz LCBzZXBhcmF0ZWQKKyAgIGJ5IGJsYW5rcyBvciBjb21tYXMuIEluY29tcGxldGUgcmFuZ2VzIG1h eSBiZSBnaXZlbjogJy1tJyBtZWFucyAnMS1tJzsKKyAgICduLScgbWVhbnMgJ24nIHRocm91Z2gg ZW5kIG9mIGxpbmUuCisgICBSZXR1cm4gdHJ1ZSBpZiBGSUVMRFNUUiBjb250YWlucyBhdCBsZWFz dCBvbmUgZmllbGQgc3BlY2lmaWNhdGlvbiwKKyAgIGZhbHNlIG90aGVyd2lzZS4gICovCiAKIHN0 YXRpYyBib29sCiBzZXRfZmllbGRzIChjb25zdCBjaGFyICpmaWVsZHN0cikKQEAgLTM0OSw5ICsy OTMsNiBAQCBzZXRfZmllbGRzIChjb25zdCBjaGFyICpmaWVsZHN0cikKICAgYm9vbCBmaWVsZF9m b3VuZCA9IGZhbHNlOwkvKiBUcnVlIGlmIGF0IGxlYXN0IG9uZSBmaWVsZCBzcGVjCiAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgIGhhcyBiZWVuIHByb2Nlc3NlZC4gICovCiAKLSAg c3RydWN0IHJhbmdlX3BhaXIgKnJwID0gTlVMTDsKLSAgc2l6ZV90IG5fcnAgPSAwOwotICBzaXpl X3Qgbl9ycF9hbGxvY2F0ZWQgPSAwOwogICBzaXplX3QgaTsKICAgYm9vbCBpbl9kaWdpdHMgPSBm YWxzZTsKIApAQCAtNDAzLDQxICszNDQsMTAgQEAgc2V0X2ZpZWxkcyAoY29uc3QgY2hhciAqZmll bGRzdHIpCiAgICAgICAgICAgICAgICAgICBpZiAodmFsdWUgPCBpbml0aWFsKQogICAgICAgICAg ICAgICAgICAgICBGQVRBTF9FUlJPUiAoXygiaW52YWxpZCBkZWNyZWFzaW5nIHJhbmdlIikpOwog Ci0gICAgICAgICAgICAgICAgICAvKiBJcyB0aGVyZSBhbHJlYWR5IGEgcmFuZ2UgZ29pbmcgdG8g ZW5kIG9mIGxpbmU/ICovCi0gICAgICAgICAgICAgICAgICBpZiAoZW9sX3JhbmdlX3N0YXJ0ICE9 IDApCi0gICAgICAgICAgICAgICAgICAgIHsKLSAgICAgICAgICAgICAgICAgICAgICAvKiBZZXMu ICBJcyB0aGUgbmV3IHNlcXVlbmNlIGFscmVhZHkgY29udGFpbmVkCi0gICAgICAgICAgICAgICAg ICAgICAgICAgaW4gdGhlIG9sZCBvbmU/ICBJZiBzbywgbm8gcHJvY2Vzc2luZyBpcwotICAgICAg ICAgICAgICAgICAgICAgICAgIG5lY2Vzc2FyeS4gKi8KLSAgICAgICAgICAgICAgICAgICAgICBp ZiAoaW5pdGlhbCA8IGVvbF9yYW5nZV9zdGFydCkKLSAgICAgICAgICAgICAgICAgICAgICAgIHsK LSAgICAgICAgICAgICAgICAgICAgICAgICAgLyogTm8sIHRoZSBuZXcgc2VxdWVuY2Ugc3RhcnRz IGJlZm9yZSB0aGUKLSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgb2xkLiAgRG9lcyB0aGUg b2xkIHJhbmdlIGdvaW5nIHRvIGVuZCBvZiBsaW5lCi0gICAgICAgICAgICAgICAgICAgICAgICAg ICAgIGV4dGVuZCBpbnRvIHRoZSBuZXcgcmFuZ2U/ICAqLwotICAgICAgICAgICAgICAgICAgICAg ICAgICBpZiAoZW9sX3JhbmdlX3N0YXJ0IDw9IHZhbHVlKQotICAgICAgICAgICAgICAgICAgICAg ICAgICAgIHsKLSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIC8qIFllcy4gIFNpbXBseSBt b3ZlIHRoZSBlbmQgb2YgbGluZSBtYXJrZXIuICovCi0gICAgICAgICAgICAgICAgICAgICAgICAg ICAgICBlb2xfcmFuZ2Vfc3RhcnQgPSBpbml0aWFsOwotICAgICAgICAgICAgICAgICAgICAgICAg ICAgIH0KLSAgICAgICAgICAgICAgICAgICAgICAgICAgZWxzZQotICAgICAgICAgICAgICAgICAg ICAgICAgICAgIHsKLSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIC8qIE5vLiAgQSBzaW1w bGUgcmFuZ2UsIGJlZm9yZSBhbmQgZGlzam9pbnQgZnJvbQotICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgdGhlIHJhbmdlIGdvaW5nIHRvIGVuZCBvZiBsaW5lLiAgRmlsbCBpdC4gKi8K LSAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIEFERF9SQU5HRV9QQUlSIChycCwgaW5pdGlh bCwgdmFsdWUpOwotICAgICAgICAgICAgICAgICAgICAgICAgICAgIH0KLQotICAgICAgICAgICAg ICAgICAgICAgICAgICAvKiBJbiBhbnkgY2FzZSwgc29tZSBmaWVsZHMgd2VyZSBzZWxlY3RlZC4g Ki8KLSAgICAgICAgICAgICAgICAgICAgICAgICAgZmllbGRfZm91bmQgPSB0cnVlOwotICAgICAg ICAgICAgICAgICAgICAgICAgfQotICAgICAgICAgICAgICAgICAgICB9Ci0gICAgICAgICAgICAg ICAgICBlbHNlCi0gICAgICAgICAgICAgICAgICAgIHsKLSAgICAgICAgICAgICAgICAgICAgICAv KiBUaGVyZSBpcyBubyByYW5nZSBnb2luZyB0byBlbmQgb2YgbGluZS4gKi8KLSAgICAgICAgICAg ICAgICAgICAgICBBRERfUkFOR0VfUEFJUiAocnAsIGluaXRpYWwsIHZhbHVlKTsKLSAgICAgICAg ICAgICAgICAgICAgICBmaWVsZF9mb3VuZCA9IHRydWU7Ci0gICAgICAgICAgICAgICAgICAgIH0K LSAgICAgICAgICAgICAgICAgIHZhbHVlID0gMDsKKyAgICAgICAgICAgICAgICAgIEFERF9SQU5H RV9QQUlSIChycCwgaW5pdGlhbCwgdmFsdWUpOworICAgICAgICAgICAgICAgICAgZmllbGRfZm91 bmQgPSB0cnVlOwogICAgICAgICAgICAgICAgIH0KKyAgICAgICAgICAgICAgdmFsdWUgPSAwOwog ICAgICAgICAgICAgfQogICAgICAgICAgIGVsc2UKICAgICAgICAgICAgIHsKQEAgLTQ0OCw5ICsz NTgsNyBAQCBzZXRfZmllbGRzIChjb25zdCBjaGFyICpmaWVsZHN0cikKICAgICAgICAgICAgIH0K IAogICAgICAgICAgIGlmICgqZmllbGRzdHIgPT0gJ1wwJykKLSAgICAgICAgICAgIHsKLSAgICAg ICAgICAgICAgYnJlYWs7Ci0gICAgICAgICAgICB9CisgICAgICAgICAgICBicmVhazsKIAogICAg ICAgICAgIGZpZWxkc3RyKys7CiAgICAgICAgICAgbGhzX3NwZWNpZmllZCA9IGZhbHNlOwpAQCAt NDk0LDQ3ICs0MDIsNDMgQEAgc2V0X2ZpZWxkcyAoY29uc3QgY2hhciAqZmllbGRzdHIpCiAgICAg ICAgIEZBVEFMX0VSUk9SIChfKCJpbnZhbGlkIGJ5dGUsIGNoYXJhY3RlciBvciBmaWVsZCBsaXN0 IikpOwogICAgIH0KIAotICBtYXhfcmFuZ2VfZW5kcG9pbnQgPSAwOwotICBmb3IgKGkgPSAwOyBp IDwgbl9ycDsgaSsrKQorICBxc29ydCAocnAsIG5fcnAsIHNpemVvZiAocnBbMF0pLCBjb21wYXJl X3Jhbmdlcyk7CisKKyAgLyogT21pdCBmaW5pdGUgcmFuZ2VzIHN1YnN1bWVkIGJ5IGEgdG8tRU9M IHJhbmdlLiAqLworICBpZiAoZW9sX3JhbmdlX3N0YXJ0ICYmIG5fcnApCiAgICAgewotICAgICAg aWYgKHJwW2ldLmhpID4gbWF4X3JhbmdlX2VuZHBvaW50KQotICAgICAgICBtYXhfcmFuZ2VfZW5k cG9pbnQgPSBycFtpXS5oaTsKKyAgICAgIGkgPSBuX3JwOworICAgICAgd2hpbGUgKGkgJiYgZW9s X3JhbmdlX3N0YXJ0IDw9IHJwW2kgLSAxXS5oaSkKKyAgICAgICAgeworICAgICAgICAgIGVvbF9y YW5nZV9zdGFydCA9IE1JTiAocnBbaSAtIDFdLmxvLCBlb2xfcmFuZ2Vfc3RhcnQpOworICAgICAg ICAgIC0tbl9ycDsKKyAgICAgICAgICAtLWk7CisgICAgICAgIH0KICAgICB9CiAKLSAgLyogQWxs b2NhdGUgYW4gYXJyYXkgbGFyZ2UgZW5vdWdoIHNvIHRoYXQgaXQgbWF5IGJlIGluZGV4ZWQgYnkK LSAgICAgdGhlIGZpZWxkIG51bWJlcnMgY29ycmVzcG9uZGluZyB0byBhbGwgZmluaXRlIHJhbmdl cwotICAgICAoaS5lLiAnMi02JyBvciAnLTQnLCBidXQgbm90ICc1LScpIGluIEZJRUxEU1RSLiAg Ki8KLQotICBpZiAobWF4X3JhbmdlX2VuZHBvaW50KQotICAgIHByaW50YWJsZV9maWVsZCA9IHh6 YWxsb2MgKG1heF9yYW5nZV9lbmRwb2ludCAvIENIQVJfQklUICsgMSk7Ci0KLSAgcXNvcnQgKHJw LCBuX3JwLCBzaXplb2YgKHJwWzBdKSwgY29tcGFyZV9yYW5nZXMpOwotCi0gIC8qIFNldCB0aGUg YXJyYXkgZW50cmllcyBjb3JyZXNwb25kaW5nIHRvIGludGVnZXJzIGluIHRoZSByYW5nZXMgb2Yg UlAuICAqLwotICBmb3IgKGkgPSAwOyBpIDwgbl9ycDsgaSsrKQorICAvKiBNZXJnZSBmaW5pdGUg cmFuZ2UgcGFpcnMgKGUuZy4gYDItNSwzLTQnIGJlY29tZXMgYDItNScpLiAqLworICBmb3IgKGkg PSAwOyBpIDwgbl9ycDsgKytpKQogICAgIHsKLSAgICAgIC8qIElnbm9yZSBhbnkgcmFuZ2UgdGhh dCBpcyBzdWJzdW1lZCBieSB0aGUgdG8tRU9MIHJhbmdlLiAgKi8KLSAgICAgIGlmIChlb2xfcmFu Z2Vfc3RhcnQgJiYgZW9sX3JhbmdlX3N0YXJ0IDw9IHJwW2ldLmxvKQotICAgICAgICBjb250aW51 ZTsKLQotICAgICAgLyogUmVjb3JkIHRoZSByYW5nZS1zdGFydCBpbmRpY2VzLCBpLmUuLCByZWNv cmQgZWFjaCBzdGFydAotICAgICAgICAgaW5kZXggdGhhdCBpcyBub3QgcGFydCBvZiBhbnkgb3Ro ZXIgKGxvLi5oaV0gcmFuZ2UuICAqLwotICAgICAgc2l6ZV90IHJzaV9jYW5kaWRhdGUgPSBjb21w bGVtZW50ID8gcnBbaV0uaGkgKyAxIDogcnBbaV0ubG87Ci0gICAgICBpZiAob3V0cHV0X2RlbGlt aXRlcl9zcGVjaWZpZWQKLSAgICAgICAgICAmJiAhaXNfcHJpbnRhYmxlX2ZpZWxkIChyc2lfY2Fu ZGlkYXRlKSkKLSAgICAgICAgbWFya19yYW5nZV9zdGFydCAocnNpX2NhbmRpZGF0ZSk7Ci0KLSAg ICAgIGZvciAoc2l6ZV90IGogPSBycFtpXS5sbzsgaiA8PSBycFtpXS5oaTsgaisrKQotICAgICAg ICBtYXJrX3ByaW50YWJsZV9maWVsZCAoaik7CisgICAgICBmb3IgKHNpemVfdCBqID0gaSArIDE7 IGogPCBuX3JwOyArK2opCisgICAgICAgIHsKKyAgICAgICAgICBpZiAocnBbal0ubG8gPD0gcnBb aV0uaGkpCisgICAgICAgICAgICB7CisgICAgICAgICAgICAgIHJwW2ldLmhpID0gTUFYIChycFtq XS5oaSwgcnBbaV0uaGkpOworICAgICAgICAgICAgICBtZW1tb3ZlIChycCArIGosIHJwICsgaiAr IDEsCisgICAgICAgICAgICAgICAgICAgICAgIChuX3JwIC0gaiAtIDEpICogc2l6ZW9mIChzdHJ1 Y3QgcmFuZ2VfcGFpcikpOworICAgICAgICAgICAgICAtLW5fcnA7CisgICAgICAgICAgICB9Cisg ICAgICAgICAgZWxzZQorICAgICAgICAgICAgYnJlYWs7CisgICAgICAgIH0KICAgICB9CiAKLSAg aWYgKG91dHB1dF9kZWxpbWl0ZXJfc3BlY2lmaWVkCi0gICAgICAmJiAhY29tcGxlbWVudAotICAg ICAgJiYgZW9sX3JhbmdlX3N0YXJ0Ci0gICAgICAmJiBtYXhfcmFuZ2VfZW5kcG9pbnQgJiYgIWlz X3ByaW50YWJsZV9maWVsZCAoZW9sX3JhbmdlX3N0YXJ0KSkKLSAgICBtYXJrX3JhbmdlX3N0YXJ0 IChlb2xfcmFuZ2Vfc3RhcnQpOwogCi0gIGZyZWUgKHJwKTsKKyAgLyogQWZ0ZXIgbWVyZ2luZywg cmVhbGxvY2F0ZSBSUCBzbyB3ZSByZWFsaXNlIG1lbW9yeSB0byB0aGUgc3lzdGVtLgorICAgICBB bHNvIGFkZCBhIHNlbnRpbmVsIGF0IHRoZSBlbmQgb2YgUlAsIHNvIHdlIG5ldmVyIGdldCBtZW1v cnkgc2VnZmF1bHQuICovCisgICsrbl9ycDsKKyAgcnAgPSB4cmVhbGxvYyAocnAsIG5fcnAgKiBz aXplb2YgKHN0cnVjdCByYW5nZV9wYWlyKSk7CisgIHJwW25fcnAgLSAxXS5sbyA9IHJwW25fcnAg LSAxXS5oaSA9IDA7CiAKICAgcmV0dXJuIGZpZWxkX2ZvdW5kOwogfQpAQCAtNTUxLDcgKzQ1NSw4 IEBAIGN1dF9ieXRlcyAoRklMRSAqc3RyZWFtKQogCiAgIGJ5dGVfaWR4ID0gMDsKICAgcHJpbnRf ZGVsaW1pdGVyID0gZmFsc2U7Ci0gIHdoaWxlICgxKQorICBjdXJyZW50X3JwID0gcnA7CisgIHdo aWxlICh0cnVlKQogICAgIHsKICAgICAgIGludCBjOwkJLyogRWFjaCBjaGFyYWN0ZXIgZnJvbSB0 aGUgZmlsZS4gKi8KIApAQCAtNTYyLDYgKzQ2Nyw3IEBAIGN1dF9ieXRlcyAoRklMRSAqc3RyZWFt KQogICAgICAgICAgIHB1dGNoYXIgKCdcbicpOwogICAgICAgICAgIGJ5dGVfaWR4ID0gMDsKICAg ICAgICAgICBwcmludF9kZWxpbWl0ZXIgPSBmYWxzZTsKKyAgICAgICAgICBjdXJyZW50X3JwID0g cnA7CiAgICAgICAgIH0KICAgICAgIGVsc2UgaWYgKGMgPT0gRU9GKQogICAgICAgICB7CkBAIC01 NzEsMTYgKzQ3NywyMSBAQCBjdXRfYnl0ZXMgKEZJTEUgKnN0cmVhbSkKICAgICAgICAgfQogICAg ICAgZWxzZQogICAgICAgICB7Ci0gICAgICAgICAgYm9vbCByYW5nZV9zdGFydDsKLSAgICAgICAg ICBib29sICpycyA9IG91dHB1dF9kZWxpbWl0ZXJfc3BlY2lmaWVkID8gJnJhbmdlX3N0YXJ0IDog TlVMTDsKLSAgICAgICAgICBpZiAocHJpbnRfa3RoICgrK2J5dGVfaWR4LCBycykpCisgICAgICAg ICAgKytieXRlX2lkeDsKKyAgICAgICAgICBpZiAoKGN1cnJlbnRfcnAtPmhpIDwgYnl0ZV9pZHgp ICYmIChjdXJyZW50X3JwIDwgcnAgKyBuX3JwIC0gMSkpCisgICAgICAgICAgICArK2N1cnJlbnRf cnA7CisgICAgICAgICAgaWYgKHByaW50X2t0aCAoYnl0ZV9pZHgpKQogICAgICAgICAgICAgewot ICAgICAgICAgICAgICBpZiAocnMgJiYgKnJzICYmIHByaW50X2RlbGltaXRlcikKKyAgICAgICAg ICAgICAgaWYgKG91dHB1dF9kZWxpbWl0ZXJfc3BlY2lmaWVkKQogICAgICAgICAgICAgICAgIHsK LSAgICAgICAgICAgICAgICAgIGZ3cml0ZSAob3V0cHV0X2RlbGltaXRlcl9zdHJpbmcsIHNpemVv ZiAoY2hhciksCi0gICAgICAgICAgICAgICAgICAgICAgICAgIG91dHB1dF9kZWxpbWl0ZXJfbGVu Z3RoLCBzdGRvdXQpOworICAgICAgICAgICAgICAgICAgaWYgKHByaW50X2RlbGltaXRlciAmJiBp c19yYW5nZV9zdGFydCAoYnl0ZV9pZHgpKQorICAgICAgICAgICAgICAgICAgICB7CisgICAgICAg ICAgICAgICAgICAgICAgZndyaXRlIChvdXRwdXRfZGVsaW1pdGVyX3N0cmluZywgc2l6ZW9mIChj aGFyKSwKKyAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIG91dHB1dF9kZWxpbWl0ZXJfbGVu Z3RoLCBzdGRvdXQpOworICAgICAgICAgICAgICAgICAgICB9CisgICAgICAgICAgICAgICAgICBw cmludF9kZWxpbWl0ZXIgPSB0cnVlOwogICAgICAgICAgICAgICAgIH0KLSAgICAgICAgICAgICAg cHJpbnRfZGVsaW1pdGVyID0gdHJ1ZTsKKwogICAgICAgICAgICAgICBwdXRjaGFyIChjKTsKICAg ICAgICAgICAgIH0KICAgICAgICAgfQpAQCAtNTk3LDYgKzUwOCw4IEBAIGN1dF9maWVsZHMgKEZJ TEUgKnN0cmVhbSkKICAgYm9vbCBmb3VuZF9hbnlfc2VsZWN0ZWRfZmllbGQgPSBmYWxzZTsKICAg Ym9vbCBidWZmZXJfZmlyc3RfZmllbGQ7CiAKKyAgY3VycmVudF9ycCA9IHJwOworCiAgIGMgPSBn ZXRjIChzdHJlYW0pOwogICBpZiAoYyA9PSBFT0YpCiAgICAgcmV0dXJuOwpAQCAtNjA5LDcgKzUy Miw3IEBAIGN1dF9maWVsZHMgKEZJTEUgKnN0cmVhbSkKICAgICAgYW5kIHRoZSBmaXJzdCBmaWVs ZCBoYXMgYmVlbiBzZWxlY3RlZCwgb3IgaWYgbm9uLWRlbGltaXRlZCBsaW5lcwogICAgICBtdXN0 IGJlIHN1cHByZXNzZWQgYW5kIHRoZSBmaXJzdCBmaWVsZCBoYXMgKm5vdCogYmVlbiBzZWxlY3Rl ZC4KICAgICAgVGhhdCBpcyBiZWNhdXNlIGEgbm9uLWRlbGltaXRlZCBsaW5lIGhhcyBleGFjdGx5 IG9uZSBmaWVsZC4gICovCi0gIGJ1ZmZlcl9maXJzdF9maWVsZCA9IChzdXBwcmVzc19ub25fZGVs aW1pdGVkIF4gIXByaW50X2t0aCAoMSwgTlVMTCkpOworICBidWZmZXJfZmlyc3RfZmllbGQgPSAo c3VwcHJlc3Nfbm9uX2RlbGltaXRlZCBeICFwcmludF9rdGggKDEpKTsKIAogICB3aGlsZSAoMSkK ICAgICB7CkBAIC02NTAsNyArNTYzLDcgQEAgY3V0X2ZpZWxkcyAoRklMRSAqc3RyZWFtKQogICAg ICAgICAgICAgICAgIH0KICAgICAgICAgICAgICAgY29udGludWU7CiAgICAgICAgICAgICB9Ci0g ICAgICAgICAgaWYgKHByaW50X2t0aCAoMSwgTlVMTCkpCisgICAgICAgICAgaWYgKHByaW50X2t0 aCAoMSkpCiAgICAgICAgICAgICB7CiAgICAgICAgICAgICAgIC8qIFByaW50IHRoZSBmaWVsZCwg YnV0IG5vdCB0aGUgdHJhaWxpbmcgZGVsaW1pdGVyLiAgKi8KICAgICAgICAgICAgICAgZndyaXRl IChmaWVsZF8xX2J1ZmZlciwgc2l6ZW9mIChjaGFyKSwgbl9ieXRlcyAtIDEsIHN0ZG91dCk7CkBA IC02NjEsNyArNTc0LDcgQEAgY3V0X2ZpZWxkcyAoRklMRSAqc3RyZWFtKQogCiAgICAgICBpZiAo YyAhPSBFT0YpCiAgICAgICAgIHsKLSAgICAgICAgICBpZiAocHJpbnRfa3RoIChmaWVsZF9pZHgs IE5VTEwpKQorICAgICAgICAgIGlmIChwcmludF9rdGggKGZpZWxkX2lkeCkpCiAgICAgICAgICAg ICB7CiAgICAgICAgICAgICAgIGlmIChmb3VuZF9hbnlfc2VsZWN0ZWRfZmllbGQpCiAgICAgICAg ICAgICAgICAgewpAQCAtNjk1LDcgKzYwOCwxMSBAQCBjdXRfZmllbGRzIChGSUxFICpzdHJlYW0p CiAgICAgICAgIH0KIAogICAgICAgaWYgKGMgPT0gZGVsaW0pCi0gICAgICAgICsrZmllbGRfaWR4 OworICAgICAgICB7CisgICAgICAgICAgKytmaWVsZF9pZHg7CisgICAgICAgICAgaWYgKChmaWVs ZF9pZHggPiBjdXJyZW50X3JwLT5oaSkgJiYgKGN1cnJlbnRfcnAgPCBycCArIG5fcnAgLSAxKSkK KyAgICAgICAgICAgICsrY3VycmVudF9ycDsKKyAgICAgICAgfQogICAgICAgZWxzZSBpZiAoYyA9 PSAnXG4nIHx8IGMgPT0gRU9GKQogICAgICAgICB7CiAgICAgICAgICAgaWYgKGZvdW5kX2FueV9z ZWxlY3RlZF9maWVsZApAQCAtNzA0LDYgKzYyMSw3IEBAIGN1dF9maWVsZHMgKEZJTEUgKnN0cmVh bSkKICAgICAgICAgICBpZiAoYyA9PSBFT0YpCiAgICAgICAgICAgICBicmVhazsKICAgICAgICAg ICBmaWVsZF9pZHggPSAxOworICAgICAgICAgIGN1cnJlbnRfcnAgPSBycDsKICAgICAgICAgICBm b3VuZF9hbnlfc2VsZWN0ZWRfZmllbGQgPSBmYWxzZTsKICAgICAgICAgfQogICAgIH0KQEAgLTg1 NCwxNiArNzcyLDYgQEAgbWFpbiAoaW50IGFyZ2MsIGNoYXIgKiphcmd2KQogICAgIEZBVEFMX0VS Uk9SIChfKCJzdXBwcmVzc2luZyBub24tZGVsaW1pdGVkIGxpbmVzIG1ha2VzIHNlbnNlXG5cCiBc dG9ubHkgd2hlbiBvcGVyYXRpbmcgb24gZmllbGRzIikpOwogCi0gIGlmIChvdXRwdXRfZGVsaW1p dGVyX3NwZWNpZmllZCkKLSAgICB7Ci0gICAgICByYW5nZV9zdGFydF9odCA9IGhhc2hfaW5pdGlh bGl6ZSAoSFRfUkFOR0VfU1RBUlRfSU5ERVhfSU5JVElBTF9DQVBBQ0lUWSwKLSAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICBOVUxMLCBoYXNoX2ludCwKLSAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICBoYXNoX2NvbXBhcmVfaW50cywgTlVMTCk7Ci0g ICAgICBpZiAocmFuZ2Vfc3RhcnRfaHQgPT0gTlVMTCkKLSAgICAgICAgeGFsbG9jX2RpZSAoKTsK LQotICAgIH0KLQogICBpZiAoISBzZXRfZmllbGRzIChzcGVjX2xpc3Rfc3RyaW5nKSkKICAgICB7 CiAgICAgICBpZiAob3BlcmF0aW5nX21vZGUgPT0gZmllbGRfbW9kZSkKQEAgLTg5MCw4ICs3OTgs NiBAQCBtYWluIChpbnQgYXJnYywgY2hhciAqKmFyZ3YpCiAgICAgZm9yIChvayA9IHRydWU7IG9w dGluZCA8IGFyZ2M7IG9wdGluZCsrKQogICAgICAgb2sgJj0gY3V0X2ZpbGUgKGFyZ3Zbb3B0aW5k XSk7CiAKLSAgaWYgKHJhbmdlX3N0YXJ0X2h0KQotICAgIGhhc2hfZnJlZSAocmFuZ2Vfc3RhcnRf aHQpOwogCiAgIGlmIChoYXZlX3JlYWRfc3RkaW4gJiYgZmNsb3NlIChzdGRpbikgPT0gRU9GKQog ICAgIHsKZGlmZiAtLWdpdCBhL3Rlc3RzL21pc2MvY3V0LnBsIGIvdGVzdHMvbWlzYy9jdXQucGwK aW5kZXggMGYwYTNhMy4uMTIwODgwYyAxMDA3NTUKLS0tIGEvdGVzdHMvbWlzYy9jdXQucGwKKysr IGIvdGVzdHMvbWlzYy9jdXQucGwKQEAgLTE4Miw2ICsxODIsMTAgQEAgbXkgQFRlc3RzID0KICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAge0lOPT4iMTIzNDU2XG4ifSwg e09VVD0+IjIzNDU2XG4ifV0sCiAgIFsnRU9MLXN1YnN1bWVkLTMnLCAnLS1jb21wbGVtZW50IC1i Myw0LTQsNSwyLScsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHtJ Tj0+IjEyMzQ1NlxuIn0sIHtPVVQ9PiIxXG4ifV0sCisKKyAgWydFT0wtc3Vic3VtZWQtNCcsICct LW91dHB1dC1kPTogLWIxLTIsMi0zLDMtJywKKyAgICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICB7SU49PiIxMjM0XG4ifSwge09VVD0+IjEyMzRcbiJ9XSwKKwogICk7CiAKIGlm ICgkbWJfbG9jYWxlIG5lICdDJykKLS0gCjEuOC4wLjEKCg== --Multipart=_Sun__9_Dec_2012_11_28_05_+0100_qC9IODJNeE9dkjEt-- From debbugs-submit-bounces@debbugs.gnu.org Sun Dec 09 15:45:41 2012 Received: (at 13127) by debbugs.gnu.org; 9 Dec 2012 20:45:41 +0000 Received: from localhost ([127.0.0.1]:34371 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Thnkq-0003wT-3M for submit@debbugs.gnu.org; Sun, 09 Dec 2012 15:45:41 -0500 Received: from mx.meyering.net ([88.168.87.75]:44361) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1Thnkm-0003tx-Is for 13127@debbugs.gnu.org; Sun, 09 Dec 2012 15:45:38 -0500 Received: from rho.meyering.net (rho.meyering.net [127.0.0.1]) by rho.meyering.net (Acme Bit-Twister) with ESMTP id 6949760085; Sun, 9 Dec 2012 21:45:03 +0100 (CET) From: Jim Meyering To: Cojocaru Alexandru Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre In-Reply-To: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> (Cojocaru Alexandru's message of "Sun, 9 Dec 2012 11:28:05 +0100") References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> Date: Sun, 09 Dec 2012 21:45:03 +0100 Message-ID: <87fw3fosz4.fsf@rho.meyering.net> Lines: 48 MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -1.5 (-) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.5 (-) Cojocaru Alexandru wrote: > Subject: [PATCH] cut: use only one data structure > > The current implementation of cut, uses a bit array, > an array of `struct range_pair's, and (when --output-delimiter > is specified) a hash_table. The new implementation will use > only an array of `struct range_pair's. > The old implementation is inefficient for the following reasons: > 1. When -b with a big num is specified, it allocates a lot of useless > memory for `printable_field'. > 2. When --output-delimiter is specified, it will allocate 31 buckets. > Even if a few ranges are specified. ... > -/* Given the list of field or byte range specifications FIELDSTR, set > - MAX_RANGE_ENDPOINT and allocate and initialize the PRINTABLE_FIELD > - array. If there is a right-open-ended range, set EOL_RANGE_START > - to its starting index. FIELDSTR should be composed of one or more > - numbers or ranges of numbers, separated by blanks or commas. > - Incomplete ranges may be given: '-m' means '1-m'; 'n-' means 'n' > - through end of line. Return true if FIELDSTR contains at least > - one field specification, false otherwise. */ > - > -/* FIXME-someday: What if the user wants to cut out the 1,000,000-th > - field of some huge input file? This function shouldn't have to > - allocate a table of a million bits just so we can test every > - field < 10^6 with an array dereference. Instead, consider using > - an adaptive approach: if the range of selected fields is too large, > - but only a few fields/byte-offsets are actually selected, use a > - hash table. If the range of selected fields is too large, and > - too many are selected, then resort to using the range-pairs (the > - 'rp' array) directly. */ Thanks for the patch. This is large enough that you'll have to file a copyright assignment. For details, see the "Copyright assignment" section in the file named HACKING. Have you considered performance in the common case? I suspect that a byte or field number larger than 1000 is not common. That is why, in the FIXME comment above, I suggested to use an adaptive approach. I had the feeling (don't remember if I profiled it) that testing a bit per input field would be more efficient than an in-range test. If you construct test cases and gather timing data, please do so in a reproducible manner and include details when you report back, so we can compare on different types of systems. Cache matters a lot, these days. From debbugs-submit-bounces@debbugs.gnu.org Tue Dec 11 09:26:04 2012 Received: (at 13127) by debbugs.gnu.org; 11 Dec 2012 14:26:04 +0000 Received: from localhost ([127.0.0.1]:36675 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1TiQmX-0008R2-9z for submit@debbugs.gnu.org; Tue, 11 Dec 2012 09:26:03 -0500 Received: from mailout-us.gmx.com ([74.208.5.67]:35851) by debbugs.gnu.org with smtp (Exim 4.72) (envelope-from ) id 1TiQmP-0008Qg-8J for 13127@debbugs.gnu.org; Tue, 11 Dec 2012 09:25:57 -0500 Received: (qmail invoked by alias); 11 Dec 2012 14:24:17 -0000 Received: from unknown (EHLO smag-R59-R60-R61) [151.65.245.223] by mail.gmx.com (mp-us004) with SMTP; 11 Dec 2012 09:24:17 -0500 X-Authenticated: #130707387 X-Provags-ID: V01U2FsdGVkX1+3WuBR5Fm32nYKWkWk4o6no17FCEHgokhS27l/L8 HF6AO/m0zbe+Gq Date: Tue, 11 Dec 2012 15:24:36 +0100 From: Cojocaru Alexandru To: Jim Meyering Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre Message-Id: <20121211152436.ada365e55fa617e0a41255e8@gmx.com> In-Reply-To: <87fw3fosz4.fsf@rho.meyering.net> References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <87fw3fosz4.fsf@rho.meyering.net> X-Mailer: Sylpheed 3.3.0 (GTK+ 2.24.13; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Multipart=_Tue__11_Dec_2012_15_24_36_+0100_86acGuYK1jCMzl/3" X-Y-GMX-Trusted: 0 X-Spam-Score: 0.8 (/) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: xojoc@gmx.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -0.0 (/) This is a multi-part message in MIME format. --Multipart=_Tue__11_Dec_2012_15_24_36_+0100_86acGuYK1jCMzl/3 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Sun, 09 Dec 2012 21:45:03 +0100 Jim Meyering wrote: > Thanks for the patch. > This is large enough that you'll have to file a copyright assignment. > For details, see the "Copyright assignment" section in the file > named HACKING. Fine. > Have you considered performance in the common case? > I suspect that a byte or field number larger than 1000 is > not common. That is why, in the FIXME comment above, > I suggested to use an adaptive approach. I had the feeling > (don't remember if I profiled it) that testing a bit per > input field would be more efficient than an in-range test. Yes, it was the first thing I checked. And there's no performance loss. > If you construct test cases and gather timing data, please do so > in a reproducible manner and include details when you report back, > so we can compare on different types of systems. Here are my benchmarks: OS: Parabola GNU/linux-libre (linux-libre v3.6.8-1) Compiler: GCC 4.7.2 Cflags: -O2 LANG: C CPU: Intel Core Duo (1.86 GHz) (L1 Cache 64KiB) (L2 Cache 2MiB) Main memory: - Bank 0: DIMM DRAM Synchronous (1GiB) (width 64 bits) - Bank 1: DIMM DRAM Synchronous (1GiB) (width 64 bits) NOTE: information gathered with `lshw'. Summary (see the attached file for complete data): ### small ranges cut-pre: 0:01.84 cut-post: 0:01.36 cut-split: 0:01.25 ### bigger ranges cut-pre: 0:11.74 cut-post: 0:09.20 cut-split: 0:07.91 *** ### fields cut-pre: 0:02.90 cut-post: 0:02.68 cut-split: 0:02.85 ### --output-delimiter cut-pre: 0:02.90 cut-post: 0:02.74 cut-split: 0:02.80 NOTES: cut-pre is the current implementation and was compiled from commit ec48beadf. cut-post was compiled after applying the above patch to commit ec48beadf. cut-split was compiled after applying the `split-print_kth' patch to commit ec48beadf. The main advantages cames from splitting `print_kth' into two separate functions, so now `print_kth' does fewer checks. Best regards, Cojocaru Alexandru --Multipart=_Tue__11_Dec_2012_15_24_36_+0100_86acGuYK1jCMzl/3 Content-Type: text/plain; name="full-data.txt" Content-Disposition: attachment; filename="full-data.txt" Content-Transfer-Encoding: 7bit OS: Parabola GNU/linux-libre (linux-libre v3.6.8-1) Compiler: GCC 4.7.2 Cflags: -O2 LANG: C CPU: Intel Core Duo (1.86 GHz) (L1 64KiB) (L2 2MiB) Main memory: - Bank 0: DIMM DRAM Synchronous (1GiB) (width 64 bits) - Bank 1: DIMM DRAM Synchronous (1GiB) (width 64 bits) NOTE: information gathered with `lshw'. bash$ ./cut-pre 2> /dev/null # try not to count caching of shared libraries ### small ranges bash$ for i in `seq 1 1000000`; do echo "abcdfeg" >> big-file; done bash$ for i in 1 2 3; do /usr/bin/time ./cut-pre -b1,3 big-file > /dev/null; echo ; done 1.72user 0.11system 0:01.84elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps 1.75user 0.08system 0:01.84elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps 1.76user 0.07system 0:01.84elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps bash$ for i in 1 2 3; do /usr/bin/time ./cut-post -b1,3 big-file > /dev/null; echo; done 1.23user 0.12system 0:01.36elapsed 99%CPU (0avgtext+0avgdata 560maxresident)k 0inputs+0outputs (0major+165minor)pagefaults 0swaps 1.25user 0.09system 0:01.36elapsed 99%CPU (0avgtext+0avgdata 560maxresident)k 0inputs+0outputs (0major+165minor)pagefaults 0swaps 1.25user 0.09system 0:01.36elapsed 99%CPU (0avgtext+0avgdata 556maxresident)k 0inputs+0outputs (0major+164minor)pagefaults 0swaps bash$ for i in 1 2 3; do /usr/bin/time ./cut-split -b1,3 big-file > /dev/null; echo ; done 1.15user 0.09system 0:01.25elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps 1.15user 0.08system 0:01.25elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps 1.14user 0.10system 0:01.25elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps ### bigger ranges bash$ yes $(for i in $(seq 1 100000); do echo -n a; done) | dd of=big-lines ibs=100001 count=10000 iflag=fullblock bash$ for i in 1 2 3; do /usr/bin/time ./cut-pre -b50-100,101-105,9999 big-lines > /dev/null; echo; done 11.01user 0.70system 0:11.74elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps 11.02user 0.70system 0:11.74elapsed 99%CPU (0avgtext+0avgdata 576maxresident)k 0inputs+0outputs (0major+169minor)pagefaults 0swaps 11.04user 0.66system 0:11.73elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps bash$ for i in 1 2 3; do /usr/bin/time ./cut-post -b50-100,101-105,9999 big-lines > /dev/null; echo; done 8.65user 0.52system 0:09.20elapsed 99%CPU (0avgtext+0avgdata 560maxresident)k 0inputs+0outputs (0major+165minor)pagefaults 0swaps 8.59user 0.58system 0:09.20elapsed 99%CPU (0avgtext+0avgdata 556maxresident)k 0inputs+0outputs (0major+164minor)pagefaults 0swaps 8.53user 0.65system 0:09.21elapsed 99%CPU (0avgtext+0avgdata 560maxresident)k 0inputs+0outputs (0major+165minor)pagefaults 0swaps bash$ for i in 1 2 3; do /usr/bin/time ./cut-split -b50-100,101-105,9999 big-lines > /dev/null; echo; done 7.22user 0.66system 0:07.91elapsed 99%CPU (0avgtext+0avgdata 576maxresident)k 0inputs+0outputs (0major+169minor)pagefaults 0swaps 7.26user 0.61system 0:07.90elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps 7.24user 0.64system 0:07.91elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps ### fields bash$ yes "a:b:c:d:e" | dd of=fields ibs=10 count=1000000 iflag=fullblock bash$ for i in 1 2 3; do /usr/bin/time ./cut-pre -f2,3 -d: fields > /dev/null; echo; done 2.82user 0.06system 0:02.90elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps 2.80user 0.05system 0:02.87elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps 2.79user 0.05system 0:02.85elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps bash$ for i in 1 2 3; do /usr/bin/time ./cut-post -f2,3 -d: fields > /dev/null; echo; done 2.58user 0.09system 0:02.68elapsed 99%CPU (0avgtext+0avgdata 556maxresident)k 0inputs+0outputs (0major+164minor)pagefaults 0swaps 2.63user 0.05system 0:02.69elapsed 99%CPU (0avgtext+0avgdata 560maxresident)k 0inputs+0outputs (0major+165minor)pagefaults 0swaps 2.61user 0.07system 0:02.69elapsed 99%CPU (0avgtext+0avgdata 556maxresident)k 0inputs+0outputs (0major+164minor)pagefaults 0swaps bash$ for i in 1 2 3; do /usr/bin/time ./cut-split -f2,3 -d: fields > /dev/null; echo; done 2.79user 0.05system 0:02.85elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps 2.61user 0.06system 0:02.69elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps 2.63user 0.09system 0:02.73elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps ### --output-delimiter bash$ for i in 1 2 3; do /usr/bin/time ./cut-pre -f2,3 -d: --output-d=' ' fields > /dev/null; echo; done 2.82user 0.06system 0:02.90elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps 2.81user 0.06system 0:02.88elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps 2.80user 0.05system 0:02.86elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps bash$ for i in 1 2 3; do /usr/bin/time ./cut-post -f2,3 -d: --output-d=' ' fields > /dev/null; echo; done 2.67user 0.05system 0:02.74elapsed 99%CPU (0avgtext+0avgdata 556maxresident)k 0inputs+0outputs (0major+164minor)pagefaults 0swaps 2.60user 0.09system 0:02.70elapsed 99%CPU (0avgtext+0avgdata 556maxresident)k 0inputs+0outputs (0major+164minor)pagefaults 0swaps 2.59user 0.08system 0:02.68elapsed 99%CPU (0avgtext+0avgdata 556maxresident)k 0inputs+0outputs (0major+164minor)pagefaults 0swaps bash$ for i in 1 2 3; do /usr/bin/time ./cut-split -f2,3 -d: --output-d=' ' fields > /dev/null; echo; done 2.75user 0.04system 0:02.80elapsed 99%CPU (0avgtext+0avgdata 568maxresident)k 0inputs+0outputs (0major+167minor)pagefaults 0swaps 2.63user 0.05system 0:02.69elapsed 99%CPU (0avgtext+0avgdata 572maxresident)k 0inputs+0outputs (0major+168minor)pagefaults 0swaps 2.61user 0.08system 0:02.70elapsed 99%CPU (0avgtext+0avgdata 576maxresident)k 0inputs+0outputs (0major+169minor)pagefaults 0swaps NOTES: cut-pre is the current implementation and was compiled from commit ec48beadf. cut-post was compiled after applying the above patch to commit ec48beadf. cut-split was compiled after applying the attached patch to commit ec48bead. --Multipart=_Tue__11_Dec_2012_15_24_36_+0100_86acGuYK1jCMzl/3 Content-Type: application/octet-stream; name="split-print_kth" Content-Disposition: attachment; filename="split-print_kth" Content-Transfer-Encoding: base64 RnJvbSA0MDZlOTAxYmY3NmNjMTg3NzA1ZGE1OGM1OGUxYTE1M2NjN2IwZjY5IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBDb2pvY2FydSBBbGV4YW5kcnUgPHhvam9jQGdteC5jb20+CkRh dGU6IE1vbiwgMTAgRGVjIDIwMTIgMTY6NDI6NDcgKzAxMDAKU3ViamVjdDogW1BBVENIXSBjdXQ6 IHNwbGl0IHByaW50X2t0aCB0byBhdm9pZCBleHRyYSBjaGVja3MuCgpzcmMvY3V0LmMgKHByaW50 X2t0aCk6IFNwbGl0IGl0LiBDaGVjayAqb25seSogaWYgYSBnaXZlbgpmaWVsZCBvciBieXRlIGlz IHByaW50YWJsZS4KLS0tCiBzcmMvY3V0LmMgfCAyOSArKysrKysrKysrKystLS0tLS0tLS0tLS0t LS0tLQogMSBmaWxlIGNoYW5nZWQsIDEyIGluc2VydGlvbnMoKyksIDE3IGRlbGV0aW9ucygtKQoK ZGlmZiAtLWdpdCBhL3NyYy9jdXQuYyBiL3NyYy9jdXQuYwppbmRleCBkZTkzMjBjLi5lN2JkM2Jm IDEwMDY0NAotLS0gYS9zcmMvY3V0LmMKKysrIGIvc3JjL2N1dC5jCkBAIC0yODQsMjkgKzI4NCwy NSBAQCBoYXNoX2NvbXBhcmVfaW50cyAodm9pZCBjb25zdCAqeCwgdm9pZCBjb25zdCAqeSkKICAg cmV0dXJuICh4ID09IHkpID8gdHJ1ZSA6IGZhbHNlOwogfQogCisvKiBSZXR1cm4gbm9uemVybyBp ZiB0aGUgSyd0aCBmaWVsZCBvciBieXRlIGlzIHRoZSBiZWdpbm5pbmcKKyAgIG9mIGEgcmFuZ2Uu ICovCisKIHN0YXRpYyBib29sCiBpc19yYW5nZV9zdGFydF9pbmRleCAoc2l6ZV90IGkpCiB7CiAg IHJldHVybiBoYXNoX2xvb2t1cCAocmFuZ2Vfc3RhcnRfaHQsICh2b2lkICopIGkpID8gdHJ1ZSA6 IGZhbHNlOwogfQogCi0vKiBSZXR1cm4gbm9uemVybyBpZiB0aGUgSyd0aCBmaWVsZCBvciBieXRl IGlzIHByaW50YWJsZS4KLSAgIFdoZW4gcmV0dXJuaW5nIG5vbnplcm8sIGlmIFJBTkdFX1NUQVJU IGlzIG5vbi1OVUxMLAotICAgc2V0ICpSQU5HRV9TVEFSVCB0byB0cnVlIGlmIEsgaXMgdGhlIGJl Z2lubmluZyBvZiBhIHJhbmdlLCBhbmQgdG8KLSAgIGZhbHNlIG90aGVyd2lzZS4gICovCisvKiBS ZXR1cm4gbm9uemVybyBpZiB0aGUgSyd0aCBmaWVsZCBvciBieXRlIGlzIHByaW50YWJsZS4gKi8K IAogc3RhdGljIGJvb2wKLXByaW50X2t0aCAoc2l6ZV90IGssIGJvb2wgKnJhbmdlX3N0YXJ0KQor cHJpbnRfa3RoIChzaXplX3QgaykKIHsKICAgYm9vbCBrX3NlbGVjdGVkCiAgICAgPSAoKDAgPCBl b2xfcmFuZ2Vfc3RhcnQgJiYgZW9sX3JhbmdlX3N0YXJ0IDw9IGspCiAgICAgICAgfHwgKGsgPD0g bWF4X3JhbmdlX2VuZHBvaW50ICYmIGlzX3ByaW50YWJsZV9maWVsZCAoaykpKTsKIAotICBib29s IGlzX3NlbGVjdGVkID0ga19zZWxlY3RlZCBeIGNvbXBsZW1lbnQ7Ci0gIGlmIChyYW5nZV9zdGFy dCAmJiBpc19zZWxlY3RlZCkKLSAgICAqcmFuZ2Vfc3RhcnQgPSBpc19yYW5nZV9zdGFydF9pbmRl eCAoayk7Ci0KLSAgcmV0dXJuIGlzX3NlbGVjdGVkOworICByZXR1cm4ga19zZWxlY3RlZCBeIGNv bXBsZW1lbnQ7CiB9CiAKIC8qIENvbXBhcmlzb24gZnVuY3Rpb24gZm9yIHFzb3J0IHRvIG9yZGVy IHRoZSBsaXN0IG9mCkBAIC01NzEsMTEgKzU2NywxMCBAQCBjdXRfYnl0ZXMgKEZJTEUgKnN0cmVh bSkKICAgICAgICAgfQogICAgICAgZWxzZQogICAgICAgICB7Ci0gICAgICAgICAgYm9vbCByYW5n ZV9zdGFydDsKLSAgICAgICAgICBib29sICpycyA9IG91dHB1dF9kZWxpbWl0ZXJfc3BlY2lmaWVk ID8gJnJhbmdlX3N0YXJ0IDogTlVMTDsKLSAgICAgICAgICBpZiAocHJpbnRfa3RoICgrK2J5dGVf aWR4LCBycykpCisgICAgICAgICAgaWYgKHByaW50X2t0aCAoKytieXRlX2lkeCkpCiAgICAgICAg ICAgICB7Ci0gICAgICAgICAgICAgIGlmIChycyAmJiAqcnMgJiYgcHJpbnRfZGVsaW1pdGVyKQor ICAgICAgICAgICAgICBpZiAob3V0cHV0X2RlbGltaXRlcl9zcGVjaWZpZWQgJiYgcHJpbnRfZGVs aW1pdGVyICYmCisgICAgICAgICAgICAgICAgICBpc19yYW5nZV9zdGFydF9pbmRleCAoYnl0ZV9p ZHgpKQogICAgICAgICAgICAgICAgIHsKICAgICAgICAgICAgICAgICAgIGZ3cml0ZSAob3V0cHV0 X2RlbGltaXRlcl9zdHJpbmcsIHNpemVvZiAoY2hhciksCiAgICAgICAgICAgICAgICAgICAgICAg ICAgIG91dHB1dF9kZWxpbWl0ZXJfbGVuZ3RoLCBzdGRvdXQpOwpAQCAtNjA5LDcgKzYwNCw3IEBA IGN1dF9maWVsZHMgKEZJTEUgKnN0cmVhbSkKICAgICAgYW5kIHRoZSBmaXJzdCBmaWVsZCBoYXMg YmVlbiBzZWxlY3RlZCwgb3IgaWYgbm9uLWRlbGltaXRlZCBsaW5lcwogICAgICBtdXN0IGJlIHN1 cHByZXNzZWQgYW5kIHRoZSBmaXJzdCBmaWVsZCBoYXMgKm5vdCogYmVlbiBzZWxlY3RlZC4KICAg ICAgVGhhdCBpcyBiZWNhdXNlIGEgbm9uLWRlbGltaXRlZCBsaW5lIGhhcyBleGFjdGx5IG9uZSBm aWVsZC4gICovCi0gIGJ1ZmZlcl9maXJzdF9maWVsZCA9IChzdXBwcmVzc19ub25fZGVsaW1pdGVk IF4gIXByaW50X2t0aCAoMSwgTlVMTCkpOworICBidWZmZXJfZmlyc3RfZmllbGQgPSAoc3VwcHJl c3Nfbm9uX2RlbGltaXRlZCBeICFwcmludF9rdGggKDEpKTsKIAogICB3aGlsZSAoMSkKICAgICB7 CkBAIC02NTAsNyArNjQ1LDcgQEAgY3V0X2ZpZWxkcyAoRklMRSAqc3RyZWFtKQogICAgICAgICAg ICAgICAgIH0KICAgICAgICAgICAgICAgY29udGludWU7CiAgICAgICAgICAgICB9Ci0gICAgICAg ICAgaWYgKHByaW50X2t0aCAoMSwgTlVMTCkpCisgICAgICAgICAgaWYgKHByaW50X2t0aCAoMSkp CiAgICAgICAgICAgICB7CiAgICAgICAgICAgICAgIC8qIFByaW50IHRoZSBmaWVsZCwgYnV0IG5v dCB0aGUgdHJhaWxpbmcgZGVsaW1pdGVyLiAgKi8KICAgICAgICAgICAgICAgZndyaXRlIChmaWVs ZF8xX2J1ZmZlciwgc2l6ZW9mIChjaGFyKSwgbl9ieXRlcyAtIDEsIHN0ZG91dCk7CkBAIC02NjEs NyArNjU2LDcgQEAgY3V0X2ZpZWxkcyAoRklMRSAqc3RyZWFtKQogCiAgICAgICBpZiAoYyAhPSBF T0YpCiAgICAgICAgIHsKLSAgICAgICAgICBpZiAocHJpbnRfa3RoIChmaWVsZF9pZHgsIE5VTEwp KQorICAgICAgICAgIGlmIChwcmludF9rdGggKGZpZWxkX2lkeCkpCiAgICAgICAgICAgICB7CiAg ICAgICAgICAgICAgIGlmIChmb3VuZF9hbnlfc2VsZWN0ZWRfZmllbGQpCiAgICAgICAgICAgICAg ICAgewotLSAKMS44LjAuMQoK --Multipart=_Tue__11_Dec_2012_15_24_36_+0100_86acGuYK1jCMzl/3-- From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 26 12:07:14 2013 Received: (at 13127) by debbugs.gnu.org; 26 Apr 2013 16:07:14 +0000 Received: from localhost ([127.0.0.1]:45074 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UVlB3-0001tA-CV for submit@debbugs.gnu.org; Fri, 26 Apr 2013 12:07:14 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40534) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UVlAy-0001sj-5J for 13127@debbugs.gnu.org; Fri, 26 Apr 2013 12:07:11 -0400 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r3QG753D010566 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 26 Apr 2013 12:07:05 -0400 Received: from [10.36.116.82] (ovpn-116-82.ams2.redhat.com [10.36.116.82]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r3QG72jc001783 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 26 Apr 2013 12:07:04 -0400 Message-ID: <517AA625.5080802@draigBrady.com> Date: Fri, 26 Apr 2013 17:07:01 +0100 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: xojoc@gmx.com Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> In-Reply-To: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> X-Enigmail-Version: 1.5.1 Content-Type: multipart/mixed; boundary="------------050105080902070905040402" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) This is a multi-part message in MIME format. --------------050105080902070905040402 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id r3QG753D010566 I've rebased this to master and attached. The rebase wasn't trivial so I might have messed up. The cut.pl test is now failing on master. Could you have a look. Also could you add a test (or just a bit of shell) to demonstrate which options the memory is not allocated for example. Ideally some pathological option combo that no longer allocates huge amounts of RAM. thanks, P=E1draig. --------------050105080902070905040402 Content-Type: text/x-patch; name="cut-mem.patch" Content-Disposition: attachment; filename="cut-mem.patch" Content-Transfer-Encoding: 7bit >From 6d3cac6544670fb5dac27be71e2aa3e2eb502989 Mon Sep 17 00:00:00 2001 From: Cojocaru Alexandru Date: Sun, 9 Dec 2012 10:43:10 +0100 Subject: [PATCH] cut: use less memory The current implementation of cut, uses a bit array, an array of `struct range_pair's, and (when --output-delimiter is specified) a hash_table. The new implementation will use only an array of `struct range_pair's. The old implementation is inefficient for the following reasons: 1. When -b with a big num is specified, it allocates a lot of useless memory for `printable_field'. 2. When --output-delimiter is specified, it will allocate 31 buckets. Even if a few ranges are specified. * src/cut.c (set_fields): Set and initialize RP instead of printable_field. * src/cut.c (print_kth): Split it. Check *only* if a given field or byte is printable. * src/cut.c (is_range_start): New function. * tests/misc/cut.pl: Check if `eol_range_start' is set correctly. --- src/cut.c | 317 ++++++++++++++++++---------------------------------- tests/misc/cut.pl | 4 + 2 files changed, 114 insertions(+), 207 deletions(-) diff --git a/src/cut.c b/src/cut.c index 494aad7..b42b405 100644 --- a/src/cut.c +++ b/src/cut.c @@ -53,8 +53,31 @@ } \ while (0) + +struct range_pair + { + size_t lo; + size_t hi; + }; + +/* Array of `struct range_pair' holding all the finite ranges. */ +static struct range_pair *rp; + +/* Pointer inside RP. When checking if a byte or field is selected + by a finite range, we check if it is between CURRENT_RP.LO + and CURRENT_RP.HI. If the byte or field index is greater than + CURRENT_RP.HI then we make CURRENT_RP to point to the next range pair. */ +static struct range_pair *current_rp; + +/* Number of finite ranges specified by the user. */ +static size_t n_rp; + +/* Number of `struct range_pair's allocated. */ +static size_t n_rp_allocated; + + /* Append LOW, HIGH to the list RP of range pairs, allocating additional - space if necessary. Update local variable N_RP. When allocating, + space if necessary. Update global variable N_RP. When allocating, update global variable N_RP_ALLOCATED. */ #define ADD_RANGE_PAIR(rp, low, high) \ @@ -72,11 +95,6 @@ } \ while (0) -struct range_pair - { - size_t lo; - size_t hi; - }; /* This buffer is used to support the semantics of the -s option (or lack of same) when the specified field list includes (does @@ -90,26 +108,11 @@ static char *field_1_buffer; /* The number of bytes allocated for FIELD_1_BUFFER. */ static size_t field_1_bufsize; -/* The largest field or byte index used as an endpoint of a closed - or degenerate range specification; this doesn't include the starting - index of right-open-ended ranges. For example, with either range spec - '2-5,9-', '2-3,5,9-' this variable would be set to 5. */ -static size_t max_range_endpoint; /* If nonzero, this is the index of the first field in a range that goes to end of line. */ static size_t eol_range_start; -/* This is a bit vector. - In byte mode, which bytes to output. - In field mode, which DELIM-separated fields to output. - Both bytes and fields are numbered starting with 1, - so the zeroth bit of this array is unused. - A field or byte K has been selected if - (K <= MAX_RANGE_ENDPOINT and is_printable_field(K)) - || (EOL_RANGE_START > 0 && K >= EOL_RANGE_START). */ -static unsigned char *printable_field; - enum operating_mode { undefined_mode, @@ -148,15 +151,6 @@ static char *output_delimiter_string; /* True if we have ever read standard input. */ static bool have_read_stdin; -#define HT_RANGE_START_INDEX_INITIAL_CAPACITY 31 - -/* The set of range-start indices. For example, given a range-spec list like - '-b1,3-5,4-9,15-', the following indices will be recorded here: 1, 3, 15. - Note that although '4' looks like a range-start index, it is in the middle - of the '3-5' range, so it doesn't count. - This table is created/used IFF output_delimiter_specified is set. */ -static Hash_table *range_start_ht; - /* For long options that have no equivalent short option, use a non-character as a pseudo short option, starting with CHAR_MAX + 1. */ enum @@ -239,73 +233,33 @@ With no FILE, or when FILE is -, read standard input.\n\ exit (status); } -static inline void -mark_range_start (size_t i) -{ - /* Record the fact that 'i' is a range-start index. */ - void *ent_from_table = hash_insert (range_start_ht, (void*) i); - if (ent_from_table == NULL) - { - /* Insertion failed due to lack of memory. */ - xalloc_die (); - } - assert ((size_t) ent_from_table == i); -} - -static inline void -mark_printable_field (size_t i) -{ - size_t n = i / CHAR_BIT; - printable_field[n] |= (1 << (i % CHAR_BIT)); -} - -static inline bool -is_printable_field (size_t i) -{ - size_t n = i / CHAR_BIT; - return (printable_field[n] >> (i % CHAR_BIT)) & 1; -} - -static size_t -hash_int (const void *x, size_t tablesize) -{ -#ifdef UINTPTR_MAX - uintptr_t y = (uintptr_t) x; -#else - size_t y = (size_t) x; -#endif - return y % tablesize; -} +/* Return nonzero if the K'th field or byte is printable. */ static bool -hash_compare_ints (void const *x, void const *y) +print_kth (size_t k) { - return (x == y) ? true : false; -} + bool k_selected = false; + if (0 < eol_range_start && eol_range_start <= k) + k_selected = true; + else if (current_rp->lo <= k && k <= current_rp->hi) + k_selected = true; -static bool -is_range_start_index (size_t i) -{ - return hash_lookup (range_start_ht, (void *) i) ? true : false; + return k_selected ^ complement; } -/* Return nonzero if the K'th field or byte is printable. - When returning nonzero, if RANGE_START is non-NULL, - set *RANGE_START to true if K is the beginning of a range, and to - false otherwise. */ +/* Return nonzero if K'th byte is the beginning of a range. */ -static bool -print_kth (size_t k, bool *range_start) +static inline bool +is_range_start (size_t k) { - bool k_selected - = ((0 < eol_range_start && eol_range_start <= k) - || (k <= max_range_endpoint && is_printable_field (k))); + bool is_start = false; - bool is_selected = k_selected ^ complement; - if (range_start && is_selected) - *range_start = is_range_start_index (k); + if (!complement) + is_start = (k == eol_range_start || k == current_rp->lo); + else + is_start = (k == (current_rp - 1)->hi + 1); - return is_selected; + return is_start; } /* Comparison function for qsort to order the list of @@ -318,24 +272,14 @@ compare_ranges (const void *a, const void *b) return a_start < b_start ? -1 : a_start > b_start; } -/* Given the list of field or byte range specifications FIELDSTR, set - MAX_RANGE_ENDPOINT and allocate and initialize the PRINTABLE_FIELD - array. If there is a right-open-ended range, set EOL_RANGE_START - to its starting index. FIELDSTR should be composed of one or more - numbers or ranges of numbers, separated by blanks or commas. - Incomplete ranges may be given: '-m' means '1-m'; 'n-' means 'n' - through end of line. Return true if FIELDSTR contains at least - one field specification, false otherwise. */ - -/* FIXME-someday: What if the user wants to cut out the 1,000,000-th - field of some huge input file? This function shouldn't have to - allocate a table of a million bits just so we can test every - field < 10^6 with an array dereference. Instead, consider using - an adaptive approach: if the range of selected fields is too large, - but only a few fields/byte-offsets are actually selected, use a - hash table. If the range of selected fields is too large, and - too many are selected, then resort to using the range-pairs (the - 'rp' array) directly. */ +/* Given the list of field or byte range specifications FIELDSTR, + allocate and initialize the RP array. If there is a right-open-ended + range, set EOL_RANGE_START to its starting index. FIELDSTR should + be composed of one or more numbers or ranges of numbers, separated + by blanks or commas. Incomplete ranges may be given: '-m' means '1-m'; + 'n-' means 'n' through end of line. + Return true if FIELDSTR contains at least one field specification, + false otherwise. */ static bool set_fields (const char *fieldstr) @@ -348,9 +292,6 @@ set_fields (const char *fieldstr) bool field_found = false; /* True if at least one field spec has been processed. */ - struct range_pair *rp = NULL; - size_t n_rp = 0; - size_t n_rp_allocated = 0; size_t i; bool in_digits = false; @@ -402,41 +343,10 @@ set_fields (const char *fieldstr) if (value < initial) FATAL_ERROR (_("invalid decreasing range")); - /* Is there already a range going to end of line? */ - if (eol_range_start != 0) - { - /* Yes. Is the new sequence already contained - in the old one? If so, no processing is - necessary. */ - if (initial < eol_range_start) - { - /* No, the new sequence starts before the - old. Does the old range going to end of line - extend into the new range? */ - if (eol_range_start <= value) - { - /* Yes. Simply move the end of line marker. */ - eol_range_start = initial; - } - else - { - /* No. A simple range, before and disjoint from - the range going to end of line. Fill it. */ - ADD_RANGE_PAIR (rp, initial, value); - } - - /* In any case, some fields were selected. */ - field_found = true; - } - } - else - { - /* There is no range going to end of line. */ - ADD_RANGE_PAIR (rp, initial, value); - field_found = true; - } - value = 0; + ADD_RANGE_PAIR (rp, initial, value); + field_found = true; } + value = 0; } else { @@ -447,9 +357,7 @@ set_fields (const char *fieldstr) } if (*fieldstr == '\0') - { - break; - } + break; fieldstr++; lhs_specified = false; @@ -493,49 +401,42 @@ set_fields (const char *fieldstr) FATAL_ERROR (_("invalid byte, character or field list")); } - max_range_endpoint = 0; - for (i = 0; i < n_rp; i++) - { - if (rp[i].hi > max_range_endpoint) - max_range_endpoint = rp[i].hi; - } - - /* Allocate an array large enough so that it may be indexed by - the field numbers corresponding to all finite ranges - (i.e. '2-6' or '-4', but not '5-') in FIELDSTR. */ - - if (max_range_endpoint) - printable_field = xzalloc (max_range_endpoint / CHAR_BIT + 1); - qsort (rp, n_rp, sizeof (rp[0]), compare_ranges); - /* Set the array entries corresponding to integers in the ranges of RP. */ - for (i = 0; i < n_rp; i++) + /* Omit finite ranges subsumed by a to-EOL range. */ + if (eol_range_start && n_rp) { - /* Ignore any range that is subsumed by the to-EOL range. */ - if (eol_range_start && eol_range_start <= rp[i].lo) - continue; - - /* Record the range-start indices, i.e., record each start - index that is not part of any other (lo..hi] range. */ - size_t rsi_candidate = complement ? rp[i].hi + 1 : rp[i].lo; - if (output_delimiter_specified - && !is_printable_field (rsi_candidate)) - mark_range_start (rsi_candidate); - - for (size_t j = rp[i].lo; j <= rp[i].hi; j++) - mark_printable_field (j); + i = n_rp; + while (i && eol_range_start <= rp[i - 1].hi) + { + eol_range_start = MIN (rp[i - 1].lo, eol_range_start); + --n_rp; + --i; + } } - if (output_delimiter_specified - && !complement - && eol_range_start - && max_range_endpoint - && (max_range_endpoint < eol_range_start - || !is_printable_field (eol_range_start))) - mark_range_start (eol_range_start); + /* Merge finite range pairs (e.g. `2-5,3-4' becomes `2-5'). */ + for (i = 0; i < n_rp; ++i) + { + for (size_t j = i + 1; j < n_rp; ++j) + { + if (rp[j].lo <= rp[i].hi) + { + rp[i].hi = MAX (rp[j].hi, rp[i].hi); + memmove (rp + j, rp + j + 1, + (n_rp - j - 1) * sizeof (struct range_pair)); + --n_rp; + } + else + break; + } + } - free (rp); + /* After merging, reallocate RP so we realise memory to the system. + Also add a sentinel at the end of RP, so we never get memory segfault. */ + ++n_rp; + rp = xrealloc (rp, n_rp * sizeof (struct range_pair)); + rp[n_rp - 1].lo = rp[n_rp - 1].hi = 0; return field_found; } @@ -552,7 +453,8 @@ cut_bytes (FILE *stream) byte_idx = 0; print_delimiter = false; - while (1) + current_rp = rp; + while (true) { int c; /* Each character from the file. */ @@ -563,6 +465,7 @@ cut_bytes (FILE *stream) putchar ('\n'); byte_idx = 0; print_delimiter = false; + current_rp = rp; } else if (c == EOF) { @@ -572,16 +475,21 @@ cut_bytes (FILE *stream) } else { - bool range_start; - bool *rs = output_delimiter_specified ? &range_start : NULL; - if (print_kth (++byte_idx, rs)) + ++byte_idx; + if ((current_rp->hi < byte_idx) && (current_rp < rp + n_rp - 1)) + ++current_rp; + if (print_kth (byte_idx)) { - if (rs && *rs && print_delimiter) + if (output_delimiter_specified) { - fwrite (output_delimiter_string, sizeof (char), - output_delimiter_length, stdout); + if (print_delimiter && is_range_start (byte_idx)) + { + fwrite (output_delimiter_string, sizeof (char), + output_delimiter_length, stdout); + } + print_delimiter = true; } - print_delimiter = true; + putchar (c); } } @@ -598,6 +506,8 @@ cut_fields (FILE *stream) bool found_any_selected_field = false; bool buffer_first_field; + current_rp = rp; + c = getc (stream); if (c == EOF) return; @@ -611,7 +521,7 @@ cut_fields (FILE *stream) and the first field has been selected, or if non-delimited lines must be suppressed and the first field has *not* been selected. That is because a non-delimited line has exactly one field. */ - buffer_first_field = (suppress_non_delimited ^ !print_kth (1, NULL)); + buffer_first_field = (suppress_non_delimited ^ !print_kth (1)); while (1) { @@ -657,18 +567,18 @@ cut_fields (FILE *stream) } continue; } - if (print_kth (1, NULL)) + if (print_kth (1)) { /* Print the field, but not the trailing delimiter. */ fwrite (field_1_buffer, sizeof (char), n_bytes - 1, stdout); found_any_selected_field = true; } - ++field_idx; + field_idx++; } int prev_c = c; - if (print_kth (field_idx, NULL)) + if (print_kth (field_idx)) { if (found_any_selected_field) { @@ -702,10 +612,15 @@ cut_fields (FILE *stream) if (c == EOF) break; field_idx = 1; + current_rp = rp; found_any_selected_field = false; } else if (c == delim) - field_idx++; + { + field_idx++; + if ((field_idx > current_rp->hi) && (current_rp < rp + n_rp - 1)) + current_rp++; + } } } @@ -854,16 +769,6 @@ main (int argc, char **argv) FATAL_ERROR (_("suppressing non-delimited lines makes sense\n\ \tonly when operating on fields")); - if (output_delimiter_specified) - { - range_start_ht = hash_initialize (HT_RANGE_START_INDEX_INITIAL_CAPACITY, - NULL, hash_int, - hash_compare_ints, NULL); - if (range_start_ht == NULL) - xalloc_die (); - - } - if (! set_fields (spec_list_string)) { if (operating_mode == field_mode) @@ -890,8 +795,6 @@ main (int argc, char **argv) for (ok = true; optind < argc; optind++) ok &= cut_file (argv[optind]); - if (range_start_ht) - hash_free (range_start_ht); if (have_read_stdin && fclose (stdin) == EOF) { diff --git a/tests/misc/cut.pl b/tests/misc/cut.pl index 41e9e20..1543faf 100755 --- a/tests/misc/cut.pl +++ b/tests/misc/cut.pl @@ -210,6 +210,10 @@ my @Tests = {IN=>"123456\n"}, {OUT=>"23456\n"}], ['EOL-subsumed-3', '--complement -b3,4-4,5,2-', {IN=>"123456\n"}, {OUT=>"1\n"}], + + ['EOL-subsumed-4', '--output-d=: -b1-2,2-3,3-', + {IN=>"1234\n"}, {OUT=>"1234\n"}], + ); if ($mb_locale ne 'C') -- 1.7.7.6 --------------050105080902070905040402-- From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 26 12:11:58 2013 Received: (at 13127) by debbugs.gnu.org; 26 Apr 2013 16:11:59 +0000 Received: from localhost ([127.0.0.1]:45081 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UVlFe-00023K-6L for submit@debbugs.gnu.org; Fri, 26 Apr 2013 12:11:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34855) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UVlFb-000236-5y for 13127@debbugs.gnu.org; Fri, 26 Apr 2013 12:11:56 -0400 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r3QGBrkG010095 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 26 Apr 2013 12:11:53 -0400 Received: from [10.36.116.82] (ovpn-116-82.ams2.redhat.com [10.36.116.82]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r3QGBodq003639 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 26 Apr 2013 12:11:51 -0400 Message-ID: <517AA745.3070506@draigBrady.com> Date: Fri, 26 Apr 2013 17:11:49 +0100 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: xojoc@gmx.com Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <87fw3fosz4.fsf@rho.meyering.net> <20121211152436.ada365e55fa617e0a41255e8@gmx.com> In-Reply-To: <20121211152436.ada365e55fa617e0a41255e8@gmx.com> X-Enigmail-Version: 1.5.1 Content-Type: multipart/mixed; boundary="------------050606050805050106090101" X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) This is a multi-part message in MIME format. --------------050606050805050106090101 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id r3QGBrkG010095 This separate patch to simplify the print_kth() function by removing the comparison from it, is simple and has a significant perf advantage. Tests pass so I'll apply. I'll adjust the commit log to summarise the perf change, but I notice the change isn't as great as yours on my sandybridge i3 syst= em. Benchmark results for both the rebased memory rework and the simple print_kth() optimization attached. thanks! P=E1draig. --------------050606050805050106090101 Content-Type: text/plain; charset=UTF-8; name="split-pb-results" Content-Disposition: attachment; filename="split-pb-results" Content-Transfer-Encoding: 7bit ### small ranges $ yes abcdfeg | head -n1MB > big-file $ for c in orig split mem; do src/cut-$c 2>/dev/null; time src/cut-$c -b1,3 big-file > /dev/null; done real 0m0.083s user 0m0.080s sys 0m0.003s real 0m0.069s user 0m0.061s sys 0m0.007s real 0m0.068s user 0m0.066s sys 0m0.001s ### bigger ranges $ yes $(yes a | head -n100000 | tr -d '\n') | head -n10000 > big-lines $ for c in orig split mem; do src/cut-$c 2>/dev/null; time src/cut-$c -b50-100,101-105,9999 big-lines > /dev/null; done real 0m9.951s user 0m9.065s sys 0m0.810s real 0m8.542s user 0m7.586s sys 0m0.876s real 0m10.149s user 0m8.875s sys 0m1.145s ### fields yes "a:b:c:d:e" | head -n1MB > fields $ for c in orig split mem; do src/cut-$c 2>/dev/null; time src/cut-$c -f2,3 -d: fields > /dev/null; done real 0m0.172s user 0m0.167s sys 0m0.004s real 0m0.149s user 0m0.146s sys 0m0.003s real 0m0.145s user 0m0.141s sys 0m0.004s ### --output-delimiter $ for c in orig split mem; do src/cut-$c 2>/dev/null; time src/cut-$c -f2,3 -d: --output-d=' ' fields > /dev/null; done real 0m0.159s user 0m0.153s sys 0m0.005s real 0m0.144s user 0m0.141s sys 0m0.003s real 0m0.137s user 0m0.133s sys 0m0.004s --------------050606050805050106090101-- From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 26 14:52:45 2013 Received: (at 13127) by debbugs.gnu.org; 26 Apr 2013 18:52:45 +0000 Received: from localhost ([127.0.0.1]:45325 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UVnlF-00042N-BC for submit@debbugs.gnu.org; Fri, 26 Apr 2013 14:52:45 -0400 Received: from mout.gmx.net ([212.227.15.19]:53523) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UVnlB-00042E-MA for 13127@debbugs.gnu.org; Fri, 26 Apr 2013 14:52:43 -0400 Received: from mailout-eu.gmx.com ([10.1.101.213]) by mrigmx.server.lan (mrigmx002) with ESMTP (Nemesis) id 0MdINx-1UDrV70ky3-00IWWq for <13127@debbugs.gnu.org>; Fri, 26 Apr 2013 20:52:38 +0200 Received: (qmail 10494 invoked by uid 0); 26 Apr 2013 18:52:38 -0000 Received: from 151.65.151.251 by rms-eu015 with HTTP Content-Type: text/plain; charset="utf-8" Date: Fri, 26 Apr 2013 20:52:35 +0200 From: "Alexandru Cojocaru" Message-ID: <20130426185235.221370@gmx.com> MIME-Version: 1.0 Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre To: =?iso-8859-1?Q?=22P=E1draig_Brady=22?= X-Flags: 0001 X-Mailer: GMX.com Web Mailer x-registered: 0 Content-Transfer-Encoding: 8bit X-GMX-UID: pKkIcR03eSEqKtoMoXMhLxJ+IGRvb4Ds X-Spam-Score: 0.8 (/) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) Hi, sorry for the delay. From: Pádraig Brady > Sent: 04/26/13 04:07 PM > To: xojoc@gmx.com > Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre > The rebase wasn't trivial so I might have messed up. Hum, I had problems only with `cut.pl'. > The cut.pl test is now failing on master. Could you have a look. I had no problems. Could you show me your output? > Also could you add a test (or just a bit of shell) to demonstrate > which options the memory is not allocated for example. > Ideally some pathological option combo that no longer > allocates huge amounts of RAM. $ echo a | cut -b1-$(echo '2^32-1'|bc) cut: memory exhausted Could you please write the test? It seems that I should use $limits but I don't know how exactly :-). Thanks. > thanks, > Pádraig. Best regards, Cojocaru Alexandru From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 26 15:25:06 2013 Received: (at 13127) by debbugs.gnu.org; 26 Apr 2013 19:25:06 +0000 Received: from localhost ([127.0.0.1]:45363 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UVoGV-00066w-7b for submit@debbugs.gnu.org; Fri, 26 Apr 2013 15:25:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]:20242) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UVoGT-00066X-5S for 13127@debbugs.gnu.org; Fri, 26 Apr 2013 15:25:02 -0400 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r3QJOv8M017097 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 26 Apr 2013 15:24:58 -0400 Received: from [10.36.116.82] (ovpn-116-82.ams2.redhat.com [10.36.116.82]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r3QJOsgp014401 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 26 Apr 2013 15:24:56 -0400 Message-ID: <517AD486.8000306@draigBrady.com> Date: Fri, 26 Apr 2013 20:24:54 +0100 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: Alexandru Cojocaru Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <20130426185235.221370@gmx.com> In-Reply-To: <20130426185235.221370@gmx.com> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=UTF-8 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id r3QJOv8M017097 X-Spam-Score: -6.9 (------) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) On 04/26/2013 07:52 PM, Alexandru Cojocaru wrote: > Hi, > sorry for the delay. >=20 > From: P=C3=A1draig Brady >> Sent: 04/26/13 04:07 PM >> To: xojoc@gmx.com >> Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre >=20 >> The rebase wasn't trivial so I might have messed up. > Hum, I had problems only with `cut.pl'. Did you pull the latest master? The last patch I sent is against that. >> The cut.pl test is now failing on master. Could you have a look. > I had no problems. Could you show me your output? Ah the failures are in tests I added in the meantime: http://git.sv.gnu.org/gitweb/?p=3Dcoreutils.git;a=3Dcommit;h=3D51ce0bf8 Specifically this is now only outputting the first field, rather than both fields like it should: printf '%s\n' a:1 b:2 | src/cut -s -d: -f1,2 >> Also could you add a test (or just a bit of shell) to demonstrate >> which options the memory is not allocated for example. >> Ideally some pathological option combo that no longer >> allocates huge amounts of RAM. > $ echo a | cut -b1-$(echo '2^32-1'|bc) > cut: memory exhausted Ok cool, I was just ensuring I didn't miss anything. > Could you please write the test? It seems that I should use $limits > but I don't know how exactly :-). Thanks. I'll write a test based on: (ulimit -v 20000; : | cut -b1-$((2**32-1))) || echo fail thanks, P=C3=A1draig. From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 26 18:49:45 2013 Received: (at 13127) by debbugs.gnu.org; 26 Apr 2013 22:49:45 +0000 Received: from localhost ([127.0.0.1]:45546 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UVrSa-0005nM-RJ for submit@debbugs.gnu.org; Fri, 26 Apr 2013 18:49:45 -0400 Received: from mout.gmx.net ([212.227.17.20]:60270) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UVrSX-0005nB-HY for 13127@debbugs.gnu.org; Fri, 26 Apr 2013 18:49:42 -0400 Received: from mailout-eu.gmx.com ([10.1.101.213]) by mrigmx.server.lan (mrigmx002) with ESMTP (Nemesis) id 0LzVZk-1UZyFv10So-014n5X for <13127@debbugs.gnu.org>; Sat, 27 Apr 2013 00:49:37 +0200 Received: (qmail 28875 invoked by uid 0); 26 Apr 2013 22:49:37 -0000 Received: from 151.65.151.251 by rms-eu018 with HTTP Content-Type: multipart/mixed; boundary="========GMX221351367016574845150" Date: Sat, 27 Apr 2013 00:49:34 +0200 From: "Alexandru Cojocaru" Message-ID: <20130426224934.221350@gmx.com> MIME-Version: 1.0 Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre To: =?iso-8859-1?Q?=22P=E1draig_Brady=22?= X-Flags: 0001 X-Mailer: GMX.com Web Mailer x-registered: 0 X-GMX-UID: KWEJcRM3eSEqKtoMoXMhxbh+IGRvb4BU X-Spam-Score: -0.5 (/) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) --========GMX221351367016574845150 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit From: Pádraig Brady > Sent: 04/26/13 07:24 PM > To: Alexandru Cojocaru > Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre > > On 04/26/2013 07:52 PM, Alexandru Cojocaru wrote: > > Hi, > > sorry for the delay. > > > > From: Pádraig Brady > >> Sent: 04/26/13 04:07 PM > >> To: xojoc@gmx.com > >> Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre > > > >> The rebase wasn't trivial so I might have messed up. > > Hum, I had problems only with `cut.pl'. > > Did you pull the latest master? > The last patch I sent is against that. Ah, yeah I used your patch. This is why it worked. > >> The cut.pl test is now failing on master. Could you have a look. > > I had no problems. Could you show me your output? > > Ah the failures are in tests I added in the meantime: > http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commit;h=51ce0bf8 > > Specifically this is now only outputting the first field, > rather than both fields like it should: > > printf '%s\n' a:1 b:2 | src/cut -s -d: -f1,2 The problem was caused by `current_rp' which wasn't incremented as needed. See attachment for patch. My tests were succesfull, can you recheck? > >> Also could you add a test (or just a bit of shell) to demonstrate > >> which options the memory is not allocated for example. > >> Ideally some pathological option combo that no longer > >> allocates huge amounts of RAM. > > $ echo a | cut -b1-$(echo '2^32-1'|bc) > > cut: memory exhausted > > Ok cool, I was just ensuring I didn't miss anything. > > > Could you please write the test? It seems that I should use $limits > > but I don't know how exactly :-). Thanks. > > I'll write a test based on: > (ulimit -v 20000; : | cut -b1-$((2**32-1))) || echo fail Thanks for the test case. > thanks, > Pádraig. Best regards, Cojocaru Alexandru --========GMX221351367016574845150 Content-Type: application/octet-stream; charset="utf-8"; name="cut-fix-bug.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="cut-fix-bug.patch" Content-Description: Attachment: cut-fix-bug.patch RnJvbSBlZmE2MWUxZTNjYmFlOGM0YWQ2ZDk0M2M4Nzc0YzQ4N2ZjMmRlYWFjIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBDb2pvY2FydSBBbGV4YW5kcnUgPHhvam9jQGdteC5jb20+CkRh dGU6IEZyaSwgMjYgQXByIDIwMTMgMjI6Mzk6MzYgKzAwMDAKU3ViamVjdDogW1BBVENIXSBjdXQ6 IGBuZXh0X2ZpZWxkJyBuZXcgZnVuY3Rpb24KCkluY3JlbWVudCBib3RoIGBmaWVsZF9pZHgnIGFu ZCBgY3VycmVudF9ycCcsCmNoZWNraW5nIGF0IHRoZSBzYW1lIHRpbWUgZm9yIGFycmF5IGJvdW5k YXJpZXMuCi0tLQogc3JjL2N1dC5jIHwgMTcgKysrKysrKysrKystLS0tLS0KIDEgZmlsZSBjaGFu Z2VkLCAxMSBpbnNlcnRpb25zKCspLCA2IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NyYy9j dXQuYyBiL3NyYy9jdXQuYwppbmRleCBiNDJiNDA1Li43NTM1MTJlIDEwMDY0NAotLS0gYS9zcmMv Y3V0LmMKKysrIGIvc3JjL2N1dC5jCkBAIC00OTYsNiArNDk2LDE1IEBAIGN1dF9ieXRlcyAoRklM RSAqc3RyZWFtKQogICAgIH0KIH0KIAorLyogQWRkIG9uZSB0byBgZmllbGRfaWR4JyBhbmQgaW5j cmVtZW50IGBjdXJyZW50X3JwJy4gKi8KK3N0YXRpYyB2b2lkCituZXh0X2ZpZWxkIChzaXplX3Qg KmZpZWxkX2lkeCkKK3sKKwkrKypmaWVsZF9pZHg7CisgICAgaWYgKCgqZmllbGRfaWR4ID4gY3Vy cmVudF9ycC0+aGkpICYmIChjdXJyZW50X3JwIDwgcnAgKyBuX3JwIC0gMSkpCisgICAgICAgICAg ICArK2N1cnJlbnRfcnA7Cit9CisKIC8qIFJlYWQgZnJvbSBzdHJlYW0gU1RSRUFNLCBwcmludGlu ZyB0byBzdGFuZGFyZCBvdXRwdXQgYW55IHNlbGVjdGVkIGZpZWxkcy4gICovCiAKIHN0YXRpYyB2 b2lkCkBAIC01NzMsNyArNTgyLDcgQEAgY3V0X2ZpZWxkcyAoRklMRSAqc3RyZWFtKQogICAgICAg ICAgICAgICBmd3JpdGUgKGZpZWxkXzFfYnVmZmVyLCBzaXplb2YgKGNoYXIpLCBuX2J5dGVzIC0g MSwgc3Rkb3V0KTsKICAgICAgICAgICAgICAgZm91bmRfYW55X3NlbGVjdGVkX2ZpZWxkID0gdHJ1 ZTsKICAgICAgICAgICAgIH0KLSAgICAgICAgICBmaWVsZF9pZHgrKzsKKyAgICAgICAgICBuZXh0 X2ZpZWxkICgmZmllbGRfaWR4KTsKICAgICAgICAgfQogCiAgICAgICBpbnQgcHJldl9jID0gYzsK QEAgLTYxNiwxMSArNjI1LDcgQEAgY3V0X2ZpZWxkcyAoRklMRSAqc3RyZWFtKQogICAgICAgICAg IGZvdW5kX2FueV9zZWxlY3RlZF9maWVsZCA9IGZhbHNlOwogICAgICAgICB9CiAgICAgICBlbHNl IGlmIChjID09IGRlbGltKQotICAgICAgICB7Ci0gICAgICAgICAgZmllbGRfaWR4Kys7Ci0gICAg ICAgICAgaWYgKChmaWVsZF9pZHggPiBjdXJyZW50X3JwLT5oaSkgJiYgKGN1cnJlbnRfcnAgPCBy cCArIG5fcnAgLSAxKSkKLSAgICAgICAgICAgIGN1cnJlbnRfcnArKzsKLSAgICAgICAgfQorICAg ICAgICBuZXh0X2ZpZWxkICgmZmllbGRfaWR4KTsKICAgICB9CiB9CiAKLS0gCjEuOC4yLjEKCg== --========GMX221351367016574845150-- From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 26 19:46:55 2013 Received: (at 13127-done) by debbugs.gnu.org; 26 Apr 2013 23:46:55 +0000 Received: from localhost ([127.0.0.1]:45591 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UVsLt-0007r9-Ok for submit@debbugs.gnu.org; Fri, 26 Apr 2013 19:46:55 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41028) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UVsLr-0007r0-MJ for 13127-done@debbugs.gnu.org; Fri, 26 Apr 2013 19:46:52 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r3QNkliA016508 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 26 Apr 2013 19:46:47 -0400 Received: from [10.36.116.82] (ovpn-116-82.ams2.redhat.com [10.36.116.82]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r3QNkh5N009804 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 26 Apr 2013 19:46:46 -0400 Message-ID: <517B11E3.8030904@draigBrady.com> Date: Sat, 27 Apr 2013 00:46:43 +0100 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: Alexandru Cojocaru Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <20130426224934.221350@gmx.com> In-Reply-To: <20130426224934.221350@gmx.com> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=UTF-8 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by mx1.redhat.com id r3QNkliA016508 X-Spam-Score: -5.5 (-----) X-Debbugs-Envelope-To: 13127-done Cc: 13127-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) On 04/26/2013 11:49 PM, Alexandru Cojocaru wrote: >>>> The cut.pl test is now failing on master. Could you have a look. >>> I had no problems. Could you show me your output? >> >> Ah the failures are in tests I added in the meantime: >> http://git.sv.gnu.org/gitweb/?p=3Dcoreutils.git;a=3Dcommit;h=3D51ce0bf= 8 >> >> Specifically this is now only outputting the first field, >> rather than both fields like it should: >> >> printf '%s\n' a:1 b:2 | src/cut -s -d: -f1,2 > The problem was caused by `current_rp' which wasn't > incremented as needed. See attachment for patch. > My tests were succesfull, can you recheck? Great tests now pass here. I'll give it a thorough review tomorrow and apply. thanks, P=C3=A1draig. From debbugs-submit-bounces@debbugs.gnu.org Sat Apr 27 21:51:28 2013 Received: (at 13127) by debbugs.gnu.org; 28 Apr 2013 01:51:29 +0000 Received: from localhost ([127.0.0.1]:47603 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWGm0-0000JO-5w for submit@debbugs.gnu.org; Sat, 27 Apr 2013 21:51:28 -0400 Received: from mail2.vodafone.ie ([213.233.128.44]:9964) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWGlw-0000JF-Tf for 13127@debbugs.gnu.org; Sat, 27 Apr 2013 21:51:26 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjkDAIt/fFFtTb4F/2dsb2JhbAANRoZ0uCqCYQMBgRaDEwEBAQQjDwE5CgMQCw0LAgIFFgsCAgkDAgECAUUTAQcBAbRScpBQgSONGCwzBxaCJoETA51UjgY Received: from unknown (HELO [192.168.1.79]) ([109.77.190.5]) by mail2.vodafone.ie with ESMTP; 28 Apr 2013 02:51:14 +0100 Message-ID: <517C8091.7090106@draigBrady.com> Date: Sun, 28 Apr 2013 02:51:13 +0100 From: =?UTF-8?B?UMOhZHJhaWcgQnJhZHk=?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: xojoc@gmx.com Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <87fw3fosz4.fsf@rho.meyering.net> <20121211152436.ada365e55fa617e0a41255e8@gmx.com> <517AA745.3070506@draigBrady.com> In-Reply-To: <517AA745.3070506@draigBrady.com> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On 04/26/2013 05:11 PM, Pádraig Brady wrote: > This separate patch to simplify the print_kth() function > by removing the comparison from it, is simple and > has a significant perf advantage. Tests pass so I'll apply. > > I'll adjust the commit log to summarise the perf change, > but I notice the change isn't as great as yours on my sandybridge i3 system. > Benchmark results for both the rebased memory rework and > the simple print_kth() optimization attached. So looking in detail, this central print_kth function is of most importance to performance. I thought that your simplification of it might allow it to be auto inlined. but I confirmed that gcc 4.6.0 -O2 does not do this at present by doing: objdump -d src/cut.o | grep -q ':' && echo called || echo inlined Marking it as inlined gives another gain as shown below. Testing these combinations, we have: orig = bit array implementation split = ditto + simplified print_kth split-inline = ditto + inlined print_kth mem = no bit array mem-split = ditto + simplified print_kth mem-inline = ditto + inlined print_kth $ yes abcdfeg | head -n1MB > big-file $ for c in orig split split-inline mem mem-split mem-split-inline; do src/cut-$c 2>/dev/null echo -ne "\n== $c ==" time src/cut-$c -b1,3 big-file > /dev/null done == orig == real 0m0.084s user 0m0.078s sys 0m0.006s == split == real 0m0.077s user 0m0.070s sys 0m0.006s == split-inline == real 0m0.055s user 0m0.049s sys 0m0.006s == mem == real 0m0.111s user 0m0.108s sys 0m0.002s == mem-split == real 0m0.088s user 0m0.081s sys 0m0.007s == mem-split-inline == real 0m0.070s user 0m0.060s sys 0m0.009s So in summary, removing the bit array does slow things down, but with the advantage of disassociating mem usage from range width. I'll split the patch into two for the mem change and the cpu change, and might follow up with a subsequent patch to reinstate the bit array for the common case of small -[bcf] and no --output-delim. That's a common trend in these mem adjustment patches. I.E. Find a point to switch from the more CPU efficient method, to one which is more memory efficient. thanks, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Sat Apr 27 22:40:59 2013 Received: (at 13127) by debbugs.gnu.org; 28 Apr 2013 02:40:59 +0000 Received: from localhost ([127.0.0.1]:47647 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWHXe-00027K-67 for submit@debbugs.gnu.org; Sat, 27 Apr 2013 22:40:52 -0400 Received: from mail2.vodafone.ie ([213.233.128.44]:34244) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWHXI-00026U-T0 for 13127@debbugs.gnu.org; Sat, 27 Apr 2013 22:40:24 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AoADACiLfFFda9WB/2dsb2JhbAANRoM9RIhRtS0DAYEWgxMBAQEEJ1IQCw0EAwECAQkWDwkDAgECAT0IEwEFAgEBiAADFqxBkUGMRIEJFlhOEQcJg0YDj2mDZoR1hRCFWIgugWo Received: from unknown (HELO [192.168.1.79]) ([93.107.213.129]) by mail2.vodafone.ie with ESMTP; 28 Apr 2013 03:40:10 +0100 Message-ID: <517C8C09.5050700@draigBrady.com> Date: Sun, 28 Apr 2013 03:40:09 +0100 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: xojoc@gmx.com Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <517AA625.5080802@draigBrady.com> In-Reply-To: <517AA625.5080802@draigBrady.com> X-Enigmail-Version: 1.5.1 Content-Type: multipart/mixed; boundary="------------020607060301050201020508" X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) This is a multi-part message in MIME format. --------------020607060301050201020508 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit On 04/26/2013 05:07 PM, Pádraig Brady wrote: > I've rebased this to master and attached. > The rebase wasn't trivial so I might have messed up. > The cut.pl test is now failing on master. Could you have a look. > Also could you add a test (or just a bit of shell) to demonstrate > which options the memory is not allocated for example. > Ideally some pathological option combo that no longer > allocates huge amounts of RAM. I refactored a little more (see next_item()). Also split to two patches and added some benchmarks. Will push the attached in a few hours. thanks, Pádraig. --------------020607060301050201020508 Content-Type: text/x-patch; name="cut-mem.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="cut-mem.patch" >From d28d239bec5371f88d0779836af2107abd624258 Mon Sep 17 00:00:00 2001 From: Cojocaru Alexandru Date: Sun, 9 Dec 2012 10:43:10 +0100 Subject: [PATCH 1/2] cut: make memory allocation independent of range width The current implementation of cut, uses a bit array, an array of `struct range_pair's, and (when --output-delimiter is specified) a hash_table. The new implementation will use only an array of `struct range_pair's. The old implementation is memory inefficient because: 1. When -b with a big num is specified, it allocates a lot of memory for `printable_field'. 2. When --output-delimiter is specified, it will allocate 31 buckets. Even if a few ranges are specified. Note CPU overhead is increased to determine if an item is to be printed, as shown by: $ yes abcdfeg | head -n1MB > big-file $ for c in with-bitarray without-bitarray; do src/cut-$c 2>/dev/null echo -ne "\n== $c ==" time src/cut-$c -b1,3 big-file > /dev/null done == with-bitarray == real 0m0.084s user 0m0.078s sys 0m0.006s == without-bitarray == real 0m0.111s user 0m0.108s sys 0m0.002s Further patches will reduce this overhead without using a bit array. * src/cut.c (set_fields): Set and initialize RP instead of printable_field. * src/cut.c (is_range_start_index): Use CURRENT_RP rather than a hash. * tests/misc/cut.pl: Check if `eol_range_start' is set correctly. * tests/misc/cut-huge-range.sh: Rename from cut-huge-to-eol-range.sh, and add a test to verify large amounts of mem aren't allocated. Fixes http://bugs.gnu.org/13127 --- src/cut.c | 294 +++++++------------- tests/local.mk | 2 +- ...{cut-huge-to-eol-range.sh => cut-huge-range.sh} | 4 + tests/misc/cut.pl | 4 + 4 files changed, 111 insertions(+), 193 deletions(-) rename tests/misc/{cut-huge-to-eol-range.sh => cut-huge-range.sh} (84%) diff --git a/src/cut.c b/src/cut.c index 494aad7..8738c46 100644 --- a/src/cut.c +++ b/src/cut.c @@ -53,8 +53,31 @@ } \ while (0) + +struct range_pair + { + size_t lo; + size_t hi; + }; + +/* Array of `struct range_pair' holding all the finite ranges. */ +static struct range_pair *rp; + +/* Pointer inside RP. When checking if a byte or field is selected + by a finite range, we check if it is between CURRENT_RP.LO + and CURRENT_RP.HI. If the byte or field index is greater than + CURRENT_RP.HI then we make CURRENT_RP to point to the next range pair. */ +static struct range_pair *current_rp; + +/* Number of finite ranges specified by the user. */ +static size_t n_rp; + +/* Number of `struct range_pair's allocated. */ +static size_t n_rp_allocated; + + /* Append LOW, HIGH to the list RP of range pairs, allocating additional - space if necessary. Update local variable N_RP. When allocating, + space if necessary. Update global variable N_RP. When allocating, update global variable N_RP_ALLOCATED. */ #define ADD_RANGE_PAIR(rp, low, high) \ @@ -72,11 +95,6 @@ } \ while (0) -struct range_pair - { - size_t lo; - size_t hi; - }; /* This buffer is used to support the semantics of the -s option (or lack of same) when the specified field list includes (does @@ -90,26 +108,11 @@ static char *field_1_buffer; /* The number of bytes allocated for FIELD_1_BUFFER. */ static size_t field_1_bufsize; -/* The largest field or byte index used as an endpoint of a closed - or degenerate range specification; this doesn't include the starting - index of right-open-ended ranges. For example, with either range spec - '2-5,9-', '2-3,5,9-' this variable would be set to 5. */ -static size_t max_range_endpoint; /* If nonzero, this is the index of the first field in a range that goes to end of line. */ static size_t eol_range_start; -/* This is a bit vector. - In byte mode, which bytes to output. - In field mode, which DELIM-separated fields to output. - Both bytes and fields are numbered starting with 1, - so the zeroth bit of this array is unused. - A field or byte K has been selected if - (K <= MAX_RANGE_ENDPOINT and is_printable_field(K)) - || (EOL_RANGE_START > 0 && K >= EOL_RANGE_START). */ -static unsigned char *printable_field; - enum operating_mode { undefined_mode, @@ -148,15 +151,6 @@ static char *output_delimiter_string; /* True if we have ever read standard input. */ static bool have_read_stdin; -#define HT_RANGE_START_INDEX_INITIAL_CAPACITY 31 - -/* The set of range-start indices. For example, given a range-spec list like - '-b1,3-5,4-9,15-', the following indices will be recorded here: 1, 3, 15. - Note that although '4' looks like a range-start index, it is in the middle - of the '3-5' range, so it doesn't count. - This table is created/used IFF output_delimiter_specified is set. */ -static Hash_table *range_start_ht; - /* For long options that have no equivalent short option, use a non-character as a pseudo short option, starting with CHAR_MAX + 1. */ enum @@ -239,73 +233,37 @@ With no FILE, or when FILE is -, read standard input.\n\ exit (status); } -static inline void -mark_range_start (size_t i) -{ - /* Record the fact that 'i' is a range-start index. */ - void *ent_from_table = hash_insert (range_start_ht, (void*) i); - if (ent_from_table == NULL) - { - /* Insertion failed due to lack of memory. */ - xalloc_die (); - } - assert ((size_t) ent_from_table == i); -} - -static inline void -mark_printable_field (size_t i) -{ - size_t n = i / CHAR_BIT; - printable_field[n] |= (1 << (i % CHAR_BIT)); -} +/* Return nonzero if K'th byte is the beginning of a range. */ static inline bool -is_printable_field (size_t i) +is_range_start_index (size_t k) { - size_t n = i / CHAR_BIT; - return (printable_field[n] >> (i % CHAR_BIT)) & 1; -} + bool is_start = false; -static size_t -hash_int (const void *x, size_t tablesize) -{ -#ifdef UINTPTR_MAX - uintptr_t y = (uintptr_t) x; -#else - size_t y = (size_t) x; -#endif - return y % tablesize; -} + if (!complement) + is_start = (k == eol_range_start || k == current_rp->lo); + else + is_start = (k == (current_rp - 1)->hi + 1); -static bool -hash_compare_ints (void const *x, void const *y) -{ - return (x == y) ? true : false; + return is_start; } -static bool -is_range_start_index (size_t i) -{ - return hash_lookup (range_start_ht, (void *) i) ? true : false; -} - -/* Return nonzero if the K'th field or byte is printable. - When returning nonzero, if RANGE_START is non-NULL, - set *RANGE_START to true if K is the beginning of a range, and to - false otherwise. */ +/* Return nonzero if the K'th field or byte is printable. */ static bool print_kth (size_t k, bool *range_start) { - bool k_selected - = ((0 < eol_range_start && eol_range_start <= k) - || (k <= max_range_endpoint && is_printable_field (k))); + bool k_selected = false; + if (0 < eol_range_start && eol_range_start <= k) + k_selected = true; + else if (current_rp->lo <= k && k <= current_rp->hi) + k_selected = true; bool is_selected = k_selected ^ complement; if (range_start && is_selected) *range_start = is_range_start_index (k); - return is_selected; + return k_selected ^ complement; } /* Comparison function for qsort to order the list of @@ -318,24 +276,25 @@ compare_ranges (const void *a, const void *b) return a_start < b_start ? -1 : a_start > b_start; } -/* Given the list of field or byte range specifications FIELDSTR, set - MAX_RANGE_ENDPOINT and allocate and initialize the PRINTABLE_FIELD - array. If there is a right-open-ended range, set EOL_RANGE_START - to its starting index. FIELDSTR should be composed of one or more - numbers or ranges of numbers, separated by blanks or commas. - Incomplete ranges may be given: '-m' means '1-m'; 'n-' means 'n' - through end of line. Return true if FIELDSTR contains at least - one field specification, false otherwise. */ - -/* FIXME-someday: What if the user wants to cut out the 1,000,000-th - field of some huge input file? This function shouldn't have to - allocate a table of a million bits just so we can test every - field < 10^6 with an array dereference. Instead, consider using - an adaptive approach: if the range of selected fields is too large, - but only a few fields/byte-offsets are actually selected, use a - hash table. If the range of selected fields is too large, and - too many are selected, then resort to using the range-pairs (the - 'rp' array) directly. */ +/* Increment *ITEM_IDX (i.e. a field or byte index), + and if required CURRENT_RP. */ + +static void +next_item (size_t *item_idx) +{ + (*item_idx)++; + if ((*item_idx > current_rp->hi) && (current_rp < rp + n_rp - 1)) + current_rp++; +} + +/* Given the list of field or byte range specifications FIELDSTR, + allocate and initialize the RP array. If there is a right-open-ended + range, set EOL_RANGE_START to its starting index. FIELDSTR should + be composed of one or more numbers or ranges of numbers, separated + by blanks or commas. Incomplete ranges may be given: '-m' means '1-m'; + 'n-' means 'n' through end of line. + Return true if FIELDSTR contains at least one field specification, + false otherwise. */ static bool set_fields (const char *fieldstr) @@ -348,9 +307,6 @@ set_fields (const char *fieldstr) bool field_found = false; /* True if at least one field spec has been processed. */ - struct range_pair *rp = NULL; - size_t n_rp = 0; - size_t n_rp_allocated = 0; size_t i; bool in_digits = false; @@ -402,41 +358,10 @@ set_fields (const char *fieldstr) if (value < initial) FATAL_ERROR (_("invalid decreasing range")); - /* Is there already a range going to end of line? */ - if (eol_range_start != 0) - { - /* Yes. Is the new sequence already contained - in the old one? If so, no processing is - necessary. */ - if (initial < eol_range_start) - { - /* No, the new sequence starts before the - old. Does the old range going to end of line - extend into the new range? */ - if (eol_range_start <= value) - { - /* Yes. Simply move the end of line marker. */ - eol_range_start = initial; - } - else - { - /* No. A simple range, before and disjoint from - the range going to end of line. Fill it. */ - ADD_RANGE_PAIR (rp, initial, value); - } - - /* In any case, some fields were selected. */ - field_found = true; - } - } - else - { - /* There is no range going to end of line. */ - ADD_RANGE_PAIR (rp, initial, value); - field_found = true; - } - value = 0; + ADD_RANGE_PAIR (rp, initial, value); + field_found = true; } + value = 0; } else { @@ -447,9 +372,7 @@ set_fields (const char *fieldstr) } if (*fieldstr == '\0') - { - break; - } + break; fieldstr++; lhs_specified = false; @@ -493,49 +416,42 @@ set_fields (const char *fieldstr) FATAL_ERROR (_("invalid byte, character or field list")); } - max_range_endpoint = 0; - for (i = 0; i < n_rp; i++) - { - if (rp[i].hi > max_range_endpoint) - max_range_endpoint = rp[i].hi; - } - - /* Allocate an array large enough so that it may be indexed by - the field numbers corresponding to all finite ranges - (i.e. '2-6' or '-4', but not '5-') in FIELDSTR. */ - - if (max_range_endpoint) - printable_field = xzalloc (max_range_endpoint / CHAR_BIT + 1); - qsort (rp, n_rp, sizeof (rp[0]), compare_ranges); - /* Set the array entries corresponding to integers in the ranges of RP. */ - for (i = 0; i < n_rp; i++) + /* Omit finite ranges subsumed by a to-EOL range. */ + if (eol_range_start && n_rp) { - /* Ignore any range that is subsumed by the to-EOL range. */ - if (eol_range_start && eol_range_start <= rp[i].lo) - continue; - - /* Record the range-start indices, i.e., record each start - index that is not part of any other (lo..hi] range. */ - size_t rsi_candidate = complement ? rp[i].hi + 1 : rp[i].lo; - if (output_delimiter_specified - && !is_printable_field (rsi_candidate)) - mark_range_start (rsi_candidate); - - for (size_t j = rp[i].lo; j <= rp[i].hi; j++) - mark_printable_field (j); + i = n_rp; + while (i && eol_range_start <= rp[i - 1].hi) + { + eol_range_start = MIN (rp[i - 1].lo, eol_range_start); + --n_rp; + --i; + } } - if (output_delimiter_specified - && !complement - && eol_range_start - && max_range_endpoint - && (max_range_endpoint < eol_range_start - || !is_printable_field (eol_range_start))) - mark_range_start (eol_range_start); + /* Merge finite range pairs (e.g. `2-5,3-4' becomes `2-5'). */ + for (i = 0; i < n_rp; ++i) + { + for (size_t j = i + 1; j < n_rp; ++j) + { + if (rp[j].lo <= rp[i].hi) + { + rp[i].hi = MAX (rp[j].hi, rp[i].hi); + memmove (rp + j, rp + j + 1, + (n_rp - j - 1) * sizeof (struct range_pair)); + --n_rp; + } + else + break; + } + } - free (rp); + /* After merging, reallocate RP so we release memory to the system. + Also add a sentinel at the end of RP, to avoid out of bounds access. */ + ++n_rp; + rp = xrealloc (rp, n_rp * sizeof (struct range_pair)); + rp[n_rp - 1].lo = rp[n_rp - 1].hi = 0; return field_found; } @@ -552,7 +468,8 @@ cut_bytes (FILE *stream) byte_idx = 0; print_delimiter = false; - while (1) + current_rp = rp; + while (true) { int c; /* Each character from the file. */ @@ -563,6 +480,7 @@ cut_bytes (FILE *stream) putchar ('\n'); byte_idx = 0; print_delimiter = false; + current_rp = rp; } else if (c == EOF) { @@ -572,9 +490,10 @@ cut_bytes (FILE *stream) } else { + next_item (&byte_idx); bool range_start; bool *rs = output_delimiter_specified ? &range_start : NULL; - if (print_kth (++byte_idx, rs)) + if (print_kth (byte_idx, rs)) { if (rs && *rs && print_delimiter) { @@ -598,6 +517,8 @@ cut_fields (FILE *stream) bool found_any_selected_field = false; bool buffer_first_field; + current_rp = rp; + c = getc (stream); if (c == EOF) return; @@ -663,7 +584,7 @@ cut_fields (FILE *stream) fwrite (field_1_buffer, sizeof (char), n_bytes - 1, stdout); found_any_selected_field = true; } - ++field_idx; + next_item (&field_idx); } int prev_c = c; @@ -702,10 +623,11 @@ cut_fields (FILE *stream) if (c == EOF) break; field_idx = 1; + current_rp = rp; found_any_selected_field = false; } else if (c == delim) - field_idx++; + next_item (&field_idx); } } @@ -854,16 +776,6 @@ main (int argc, char **argv) FATAL_ERROR (_("suppressing non-delimited lines makes sense\n\ \tonly when operating on fields")); - if (output_delimiter_specified) - { - range_start_ht = hash_initialize (HT_RANGE_START_INDEX_INITIAL_CAPACITY, - NULL, hash_int, - hash_compare_ints, NULL); - if (range_start_ht == NULL) - xalloc_die (); - - } - if (! set_fields (spec_list_string)) { if (operating_mode == field_mode) @@ -890,8 +802,6 @@ main (int argc, char **argv) for (ok = true; optind < argc; optind++) ok &= cut_file (argv[optind]); - if (range_start_ht) - hash_free (range_start_ht); if (have_read_stdin && fclose (stdin) == EOF) { diff --git a/tests/local.mk b/tests/local.mk index f47da8d..fb5cc63 100644 --- a/tests/local.mk +++ b/tests/local.mk @@ -245,7 +245,7 @@ all_tests = \ tests/misc/pwd-option.sh \ tests/misc/chcon-fail.sh \ tests/misc/cut.pl \ - tests/misc/cut-huge-to-eol-range.sh \ + tests/misc/cut-huge-range.sh \ tests/misc/wc.pl \ tests/misc/wc-files0-from.pl \ tests/misc/wc-files0.sh \ diff --git a/tests/misc/cut-huge-to-eol-range.sh b/tests/misc/cut-huge-range.sh similarity index 84% rename from tests/misc/cut-huge-to-eol-range.sh rename to tests/misc/cut-huge-range.sh index 42cecfd..552ccc8 100755 --- a/tests/misc/cut-huge-to-eol-range.sh +++ b/tests/misc/cut-huge-range.sh @@ -25,6 +25,10 @@ getlimits_ # a 256MiB bit vector. With a 20MB limit on VM, the following would fail. (ulimit -v 20000; : | cut -b$INT_MAX- > err 2>&1) || fail=1 +# Up to and including coreutils-8.21, cut would allocate possibly needed +# memory upfront. Subsequently memory is allocated as required. +(ulimit -v 20000; : | cut -b1-$INT_MAX > err 2>&1) || fail=1 + compare /dev/null err || fail=1 Exit $fail diff --git a/tests/misc/cut.pl b/tests/misc/cut.pl index 41e9e20..1543faf 100755 --- a/tests/misc/cut.pl +++ b/tests/misc/cut.pl @@ -210,6 +210,10 @@ my @Tests = {IN=>"123456\n"}, {OUT=>"23456\n"}], ['EOL-subsumed-3', '--complement -b3,4-4,5,2-', {IN=>"123456\n"}, {OUT=>"1\n"}], + + ['EOL-subsumed-4', '--output-d=: -b1-2,2-3,3-', + {IN=>"1234\n"}, {OUT=>"1234\n"}], + ); if ($mb_locale ne 'C') -- 1.7.7.6 >From 99f7302babe54b49546bb69cc45fe7805505f60e Mon Sep 17 00:00:00 2001 From: Cojocaru Alexandru Date: Sun, 28 Apr 2013 03:03:45 +0100 Subject: [PATCH 2/2] cut: reduce CPU overhead in determining item to output print_kth() is the central function of cut used to determine if an item is to be output or not, so simplify it by moving some logic outside and also inline it. Benchmark results for both aspects of this change are: $ yes abcdfeg | head -n1MB > big-file $ for c in orig split split-inline; do src/cut-$c 2>/dev/null echo -ne "\n== $c ==" time src/cut-$c -b1,3 big-file > /dev/null done == orig == real 0m0.111s user 0m0.108s sys 0m0.002s == split == real 0m0.088s user 0m0.081s sys 0m0.007s == split-inline == real 0m0.070s user 0m0.060s sys 0m0.009s * src/cut.c (print_kth): Refactor a branch to be outside of the function, and also mark the function as inline. --- src/cut.c | 54 ++++++++++++++++++++++++++---------------------------- 1 files changed, 26 insertions(+), 28 deletions(-) diff --git a/src/cut.c b/src/cut.c index 8738c46..853b087 100644 --- a/src/cut.c +++ b/src/cut.c @@ -233,6 +233,20 @@ With no FILE, or when FILE is -, read standard input.\n\ exit (status); } +/* Return nonzero if the K'th field or byte is printable. */ + +static inline bool +print_kth (size_t k) +{ + bool k_selected = false; + if (0 < eol_range_start && eol_range_start <= k) + k_selected = true; + else if (current_rp->lo <= k && k <= current_rp->hi) + k_selected = true; + + return k_selected ^ complement; +} + /* Return nonzero if K'th byte is the beginning of a range. */ static inline bool @@ -248,24 +262,6 @@ is_range_start_index (size_t k) return is_start; } -/* Return nonzero if the K'th field or byte is printable. */ - -static bool -print_kth (size_t k, bool *range_start) -{ - bool k_selected = false; - if (0 < eol_range_start && eol_range_start <= k) - k_selected = true; - else if (current_rp->lo <= k && k <= current_rp->hi) - k_selected = true; - - bool is_selected = k_selected ^ complement; - if (range_start && is_selected) - *range_start = is_range_start_index (k); - - return k_selected ^ complement; -} - /* Comparison function for qsort to order the list of struct range_pairs. */ static int @@ -491,16 +487,18 @@ cut_bytes (FILE *stream) else { next_item (&byte_idx); - bool range_start; - bool *rs = output_delimiter_specified ? &range_start : NULL; - if (print_kth (byte_idx, rs)) + if (print_kth (byte_idx)) { - if (rs && *rs && print_delimiter) + if (output_delimiter_specified) { - fwrite (output_delimiter_string, sizeof (char), - output_delimiter_length, stdout); + if (print_delimiter && is_range_start_index (byte_idx)) + { + fwrite (output_delimiter_string, sizeof (char), + output_delimiter_length, stdout); + } + print_delimiter = true; } - print_delimiter = true; + putchar (c); } } @@ -532,7 +530,7 @@ cut_fields (FILE *stream) and the first field has been selected, or if non-delimited lines must be suppressed and the first field has *not* been selected. That is because a non-delimited line has exactly one field. */ - buffer_first_field = (suppress_non_delimited ^ !print_kth (1, NULL)); + buffer_first_field = (suppress_non_delimited ^ !print_kth (1)); while (1) { @@ -578,7 +576,7 @@ cut_fields (FILE *stream) } continue; } - if (print_kth (1, NULL)) + if (print_kth (1)) { /* Print the field, but not the trailing delimiter. */ fwrite (field_1_buffer, sizeof (char), n_bytes - 1, stdout); @@ -589,7 +587,7 @@ cut_fields (FILE *stream) int prev_c = c; - if (print_kth (field_idx, NULL)) + if (print_kth (field_idx)) { if (found_any_selected_field) { -- 1.7.7.6 --------------020607060301050201020508-- From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 28 07:44:29 2013 Received: (at 13127) by debbugs.gnu.org; 28 Apr 2013 11:44:29 +0000 Received: from localhost ([127.0.0.1]:47924 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWQ1t-00008M-DJ for submit@debbugs.gnu.org; Sun, 28 Apr 2013 07:44:29 -0400 Received: from mout.gmx.net ([212.227.15.15]:50077) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWQ1r-000083-1S for 13127@debbugs.gnu.org; Sun, 28 Apr 2013 07:44:28 -0400 Received: from mailout-eu.gmx.com ([10.1.101.215]) by mrigmx.server.lan (mrigmx001) with ESMTP (Nemesis) id 0MQsuO-1U3MrA0nNO-00UGak for <13127@debbugs.gnu.org>; Sun, 28 Apr 2013 13:44:14 +0200 Received: (qmail invoked by alias); 28 Apr 2013 11:44:13 -0000 Received: from unknown (EHLO COMPUTER-1) [151.65.151.251] by mail.gmx.com (mp-eu015) with SMTP; 28 Apr 2013 13:44:13 +0200 X-Authenticated: #130707387 X-Provags-ID: V01U2FsdGVkX1/BSBl9kN7BPBHyJBl6mDq1wPeuzN6FUUaIfjXEE/ ZbjP2LJAh896RH Date: Sun, 28 Apr 2013 13:44:09 +0200 From: Cojocaru Alexandru To: =?ISO-8859-1?Q?P=E1draig?= Brady Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre Message-Id: <20130428134409.8abec330a4ecba67b78f4bc0@gmx.com> In-Reply-To: <517C8091.7090106@draigBrady.com> References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <87fw3fosz4.fsf@rho.meyering.net> <20121211152436.ada365e55fa617e0a41255e8@gmx.com> <517AA745.3070506@draigBrady.com> <517C8091.7090106@draigBrady.com> X-Mailer: Sylpheed 3.3.0 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Y-GMX-Trusted: 0 X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On Sun, 28 Apr 2013 02:51:13 +0100 P=E1draig Brady wrote: > So looking in detail, this central print_kth function is of most importan= ce to performance. I made the same conclusion as yours, see: http://lists.gnu.org/archive/html/bug-coreutils/2012-12/msg00045.html > I thought that your simplification of it might allow it to be auto inline= d. > but I confirmed that gcc 4.6.0 -O2 does not do this at present by doing: >=20 > objdump -d src/cut.o | grep -q ':' && echo called || echo in= lined With gcc 4.8.0 -O2 both `print_kth' and `is_range_start' are inlined even without the `inline' keyword: nm src/cut | grep print_kth nm src/cut | grep is_range_start both the above comands give me no output. > Marking it as inlined gives another gain as shown below. >=20 > Testing these combinations, we have: > orig =3D bit array implementation > split =3D ditto + simplified print_kth > split-inline =3D ditto + inlined print_kth > mem =3D no bit array > mem-split =3D ditto + simplified print_kth > mem-inline =3D ditto + inlined print_kth >=20 > $ yes abcdfeg | head -n1MB > big-file > $ for c in orig split split-inline mem mem-split mem-split-inline; do > src/cut-$c 2>/dev/null > echo -ne "\n=3D=3D $c =3D=3D" > time src/cut-$c -b1,3 big-file > /dev/null > done >=20 > =3D=3D orig =3D=3D > real 0m0.084s > user 0m0.078s > sys 0m0.006s >=20 > =3D=3D split =3D=3D > real 0m0.077s > user 0m0.070s > sys 0m0.006s >=20 > =3D=3D split-inline =3D=3D > real 0m0.055s > user 0m0.049s > sys 0m0.006s >=20 > =3D=3D mem =3D=3D > real 0m0.111s > user 0m0.108s > sys 0m0.002s >=20 > =3D=3D mem-split =3D=3D > real 0m0.088s > user 0m0.081s > sys 0m0.007s >=20 > =3D=3D mem-split-inline =3D=3D > real 0m0.070s > user 0m0.060s > sys 0m0.009s >=20 > So in summary, removing the bit array does slow things down, I think that the problem lies in `print_kth' again. I've wrongly put an useless branch in it. See the attachment for a patch. Another problem may be the merging and the call to `xrealloc' that we do at the end of `set_fields'. > but with the advantage of disassociating mem usage from range width. > I'll split the patch into two for the mem change and the cpu change, > and might follow up with a subsequent patch to reinstate the bit array > for the common case of small -[bcf] and no --output-delim. My primary goal was to simplify the code. Even if the attached patch dosen't work, I think that detecting small -[bcf] ranges would just make the code more cumbersome. > That's a common trend in these mem adjustment patches. > I.E. Find a point to switch from the more CPU efficient method, > to one which is more memory efficient. >=20 > thanks, > P=E1draig. Please could you re-run the benchmarks after applying the patch? Could you also try with a bigger file (for example 100MB)? So as to make the difference among the various patches more clear. Unfortunately I'm under an emulator and the benchmarks aren't very faithful. Best regards, Cojocaru Alexandru From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 28 10:04:50 2013 Received: (at 13127) by debbugs.gnu.org; 28 Apr 2013 14:04:50 +0000 Received: from localhost ([127.0.0.1]:48586 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWSDh-0006Aa-PF for submit@debbugs.gnu.org; Sun, 28 Apr 2013 10:04:50 -0400 Received: from mail3.vodafone.ie ([213.233.128.45]:17942) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWSDd-0006AG-JV for 13127@debbugs.gnu.org; Sun, 28 Apr 2013 10:04:47 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjkDAIUrfVFtTdmC/2dsb2JhbAANRoM9vkIDAYEUgxMBAQEDAQECLwE5CgMFCwsNAQoJFg8JAwIBAgEWLwYNAQUCAQGIDBKrKJEijjssMweDTwOUXINohRCOBg Received: from unknown (HELO [192.168.1.79]) ([109.77.217.130]) by mail3.vodafone.ie with ESMTP; 28 Apr 2013 15:04:31 +0100 Message-ID: <517D2C6F.8040006@draigBrady.com> Date: Sun, 28 Apr 2013 15:04:31 +0100 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: Cojocaru Alexandru Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <87fw3fosz4.fsf@rho.meyering.net> <20121211152436.ada365e55fa617e0a41255e8@gmx.com> <517AA745.3070506@draigBrady.com> <517C8091.7090106@draigBrady.com> <20130428134409.8abec330a4ecba67b78f4bc0@gmx.com> In-Reply-To: <20130428134409.8abec330a4ecba67b78f4bc0@gmx.com> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On 04/28/2013 12:44 PM, Cojocaru Alexandru wrote: > On Sun, 28 Apr 2013 02:51:13 +0100 > Pádraig Brady wrote: >> So looking in detail, this central print_kth function is of most importance to performance. > I made the same conclusion as yours, see: > http://lists.gnu.org/archive/html/bug-coreutils/2012-12/msg00045.html > >> I thought that your simplification of it might allow it to be auto inlined. >> but I confirmed that gcc 4.6.0 -O2 does not do this at present by doing: >> >> objdump -d src/cut.o | grep -q ':' && echo called || echo inlined > With gcc 4.8.0 -O2 both `print_kth' and `is_range_start' are inlined > even without the `inline' keyword: > nm src/cut | grep print_kth > nm src/cut | grep is_range_start > both the above comands give me no output. Good gcc is getting better. We'll leave the inline for now at least, to aid non bleeding edge gcc. >> Marking it as inlined gives another gain as shown below. >> >> Testing these combinations, we have: >> orig = bit array implementation >> split = ditto + simplified print_kth >> split-inline = ditto + inlined print_kth >> mem = no bit array >> mem-split = ditto + simplified print_kth >> mem-inline = ditto + inlined print_kth >> >> $ yes abcdfeg | head -n1MB > big-file >> $ for c in orig split split-inline mem mem-split mem-split-inline; do >> src/cut-$c 2>/dev/null >> echo -ne "\n== $c ==" >> time src/cut-$c -b1,3 big-file > /dev/null >> done >> >> == orig == >> real 0m0.084s >> user 0m0.078s >> sys 0m0.006s >> >> == split == >> real 0m0.077s >> user 0m0.070s >> sys 0m0.006s >> >> == split-inline == >> real 0m0.055s >> user 0m0.049s >> sys 0m0.006s >> >> == mem == >> real 0m0.111s >> user 0m0.108s >> sys 0m0.002s >> >> == mem-split == >> real 0m0.088s >> user 0m0.081s >> sys 0m0.007s >> >> == mem-split-inline == >> real 0m0.070s >> user 0m0.060s >> sys 0m0.009s >> >> So in summary, removing the bit array does slow things down, > I think that the problem lies in `print_kth' again. I've wrongly put > an useless branch in it. See the attachment for a patch. Did you forget to attach? > Another problem may be the merging and the call to `xrealloc' that > we do at the end of `set_fields'. That's only called at startup so I wouldn't worry too much. What was your specific concern here? >> but with the advantage of disassociating mem usage from range width. >> I'll split the patch into two for the mem change and the cpu change, >> and might follow up with a subsequent patch to reinstate the bit array >> for the common case of small -[bcf] and no --output-delim. > My primary goal was to simplify the code. Even if the attached patch > dosen't work, I think that detecting small -[bcf] ranges would just make > the code more cumbersome. Yes it's a trade off. For often used tools such as coreutils though, it's sometimes worth a little bit extra complexity for performance reasons. Here we might be able to guide the compiler around the branches like: print_kth() { if likely(bitarray_used) ... else ... } Anyway I'll wait for your patch before carefully considering to reinstate the bit array. >> That's a common trend in these mem adjustment patches. >> I.E. Find a point to switch from the more CPU efficient method, >> to one which is more memory efficient. >> >> thanks, >> Pádraig. > > Please could you re-run the benchmarks after applying the patch? > Could you also try with a bigger file (for example 100MB)? So as > to make the difference among the various patches more clear. > Unfortunately I'm under an emulator and the benchmarks aren't very > faithful. Sure. Eagerly waiting the patch :) Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 28 13:21:55 2013 Received: (at submit) by debbugs.gnu.org; 28 Apr 2013 17:21:56 +0000 Received: from localhost ([127.0.0.1]:48753 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWVIQ-0004UZ-Sg for submit@debbugs.gnu.org; Sun, 28 Apr 2013 13:21:55 -0400 Received: from eggs.gnu.org ([208.118.235.92]:52422) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWLyr-0005HH-N7 for submit@debbugs.gnu.org; Sun, 28 Apr 2013 03:25:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UWLyg-0002nr-9o for submit@debbugs.gnu.org; Sun, 28 Apr 2013 03:24:55 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-100.5 required=5.0 tests=BAYES_05, RCVD_IN_DNSWL_NONE, USER_IN_WHITELIST autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([208.118.235.17]:51770) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UWLyg-0002nn-6l for submit@debbugs.gnu.org; Sun, 28 Apr 2013 03:24:54 -0400 Received: from eggs.gnu.org ([208.118.235.92]:35547) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UWLye-0007Q7-7W for bug-coreutils@gnu.org; Sun, 28 Apr 2013 03:24:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UWLyd-0002nJ-Bz for bug-coreutils@gnu.org; Sun, 28 Apr 2013 03:24:52 -0400 Received: from mout.gmx.net ([212.227.15.19]:49535) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UWLyd-0002mz-32 for bug-coreutils@gnu.org; Sun, 28 Apr 2013 03:24:51 -0400 Received: from mailout-de.gmx.net ([10.1.76.34]) by mrigmx.server.lan (mrigmx001) with ESMTP (Nemesis) id 0LygpN-1UaJkd0wXH-0164B8 for ; Sun, 28 Apr 2013 09:24:48 +0200 Received: (qmail invoked by alias); 28 Apr 2013 07:24:48 -0000 Received: from dslb-092-074-108-027.pools.arcor-ip.net (EHLO [127.0.0.1]) [92.74.108.27] by mail.gmx.net (mp034) with SMTP; 28 Apr 2013 09:24:48 +0200 X-Authenticated: #46577751 X-Provags-ID: V01U2FsdGVkX1/Xsycan1XX3/F/ELD1kq0ttvtAQXO3UoLbuOH79i fsESKCW6BzvCyv Message-ID: <517CCEBF.6020608@thogro.org> Date: Sun, 28 Apr 2013 09:24:47 +0200 From: Philipp Thomas User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130328 Thunderbird/17.0.5 MIME-Version: 1.0 To: bug-coreutils@gnu.org Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <517AA625.5080802@draigBrady.com> <517C8C09.5050700@draigBrady.com> In-Reply-To: <517C8C09.5050700@draigBrady.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Y-GMX-Trusted: 0 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.4.x-2.6.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 208.118.235.17 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sun, 28 Apr 2013 13:21:53 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -6.9 (------) Am 28.04.2013 04:40, schrieb Pádraig Brady: > 2. When --output-delimiter is specified, it will allocate 31 buckets. > Even if a few ranges are specified. Shouldn't this be "Even if only a few ranges are specified"? Philipp From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 28 14:14:56 2013 Received: (at 13127) by debbugs.gnu.org; 28 Apr 2013 18:14:56 +0000 Received: from localhost ([127.0.0.1]:48788 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWW7k-0006WH-CS for submit@debbugs.gnu.org; Sun, 28 Apr 2013 14:14:56 -0400 Received: from mout.gmx.net ([212.227.17.22]:65460) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWW7h-0006W7-Kj for 13127@debbugs.gnu.org; Sun, 28 Apr 2013 14:14:55 -0400 Received: from mailout-eu.gmx.com ([10.1.101.214]) by mrigmx.server.lan (mrigmx001) with ESMTP (Nemesis) id 0MSFoz-1U462m1R07-00TRjG for <13127@debbugs.gnu.org>; Sun, 28 Apr 2013 20:14:39 +0200 Received: (qmail invoked by alias); 28 Apr 2013 18:14:38 -0000 Received: from unknown (EHLO COMPUTER-1) [151.65.151.251] by mail.gmx.com (mp-eu014) with SMTP; 28 Apr 2013 20:14:38 +0200 X-Authenticated: #130707387 X-Provags-ID: V01U2FsdGVkX18ylvwpxUwyrmZA7mC7DrL3fbOON97dUq9XS9dvo/ 9wBRwOB2lE8+n3 Date: Sun, 28 Apr 2013 20:14:33 +0200 From: Cojocaru Alexandru To: =?ISO-8859-1?Q?P=E1draig?= Brady Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre Message-Id: <20130428201433.1720fe6d54581458391fc6fe@gmx.com> In-Reply-To: <517D2C6F.8040006@draigBrady.com> References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <87fw3fosz4.fsf@rho.meyering.net> <20121211152436.ada365e55fa617e0a41255e8@gmx.com> <517AA745.3070506@draigBrady.com> <517C8091.7090106@draigBrady.com> <20130428134409.8abec330a4ecba67b78f4bc0@gmx.com> <517D2C6F.8040006@draigBrady.com> X-Mailer: Sylpheed 3.3.0 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Multipart=_Sun__28_Apr_2013_20_14_33_+0200_vyr/j=ue=lUbfb3L" X-Y-GMX-Trusted: 0 X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) This is a multi-part message in MIME format. --Multipart=_Sun__28_Apr_2013_20_14_33_+0200_vyr/j=ue=lUbfb3L Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Sun, 28 Apr 2013 15:04:31 +0100 P=E1draig Brady wrote: > On 04/28/2013 12:44 PM, Cojocaru Alexandru wrote: > > Another problem may be the merging and the call to `xrealloc' that > > we do at the end of `set_fields'. >=20 > That's only called at startup so I wouldn't worry too much. > What was your specific concern here? The file you used with the benchmarks was quite small, so I was worring that both the loop used for the merging and the call to `xrealloc' was affecting too much the benchmarks. > >> but with the advantage of disassociating mem usage from range width. > >> I'll split the patch into two for the mem change and the cpu change, > >> and might follow up with a subsequent patch to reinstate the bit array > >> for the common case of small -[bcf] and no --output-delim. > > My primary goal was to simplify the code. Even if the attached patch > > dosen't work, I think that detecting small -[bcf] ranges would just make > > the code more cumbersome. >=20 > Yes it's a trade off. For often used tools such as coreutils though, > it's sometimes worth a little bit extra complexity for performance reason= s. > Here we might be able to guide the compiler around the branches like: >=20 > print_kth() > { > if likely(bitarray_used) > ... > else > ... > } Ok. > Anyway I'll wait for your patch before carefully considering > to reinstate the bit array. Please, give me some more time. I think that it would be possible to use the sentinel to speed up things a bit. [...] > Sure. Eagerly waiting the patch :) Attached. Best regards, Cojocaru Alexandru --Multipart=_Sun__28_Apr_2013_20_14_33_+0200_vyr/j=ue=lUbfb3L Content-Type: application/octet-stream; name="avoid-branch.patch" Content-Disposition: attachment; filename="avoid-branch.patch" Content-Transfer-Encoding: base64 RnJvbSAyY2ZkOWU4ZjRkZGM2YTkzZWU4ZDAwYzhhOWYyZjQzZjU2YmM5ZTA2IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBDb2pvY2FydSBBbGV4YW5kcnUgPHhvam9jQGdteC5jb20+CkRh dGU6IFN1biwgMjggQXByIDIwMTMgMTE6MjY6MTcgKzAwMDAKU3ViamVjdDogW1BBVENIXSBjdXQ6 IGF2b2lkIGEgYnJhbmNoIGluIGBwcmludF9rdGgnCgpVc2UgfHwgaW5zdGVhZCBvZiBhbiAoaWYs IGVsc2VpZikgcGFpcgotLS0KIHNyYy9jdXQuYyB8IDkgKysrKystLS0tCiAxIGZpbGUgY2hhbmdl ZCwgNSBpbnNlcnRpb25zKCspLCA0IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NyYy9jdXQu YyBiL3NyYy9jdXQuYwppbmRleCA3NTM1MTJlLi5lMDUxOTc2IDEwMDY0NAotLS0gYS9zcmMvY3V0 LmMKKysrIGIvc3JjL2N1dC5jCkBAIC0yMzksMTAgKzIzOSwxMSBAQCBzdGF0aWMgYm9vbAogcHJp bnRfa3RoIChzaXplX3QgaykKIHsKICAgYm9vbCBrX3NlbGVjdGVkID0gZmFsc2U7Ci0gIGlmICgw IDwgZW9sX3JhbmdlX3N0YXJ0ICYmIGVvbF9yYW5nZV9zdGFydCA8PSBrKQotICAgIGtfc2VsZWN0 ZWQgPSB0cnVlOwotICBlbHNlIGlmIChjdXJyZW50X3JwLT5sbyA8PSBrICYmIGsgPD0gY3VycmVu dF9ycC0+aGkpCi0gICAga19zZWxlY3RlZCA9IHRydWU7CisgIGlmICgoMCA8IGVvbF9yYW5nZV9z dGFydCAmJiBlb2xfcmFuZ2Vfc3RhcnQgPD0gaykgfHwKKyAgICAgKGN1cnJlbnRfcnAtPmxvIDw9 IGsgJiYgayA8PSBjdXJyZW50X3JwLT5oaSkpCisgICAgIHsKKyAgICAgICBrX3NlbGVjdGVkID0g dHJ1ZTsKKyAgICAgfQogCiAgIHJldHVybiBrX3NlbGVjdGVkIF4gY29tcGxlbWVudDsKIH0KLS0g CjEuOC4yLjEKCg== --Multipart=_Sun__28_Apr_2013_20_14_33_+0200_vyr/j=ue=lUbfb3L-- From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 28 16:11:36 2013 Received: (at 13127) by debbugs.gnu.org; 28 Apr 2013 20:11:36 +0000 Received: from localhost ([127.0.0.1]:48913 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWXwZ-0002So-EK for submit@debbugs.gnu.org; Sun, 28 Apr 2013 16:11:36 -0400 Received: from mail2.vodafone.ie ([213.233.128.44]:5639) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWXwT-0002SV-IX for 13127@debbugs.gnu.org; Sun, 28 Apr 2013 16:11:30 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AjcDACaBfVFtTsRc/2dsb2JhbAANRsF/AwGBFoMTAQEBBDIBRhALDQEKCRYPCQMCAQIBRQYNAQUCAQGzOpEQjjtfB4NPA51UjgY Received: from unknown (HELO [192.168.1.79]) ([109.78.196.92]) by mail2.vodafone.ie with ESMTP; 28 Apr 2013 21:11:10 +0100 Message-ID: <517D825E.3040308@draigBrady.com> Date: Sun, 28 Apr 2013 21:11:10 +0100 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: Cojocaru Alexandru Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <87fw3fosz4.fsf@rho.meyering.net> <20121211152436.ada365e55fa617e0a41255e8@gmx.com> <517AA745.3070506@draigBrady.com> <517C8091.7090106@draigBrady.com> <20130428134409.8abec330a4ecba67b78f4bc0@gmx.com> <517D2C6F.8040006@draigBrady.com> <20130428201433.1720fe6d54581458391fc6fe@gmx.com> In-Reply-To: <20130428201433.1720fe6d54581458391fc6fe@gmx.com> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) On 04/28/2013 07:14 PM, Cojocaru Alexandru wrote: > On Sun, 28 Apr 2013 15:04:31 +0100 > Pádraig Brady wrote: > >> On 04/28/2013 12:44 PM, Cojocaru Alexandru wrote: >>> Another problem may be the merging and the call to `xrealloc' that >>> we do at the end of `set_fields'. >> >> That's only called at startup so I wouldn't worry too much. >> What was your specific concern here? > The file you used with the benchmarks was quite small, so I was > worring that both the loop used for the merging and the call to > `xrealloc' was affecting too much the benchmarks. Ah right. I've enough to see relative differences quite easily, but will increase further benchmark sizes if needed. >>>> but with the advantage of disassociating mem usage from range width. >>>> I'll split the patch into two for the mem change and the cpu change, >>>> and might follow up with a subsequent patch to reinstate the bit array >>>> for the common case of small -[bcf] and no --output-delim. >>> My primary goal was to simplify the code. Even if the attached patch >>> dosen't work, I think that detecting small -[bcf] ranges would just make >>> the code more cumbersome. >> >> Yes it's a trade off. For often used tools such as coreutils though, >> it's sometimes worth a little bit extra complexity for performance reasons. >> Here we might be able to guide the compiler around the branches like: >> >> print_kth() >> { >> if likely(bitarray_used) >> ... >> else >> ... >> } > Ok. > >> Anyway I'll wait for your patch before carefully considering >> to reinstate the bit array. > Please, give me some more time. I think that it would be possible to > use the sentinel to speed up things a bit. Sure. > [...] >> Sure. Eagerly waiting the patch :) > Attached. That changes the else to an || I thought gcc would optimize that to the same code. While the assembly generated is a little different, the performance of both is essentially the same. thanks, Pádraig. From debbugs-submit-bounces@debbugs.gnu.org Sun Apr 28 19:59:46 2013 Received: (at 13127) by debbugs.gnu.org; 28 Apr 2013 23:59:46 +0000 Received: from localhost ([127.0.0.1]:49074 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWbVP-0003Fe-Ol for submit@debbugs.gnu.org; Sun, 28 Apr 2013 19:59:46 -0400 Received: from mail2.vodafone.ie ([213.233.128.44]:13377) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UWbVJ-0003Eq-Vk for 13127@debbugs.gnu.org; Sun, 28 Apr 2013 19:59:42 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AoMEADy3fVFtTsRc/2dsb2JhbAANRYM9RIhRtS0DAYEVgxMBAQEEGg1SEAsNAQMDAQIKFg8JAwIBAgE9CAYNAQUCAQGIAAMWqyWRI4xEgQkWWE4RBwYDg0YDj2mDZoR1hRCFWIgugWo Received: from unknown (HELO [192.168.1.79]) ([109.78.196.92]) by mail2.vodafone.ie with ESMTP; 29 Apr 2013 00:59:21 +0100 Message-ID: <517DB7D8.8080900@draigBrady.com> Date: Mon, 29 Apr 2013 00:59:20 +0100 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: Cojocaru Alexandru Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <87fw3fosz4.fsf@rho.meyering.net> <20121211152436.ada365e55fa617e0a41255e8@gmx.com> <517AA745.3070506@draigBrady.com> <517C8091.7090106@draigBrady.com> <20130428134409.8abec330a4ecba67b78f4bc0@gmx.com> <517D2C6F.8040006@draigBrady.com> <20130428201433.1720fe6d54581458391fc6fe@gmx.com> <517D825E.3040308@draigBrady.com> In-Reply-To: <517D825E.3040308@draigBrady.com> X-Enigmail-Version: 1.5.1 Content-Type: multipart/mixed; boundary="------------080609000706090507050402" X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) This is a multi-part message in MIME format. --------------080609000706090507050402 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit So I reinstated the bit vector which was a little tricky to do while maintaining performance, but it works very well. So in summary with the attached 3 patch series, the CPU usage of the common cut path is nearly halved, while the max memory that will be allocated for the bit vector is 64KiB. I'll apply this series in the morning. thanks, Pádraig. p.s. I doubt adding a sentinel to the range pair structure would out performance the bit vector approach, given the significant benefit shown in the benchmark in the commit message. --------------080609000706090507050402 Content-Type: text/x-patch; name="cut-mem.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="cut-mem.patch" >From 27a9eb50df1f5a49cb4c373de2b24eaa66cc2a11 Mon Sep 17 00:00:00 2001 From: Cojocaru Alexandru Date: Sun, 9 Dec 2012 10:43:10 +0100 Subject: [PATCH 1/3] cut: make memory allocation independent of range width The current implementation of cut, uses a bit array, an array of `struct range_pair's, and (when --output-delimiter is specified) a hash_table. The new implementation will use only an array of `struct range_pair's. The old implementation is memory inefficient because: 1. When -b with a big num is specified, it allocates a lot of memory for `printable_field'. 2. When --output-delimiter is specified, it will allocate 31 buckets. Even if a few ranges are specified. Note CPU overhead is increased to determine if an item is to be printed, as shown by: $ yes abcdfeg | head -n1MB > big-file $ for c in with-bitarray without-bitarray; do src/cut-$c 2>/dev/null echo -ne "\n== $c ==" time src/cut-$c -b1,3 big-file > /dev/null done == with-bitarray == real 0m0.084s user 0m0.078s sys 0m0.006s == without-bitarray == real 0m0.111s user 0m0.108s sys 0m0.002s Subsequent patches will reduce this overhead. * src/cut.c (set_fields): Set and initialize RP instead of printable_field. * src/cut.c (is_range_start_index): Use CURRENT_RP rather than a hash. * tests/misc/cut.pl: Check if `eol_range_start' is set correctly. * tests/misc/cut-huge-range.sh: Rename from cut-huge-to-eol-range.sh, and add a test to verify large amounts of mem aren't allocated. Fixes http://bugs.gnu.org/13127 --- src/cut.c | 294 +++++++------------- tests/local.mk | 2 +- ...{cut-huge-to-eol-range.sh => cut-huge-range.sh} | 4 + tests/misc/cut.pl | 4 + 4 files changed, 111 insertions(+), 193 deletions(-) rename tests/misc/{cut-huge-to-eol-range.sh => cut-huge-range.sh} (84%) diff --git a/src/cut.c b/src/cut.c index 494aad7..8738c46 100644 --- a/src/cut.c +++ b/src/cut.c @@ -53,8 +53,31 @@ } \ while (0) + +struct range_pair + { + size_t lo; + size_t hi; + }; + +/* Array of `struct range_pair' holding all the finite ranges. */ +static struct range_pair *rp; + +/* Pointer inside RP. When checking if a byte or field is selected + by a finite range, we check if it is between CURRENT_RP.LO + and CURRENT_RP.HI. If the byte or field index is greater than + CURRENT_RP.HI then we make CURRENT_RP to point to the next range pair. */ +static struct range_pair *current_rp; + +/* Number of finite ranges specified by the user. */ +static size_t n_rp; + +/* Number of `struct range_pair's allocated. */ +static size_t n_rp_allocated; + + /* Append LOW, HIGH to the list RP of range pairs, allocating additional - space if necessary. Update local variable N_RP. When allocating, + space if necessary. Update global variable N_RP. When allocating, update global variable N_RP_ALLOCATED. */ #define ADD_RANGE_PAIR(rp, low, high) \ @@ -72,11 +95,6 @@ } \ while (0) -struct range_pair - { - size_t lo; - size_t hi; - }; /* This buffer is used to support the semantics of the -s option (or lack of same) when the specified field list includes (does @@ -90,26 +108,11 @@ static char *field_1_buffer; /* The number of bytes allocated for FIELD_1_BUFFER. */ static size_t field_1_bufsize; -/* The largest field or byte index used as an endpoint of a closed - or degenerate range specification; this doesn't include the starting - index of right-open-ended ranges. For example, with either range spec - '2-5,9-', '2-3,5,9-' this variable would be set to 5. */ -static size_t max_range_endpoint; /* If nonzero, this is the index of the first field in a range that goes to end of line. */ static size_t eol_range_start; -/* This is a bit vector. - In byte mode, which bytes to output. - In field mode, which DELIM-separated fields to output. - Both bytes and fields are numbered starting with 1, - so the zeroth bit of this array is unused. - A field or byte K has been selected if - (K <= MAX_RANGE_ENDPOINT and is_printable_field(K)) - || (EOL_RANGE_START > 0 && K >= EOL_RANGE_START). */ -static unsigned char *printable_field; - enum operating_mode { undefined_mode, @@ -148,15 +151,6 @@ static char *output_delimiter_string; /* True if we have ever read standard input. */ static bool have_read_stdin; -#define HT_RANGE_START_INDEX_INITIAL_CAPACITY 31 - -/* The set of range-start indices. For example, given a range-spec list like - '-b1,3-5,4-9,15-', the following indices will be recorded here: 1, 3, 15. - Note that although '4' looks like a range-start index, it is in the middle - of the '3-5' range, so it doesn't count. - This table is created/used IFF output_delimiter_specified is set. */ -static Hash_table *range_start_ht; - /* For long options that have no equivalent short option, use a non-character as a pseudo short option, starting with CHAR_MAX + 1. */ enum @@ -239,73 +233,37 @@ With no FILE, or when FILE is -, read standard input.\n\ exit (status); } -static inline void -mark_range_start (size_t i) -{ - /* Record the fact that 'i' is a range-start index. */ - void *ent_from_table = hash_insert (range_start_ht, (void*) i); - if (ent_from_table == NULL) - { - /* Insertion failed due to lack of memory. */ - xalloc_die (); - } - assert ((size_t) ent_from_table == i); -} - -static inline void -mark_printable_field (size_t i) -{ - size_t n = i / CHAR_BIT; - printable_field[n] |= (1 << (i % CHAR_BIT)); -} +/* Return nonzero if K'th byte is the beginning of a range. */ static inline bool -is_printable_field (size_t i) +is_range_start_index (size_t k) { - size_t n = i / CHAR_BIT; - return (printable_field[n] >> (i % CHAR_BIT)) & 1; -} + bool is_start = false; -static size_t -hash_int (const void *x, size_t tablesize) -{ -#ifdef UINTPTR_MAX - uintptr_t y = (uintptr_t) x; -#else - size_t y = (size_t) x; -#endif - return y % tablesize; -} + if (!complement) + is_start = (k == eol_range_start || k == current_rp->lo); + else + is_start = (k == (current_rp - 1)->hi + 1); -static bool -hash_compare_ints (void const *x, void const *y) -{ - return (x == y) ? true : false; + return is_start; } -static bool -is_range_start_index (size_t i) -{ - return hash_lookup (range_start_ht, (void *) i) ? true : false; -} - -/* Return nonzero if the K'th field or byte is printable. - When returning nonzero, if RANGE_START is non-NULL, - set *RANGE_START to true if K is the beginning of a range, and to - false otherwise. */ +/* Return nonzero if the K'th field or byte is printable. */ static bool print_kth (size_t k, bool *range_start) { - bool k_selected - = ((0 < eol_range_start && eol_range_start <= k) - || (k <= max_range_endpoint && is_printable_field (k))); + bool k_selected = false; + if (0 < eol_range_start && eol_range_start <= k) + k_selected = true; + else if (current_rp->lo <= k && k <= current_rp->hi) + k_selected = true; bool is_selected = k_selected ^ complement; if (range_start && is_selected) *range_start = is_range_start_index (k); - return is_selected; + return k_selected ^ complement; } /* Comparison function for qsort to order the list of @@ -318,24 +276,25 @@ compare_ranges (const void *a, const void *b) return a_start < b_start ? -1 : a_start > b_start; } -/* Given the list of field or byte range specifications FIELDSTR, set - MAX_RANGE_ENDPOINT and allocate and initialize the PRINTABLE_FIELD - array. If there is a right-open-ended range, set EOL_RANGE_START - to its starting index. FIELDSTR should be composed of one or more - numbers or ranges of numbers, separated by blanks or commas. - Incomplete ranges may be given: '-m' means '1-m'; 'n-' means 'n' - through end of line. Return true if FIELDSTR contains at least - one field specification, false otherwise. */ - -/* FIXME-someday: What if the user wants to cut out the 1,000,000-th - field of some huge input file? This function shouldn't have to - allocate a table of a million bits just so we can test every - field < 10^6 with an array dereference. Instead, consider using - an adaptive approach: if the range of selected fields is too large, - but only a few fields/byte-offsets are actually selected, use a - hash table. If the range of selected fields is too large, and - too many are selected, then resort to using the range-pairs (the - 'rp' array) directly. */ +/* Increment *ITEM_IDX (i.e. a field or byte index), + and if required CURRENT_RP. */ + +static void +next_item (size_t *item_idx) +{ + (*item_idx)++; + if ((*item_idx > current_rp->hi) && (current_rp < rp + n_rp - 1)) + current_rp++; +} + +/* Given the list of field or byte range specifications FIELDSTR, + allocate and initialize the RP array. If there is a right-open-ended + range, set EOL_RANGE_START to its starting index. FIELDSTR should + be composed of one or more numbers or ranges of numbers, separated + by blanks or commas. Incomplete ranges may be given: '-m' means '1-m'; + 'n-' means 'n' through end of line. + Return true if FIELDSTR contains at least one field specification, + false otherwise. */ static bool set_fields (const char *fieldstr) @@ -348,9 +307,6 @@ set_fields (const char *fieldstr) bool field_found = false; /* True if at least one field spec has been processed. */ - struct range_pair *rp = NULL; - size_t n_rp = 0; - size_t n_rp_allocated = 0; size_t i; bool in_digits = false; @@ -402,41 +358,10 @@ set_fields (const char *fieldstr) if (value < initial) FATAL_ERROR (_("invalid decreasing range")); - /* Is there already a range going to end of line? */ - if (eol_range_start != 0) - { - /* Yes. Is the new sequence already contained - in the old one? If so, no processing is - necessary. */ - if (initial < eol_range_start) - { - /* No, the new sequence starts before the - old. Does the old range going to end of line - extend into the new range? */ - if (eol_range_start <= value) - { - /* Yes. Simply move the end of line marker. */ - eol_range_start = initial; - } - else - { - /* No. A simple range, before and disjoint from - the range going to end of line. Fill it. */ - ADD_RANGE_PAIR (rp, initial, value); - } - - /* In any case, some fields were selected. */ - field_found = true; - } - } - else - { - /* There is no range going to end of line. */ - ADD_RANGE_PAIR (rp, initial, value); - field_found = true; - } - value = 0; + ADD_RANGE_PAIR (rp, initial, value); + field_found = true; } + value = 0; } else { @@ -447,9 +372,7 @@ set_fields (const char *fieldstr) } if (*fieldstr == '\0') - { - break; - } + break; fieldstr++; lhs_specified = false; @@ -493,49 +416,42 @@ set_fields (const char *fieldstr) FATAL_ERROR (_("invalid byte, character or field list")); } - max_range_endpoint = 0; - for (i = 0; i < n_rp; i++) - { - if (rp[i].hi > max_range_endpoint) - max_range_endpoint = rp[i].hi; - } - - /* Allocate an array large enough so that it may be indexed by - the field numbers corresponding to all finite ranges - (i.e. '2-6' or '-4', but not '5-') in FIELDSTR. */ - - if (max_range_endpoint) - printable_field = xzalloc (max_range_endpoint / CHAR_BIT + 1); - qsort (rp, n_rp, sizeof (rp[0]), compare_ranges); - /* Set the array entries corresponding to integers in the ranges of RP. */ - for (i = 0; i < n_rp; i++) + /* Omit finite ranges subsumed by a to-EOL range. */ + if (eol_range_start && n_rp) { - /* Ignore any range that is subsumed by the to-EOL range. */ - if (eol_range_start && eol_range_start <= rp[i].lo) - continue; - - /* Record the range-start indices, i.e., record each start - index that is not part of any other (lo..hi] range. */ - size_t rsi_candidate = complement ? rp[i].hi + 1 : rp[i].lo; - if (output_delimiter_specified - && !is_printable_field (rsi_candidate)) - mark_range_start (rsi_candidate); - - for (size_t j = rp[i].lo; j <= rp[i].hi; j++) - mark_printable_field (j); + i = n_rp; + while (i && eol_range_start <= rp[i - 1].hi) + { + eol_range_start = MIN (rp[i - 1].lo, eol_range_start); + --n_rp; + --i; + } } - if (output_delimiter_specified - && !complement - && eol_range_start - && max_range_endpoint - && (max_range_endpoint < eol_range_start - || !is_printable_field (eol_range_start))) - mark_range_start (eol_range_start); + /* Merge finite range pairs (e.g. `2-5,3-4' becomes `2-5'). */ + for (i = 0; i < n_rp; ++i) + { + for (size_t j = i + 1; j < n_rp; ++j) + { + if (rp[j].lo <= rp[i].hi) + { + rp[i].hi = MAX (rp[j].hi, rp[i].hi); + memmove (rp + j, rp + j + 1, + (n_rp - j - 1) * sizeof (struct range_pair)); + --n_rp; + } + else + break; + } + } - free (rp); + /* After merging, reallocate RP so we release memory to the system. + Also add a sentinel at the end of RP, to avoid out of bounds access. */ + ++n_rp; + rp = xrealloc (rp, n_rp * sizeof (struct range_pair)); + rp[n_rp - 1].lo = rp[n_rp - 1].hi = 0; return field_found; } @@ -552,7 +468,8 @@ cut_bytes (FILE *stream) byte_idx = 0; print_delimiter = false; - while (1) + current_rp = rp; + while (true) { int c; /* Each character from the file. */ @@ -563,6 +480,7 @@ cut_bytes (FILE *stream) putchar ('\n'); byte_idx = 0; print_delimiter = false; + current_rp = rp; } else if (c == EOF) { @@ -572,9 +490,10 @@ cut_bytes (FILE *stream) } else { + next_item (&byte_idx); bool range_start; bool *rs = output_delimiter_specified ? &range_start : NULL; - if (print_kth (++byte_idx, rs)) + if (print_kth (byte_idx, rs)) { if (rs && *rs && print_delimiter) { @@ -598,6 +517,8 @@ cut_fields (FILE *stream) bool found_any_selected_field = false; bool buffer_first_field; + current_rp = rp; + c = getc (stream); if (c == EOF) return; @@ -663,7 +584,7 @@ cut_fields (FILE *stream) fwrite (field_1_buffer, sizeof (char), n_bytes - 1, stdout); found_any_selected_field = true; } - ++field_idx; + next_item (&field_idx); } int prev_c = c; @@ -702,10 +623,11 @@ cut_fields (FILE *stream) if (c == EOF) break; field_idx = 1; + current_rp = rp; found_any_selected_field = false; } else if (c == delim) - field_idx++; + next_item (&field_idx); } } @@ -854,16 +776,6 @@ main (int argc, char **argv) FATAL_ERROR (_("suppressing non-delimited lines makes sense\n\ \tonly when operating on fields")); - if (output_delimiter_specified) - { - range_start_ht = hash_initialize (HT_RANGE_START_INDEX_INITIAL_CAPACITY, - NULL, hash_int, - hash_compare_ints, NULL); - if (range_start_ht == NULL) - xalloc_die (); - - } - if (! set_fields (spec_list_string)) { if (operating_mode == field_mode) @@ -890,8 +802,6 @@ main (int argc, char **argv) for (ok = true; optind < argc; optind++) ok &= cut_file (argv[optind]); - if (range_start_ht) - hash_free (range_start_ht); if (have_read_stdin && fclose (stdin) == EOF) { diff --git a/tests/local.mk b/tests/local.mk index f47da8d..fb5cc63 100644 --- a/tests/local.mk +++ b/tests/local.mk @@ -245,7 +245,7 @@ all_tests = \ tests/misc/pwd-option.sh \ tests/misc/chcon-fail.sh \ tests/misc/cut.pl \ - tests/misc/cut-huge-to-eol-range.sh \ + tests/misc/cut-huge-range.sh \ tests/misc/wc.pl \ tests/misc/wc-files0-from.pl \ tests/misc/wc-files0.sh \ diff --git a/tests/misc/cut-huge-to-eol-range.sh b/tests/misc/cut-huge-range.sh similarity index 84% rename from tests/misc/cut-huge-to-eol-range.sh rename to tests/misc/cut-huge-range.sh index e6abe6e..8783e96 100755 --- a/tests/misc/cut-huge-to-eol-range.sh +++ b/tests/misc/cut-huge-range.sh @@ -25,6 +25,10 @@ getlimits_ # a 256MiB bit vector. With a 20MB limit on VM, the following would fail. (ulimit -v 20000; : | cut -b$INT_MAX- > err 2>&1) || fail=1 +# Up to and including coreutils-8.21, cut would allocate possibly needed +# memory upfront. Subsequently memory is allocated as required. +(ulimit -v 20000; : | cut -b1-$INT_MAX > err 2>&1) || fail=1 + compare /dev/null err || fail=1 Exit $fail diff --git a/tests/misc/cut.pl b/tests/misc/cut.pl index 41e9e20..1543faf 100755 --- a/tests/misc/cut.pl +++ b/tests/misc/cut.pl @@ -210,6 +210,10 @@ my @Tests = {IN=>"123456\n"}, {OUT=>"23456\n"}], ['EOL-subsumed-3', '--complement -b3,4-4,5,2-', {IN=>"123456\n"}, {OUT=>"1\n"}], + + ['EOL-subsumed-4', '--output-d=: -b1-2,2-3,3-', + {IN=>"1234\n"}, {OUT=>"1234\n"}], + ); if ($mb_locale ne 'C') -- 1.7.7.6 >From f3c065f16c3fbd9595eb667d35e070b0950f084e Mon Sep 17 00:00:00 2001 From: Cojocaru Alexandru Date: Sun, 28 Apr 2013 03:03:45 +0100 Subject: [PATCH 2/3] cut: reduce CPU overhead in determining item to output print_kth() is the central function of cut used to determine if an item is to be output or not, so simplify it by moving some logic outside. Benchmark results for this change are: $ yes abcdfeg | head -n1MB > big-file $ for c in orig split; do src/cut-$c 2>/dev/null echo -ne "\n== $c ==" time src/cut-$c -b1,3 big-file > /dev/null done == orig == real 0m0.111s user 0m0.108s sys 0m0.002s == split == real 0m0.088s user 0m0.081s sys 0m0.007s * src/cut.c (print_kth): Refactor a branch to outside the function. Related to http://bugs.gnu.org/13127 --- src/cut.c | 54 ++++++++++++++++++++--------------------- tests/misc/cut-huge-range.sh | 2 +- 2 files changed, 27 insertions(+), 29 deletions(-) diff --git a/src/cut.c b/src/cut.c index 8738c46..37615a3 100644 --- a/src/cut.c +++ b/src/cut.c @@ -233,6 +233,20 @@ With no FILE, or when FILE is -, read standard input.\n\ exit (status); } +/* Return nonzero if the K'th field or byte is printable. */ + +static bool +print_kth (size_t k) +{ + bool k_selected = false; + if (0 < eol_range_start && eol_range_start <= k) + k_selected = true; + else if (current_rp->lo <= k && k <= current_rp->hi) + k_selected = true; + + return k_selected ^ complement; +} + /* Return nonzero if K'th byte is the beginning of a range. */ static inline bool @@ -248,24 +262,6 @@ is_range_start_index (size_t k) return is_start; } -/* Return nonzero if the K'th field or byte is printable. */ - -static bool -print_kth (size_t k, bool *range_start) -{ - bool k_selected = false; - if (0 < eol_range_start && eol_range_start <= k) - k_selected = true; - else if (current_rp->lo <= k && k <= current_rp->hi) - k_selected = true; - - bool is_selected = k_selected ^ complement; - if (range_start && is_selected) - *range_start = is_range_start_index (k); - - return k_selected ^ complement; -} - /* Comparison function for qsort to order the list of struct range_pairs. */ static int @@ -491,16 +487,18 @@ cut_bytes (FILE *stream) else { next_item (&byte_idx); - bool range_start; - bool *rs = output_delimiter_specified ? &range_start : NULL; - if (print_kth (byte_idx, rs)) + if (print_kth (byte_idx)) { - if (rs && *rs && print_delimiter) + if (output_delimiter_specified) { - fwrite (output_delimiter_string, sizeof (char), - output_delimiter_length, stdout); + if (print_delimiter && is_range_start_index (byte_idx)) + { + fwrite (output_delimiter_string, sizeof (char), + output_delimiter_length, stdout); + } + print_delimiter = true; } - print_delimiter = true; + putchar (c); } } @@ -532,7 +530,7 @@ cut_fields (FILE *stream) and the first field has been selected, or if non-delimited lines must be suppressed and the first field has *not* been selected. That is because a non-delimited line has exactly one field. */ - buffer_first_field = (suppress_non_delimited ^ !print_kth (1, NULL)); + buffer_first_field = (suppress_non_delimited ^ !print_kth (1)); while (1) { @@ -578,7 +576,7 @@ cut_fields (FILE *stream) } continue; } - if (print_kth (1, NULL)) + if (print_kth (1)) { /* Print the field, but not the trailing delimiter. */ fwrite (field_1_buffer, sizeof (char), n_bytes - 1, stdout); @@ -589,7 +587,7 @@ cut_fields (FILE *stream) int prev_c = c; - if (print_kth (field_idx, NULL)) + if (print_kth (field_idx)) { if (found_any_selected_field) { diff --git a/tests/misc/cut-huge-range.sh b/tests/misc/cut-huge-range.sh index 8783e96..887197a 100755 --- a/tests/misc/cut-huge-range.sh +++ b/tests/misc/cut-huge-range.sh @@ -1,5 +1,5 @@ #!/bin/sh -# Ensure that cut does not allocate mem for a range like -b9999999999999- +# Ensure that cut does not allocate mem for large ranges # Copyright (C) 2012-2013 Free Software Foundation, Inc. -- 1.7.7.6 >From 2343cee3cd73d9b1654a651baa11d249cde04560 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= Date: Sun, 28 Apr 2013 23:27:12 +0100 Subject: [PATCH 3/3] cut: reduce CPU usage for the the common case Ensure appropriate functions are inlined. This was seen to be required with gcc 4.6.0 with -O2 on x86_64 at least. It was reported that gcc 4.8.0 did inline these functions though. Also reinstate the bit vector for the common case, to further improve performance. Benchmark results for both aspects of this change are: $ yes abcdfeg | head -n1MB > big-file $ for c in orig inline inline-array; do src/cut-$c 2>/dev/null echo -ne "\n== $c ==" time src/cut-$c -b1,3 big-file > /dev/null done == orig == real 0m0.088s user 0m0.081s sys 0m0.007s == inline == real 0m0.070s user 0m0.060s sys 0m0.009s == inline-array == real 0m0.049s user 0m0.044s sys 0m0.005s * src/cut.c (set_fields): Set up the printable_field bit vector for performance, but only when it's appropriate. I.E. not when either --output-delimeter or huge ranges are specified. (next_item): Ensure it's inlined and avoid unnecessary processing. (print_kth): Ensure it's inlined and add a branch for the fast path. Related to http://bugs.gnu.org/13127 --- src/cut.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 74 insertions(+), 6 deletions(-) diff --git a/src/cut.c b/src/cut.c index 37615a3..b347b30 100644 --- a/src/cut.c +++ b/src/cut.c @@ -108,11 +108,31 @@ static char *field_1_buffer; /* The number of bytes allocated for FIELD_1_BUFFER. */ static size_t field_1_bufsize; +/* The largest field or byte index used as an endpoint of a closed + or degenerate range specification; this doesn't include the starting + index of right-open-ended ranges. For example, with either range spec + '2-5,9-', '2-3,5,9-' this variable would be set to 5. */ +static size_t max_range_endpoint; /* If nonzero, this is the index of the first field in a range that goes to end of line. */ static size_t eol_range_start; +/* This is a bit vector. + In byte mode, which bytes to output. + In field mode, which DELIM-separated fields to output. + Both bytes and fields are numbered starting with 1, + so the zeroth bit of this array is unused. + A field or byte K has been selected if + (K <= MAX_RANGE_ENDPOINT && is_printable_field (K)) + || (EOL_RANGE_START > 0 && K >= EOL_RANGE_START). */ +static unsigned char *printable_field; + +/* The maximum size the printable_field array to allocate. + For ranges requiring more than this, we revert to the slightly + slower mechanism of inspecting the current range pair limits. */ +enum { PRINTABLE_ARRAY_MAX = 65536 }; + enum operating_mode { undefined_mode, @@ -233,14 +253,35 @@ With no FILE, or when FILE is -, read standard input.\n\ exit (status); } -/* Return nonzero if the K'th field or byte is printable. */ +static inline void +mark_printable_field (size_t i) +{ + size_t n = i / CHAR_BIT; + printable_field[n] |= (1 << (i % CHAR_BIT)); +} -static bool +static inline bool +is_printable_field (size_t i) +{ + size_t n = i / CHAR_BIT; + return (printable_field[n] >> (i % CHAR_BIT)) & 1; +} + +/* Return nonzero if the K'th field or byte is printable. + Note this is a "hot" function. Please profile when changing. */ + +static inline bool print_kth (size_t k) { bool k_selected = false; + if (0 < eol_range_start && eol_range_start <= k) k_selected = true; + else if (printable_field) /* faster path for smaller ranges. */ + { + if (k <= max_range_endpoint && is_printable_field (k)) + k_selected = true; + } else if (current_rp->lo <= k && k <= current_rp->hi) k_selected = true; @@ -275,12 +316,14 @@ compare_ranges (const void *a, const void *b) /* Increment *ITEM_IDX (i.e. a field or byte index), and if required CURRENT_RP. */ -static void +static inline void next_item (size_t *item_idx) { (*item_idx)++; - if ((*item_idx > current_rp->hi) && (current_rp < rp + n_rp - 1)) - current_rp++; + /* avoid extra processing associated with current_rp unless needed. */ + if (!printable_field) + if ((*item_idx > current_rp->hi) && (current_rp < rp + n_rp - 1)) + current_rp++; } /* Given the list of field or byte range specifications FIELDSTR, @@ -412,6 +455,24 @@ set_fields (const char *fieldstr) FATAL_ERROR (_("invalid byte, character or field list")); } + max_range_endpoint = 0; + for (i = 0; i < n_rp; i++) + { + if (rp[i].hi > max_range_endpoint) + max_range_endpoint = rp[i].hi; + } + + /* For performance, allocate an array large enough so that it may be + indexed by the field numbers corresponding to all finite ranges + (i.e. '2-6' or '-4', but not '5-') in FIELDSTR. + Note this enhancement is not possible with very large ranges, + or when --output-delimiter is specified. */ + + if (!output_delimiter_specified + && max_range_endpoint + && max_range_endpoint / CHAR_BIT < PRINTABLE_ARRAY_MAX) + printable_field = xzalloc (max_range_endpoint / CHAR_BIT + 1); + qsort (rp, n_rp, sizeof (rp[0]), compare_ranges); /* Omit finite ranges subsumed by a to-EOL range. */ @@ -426,7 +487,8 @@ set_fields (const char *fieldstr) } } - /* Merge finite range pairs (e.g. `2-5,3-4' becomes `2-5'). */ + /* Merge finite range pairs (e.g. `2-5,3-4' becomes `2-5'). + Also for small enough ranges, mark items as printable. */ for (i = 0; i < n_rp; ++i) { for (size_t j = i + 1; j < n_rp; ++j) @@ -441,6 +503,12 @@ set_fields (const char *fieldstr) else break; } + + if (printable_field) + { + for (size_t k = rp[i].lo; k <= rp[i].hi; k++) + mark_printable_field (k); + } } /* After merging, reallocate RP so we release memory to the system. -- 1.7.7.6 --------------080609000706090507050402-- From debbugs-submit-bounces@debbugs.gnu.org Mon May 06 14:55:09 2013 Received: (at 13127) by debbugs.gnu.org; 6 May 2013 18:55:09 +0000 Received: from localhost ([127.0.0.1]:60148 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UZQZ3-00055s-Dt for submit@debbugs.gnu.org; Mon, 06 May 2013 14:55:09 -0400 Received: from mout.gmx.net ([212.227.15.19]:59102) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UZQZ1-00055e-1w for 13127@debbugs.gnu.org; Mon, 06 May 2013 14:55:08 -0400 Received: from COMPUTER-1 ([151.65.151.251]) by mail.gmx.com (mrgmx103) with ESMTPSA (Nemesis) id 0M6zvN-1UKcvI0eoP-00woq6; Mon, 06 May 2013 20:54:06 +0200 Date: Mon, 6 May 2013 20:54:01 +0200 From: Cojocaru Alexandru To: =?ISO-8859-1?Q?P=E1draig?= Brady Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre Message-Id: <20130506205401.dbffe145eef3268152b9a4a5@gmx.com> In-Reply-To: <517DB7D8.8080900@draigBrady.com> References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <87fw3fosz4.fsf@rho.meyering.net> <20121211152436.ada365e55fa617e0a41255e8@gmx.com> <517AA745.3070506@draigBrady.com> <517C8091.7090106@draigBrady.com> <20130428134409.8abec330a4ecba67b78f4bc0@gmx.com> <517D2C6F.8040006@draigBrady.com> <20130428201433.1720fe6d54581458391fc6fe@gmx.com> <517D825E.3040308@draigBrady.com> <517DB7D8.8080900@draigBrady.com> X-Mailer: Sylpheed 3.3.0 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="Multipart=_Mon__6_May_2013_20_54_01_+0200_61H6/SFr=vfjuiZQ" X-Provags-ID: V03:K0:yBBU6F93msU/qNlFSfQHyKTWTQf555ah6h+jss6pqaxiNMcmgO9 w1vl/uDnRvHFCSqZMmRU8kmF/2BcEB7TYbkrRPwk7f5q7VwpgMUDdwBDL0eFkY/JmR/ZJEV hNcx9EynqdE187FY3/QMCEEHO3vQoWILQNZO9YJ+ENI3dHZknsH1xIinKbsfVLBUAOidEov GgK6WIgI+R1VCtlZxi/rw== X-Spam-Score: 0.8 (/) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) This is a multi-part message in MIME format. --Multipart=_Mon__6_May_2013_20_54_01_+0200_61H6/SFr=vfjuiZQ Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Mon, 29 Apr 2013 00:59:20 +0100 P=E1draig Brady wrote: > So I reinstated the bit vector which was a little tricky > to do while maintaining performance, but it works very well. I think it works because we are avoiding a memory access inside `next_item' this way. With this patch I try to keep the CPU benefits for `--output-d' and when large ranges are specified, even without the bitarray. Because of the sentinel now the max line len supported will be `(size_t)-1 - 1' and no more `(size_t)-1'. Is this an issue? PS: This patch also fix a little bug inside `set_fields'. Best regards, Cojocaru Alexandru --Multipart=_Mon__6_May_2013_20_54_01_+0200_61H6/SFr=vfjuiZQ Content-Type: application/octet-stream; name="cut.patch" Content-Disposition: attachment; filename="cut.patch" Content-Transfer-Encoding: base64 RnJvbSA2ZGRlZTlkMWVmMjdlNDM1NDZiMjgyNDdiMjVmOTAxN2U2ZWRlNjk1IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBDb2pvY2FydSBBbGV4YW5kcnUgPHhvam9jQGdteC5jb20+CkRh dGU6IFNhdCwgNCBNYXkgMjAxMyAyMTo1MTo1OSArMDAwMApTdWJqZWN0OiBbUEFUQ0hdIGN1dDog a2VlcCBiZW5lZml0cyBvZiBiaXRhcnJheSBldmVuIHdpdGggYC0tb3V0cHV0LWRlbGltaXRlcicK CmVvbF9yYW5nZV9zdGFydDogcmVtb3ZlZCwgYG4tJyBpcyBubyBtb3JlIHRyZWF0ZWQgc3BlY2lh bGx5CmNvbXBsZW1lbnRfcnA6IHVzZWQgdG8gY29tcGxlbWVudCBgcnAnIHdoZW4gYC0tY29tcGxl bWVudCcgaXMgc3BlY2lmaWVkCkFERF9SQU5HRV9QQUlSOiBtYWNybyByZW5hbWVkIHRvIGBhZGRf cmFuZ2VfcGFpcicKYWRkX3JhbmdlX3BhaXI6IGZ1bmN0aW9uIHJlbmFtZWQgZnJvbSBgQUREX1JB TkdFX1BBSVInCnNldF9maWVsZHM6IGZpeCBtZXJnaW5nLCBgaicgd2Fzbid0IGRlY3JlbWVudGVk IGFzIG5lZWRlZAotLS0KIHNyYy9jdXQuYyB8IDI0MiArKysrKysrKysrKysrKysrKysrKysrLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQogMSBmaWxlIGNoYW5nZWQsIDg1 IGluc2VydGlvbnMoKyksIDE1NyBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9zcmMvY3V0LmMg Yi9zcmMvY3V0LmMKaW5kZXggYjM0N2IzMC4uY2U1Yjc0MiAxMDA2NDQKLS0tIGEvc3JjL2N1dC5j CisrKyBiL3NyYy9jdXQuYwpAQCAtODAsMjEgKzgwLDE1IEBAIHN0YXRpYyBzaXplX3Qgbl9ycF9h bGxvY2F0ZWQ7CiAgICBzcGFjZSBpZiBuZWNlc3NhcnkuICBVcGRhdGUgZ2xvYmFsIHZhcmlhYmxl IE5fUlAuICBXaGVuIGFsbG9jYXRpbmcsCiAgICB1cGRhdGUgZ2xvYmFsIHZhcmlhYmxlIE5fUlBf QUxMT0NBVEVELiAgKi8KIAotI2RlZmluZSBBRERfUkFOR0VfUEFJUihycCwgbG93LCBoaWdoKQkJ CVwKLSAgZG8JCQkJCQkJXAotICAgIHsJCQkJCQkJXAotICAgICAgaWYgKGxvdyA9PSAwIHx8IGhp Z2ggPT0gMCkJCQlcCi0gICAgICAgIEZBVEFMX0VSUk9SIChfKCJmaWVsZHMgYW5kIHBvc2l0aW9u cyBhcmUgbnVtYmVyZWQgZnJvbSAxIikpOyBcCi0gICAgICBpZiAobl9ycCA+PSBuX3JwX2FsbG9j YXRlZCkJCQlcCi0gICAgICAgIHsJCQkJCQlcCi0gICAgICAgICAgKHJwKSA9IFgyTlJFQUxMT0Mg KHJwLCAmbl9ycF9hbGxvY2F0ZWQpOwlcCi0gICAgICAgIH0JCQkJCQlcCi0gICAgICBycFtuX3Jw XS5sbyA9IChsb3cpOwkJCQlcCi0gICAgICBycFtuX3JwXS5oaSA9IChoaWdoKTsJCQkJXAotICAg ICAgKytuX3JwOwkJCQkJCVwKLSAgICB9CQkJCQkJCVwKLSAgd2hpbGUgKDApCi0KK3N0YXRpYyB2 b2lkCithZGRfcmFuZ2VfcGFpciAoc2l6ZV90IGxvLCBzaXplX3QgaGkpCit7CisgIGlmIChuX3Jw ID09IG5fcnBfYWxsb2NhdGVkKQorICAgIHJwID0gWDJOUkVBTExPQyAocnAsICZuX3JwX2FsbG9j YXRlZCk7CisgIHJwW25fcnBdLmxvID0gbG87CisgIHJwW25fcnBdLmhpID0gaGk7CisgICsrbl9y cDsKK30KIAogLyogVGhpcyBidWZmZXIgaXMgdXNlZCB0byBzdXBwb3J0IHRoZSBzZW1hbnRpY3Mg b2YgdGhlIC1zIG9wdGlvbgogICAgKG9yIGxhY2sgb2Ygc2FtZSkgd2hlbiB0aGUgc3BlY2lmaWVk IGZpZWxkIGxpc3QgaW5jbHVkZXMgKGRvZXMKQEAgLTEwOCwzMSArMTAyLDYgQEAgc3RhdGljIGNo YXIgKmZpZWxkXzFfYnVmZmVyOwogLyogVGhlIG51bWJlciBvZiBieXRlcyBhbGxvY2F0ZWQgZm9y IEZJRUxEXzFfQlVGRkVSLiAgKi8KIHN0YXRpYyBzaXplX3QgZmllbGRfMV9idWZzaXplOwogCi0v KiBUaGUgbGFyZ2VzdCBmaWVsZCBvciBieXRlIGluZGV4IHVzZWQgYXMgYW4gZW5kcG9pbnQgb2Yg YSBjbG9zZWQKLSAgIG9yIGRlZ2VuZXJhdGUgcmFuZ2Ugc3BlY2lmaWNhdGlvbjsgIHRoaXMgZG9l c24ndCBpbmNsdWRlIHRoZSBzdGFydGluZwotICAgaW5kZXggb2YgcmlnaHQtb3Blbi1lbmRlZCBy YW5nZXMuICBGb3IgZXhhbXBsZSwgd2l0aCBlaXRoZXIgcmFuZ2Ugc3BlYwotICAgJzItNSw5LScs ICcyLTMsNSw5LScgdGhpcyB2YXJpYWJsZSB3b3VsZCBiZSBzZXQgdG8gNS4gICovCi1zdGF0aWMg c2l6ZV90IG1heF9yYW5nZV9lbmRwb2ludDsKLQotLyogSWYgbm9uemVybywgdGhpcyBpcyB0aGUg aW5kZXggb2YgdGhlIGZpcnN0IGZpZWxkIGluIGEgcmFuZ2UgdGhhdCBnb2VzCi0gICB0byBlbmQg b2YgbGluZS4gKi8KLXN0YXRpYyBzaXplX3QgZW9sX3JhbmdlX3N0YXJ0OwotCi0vKiBUaGlzIGlz IGEgYml0IHZlY3Rvci4KLSAgIEluIGJ5dGUgbW9kZSwgd2hpY2ggYnl0ZXMgdG8gb3V0cHV0Lgot ICAgSW4gZmllbGQgbW9kZSwgd2hpY2ggREVMSU0tc2VwYXJhdGVkIGZpZWxkcyB0byBvdXRwdXQu Ci0gICBCb3RoIGJ5dGVzIGFuZCBmaWVsZHMgYXJlIG51bWJlcmVkIHN0YXJ0aW5nIHdpdGggMSwK LSAgIHNvIHRoZSB6ZXJvdGggYml0IG9mIHRoaXMgYXJyYXkgaXMgdW51c2VkLgotICAgQSBmaWVs ZCBvciBieXRlIEsgaGFzIGJlZW4gc2VsZWN0ZWQgaWYKLSAgIChLIDw9IE1BWF9SQU5HRV9FTkRQ T0lOVCAmJiBpc19wcmludGFibGVfZmllbGQgKEspKQotICAgIHx8IChFT0xfUkFOR0VfU1RBUlQg PiAwICYmIEsgPj0gRU9MX1JBTkdFX1NUQVJUKS4gICovCi1zdGF0aWMgdW5zaWduZWQgY2hhciAq cHJpbnRhYmxlX2ZpZWxkOwotCi0vKiBUaGUgbWF4aW11bSBzaXplIHRoZSBwcmludGFibGVfZmll bGQgYXJyYXkgdG8gYWxsb2NhdGUuCi0gICBGb3IgcmFuZ2VzIHJlcXVpcmluZyBtb3JlIHRoYW4g dGhpcywgd2UgcmV2ZXJ0IHRvIHRoZSBzbGlnaHRseQotICAgc2xvd2VyIG1lY2hhbmlzbSBvZiBp bnNwZWN0aW5nIHRoZSBjdXJyZW50IHJhbmdlIHBhaXIgbGltaXRzLiAgKi8KLWVudW0geyBQUklO VEFCTEVfQVJSQVlfTUFYID0gNjU1MzYgfTsKLQogZW51bSBvcGVyYXRpbmdfbW9kZQogICB7CiAg ICAgdW5kZWZpbmVkX21vZGUsCkBAIC0xNTEsNyArMTIwLDcgQEAgc3RhdGljIGVudW0gb3BlcmF0 aW5nX21vZGUgb3BlcmF0aW5nX21vZGU7CiAgICB3aXRoIGZpZWxkIG1vZGUuICAqLwogc3RhdGlj IGJvb2wgc3VwcHJlc3Nfbm9uX2RlbGltaXRlZDsKIAotLyogSWYgbm9uemVybywgcHJpbnQgYWxs IGJ5dGVzLCBjaGFyYWN0ZXJzLCBvciBmaWVsZHMgX2V4Y2VwdF8KKy8qIElmIHRydWUsIHByaW50 IGFsbCBieXRlcywgY2hhcmFjdGVycywgb3IgZmllbGRzIF9leGNlcHRfCiAgICB0aG9zZSB0aGF0 IHdlcmUgc3BlY2lmaWVkLiAgKi8KIHN0YXRpYyBib29sIGNvbXBsZW1lbnQ7CiAKQEAgLTI1Myw1 NiArMjIyLDYgQEAgV2l0aCBubyBGSUxFLCBvciB3aGVuIEZJTEUgaXMgLSwgcmVhZCBzdGFuZGFy ZCBpbnB1dC5cblwKICAgZXhpdCAoc3RhdHVzKTsKIH0KIAotc3RhdGljIGlubGluZSB2b2lkCi1t YXJrX3ByaW50YWJsZV9maWVsZCAoc2l6ZV90IGkpCi17Ci0gIHNpemVfdCBuID0gaSAvIENIQVJf QklUOwotICBwcmludGFibGVfZmllbGRbbl0gfD0gKDEgPDwgKGkgJSBDSEFSX0JJVCkpOwotfQot Ci1zdGF0aWMgaW5saW5lIGJvb2wKLWlzX3ByaW50YWJsZV9maWVsZCAoc2l6ZV90IGkpCi17Ci0g IHNpemVfdCBuID0gaSAvIENIQVJfQklUOwotICByZXR1cm4gKHByaW50YWJsZV9maWVsZFtuXSA+ PiAoaSAlIENIQVJfQklUKSkgJiAxOwotfQotCi0vKiBSZXR1cm4gbm9uemVybyBpZiB0aGUgSyd0 aCBmaWVsZCBvciBieXRlIGlzIHByaW50YWJsZS4KLSAgIE5vdGUgdGhpcyBpcyBhICJob3QiIGZ1 bmN0aW9uLiAgUGxlYXNlIHByb2ZpbGUgd2hlbiBjaGFuZ2luZy4gICovCi0KLXN0YXRpYyBpbmxp bmUgYm9vbAotcHJpbnRfa3RoIChzaXplX3QgaykKLXsKLSAgYm9vbCBrX3NlbGVjdGVkID0gZmFs c2U7Ci0KLSAgaWYgKDAgPCBlb2xfcmFuZ2Vfc3RhcnQgJiYgZW9sX3JhbmdlX3N0YXJ0IDw9IGsp Ci0gICAga19zZWxlY3RlZCA9IHRydWU7Ci0gIGVsc2UgaWYgKHByaW50YWJsZV9maWVsZCkgLyog ZmFzdGVyIHBhdGggZm9yIHNtYWxsZXIgcmFuZ2VzLiAgKi8KLSAgICB7Ci0gICAgICBpZiAoayA8 PSBtYXhfcmFuZ2VfZW5kcG9pbnQgJiYgaXNfcHJpbnRhYmxlX2ZpZWxkIChrKSkKLSAgICAgICAg a19zZWxlY3RlZCA9IHRydWU7Ci0gICAgfQotICBlbHNlIGlmIChjdXJyZW50X3JwLT5sbyA8PSBr ICYmIGsgPD0gY3VycmVudF9ycC0+aGkpCi0gICAga19zZWxlY3RlZCA9IHRydWU7Ci0KLSAgcmV0 dXJuIGtfc2VsZWN0ZWQgXiBjb21wbGVtZW50OwotfQotCi0vKiBSZXR1cm4gbm9uemVybyBpZiBL J3RoIGJ5dGUgaXMgdGhlIGJlZ2lubmluZyBvZiBhIHJhbmdlLiAqLwotCi1zdGF0aWMgaW5saW5l IGJvb2wKLWlzX3JhbmdlX3N0YXJ0X2luZGV4IChzaXplX3QgaykKLXsKLSAgYm9vbCBpc19zdGFy dCA9IGZhbHNlOwotCi0gIGlmICghY29tcGxlbWVudCkKLSAgICBpc19zdGFydCA9IChrID09IGVv bF9yYW5nZV9zdGFydCB8fCBrID09IGN1cnJlbnRfcnAtPmxvKTsKLSAgZWxzZQotICAgIGlzX3N0 YXJ0ID0gKGsgPT0gKGN1cnJlbnRfcnAgLSAxKS0+aGkgKyAxKTsKLQotICByZXR1cm4gaXNfc3Rh cnQ7Ci19Ci0KIC8qIENvbXBhcmlzb24gZnVuY3Rpb24gZm9yIHFzb3J0IHRvIG9yZGVyIHRoZSBs aXN0IG9mCiAgICBzdHJ1Y3QgcmFuZ2VfcGFpcnMuICAqLwogc3RhdGljIGludApAQCAtMzEzLDIy ICsyMzIsMzcgQEAgY29tcGFyZV9yYW5nZXMgKGNvbnN0IHZvaWQgKmEsIGNvbnN0IHZvaWQgKmIp CiAgIHJldHVybiBhX3N0YXJ0IDwgYl9zdGFydCA/IC0xIDogYV9zdGFydCA+IGJfc3RhcnQ7CiB9 CiAKLS8qIEluY3JlbWVudCAqSVRFTV9JRFggKGkuZS4gYSBmaWVsZCBvciBieXRlIGluZGV4KSwK LSAgIGFuZCBpZiByZXF1aXJlZCBDVVJSRU5UX1JQLiAgKi8KLQotc3RhdGljIGlubGluZSB2b2lk Ci1uZXh0X2l0ZW0gKHNpemVfdCAqaXRlbV9pZHgpCitzdGF0aWMgdm9pZAorY29tcGxlbWVudF9y cCAodm9pZCkKIHsKLSAgKCppdGVtX2lkeCkrKzsKLSAgLyogYXZvaWQgZXh0cmEgcHJvY2Vzc2lu ZyBhc3NvY2lhdGVkIHdpdGggY3VycmVudF9ycCB1bmxlc3MgbmVlZGVkLiAgKi8KLSAgaWYgKCFw cmludGFibGVfZmllbGQpCi0gICAgaWYgKCgqaXRlbV9pZHggPiBjdXJyZW50X3JwLT5oaSkgJiYg KGN1cnJlbnRfcnAgPCBycCArIG5fcnAgLSAxKSkKLSAgICAgIGN1cnJlbnRfcnArKzsKLX0KKyAg aWYgKGNvbXBsZW1lbnQpCisgICAgeworICAgICAgc3RydWN0IHJhbmdlX3BhaXIgKmMgPSBycDsK KyAgICAgIHNpemVfdCBuID0gbl9ycDsKKyAgICAgIHNpemVfdCBpOworCisgICAgICBycCA9IE5V TEw7CisgICAgICBuX3JwID0gMDsKKyAgICAgIG5fcnBfYWxsb2NhdGVkID0gMDsKKworICAgICAg aWYgKGNbMF0ubG8gPiAxKQorCWFkZF9yYW5nZV9wYWlyICgxLCBjWzBdLmxvIC0gMSk7CisgICAg ICBmb3IgKGkgPSAxOyBpIDwgbjsgKytpKQorCXsKKwkgIGlmIChjW2ktMV0uaGkgKyAxID09IGNb aV0ubG8pCisJICAgIGNvbnRpbnVlOwogCisJICBhZGRfcmFuZ2VfcGFpciAoY1tpLTFdLmhpICsg MSwgY1tpXS5sbyAtIDEpOworCX0KKworICAgICAgaWYgKGNbbi0xXS5oaSA8IChzaXplX3QpLTEp CisJYWRkX3JhbmdlX3BhaXIgKGNbbi0xXS5oaSArIDEsIChzaXplX3QpLTEpOworCisgICAgICBm cmVlIChjKTsKKyAgICB9Cit9CiAvKiBHaXZlbiB0aGUgbGlzdCBvZiBmaWVsZCBvciBieXRlIHJh bmdlIHNwZWNpZmljYXRpb25zIEZJRUxEU1RSLAotICAgYWxsb2NhdGUgYW5kIGluaXRpYWxpemUg dGhlIFJQIGFycmF5LiAgSWYgdGhlcmUgaXMgYSByaWdodC1vcGVuLWVuZGVkCi0gICByYW5nZSwg c2V0IEVPTF9SQU5HRV9TVEFSVCB0byBpdHMgc3RhcnRpbmcgaW5kZXguIEZJRUxEU1RSIHNob3Vs ZAorICAgYWxsb2NhdGUgYW5kIGluaXRpYWxpemUgdGhlIFJQIGFycmF5LiBGSUVMRFNUUiBzaG91 bGQKICAgIGJlIGNvbXBvc2VkIG9mIG9uZSBvciBtb3JlIG51bWJlcnMgb3IgcmFuZ2VzIG9mIG51 bWJlcnMsIHNlcGFyYXRlZAogICAgYnkgYmxhbmtzIG9yIGNvbW1hcy4gIEluY29tcGxldGUgcmFu Z2VzIG1heSBiZSBnaXZlbjogJy1tJyBtZWFucyAnMS1tJzsKICAgICduLScgbWVhbnMgJ24nIHRo cm91Z2ggZW5kIG9mIGxpbmUuCkBAIC0zNDIsMTUgKzI3NiwxNCBAQCBzZXRfZmllbGRzIChjb25z dCBjaGFyICpmaWVsZHN0cikKICAgc2l6ZV90IHZhbHVlID0gMDsJCS8qIElmIG5vbnplcm8sIGEg bnVtYmVyIGJlaW5nIGFjY3VtdWxhdGVkLiAgKi8KICAgYm9vbCBsaHNfc3BlY2lmaWVkID0gZmFs c2U7CiAgIGJvb2wgcmhzX3NwZWNpZmllZCA9IGZhbHNlOwotICBib29sIGRhc2hfZm91bmQgPSBm YWxzZTsJLyogVHJ1ZSBpZiBhICctJyBpcyBmb3VuZCBpbiB0aGlzIGZpZWxkLiAgKi8KKyAgYm9v bCBkYXNoX2ZvdW5kID0gZmFsc2U7CS8qIFRydWUgaWYgYSBgLScgaXMgZm91bmQgaW4gdGhpcyBm aWVsZC4gICovCiAgIGJvb2wgZmllbGRfZm91bmQgPSBmYWxzZTsJLyogVHJ1ZSBpZiBhdCBsZWFz dCBvbmUgZmllbGQgc3BlYwogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBoYXMg YmVlbiBwcm9jZXNzZWQuICAqLwogCiAgIHNpemVfdCBpOwogICBib29sIGluX2RpZ2l0cyA9IGZh bHNlOwogCi0gIC8qIENvbGxlY3QgYW5kIHN0b3JlIGluIFJQIHRoZSByYW5nZSBlbmQgcG9pbnRz LgotICAgICBJdCBhbHNvIHNldHMgRU9MX1JBTkdFX1NUQVJUIGlmIGFwcHJvcHJpYXRlLiAgKi8K KyAgLyogQ29sbGVjdCBhbmQgc3RvcmUgaW4gUlAgdGhlIHJhbmdlIGVuZCBwb2ludHMuICovCiAK ICAgd2hpbGUgKHRydWUpCiAgICAgewpAQCAtMzgyLDIyICszMTUsMjAgQEAgc2V0X2ZpZWxkcyAo Y29uc3QgY2hhciAqZmllbGRzdHIpCiAgICAgICAgICAgICAgICAgRkFUQUxfRVJST1IgKF8oImlu dmFsaWQgcmFuZ2Ugd2l0aCBubyBlbmRwb2ludDogLSIpKTsKIAogICAgICAgICAgICAgICAvKiBB IHJhbmdlLiAgUG9zc2liaWxpdGllczogLW4sIG0tbiwgbi0uCi0gICAgICAgICAgICAgICAgIElu IGFueSBjYXNlLCAnaW5pdGlhbCcgY29udGFpbnMgdGhlIHN0YXJ0IG9mIHRoZSByYW5nZS4gKi8K KyAgICAgICAgICAgICAgICAgSW4gYW55IGNhc2UsIGBpbml0aWFsJyBjb250YWlucyB0aGUgc3Rh cnQgb2YgdGhlIHJhbmdlLiAqLwogICAgICAgICAgICAgICBpZiAoIXJoc19zcGVjaWZpZWQpCiAg ICAgICAgICAgICAgICAgewotICAgICAgICAgICAgICAgICAgLyogJ24tJy4gIEZyb20gJ2luaXRp YWwnIHRvIGVuZCBvZiBsaW5lLiAgSWYgd2UndmUgYWxyZWFkeQotICAgICAgICAgICAgICAgICAg ICAgc2VlbiBhbiBNLSByYW5nZSwgaWdub3JlIHN1YnNlcXVlbnQgTi0gdW5sZXNzIE4gPCBNLiAg Ki8KLSAgICAgICAgICAgICAgICAgIGlmIChlb2xfcmFuZ2Vfc3RhcnQgPT0gMCB8fCBpbml0aWFs IDwgZW9sX3JhbmdlX3N0YXJ0KQotICAgICAgICAgICAgICAgICAgICBlb2xfcmFuZ2Vfc3RhcnQg PSBpbml0aWFsOworICAgICAgICAgICAgICAgICAgLyogYG4tJy4gIEZyb20gYGluaXRpYWwnIHRv IGVuZCBvZiBsaW5lLiAqLworCQkgIGFkZF9yYW5nZV9wYWlyIChpbml0aWFsLCAoc2l6ZV90KS0x KTsKICAgICAgICAgICAgICAgICAgIGZpZWxkX2ZvdW5kID0gdHJ1ZTsKICAgICAgICAgICAgICAg ICB9CiAgICAgICAgICAgICAgIGVsc2UKICAgICAgICAgICAgICAgICB7Ci0gICAgICAgICAgICAg ICAgICAvKiAnbS1uJyBvciAnLW4nICgxLW4pLiAqLworICAgICAgICAgICAgICAgICAgLyogYG0t bicgb3IgYC1uJyAoMS1uKS4gKi8KICAgICAgICAgICAgICAgICAgIGlmICh2YWx1ZSA8IGluaXRp YWwpCiAgICAgICAgICAgICAgICAgICAgIEZBVEFMX0VSUk9SIChfKCJpbnZhbGlkIGRlY3JlYXNp bmcgcmFuZ2UiKSk7CiAKLSAgICAgICAgICAgICAgICAgIEFERF9SQU5HRV9QQUlSIChycCwgaW5p dGlhbCwgdmFsdWUpOworICAgICAgICAgICAgICAgICAgYWRkX3JhbmdlX3BhaXIgKGluaXRpYWws IHZhbHVlKTsKICAgICAgICAgICAgICAgICAgIGZpZWxkX2ZvdW5kID0gdHJ1ZTsKICAgICAgICAg ICAgICAgICB9CiAgICAgICAgICAgICAgIHZhbHVlID0gMDsKQEAgLTQwNSw3ICszMzYsOSBAQCBz ZXRfZmllbGRzIChjb25zdCBjaGFyICpmaWVsZHN0cikKICAgICAgICAgICBlbHNlCiAgICAgICAg ICAgICB7CiAgICAgICAgICAgICAgIC8qIEEgc2ltcGxlIGZpZWxkIG51bWJlciwgbm90IGEgcmFu Z2UuICovCi0gICAgICAgICAgICAgIEFERF9SQU5HRV9QQUlSIChycCwgdmFsdWUsIHZhbHVlKTsK KwkgICAgICBpZiAodmFsdWUgPT0gMCkKKwkJRkFUQUxfRVJST1IgKF8oImZpZWxkcyBhbmQgcG9z aXRpb25zIGFyZSBudW1iZXJlZCBmcm9tIDEiKSk7CisgICAgICAgICAgICAgIGFkZF9yYW5nZV9w YWlyICh2YWx1ZSwgdmFsdWUpOwogICAgICAgICAgICAgICB2YWx1ZSA9IDA7CiAgICAgICAgICAg ICAgIGZpZWxkX2ZvdW5kID0gdHJ1ZTsKICAgICAgICAgICAgIH0KQEAgLTQ1NSw3MSArMzg4LDY2 IEBAIHNldF9maWVsZHMgKGNvbnN0IGNoYXIgKmZpZWxkc3RyKQogICAgICAgICBGQVRBTF9FUlJP UiAoXygiaW52YWxpZCBieXRlLCBjaGFyYWN0ZXIgb3IgZmllbGQgbGlzdCIpKTsKICAgICB9CiAK LSAgbWF4X3JhbmdlX2VuZHBvaW50ID0gMDsKLSAgZm9yIChpID0gMDsgaSA8IG5fcnA7IGkrKykK LSAgICB7Ci0gICAgICBpZiAocnBbaV0uaGkgPiBtYXhfcmFuZ2VfZW5kcG9pbnQpCi0gICAgICAg IG1heF9yYW5nZV9lbmRwb2ludCA9IHJwW2ldLmhpOwotICAgIH0KLQotICAvKiBGb3IgcGVyZm9y bWFuY2UsIGFsbG9jYXRlIGFuIGFycmF5IGxhcmdlIGVub3VnaCBzbyB0aGF0IGl0IG1heSBiZQot ICAgICBpbmRleGVkIGJ5IHRoZSBmaWVsZCBudW1iZXJzIGNvcnJlc3BvbmRpbmcgdG8gYWxsIGZp bml0ZSByYW5nZXMKLSAgICAgKGkuZS4gJzItNicgb3IgJy00JywgYnV0IG5vdCAnNS0nKSBpbiBG SUVMRFNUUi4KLSAgICAgTm90ZSB0aGlzIGVuaGFuY2VtZW50IGlzIG5vdCBwb3NzaWJsZSB3aXRo IHZlcnkgbGFyZ2UgcmFuZ2VzLAotICAgICBvciB3aGVuIC0tb3V0cHV0LWRlbGltaXRlciBpcyBz cGVjaWZpZWQuICAqLwotCi0gIGlmICghb3V0cHV0X2RlbGltaXRlcl9zcGVjaWZpZWQKLSAgICAg ICYmIG1heF9yYW5nZV9lbmRwb2ludAotICAgICAgJiYgbWF4X3JhbmdlX2VuZHBvaW50IC8gQ0hB Ul9CSVQgPCBQUklOVEFCTEVfQVJSQVlfTUFYKQotICAgIHByaW50YWJsZV9maWVsZCA9IHh6YWxs b2MgKG1heF9yYW5nZV9lbmRwb2ludCAvIENIQVJfQklUICsgMSk7Ci0KICAgcXNvcnQgKHJwLCBu X3JwLCBzaXplb2YgKHJwWzBdKSwgY29tcGFyZV9yYW5nZXMpOwogCi0gIC8qIE9taXQgZmluaXRl IHJhbmdlcyBzdWJzdW1lZCBieSBhIHRvLUVPTCByYW5nZS4gKi8KLSAgaWYgKGVvbF9yYW5nZV9z dGFydCAmJiBuX3JwKQotICAgIHsKLSAgICAgIGkgPSBuX3JwOwotICAgICAgd2hpbGUgKGkgJiYg ZW9sX3JhbmdlX3N0YXJ0IDw9IHJwW2kgLSAxXS5oaSkKLSAgICAgICAgewotICAgICAgICAgIGVv bF9yYW5nZV9zdGFydCA9IE1JTiAocnBbaSAtIDFdLmxvLCBlb2xfcmFuZ2Vfc3RhcnQpOwotICAg ICAgICAgIC0tbl9ycDsKLSAgICAgICAgICAtLWk7Ci0gICAgICAgIH0KLSAgICB9Ci0KLSAgLyog TWVyZ2UgZmluaXRlIHJhbmdlIHBhaXJzIChlLmcuIGAyLTUsMy00JyBiZWNvbWVzIGAyLTUnKS4K LSAgICAgQWxzbyBmb3Igc21hbGwgZW5vdWdoIHJhbmdlcywgbWFyayBpdGVtcyBhcyBwcmludGFi bGUuICAqLworICAvKiBNZXJnZSByYW5nZSBwYWlycyAoZS5nLiBgMi01LDMtNCcgYmVjb21lcyBg Mi01JykuICovCiAgIGZvciAoaSA9IDA7IGkgPCBuX3JwOyArK2kpCiAgICAgewotICAgICAgZm9y IChzaXplX3QgaiA9IGkgKyAxOyBqIDwgbl9ycDsgKytqKQorICAgICAgc2l6ZV90IGo7CisgICAg ICBmb3IgKGogPSBpICsgMTsgaiA8IG5fcnA7ICsraikKICAgICAgICAgewogICAgICAgICAgIGlm IChycFtqXS5sbyA8PSBycFtpXS5oaSkKICAgICAgICAgICAgIHsKICAgICAgICAgICAgICAgcnBb aV0uaGkgPSBNQVggKHJwW2pdLmhpLCBycFtpXS5oaSk7CiAgICAgICAgICAgICAgIG1lbW1vdmUg KHJwICsgaiwgcnAgKyBqICsgMSwKLSAgICAgICAgICAgICAgICAgICAgICAgKG5fcnAgLSBqIC0g MSkgKiBzaXplb2YgKHN0cnVjdCByYW5nZV9wYWlyKSk7CisgICAgICAgICAgICAgICAgICAgICAg IChuX3JwIC0gaiAtIDEpICogc2l6ZW9mIChycFswXSkpOwogICAgICAgICAgICAgICAtLW5fcnA7 CisJICAgICAgLS1qOwogICAgICAgICAgICAgfQogICAgICAgICAgIGVsc2UKICAgICAgICAgICAg IGJyZWFrOwogICAgICAgICB9Ci0KLSAgICAgIGlmIChwcmludGFibGVfZmllbGQpCi0gICAgICAg IHsKLSAgICAgICAgICBmb3IgKHNpemVfdCBrID0gcnBbaV0ubG87IGsgPD0gcnBbaV0uaGk7IGsr KykKLSAgICAgICAgICAgIG1hcmtfcHJpbnRhYmxlX2ZpZWxkIChrKTsKLSAgICAgICAgfQogICAg IH0KIAorICBjb21wbGVtZW50X3JwICgpOworCiAgIC8qIEFmdGVyIG1lcmdpbmcsIHJlYWxsb2Nh dGUgUlAgc28gd2UgcmVsZWFzZSBtZW1vcnkgdG8gdGhlIHN5c3RlbS4KLSAgICAgQWxzbyBhZGQg YSBzZW50aW5lbCBhdCB0aGUgZW5kIG9mIFJQLCB0byBhdm9pZCBvdXQgb2YgYm91bmRzIGFjY2Vz cy4gICovCisgICAgIEFsc28gYWRkIGEgc2VudGluZWwgYXQgdGhlIGVuZCBvZiBSUCwgdG8gYXZv aWQgb3V0IG9mIGJvdW5kcyBhY2Nlc3MKKyAgICAgYW5kIGZvciBwZXJmb21hbmNlIHJlYXNvbnMu ICAqLwogICArK25fcnA7CiAgIHJwID0geHJlYWxsb2MgKHJwLCBuX3JwICogc2l6ZW9mIChzdHJ1 Y3QgcmFuZ2VfcGFpcikpOwotICBycFtuX3JwIC0gMV0ubG8gPSBycFtuX3JwIC0gMV0uaGkgPSAw OworICBycFtuX3JwIC0gMV0ubG8gPSBycFtuX3JwIC0gMV0uaGkgPSAoc2l6ZV90KS0xOwogCiAg IHJldHVybiBmaWVsZF9mb3VuZDsKIH0KIAorLyogSW5jcmVtZW50ICpJVEVNX0lEWCAoaS5lLiBh IGZpZWxkIG9yIGJ5dGUgaW5kZXgpLAorICAgYW5kIGlmIHJlcXVpcmVkIENVUlJFTlRfUlAuICAq LworCitzdGF0aWMgaW5saW5lIHZvaWQKK25leHRfaXRlbSAoc2l6ZV90ICppdGVtX2lkeCkKK3sK KyAgKCppdGVtX2lkeCkrKzsKKyAgaWYgKCgqaXRlbV9pZHgpID4gY3VycmVudF9ycC0+aGkpCisg ICAgY3VycmVudF9ycCsrOworfQorCisvKiBSZXR1cm4gbm9uemVybyBpZiB0aGUgSyd0aCBmaWVs ZCBvciBieXRlIGlzIHByaW50YWJsZS4gKi8KKworc3RhdGljIGlubGluZSBib29sCitwcmludF9r dGggKHNpemVfdCBrKQoreworICByZXR1cm4gY3VycmVudF9ycC0+bG8gPD0gazsKK30KKworLyog UmV0dXJuIG5vbnplcm8gaWYgSyd0aCBieXRlIGlzIHRoZSBiZWdpbm5pbmcgb2YgYSByYW5nZS4g Ki8KKworc3RhdGljIGlubGluZSBib29sCitpc19yYW5nZV9zdGFydF9pbmRleCAoc2l6ZV90IGsp Cit7CisgIHJldHVybiBrID09IGN1cnJlbnRfcnAtPmxvOworfQorCiAvKiBSZWFkIGZyb20gc3Ry ZWFtIFNUUkVBTSwgcHJpbnRpbmcgdG8gc3RhbmRhcmQgb3V0cHV0IGFueSBzZWxlY3RlZCBieXRl cy4gICovCiAKIHN0YXRpYyB2b2lkCi0tIAoxLjguMi4yCgo= --Multipart=_Mon__6_May_2013_20_54_01_+0200_61H6/SFr=vfjuiZQ-- From debbugs-submit-bounces@debbugs.gnu.org Tue May 07 08:11:21 2013 Received: (at 13127) by debbugs.gnu.org; 7 May 2013 12:11:22 +0000 Received: from localhost ([127.0.0.1]:60769 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UZgjp-0003Aj-4o for submit@debbugs.gnu.org; Tue, 07 May 2013 08:11:21 -0400 Received: from mail3.vodafone.ie ([213.233.128.45]:28296) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UZgjh-0003AT-ID for 13127@debbugs.gnu.org; Tue, 07 May 2013 08:11:20 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvYDAFfuiFFtTutr/2dsb2JhbAANQ4M+iRq1awMBgR6DEwEBAQR5EAsNAQMDAQIBCRYPCQMCAQIBPQgGDQEFAgEBh3YDrzODM45QjFuBd04RBwmDSgOPdo1yjgg Received: from unknown (HELO [192.168.1.79]) ([109.78.235.107]) by mail3.vodafone.ie with ESMTP; 07 May 2013 13:10:09 +0100 Message-ID: <5188EF20.3080603@draigBrady.com> Date: Tue, 07 May 2013 13:10:08 +0100 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: Cojocaru Alexandru Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <87fw3fosz4.fsf@rho.meyering.net> <20121211152436.ada365e55fa617e0a41255e8@gmx.com> <517AA745.3070506@draigBrady.com> <517C8091.7090106@draigBrady.com> <20130428134409.8abec330a4ecba67b78f4bc0@gmx.com> <517D2C6F.8040006@draigBrady.com> <20130428201433.1720fe6d54581458391fc6fe@gmx.com> <517D825E.3040308@draigBrady.com> <517DB7D8.8080900@draigBrady.com> <20130506205401.dbffe145eef3268152b9a4a5@gmx.com> In-Reply-To: <20130506205401.dbffe145eef3268152b9a4a5@gmx.com> X-Enigmail-Version: 1.5.1 Content-Type: multipart/mixed; boundary="------------090702050708030003020102" X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) This is a multi-part message in MIME format. --------------090702050708030003020102 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit On 05/06/2013 07:54 PM, Cojocaru Alexandru wrote: > On Mon, 29 Apr 2013 00:59:20 +0100 > Pádraig Brady wrote: > >> So I reinstated the bit vector which was a little tricky >> to do while maintaining performance, but it works very well. > I think it works because we are avoiding a memory access > inside `next_item' this way. > > With this patch I try to keep the CPU benefits for `--output-d' > and when large ranges are specified, even without the bitarray. > > Because of the sentinel now the max line len supported will be > `(size_t)-1 - 1' and no more `(size_t)-1'. Is this an issue? > > PS: This patch also fix a little bug inside `set_fields'. It's always best to have separate changes. I've split the fix out (attached) with an associated test. thanks, Pádraig. --------------090702050708030003020102 Content-Type: text/x-patch; name="cut-merge-fix.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="cut-merge-fix.patch" >From b54b47f954c9b97bdb2dbbf51ead908ccb3a4f13 Mon Sep 17 00:00:00 2001 From: Cojocaru Alexandru Date: Tue, 7 May 2013 13:01:46 +0100 Subject: [PATCH] cut: fix handling of overlapping ranges This issue was introduced in commit v8.21-43-g3e466ad * src/cut.c (set_fields): Process all range pairs when merging. * tests/misc/cut-huge-range.sh: Add a test for this edge case. Also fix an issue where we could miss reported errors due to truncation of the 'err' file. --- src/cut.c | 6 +++--- tests/misc/cut-huge-range.sh | 8 +++++++- 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/src/cut.c b/src/cut.c index b347b30..9501b3a 100644 --- a/src/cut.c +++ b/src/cut.c @@ -496,9 +496,9 @@ set_fields (const char *fieldstr) if (rp[j].lo <= rp[i].hi) { rp[i].hi = MAX (rp[j].hi, rp[i].hi); - memmove (rp + j, rp + j + 1, - (n_rp - j - 1) * sizeof (struct range_pair)); - --n_rp; + memmove (rp + j, rp + j + 1, (n_rp - j - 1) * sizeof *rp); + n_rp--; + j--; } else break; diff --git a/tests/misc/cut-huge-range.sh b/tests/misc/cut-huge-range.sh index 887197a..9905cd7 100755 --- a/tests/misc/cut-huge-range.sh +++ b/tests/misc/cut-huge-range.sh @@ -27,7 +27,13 @@ getlimits_ # Up to and including coreutils-8.21, cut would allocate possibly needed # memory upfront. Subsequently memory is allocated as required. -(ulimit -v 20000; : | cut -b1-$INT_MAX > err 2>&1) || fail=1 +(ulimit -v 20000; : | cut -b1-$INT_MAX >> err 2>&1) || fail=1 + +# Ensure ranges are merged correctly when large range logic is in effect +echo 1 > exp +(dd bs=1MB if=/dev/zero count=1; echo '1') | +cut -b1-1000000,2-3,4-5,1000001 2>>err | tail -c2 > out || fail=1 +compare exp out || fail=1 compare /dev/null err || fail=1 -- 1.7.7.6 --------------090702050708030003020102-- From debbugs-submit-bounces@debbugs.gnu.org Tue May 07 09:53:10 2013 Received: (at 13127) by debbugs.gnu.org; 7 May 2013 13:53:10 +0000 Received: from localhost ([127.0.0.1]:60839 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UZiKI-0000S0-9J for submit@debbugs.gnu.org; Tue, 07 May 2013 09:53:10 -0400 Received: from mail3.vodafone.ie ([213.233.128.45]:65451) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UZiK9-0000RK-KT for 13127@debbugs.gnu.org; Tue, 07 May 2013 09:53:04 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvYDAMoFiVFtTutr/2dsb2JhbAANQ4M+iRq1awMBgRyDEwEBAQQnUhALDQEDAwECAQkWDwkDAgECAT0IBg0BBQIBAYd2A682gzOOWYxbgR9YThEHCYNKA492g2mKCY4IgWk Received: from unknown (HELO [192.168.1.79]) ([109.78.235.107]) by mail3.vodafone.ie with ESMTP; 07 May 2013 14:51:52 +0100 Message-ID: <518906F8.4050906@draigBrady.com> Date: Tue, 07 May 2013 14:51:52 +0100 From: =?ISO-8859-1?Q?P=E1draig_Brady?= User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: Cojocaru Alexandru Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <87fw3fosz4.fsf@rho.meyering.net> <20121211152436.ada365e55fa617e0a41255e8@gmx.com> <517AA745.3070506@draigBrady.com> <517C8091.7090106@draigBrady.com> <20130428134409.8abec330a4ecba67b78f4bc0@gmx.com> <517D2C6F.8040006@draigBrady.com> <20130428201433.1720fe6d54581458391fc6fe@gmx.com> <517D825E.3040308@draigBrady.com> <517DB7D8.8080900@draigBrady.com> <20130506205401.dbffe145eef3268152b9a4a5@gmx.com> <5188EF20.3080603@draigBrady.com> In-Reply-To: <5188EF20.3080603@draigBrady.com> X-Enigmail-Version: 1.5.1 Content-Type: multipart/mixed; boundary="------------080405060802010200050005" X-Spam-Score: -1.9 (-) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) This is a multi-part message in MIME format. --------------080405060802010200050005 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit On 05/07/2013 01:10 PM, Pádraig Brady wrote: > On 05/06/2013 07:54 PM, Cojocaru Alexandru wrote: >> On Mon, 29 Apr 2013 00:59:20 +0100 >> Pádraig Brady wrote: >> >>> So I reinstated the bit vector which was a little tricky >>> to do while maintaining performance, but it works very well. >> I think it works because we are avoiding a memory access >> inside `next_item' this way. >> >> With this patch I try to keep the CPU benefits for `--output-d' >> and when large ranges are specified, even without the bitarray. >> >> Because of the sentinel now the max line len supported will be >> `(size_t)-1 - 1' and no more `(size_t)-1'. Is this an issue? Not a practical one. We could bump the types/limits in the range pairs up to uintmax_t since we're now not allocating lot of corresponding memory. Note I added a specific check to make it explicit that -b$SIZE_MAX is not supported if specified. I'll do that in a subsequent patch, but it's not a practical issue for now, as we still allocate mem for the whole line. The new patch performs well! I'll apply the attached in a little while. thanks! Pádraig. --------------080405060802010200050005 Content-Type: text/x-patch; name="cut-sentinel.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="cut-sentinel.patch" >From 1a6fbf21d1a70e85555a9b107f2f91188e2d3a4b Mon Sep 17 00:00:00 2001 From: Cojocaru Alexandru Date: Tue, 7 May 2013 13:47:15 +0100 Subject: [PATCH] cut: improve performance, especially with --output-delimiter Use a sentinel value that's checked implicitly, rather than a bit array, to determine if an item should be output. Benchmark results for this change are: $ yes abcdfeg | head -n1MB > big-file $ for c in orig sentinel; do src/cut-$c 2>/dev/null echo -ne "\n== $c ==" time src/cut-$c -b1,3 big-file > /dev/null done == orig == real 0m0.049s user 0m0.044s sys 0m0.005s == sentinel == real 0m0.035s user 0m0.032s sys 0m0.002s ## Again with --output-delimiter ## $ for c in orig sentinel; do src/cut-$c 2>/dev/null echo -ne "\n== $c ==" time src/cut-$c -b1,3 --output-delimiter=: big-file > /dev/null done == orig == real 0m0.106s user 0m0.103s sys 0m0.002s == sentinel == real 0m0.055s user 0m0.052s sys 0m0.003s eol_range_start: Removed. 'n-' is no longer treated specially, and instead SIZE_MAX is set for the 'hi' limit, and tested implicitly. complement_rp: Used to complement 'rp' when '--complement' is specified. ADD_RANGE_PAIR: Macro renamed to 'add_range_pair' function. * tests/misc/cut-huge-range.sh: Adjust to the SENTINEL value. --- src/cut.c | 234 +++++++++++++++--------------------------- tests/misc/cut-huge-range.sh | 15 ++- 2 files changed, 95 insertions(+), 154 deletions(-) diff --git a/src/cut.c b/src/cut.c index 9501b3a..19ef1d9 100644 --- a/src/cut.c +++ b/src/cut.c @@ -80,21 +80,15 @@ static size_t n_rp_allocated; space if necessary. Update global variable N_RP. When allocating, update global variable N_RP_ALLOCATED. */ -#define ADD_RANGE_PAIR(rp, low, high) \ - do \ - { \ - if (low == 0 || high == 0) \ - FATAL_ERROR (_("fields and positions are numbered from 1")); \ - if (n_rp >= n_rp_allocated) \ - { \ - (rp) = X2NREALLOC (rp, &n_rp_allocated); \ - } \ - rp[n_rp].lo = (low); \ - rp[n_rp].hi = (high); \ - ++n_rp; \ - } \ - while (0) - +static void +add_range_pair (size_t lo, size_t hi) +{ + if (n_rp == n_rp_allocated) + rp = X2NREALLOC (rp, &n_rp_allocated); + rp[n_rp].lo = lo; + rp[n_rp].hi = hi; + ++n_rp; +} /* This buffer is used to support the semantics of the -s option (or lack of same) when the specified field list includes (does @@ -108,31 +102,6 @@ static char *field_1_buffer; /* The number of bytes allocated for FIELD_1_BUFFER. */ static size_t field_1_bufsize; -/* The largest field or byte index used as an endpoint of a closed - or degenerate range specification; this doesn't include the starting - index of right-open-ended ranges. For example, with either range spec - '2-5,9-', '2-3,5,9-' this variable would be set to 5. */ -static size_t max_range_endpoint; - -/* If nonzero, this is the index of the first field in a range that goes - to end of line. */ -static size_t eol_range_start; - -/* This is a bit vector. - In byte mode, which bytes to output. - In field mode, which DELIM-separated fields to output. - Both bytes and fields are numbered starting with 1, - so the zeroth bit of this array is unused. - A field or byte K has been selected if - (K <= MAX_RANGE_ENDPOINT && is_printable_field (K)) - || (EOL_RANGE_START > 0 && K >= EOL_RANGE_START). */ -static unsigned char *printable_field; - -/* The maximum size the printable_field array to allocate. - For ranges requiring more than this, we revert to the slightly - slower mechanism of inspecting the current range pair limits. */ -enum { PRINTABLE_ARRAY_MAX = 65536 }; - enum operating_mode { undefined_mode, @@ -151,7 +120,7 @@ static enum operating_mode operating_mode; with field mode. */ static bool suppress_non_delimited; -/* If nonzero, print all bytes, characters, or fields _except_ +/* If true, print all bytes, characters, or fields _except_ those that were specified. */ static bool complement; @@ -253,56 +222,6 @@ With no FILE, or when FILE is -, read standard input.\n\ exit (status); } -static inline void -mark_printable_field (size_t i) -{ - size_t n = i / CHAR_BIT; - printable_field[n] |= (1 << (i % CHAR_BIT)); -} - -static inline bool -is_printable_field (size_t i) -{ - size_t n = i / CHAR_BIT; - return (printable_field[n] >> (i % CHAR_BIT)) & 1; -} - -/* Return nonzero if the K'th field or byte is printable. - Note this is a "hot" function. Please profile when changing. */ - -static inline bool -print_kth (size_t k) -{ - bool k_selected = false; - - if (0 < eol_range_start && eol_range_start <= k) - k_selected = true; - else if (printable_field) /* faster path for smaller ranges. */ - { - if (k <= max_range_endpoint && is_printable_field (k)) - k_selected = true; - } - else if (current_rp->lo <= k && k <= current_rp->hi) - k_selected = true; - - return k_selected ^ complement; -} - -/* Return nonzero if K'th byte is the beginning of a range. */ - -static inline bool -is_range_start_index (size_t k) -{ - bool is_start = false; - - if (!complement) - is_start = (k == eol_range_start || k == current_rp->lo); - else - is_start = (k == (current_rp - 1)->hi + 1); - - return is_start; -} - /* Comparison function for qsort to order the list of struct range_pairs. */ static int @@ -313,22 +232,42 @@ compare_ranges (const void *a, const void *b) return a_start < b_start ? -1 : a_start > b_start; } -/* Increment *ITEM_IDX (i.e. a field or byte index), - and if required CURRENT_RP. */ +/* Reallocate Range Pair entries, with corresponding + entries outside the range of each specified entry. */ -static inline void -next_item (size_t *item_idx) +static void +complement_rp (void) { - (*item_idx)++; - /* avoid extra processing associated with current_rp unless needed. */ - if (!printable_field) - if ((*item_idx > current_rp->hi) && (current_rp < rp + n_rp - 1)) - current_rp++; + if (complement) + { + struct range_pair *c = rp; + size_t n = n_rp; + size_t i; + + rp = NULL; + n_rp = 0; + n_rp_allocated = 0; + + if (c[0].lo > 1) + add_range_pair (1, c[0].lo - 1); + + for (i = 1; i < n; ++i) + { + if (c[i-1].hi + 1 == c[i].lo) + continue; + + add_range_pair (c[i-1].hi + 1, c[i].lo - 1); + } + + if (c[n-1].hi < SIZE_MAX) + add_range_pair (c[n-1].hi + 1, SIZE_MAX); + + free (c); + } } /* Given the list of field or byte range specifications FIELDSTR, - allocate and initialize the RP array. If there is a right-open-ended - range, set EOL_RANGE_START to its starting index. FIELDSTR should + allocate and initialize the RP array. FIELDSTR should be composed of one or more numbers or ranges of numbers, separated by blanks or commas. Incomplete ranges may be given: '-m' means '1-m'; 'n-' means 'n' through end of line. @@ -349,8 +288,7 @@ set_fields (const char *fieldstr) size_t i; bool in_digits = false; - /* Collect and store in RP the range end points. - It also sets EOL_RANGE_START if appropriate. */ + /* Collect and store in RP the range end points. */ while (true) { @@ -385,10 +323,8 @@ set_fields (const char *fieldstr) In any case, 'initial' contains the start of the range. */ if (!rhs_specified) { - /* 'n-'. From 'initial' to end of line. If we've already - seen an M- range, ignore subsequent N- unless N < M. */ - if (eol_range_start == 0 || initial < eol_range_start) - eol_range_start = initial; + /* 'n-'. From 'initial' to end of line. */ + add_range_pair (initial, SIZE_MAX); field_found = true; } else @@ -397,7 +333,7 @@ set_fields (const char *fieldstr) if (value < initial) FATAL_ERROR (_("invalid decreasing range")); - ADD_RANGE_PAIR (rp, initial, value); + add_range_pair (initial, value); field_found = true; } value = 0; @@ -405,7 +341,9 @@ set_fields (const char *fieldstr) else { /* A simple field number, not a range. */ - ADD_RANGE_PAIR (rp, value, value); + if (value == 0) + FATAL_ERROR (_("fields and positions are numbered from 1")); + add_range_pair (value, value); value = 0; field_found = true; } @@ -432,7 +370,8 @@ set_fields (const char *fieldstr) lhs_specified = 1; /* Detect overflow. */ - if (!DECIMAL_DIGIT_ACCUMULATE (value, *fieldstr - '0', size_t)) + if (!DECIMAL_DIGIT_ACCUMULATE (value, *fieldstr - '0', size_t) + || value == SIZE_MAX) { /* In case the user specified -c$(echo 2^64|bc),22, complain only about the first number. */ @@ -455,40 +394,9 @@ set_fields (const char *fieldstr) FATAL_ERROR (_("invalid byte, character or field list")); } - max_range_endpoint = 0; - for (i = 0; i < n_rp; i++) - { - if (rp[i].hi > max_range_endpoint) - max_range_endpoint = rp[i].hi; - } - - /* For performance, allocate an array large enough so that it may be - indexed by the field numbers corresponding to all finite ranges - (i.e. '2-6' or '-4', but not '5-') in FIELDSTR. - Note this enhancement is not possible with very large ranges, - or when --output-delimiter is specified. */ - - if (!output_delimiter_specified - && max_range_endpoint - && max_range_endpoint / CHAR_BIT < PRINTABLE_ARRAY_MAX) - printable_field = xzalloc (max_range_endpoint / CHAR_BIT + 1); - qsort (rp, n_rp, sizeof (rp[0]), compare_ranges); - /* Omit finite ranges subsumed by a to-EOL range. */ - if (eol_range_start && n_rp) - { - i = n_rp; - while (i && eol_range_start <= rp[i - 1].hi) - { - eol_range_start = MIN (rp[i - 1].lo, eol_range_start); - --n_rp; - --i; - } - } - - /* Merge finite range pairs (e.g. `2-5,3-4' becomes `2-5'). - Also for small enough ranges, mark items as printable. */ + /* Merge range pairs (e.g. `2-5,3-4' becomes `2-5'). */ for (i = 0; i < n_rp; ++i) { for (size_t j = i + 1; j < n_rp; ++j) @@ -503,23 +411,47 @@ set_fields (const char *fieldstr) else break; } - - if (printable_field) - { - for (size_t k = rp[i].lo; k <= rp[i].hi; k++) - mark_printable_field (k); - } } + complement_rp (); + /* After merging, reallocate RP so we release memory to the system. - Also add a sentinel at the end of RP, to avoid out of bounds access. */ + Also add a sentinel at the end of RP, to avoid out of bounds access + and for perfomance reasons. */ ++n_rp; rp = xrealloc (rp, n_rp * sizeof (struct range_pair)); - rp[n_rp - 1].lo = rp[n_rp - 1].hi = 0; + rp[n_rp - 1].lo = rp[n_rp - 1].hi = SIZE_MAX; return field_found; } +/* Increment *ITEM_IDX (i.e. a field or byte index), + and if required CURRENT_RP. */ + +static inline void +next_item (size_t *item_idx) +{ + (*item_idx)++; + if ((*item_idx) > current_rp->hi) + current_rp++; +} + +/* Return nonzero if the K'th field or byte is printable. */ + +static inline bool +print_kth (size_t k) +{ + return current_rp->lo <= k; +} + +/* Return nonzero if K'th byte is the beginning of a range. */ + +static inline bool +is_range_start_index (size_t k) +{ + return k == current_rp->lo; +} + /* Read from stream STREAM, printing to standard output any selected bytes. */ static void diff --git a/tests/misc/cut-huge-range.sh b/tests/misc/cut-huge-range.sh index 9905cd7..579190f 100755 --- a/tests/misc/cut-huge-range.sh +++ b/tests/misc/cut-huge-range.sh @@ -21,13 +21,22 @@ print_ver_ cut require_ulimit_v_ getlimits_ +# Ensure we can cut up to our sentinel value. +# This is currently SIZE_MAX, but could be raised to UINTMAX_MAX +# if we didn't allocate memory for each line as a unit. +CUT_MAX=$(expr $SIZE_MAX - 1) + # From coreutils-8.10 through 8.20, this would make cut try to allocate # a 256MiB bit vector. With a 20MB limit on VM, the following would fail. -(ulimit -v 20000; : | cut -b$INT_MAX- > err 2>&1) || fail=1 +(ulimit -v 20000; : | cut -b$CUT_MAX- > err 2>&1) || fail=1 # Up to and including coreutils-8.21, cut would allocate possibly needed -# memory upfront. Subsequently memory is allocated as required. -(ulimit -v 20000; : | cut -b1-$INT_MAX >> err 2>&1) || fail=1 +# memory upfront. Subsequently extra memory is no longer needed. +(ulimit -v 20000; : | cut -b1-$CUT_MAX >> err 2>&1) || fail=1 + +# Explicitly disallow values above CUT_MAX +(ulimit -v 20000; : | cut -b$SIZE_MAX 2>/dev/null) && fail=1 +(ulimit -v 20000; : | cut -b$SIZE_OFLOW 2>/dev/null) && fail=1 # Ensure ranges are merged correctly when large range logic is in effect echo 1 > exp -- 1.7.7.6 --------------080405060802010200050005-- From debbugs-submit-bounces@debbugs.gnu.org Wed May 08 02:55:39 2013 Received: (at 13127) by debbugs.gnu.org; 8 May 2013 06:55:39 +0000 Received: from localhost ([127.0.0.1]:33472 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UZyHr-0007jy-7Z for submit@debbugs.gnu.org; Wed, 08 May 2013 02:55:39 -0400 Received: from moutng.kundenserver.de ([212.227.17.9]:61055) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UZyHn-0007jn-Vz for 13127@debbugs.gnu.org; Wed, 08 May 2013 02:55:37 -0400 Received: from oxbaltgw05.schlund.de (oxbaltgw05.schlund.de [172.19.246.11]) by mrelayeu.kundenserver.de (node=mrbap2) with ESMTP (Nemesis) id 0Lu38g-1USZee1dDi-011iz2; Wed, 08 May 2013 08:54:26 +0200 Date: Wed, 8 May 2013 08:54:26 +0200 (CEST) From: Bernhard Voelker To: =?UTF-8?Q?=22P=C3=A1draig_Brady=22?= , Cojocaru Alexandru Message-ID: <1976940659.356456.1367996066681.open-xchange@email.1und1.de> In-Reply-To: <5188EF20.3080603@draigBrady.com> References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <87fw3fosz4.fsf@rho.meyering.net> <20121211152436.ada365e55fa617e0a41255e8@gmx.com> <517AA745.3070506@draigBrady.com> <517C8091.7090106@draigBrady.com> <20130428134409.8abec330a4ecba67b78f4bc0@gmx.com> <517D2C6F.8040006@draigBrady.com> <20130428201433.1720fe6d54581458391fc6fe@gmx.com> <517D825E.3040308@draigBrady.com> <517DB7D8.8080900@draigBrady.com> <20130506205401.dbffe145eef3268152b9a4a5@gmx.com> <5188EF20.3080603@draigBrady.com> Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Priority: 3 Importance: Medium X-Mailer: Open-Xchange Mailer v- X-Provags-ID: V02:K0:BwbcorBFIQNIUjNffYZxsRXj3U/lWmnNlYUBZtBWRW+ /vNxWim6haJzgAeWaqL0vKQHaBUq40KhmIVkVOTL+LMRr+yRy2 TflWEh9sd5RptSl9ztunQzJ3jAaWCSiPZIFUPSOfsHx5Z6CShU GYn5kWBvBLNWJDkq/Rcn80SsSLYVRbowFB2O3/3pdGhe6EXNmB RmafzeIXt7VL0Cw5FcM6SyTC9Ovp+89McZTarb61YBVMTsaoLW JN+bsK7089J8FyeISfuax2QpMPvTdYijd9L+FmDrAJKLiVQeHJ yx5LSWeI17XOR4y75tTp0s4s9jPS1e5Y5zJhXIGdBuIDWobXNQ KgyNfa0EueLdGu2UMztUfXXYyNEWPwW4a9K5Fb7VqyFADiadCh 8VTXYAvIPleLNmVaHKb164ElzmfxynLWDQ= X-Spam-Score: 0.8 (/) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Bernhard Voelker List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) > On May 7, 2013 at 2:10 PM P=C3=A1draig Brady wrote: > It's always best to have separate changes. > I've split the fix out (attached) with an associated test. The patch looks fine, thanks. Have a nice day, Berny From debbugs-submit-bounces@debbugs.gnu.org Wed May 08 02:56:06 2013 Received: (at 13127) by debbugs.gnu.org; 8 May 2013 06:56:06 +0000 Received: from localhost ([127.0.0.1]:33476 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UZyIG-0007lf-U4 for submit@debbugs.gnu.org; Wed, 08 May 2013 02:56:05 -0400 Received: from moutng.kundenserver.de ([212.227.126.187]:54349) by debbugs.gnu.org with esmtp (Exim 4.72) (envelope-from ) id 1UZyIE-0007l7-Bq for 13127@debbugs.gnu.org; Wed, 08 May 2013 02:56:03 -0400 Received: from oxbaltgw05.schlund.de (oxbaltgw05.schlund.de [172.19.246.11]) by mrelayeu.kundenserver.de (node=mrbap2) with ESMTP (Nemesis) id 0M35gF-1UJXFi1n4J-00scDr; Wed, 08 May 2013 08:54:53 +0200 Date: Wed, 8 May 2013 08:54:53 +0200 (CEST) From: Bernhard Voelker To: =?UTF-8?Q?=22P=C3=A1draig_Brady=22?= , Cojocaru Alexandru Message-ID: <1473946386.356472.1367996093727.open-xchange@email.1und1.de> In-Reply-To: <518906F8.4050906@draigBrady.com> References: <20121209112805.6a62bbcf3fdd374d3960d2fa@gmx.com> <87fw3fosz4.fsf@rho.meyering.net> <20121211152436.ada365e55fa617e0a41255e8@gmx.com> <517AA745.3070506@draigBrady.com> <517C8091.7090106@draigBrady.com> <20130428134409.8abec330a4ecba67b78f4bc0@gmx.com> <517D2C6F.8040006@draigBrady.com> <20130428201433.1720fe6d54581458391fc6fe@gmx.com> <517D825E.3040308@draigBrady.com> <517DB7D8.8080900@draigBrady.com> <20130506205401.dbffe145eef3268152b9a4a5@gmx.com> <5188EF20.3080603@draigBrady.com> <518906F8.4050906@draigBrady.com> Subject: Re: bug#13127: [PATCH] cut: use only one data strucutre MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Priority: 3 Importance: Medium X-Mailer: Open-Xchange Mailer v- X-Provags-ID: V02:K0:+kEwhNE/0j9A39Yjjw9GZZjIXJZNaii+ksfamqKdMOM gk4TSkO1iFhHSr6qm+i/YJaCyXh3F7KfrBv/0W+NoaXkhgh0W+ wlGUX7z/ISryW/ND+cDbKXJYtQY1Tj/7kAyAcxBnSXmsZIkCll UDoTzjhi/f3CHZtIOaG1AaPve8jlri4Ot70/eci4Z4Yk9jBaXE 8NaWXpVi1nYbH/hrLWYdH2al+yS+pa25YSVXg0IKNoBuwl30bf qzKW0Gx/8dNIvzCd3w3QkVnyz5sdfGNDWjU5IomPVyIilP/LGg 8xwqJx7oBfBop7SHlmAA8qCki0AivrbqiTEOaI3oEDRqnDm8+o OiJ/9Lm+Byc9CLj33XT4WfsLvfwyLBRQ7HahLK1bGPAe6MtJe1 3yeJW9QPJrNpn53NRn5TbLmfkN0Y9yhYfo= X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 13127 Cc: 13127@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: Bernhard Voelker List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: debbugs-submit-bounces@debbugs.gnu.org Errors-To: debbugs-submit-bounces@debbugs.gnu.org X-Spam-Score: -1.9 (-) > On May 7, 2013 at 3:51 PM P=C3=A1draig Brady wrote: > I'll apply the attached in a little while. +1 Have a nice day, Berny From unknown Sun Jun 22 00:29:45 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Wed, 05 Jun 2013 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator