From unknown Sun Jun 15 08:44:24 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#17328 <17328@debbugs.gnu.org> To: bug#17328 <17328@debbugs.gnu.org> Subject: Status: dfa.c will fail if used on more than one DFA Reply-To: bug#17328 <17328@debbugs.gnu.org> Date: Sun, 15 Jun 2025 15:44:24 +0000 retitle 17328 dfa.c will fail if used on more than one DFA reassign 17328 grep submitter 17328 Aharon Robbins severity 17328 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 23 14:36:46 2014 Received: (at submit) by debbugs.gnu.org; 23 Apr 2014 18:36:47 +0000 Received: from localhost ([127.0.0.1]:56437 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wd22I-00061Y-Dm for submit@debbugs.gnu.org; Wed, 23 Apr 2014 14:36:46 -0400 Received: from eggs.gnu.org ([208.118.235.92]:46009) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wd22G-00061R-Ju for submit@debbugs.gnu.org; Wed, 23 Apr 2014 14:36:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Wd226-00053Y-Fx for submit@debbugs.gnu.org; Wed, 23 Apr 2014 14:36:44 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_MANY_HDRS_LCASE autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:45739) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wd226-00053S-Da for submit@debbugs.gnu.org; Wed, 23 Apr 2014 14:36:34 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47688) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wd21y-0007o0-JK for bug-grep@gnu.org; Wed, 23 Apr 2014 14:36:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Wd21r-0004dP-3B for bug-grep@gnu.org; Wed, 23 Apr 2014 14:36:26 -0400 Received: from mxout1.netvision.net.il ([194.90.9.20]:61771) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Wd21q-0004aD-Qt for bug-grep@gnu.org; Wed, 23 Apr 2014 14:36:19 -0400 MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from skeeve.com ([89.139.11.172]) by mxout1.netvision.net.il (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTPS id <0N4H00ABTYCCFHJ0@mxout1.netvision.net.il> for bug-grep@gnu.org; Wed, 23 Apr 2014 21:36:14 +0300 (IDT) Received: from skeeve.com (skeeve.com [127.0.0.1]) by skeeve.com (8.14.4/8.14.4/Debian-2ubuntu2.1) with ESMTP id s3NIaBgh015503 for ; Wed, 23 Apr 2014 21:36:11 +0300 Received: (from arnold@localhost) by skeeve.com (8.14.4/8.14.4/Submit) id s3NIaBx7015502 for bug-grep@gnu.org; Wed, 23 Apr 2014 21:36:11 +0300 From: Aharon Robbins Message-id: <201404231836.s3NIaBx7015502@skeeve.com> Date: Wed, 23 Apr 2014 21:36:11 +0300 To: bug-grep@gnu.org Subject: dfa.c will fail if used on more than one DFA User-Agent: Heirloom mailx 12.5 6/20/10 X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) Hello. There is a built-in assumption that the routines in dfa.c will only be used on one struct dfa. This is true for grep but not true for gawk. I found this when trying to update gawk's dfa with all the changes that have been coming through. The patch is below. Thanks, Arnold ----------------------- diff --git a/src/dfa.c b/src/dfa.c index 65fc03d..85ab9bd 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -3244,15 +3244,10 @@ dfaexec (struct dfa *d, char const *begin, char *end, if (d->mb_cur_max > 1) { - static bool mb_alloc = false; memset (&mbs, 0, sizeof (mbstate_t)); - if (!mb_alloc) - { - d->mb_match_lens = xnmalloc (d->nleaves, sizeof *d->mb_match_lens); - d->mb_follows = xmalloc (sizeof *d->mb_follows); - alloc_position_set (d->mb_follows, d->nleaves); - mb_alloc = true; - } + d->mb_match_lens = xnmalloc (d->nleaves, sizeof *d->mb_match_lens); + d->mb_follows = xmalloc (sizeof *d->mb_follows); + alloc_position_set (d->mb_follows, d->nleaves); } for (;;) @@ -3434,8 +3429,13 @@ free_mbdata (struct dfa *d) { free (d->mb_follows->elems); free (d->mb_follows); + d->mb_follows = NULL; + } + if (d->mb_match_lens) + { + free (d->mb_match_lens); + d->mb_match_lens = NULL; } - free (d->mb_match_lens); } /* Initialize the components of a dfa that the other routines don't From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 23 15:49:41 2014 Received: (at 17328-done) by debbugs.gnu.org; 23 Apr 2014 19:49:41 +0000 Received: from localhost ([127.0.0.1]:56457 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wd3Aq-0008F1-Da for submit@debbugs.gnu.org; Wed, 23 Apr 2014 15:49:41 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:59036) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wd3Am-0008Ek-8j for 17328-done@debbugs.gnu.org; Wed, 23 Apr 2014 15:49:37 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id DB6A039E8014; Wed, 23 Apr 2014 12:49:34 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id W1ZNojpNonBa; Wed, 23 Apr 2014 12:49:30 -0700 (PDT) Received: from penguin.cs.ucla.edu (Penguin.CS.UCLA.EDU [131.179.64.200]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 16ED9A60001; Wed, 23 Apr 2014 12:49:30 -0700 (PDT) Message-ID: <53581949.9050008@cs.ucla.edu> Date: Wed, 23 Apr 2014 12:49:29 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Aharon Robbins , 17328-done@debbugs.gnu.org Subject: Re: bug#17328: dfa.c will fail if used on more than one DFA References: <201404231836.s3NIaBx7015502@skeeve.com> In-Reply-To: <201404231836.s3NIaBx7015502@skeeve.com> Content-Type: multipart/mixed; boundary="------------040304070609080308010703" X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: 17328-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) This is a multi-part message in MIME format. --------------040304070609080308010703 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Thanks for reporting that. That static variable was the result of a recent optimization. I guess we'll need to optimize in a different way. I noticed another static variable that dfaexec also uses, namely 'mbs'; this needs to be moved into the struct dfa, for the benefit of any applications that really need stateful encodings. A bonus is that I expect this'll make the dfa code run a bit faster. I installed the attached patch, which I hope addresses the issues you raised along with the mbs issue. The code still needs more work in this area. There shouldn't be any static variables at all, even when parsing, though this would change the API. And the code isn't consistent about referring to dfa->mb_cur_max versus MB_CUR_MAX; not sure why that is. --------------040304070609080308010703 Content-Type: text/x-patch; name="0001-dfa-omit-static-variables-that-limited-dfaexec-to-on.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-dfa-omit-static-variables-that-limited-dfaexec-to-on.pa"; filename*1="tch" >From 29a3b9008abc8581c80e6ac6f26403ee0a5f9b06 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 23 Apr 2014 12:43:52 -0700 Subject: [PATCH] dfa: omit static variables that limited dfaexec to one struct dfa Problem reported by Aharon Robbins in: http://bugs.gnu.org/17328 * src/dfa.c (struct dfa): New member mbs. mb_follows is now a position_set, not a pointer to one; this simplifies memory allocation. All uses changed. (mbs_to_wchar): Put DFA arg at the end, in place of the mbstate_t *arg, since the DFA now contains an mbstate_t. All uses changed. (mbs): Remove static variable. (dfaexec): Remove static bool that attempted to optimize memory allocation, as this wasn't correct for Gawk. Perhaps we can think of a better way to optimize memory. --- src/dfa.c | 76 ++++++++++++++++++++++++++------------------------------------- 1 file changed, 31 insertions(+), 45 deletions(-) diff --git a/src/dfa.c b/src/dfa.c index 65fc03d..42a9736 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -339,8 +339,9 @@ struct dfa with dfaparse. */ unsigned int mb_cur_max; /* Cached value of MB_CUR_MAX. */ token utf8_anychar_classes[5]; /* To lower ANYCHAR in UTF-8 locales. */ + mbstate_t mbs; /* Multibyte conversion state. */ - /* The following are used only if MB_CUR_MAX > 1. */ + /* The following are valid only if mb_cur_max > 1. */ /* The value of multibyte_prop[i] is defined by following rule. if tokens[i] < NOTCHAR @@ -422,7 +423,7 @@ struct dfa struct dfamust *musts; /* List of strings, at least one of which is known to appear in any r.e. matching the dfa. */ - position_set *mb_follows; /* Follow set added by ANYCHAR and/or MBCSET + position_set mb_follows; /* Follow set added by ANYCHAR and/or MBCSET on demand. */ int *mb_match_lens; /* Array of length reduced by ANYCHAR and/or MBCSET. */ @@ -462,33 +463,34 @@ dfambcache (struct dfa *d) } } -/* Given the dfa D, store into *PWC the result of converting the - leading bytes of the multibyte buffer S of length N bytes, updating - the conversion state in *MBS. On conversion error, convert just a - single byte as-is. Return the number of bytes converted. +/* Store into *PWC the result of converting the leading bytes of the + multibyte buffer S of length N bytes, using the mbrtowc_cache in *D + and updating the conversion state in *D. On conversion error, + convert just a single byte as-is. Return the number of bytes + converted. - This differs from mbrtowc (PWC, S, N, MBS) as follows: + This differs from mbrtowc (PWC, S, N, &D->mbs) as follows: - * Extra arg D, containing an mbrtowc_cache for speed. + * The last arg is a dfa *D instead of merely a multibyte conversion + state D->mbs. D also contains an mbrtowc_cache for speed. * N must be at least 1. * S[N - 1] must be a sentinel byte. * Shift encodings are not supported. * The return value is always in the range 1..N. - * *MBS is always valid afterwards. + * D->mbs is always valid afterwards. * *PWC is always set to something. */ static size_t -mbs_to_wchar (struct dfa *d, wchar_t *pwc, char const *s, size_t n, - mbstate_t *mbs) +mbs_to_wchar (wchar_t *pwc, char const *s, size_t n, struct dfa *d) { unsigned char uc = s[0]; wint_t wc = d->mbrtowc_cache[uc]; if (wc == WEOF) { - size_t nbytes = mbrtowc (pwc, s, n, mbs); + size_t nbytes = mbrtowc (pwc, s, n, &d->mbs); if (0 < nbytes && nbytes < (size_t) -2) return nbytes; - memset (mbs, 0, sizeof *mbs); + memset (&d->mbs, 0, sizeof d->mbs); wc = uc; } @@ -838,7 +840,6 @@ static int minrep, maxrep; /* Repeat counts for {m,n}. */ static int cur_mb_len = 1; /* Length of the multibyte representation of wctok. */ /* These variables are used only if (MB_CUR_MAX > 1). */ -static mbstate_t mbs; /* mbstate for mbrtowc. */ static wchar_t wctok; /* Wide character representation of the current multibyte character. */ @@ -856,7 +857,7 @@ static wchar_t wctok; /* Wide character representation of the current else \ { \ wchar_t _wc; \ - size_t nbytes = mbs_to_wchar (dfa, &_wc, lexptr, lexleft, &mbs); \ + size_t nbytes = mbs_to_wchar (&_wc, lexptr, lexleft, dfa); \ cur_mb_len = nbytes; \ (wc) = _wc; \ (c) = nbytes == 1 ? to_uchar (*lexptr) : EOF; \ @@ -1932,7 +1933,7 @@ dfaparse (char const *s, size_t len, struct dfa *d) if (MB_CUR_MAX > 1) { cur_mb_len = 0; - memset (&mbs, 0, sizeof mbs); + memset (&d->mbs, 0, sizeof d->mbs); } if (!syntax_bits_set) @@ -3112,8 +3113,7 @@ transit_state_consume_1char (struct dfa *d, state_num s, s2 = s1; rs = transit_state_singlebyte (d, s2, (*pp)++, &s1); } - /* Copy the positions contained by 's1' to the set 'd->mb_follows'. */ - copy (&(d->states[s1].elems), d->mb_follows); + copy (&d->states[s1].elems, &d->mb_follows); /* Add all of the positions which can be reached from 's' by consuming a single character. */ @@ -3123,7 +3123,7 @@ transit_state_consume_1char (struct dfa *d, state_num s, for (j = 0; j < d->follows[d->states[s].mbps.elems[i].index].nelem; j++) insert (d->follows[d->states[s].mbps.elems[i].index].elems[j], - d->mb_follows); + &d->mb_follows); } /* FIXME: this return value is always ignored. */ @@ -3151,7 +3151,7 @@ transit_state (struct dfa *d, state_num s, unsigned char const **pp, We check whether each of them can match or not. */ { /* Note: caller must free the return value of this function. */ - mbclen = mbs_to_wchar (d, &wc, (char const *) *pp, end - *pp, &mbs); + mbclen = mbs_to_wchar (&wc, (char const *) *pp, end - *pp, d); match_lens = check_matching_with_multibyte_ops (d, s, (char const *) *pp, wc, mbclen); @@ -3179,7 +3179,7 @@ transit_state (struct dfa *d, state_num s, unsigned char const **pp, } /* This state has some operators which can match a multibyte character. */ - d->mb_follows->nelem = 0; + d->mb_follows.nelem = 0; /* 'maxlen' may be longer than the length of a character, because it may not be a character but a (multi character) collating element. @@ -3187,12 +3187,12 @@ transit_state (struct dfa *d, state_num s, unsigned char const **pp, 'maxlen' bytes. */ transit_state_consume_1char (d, s, pp, wc, mbclen, match_lens); - s1 = state_index (d, d->mb_follows, wchar_context (wc)); + s1 = state_index (d, &d->mb_follows, wchar_context (wc)); realloc_trans_if_necessary (d, s1); while (*pp - p1 < maxlen) { - mbclen = mbs_to_wchar (d, &wc, (char const *) *pp, end - *pp, &mbs); + mbclen = mbs_to_wchar (&wc, (char const *) *pp, end - *pp, d); transit_state_consume_1char (d, s1, pp, wc, mbclen, NULL); for (i = 0; i < nelem; i++) @@ -3201,10 +3201,10 @@ transit_state (struct dfa *d, state_num s, unsigned char const **pp, for (j = 0; j < d->follows[d->states[s1].mbps.elems[i].index].nelem; j++) insert (d->follows[d->states[s1].mbps.elems[i].index].elems[j], - d->mb_follows); + &d->mb_follows); } - s1 = state_index (d, d->mb_follows, wchar_context (wc)); + s1 = state_index (d, &d->mb_follows, wchar_context (wc)); realloc_trans_if_necessary (d, s1); } return s1; @@ -3244,15 +3244,9 @@ dfaexec (struct dfa *d, char const *begin, char *end, if (d->mb_cur_max > 1) { - static bool mb_alloc = false; - memset (&mbs, 0, sizeof (mbstate_t)); - if (!mb_alloc) - { - d->mb_match_lens = xnmalloc (d->nleaves, sizeof *d->mb_match_lens); - d->mb_follows = xmalloc (sizeof *d->mb_follows); - alloc_position_set (d->mb_follows, d->nleaves); - mb_alloc = true; - } + memset (&d->mbs, 0, sizeof d->mbs); + d->mb_match_lens = xnmalloc (d->nleaves, sizeof *d->mb_match_lens); + alloc_position_set (&d->mb_follows, d->nleaves); } for (;;) @@ -3277,8 +3271,8 @@ dfaexec (struct dfa *d, char const *begin, char *end, character. */ wchar_t wc; while (mbp < p) - mbp += mbs_to_wchar (d, &wc, (char const *) mbp, - end - (char const *) mbp, &mbs); + mbp += mbs_to_wchar (&wc, (char const *) mbp, + end - (char const *) mbp, d); p = mbp; if ((char *) p >= end) @@ -3407,7 +3401,6 @@ free_mbdata (struct dfa *d) size_t i; free (d->multibyte_prop); - d->multibyte_prop = NULL; for (i = 0; i < d->nmbcsets; ++i) { @@ -3427,14 +3420,7 @@ free_mbdata (struct dfa *d) } free (d->mbcsets); - d->mbcsets = NULL; - d->nmbcsets = 0; - - if (d->mb_follows) - { - free (d->mb_follows->elems); - free (d->mb_follows); - } + free (d->mb_follows.elems); free (d->mb_match_lens); } -- 1.9.0 --------------040304070609080308010703-- From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 23 18:47:19 2014 Received: (at 17328-done) by debbugs.gnu.org; 23 Apr 2014 22:47:19 +0000 Received: from localhost ([127.0.0.1]:56592 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wd5wl-0004gh-4m for submit@debbugs.gnu.org; Wed, 23 Apr 2014 18:47:19 -0400 Received: from ipmail04.adl6.internode.on.net ([150.101.137.141]:37194) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wd5wi-0004gW-1A for 17328-done@debbugs.gnu.org; Wed, 23 Apr 2014 18:47:17 -0400 Received: from ppp14-2-47-72.lns21.adl2.internode.on.net (HELO [192.168.1.1]) ([14.2.47.72]) by ipmail04.adl6.internode.on.net with ESMTP; 24 Apr 2014 08:17:13 +0930 Message-ID: <535842F0.6060506@grouse.com.au> Date: Thu, 24 Apr 2014 08:17:12 +0930 From: behoffski User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: 17328-done@debbugs.gnu.org Subject: Re: bug#17328: dfa.c will fail if used on more than one DFA References: <201404231836.s3NIaBx7015502@skeeve.com> <53581949.9050008@cs.ucla.edu> In-Reply-To: <53581949.9050008@cs.ucla.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 17328-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.0 (/) On 04/24/14 05:19, Paul Eggert wrote: > Thanks for reporting that. That static variable was the result of a recent optimization. I guess we'll need to optimize in a different way. I noticed another static variable that dfaexec also uses, namely 'mbs'; this needs to be moved into the struct dfa, for the benefit of any applications that really need stateful encodings. A bonus is that I expect this'll make the dfa code run a bit faster. I installed the attached patch, which I hope addresses the issues you raised along with the mbs issue. > > The code still needs more work in this area. There shouldn't be any static variables at all, even when parsing, though this would change the API. And the code isn't consistent about referring to dfa->mb_cur_max versus MB_CUR_MAX; not sure why that is. The modules created by my "untangle" script are working towards this goal: Almost all code in fsalex, fsaparse, fsatoken and fsamusts have *no* static variables (I found a couple of exceptions, e.g. fsalex has an initialise-once-only static variable in function using_simple_locale, but this is easily fixed). The context is explicit in the API: An opaque pointer to outsiders, but expanding to a struct pointer internally. Charclass has one hidden static pointer (a list of pools of classes), but is explicitly set up to allow multiple lexers/parsers etc to exist in parallel. This list is not protected by mutexes, and so is not thread-safe, but otherwise is capable of handling multiple clients in parallel (including reusing charclasses across clients). Fsalex goes even further, and tries to set itself up to be correct even if the locale is changed later: It explicitly mines (part of) the current locale for information when fsalex_syntax () is called, and uses that thereafter. Clients dealing with token streams from fsalex, such as fsaparse, need to depend on the locale snapshot of the lexer in order to make correct decisions: An explicit API that I omitted to mention earlier, "proto-lexparse.h", lays out the protocol for information exchange between any lexer and any parser in a module-agnostic fashion. The interface is incomplete, but currently includes the following opcodes and a generic function to exchange information: typedef enum proto_lexparse_opcode_enum { PROTO_LEXPARSE_OP_GET_LOCALE, PROTO_LEXPARSE_OP_GET_IS_MULTIBYTE_ENV, PROTO_LEXPARSE_OP_GET_REPMN_MIN, PROTO_LEXPARSE_OP_GET_REPMN_MAX, PROTO_LEXPARSE_OP_GET_WIDE_CHAR, PROTO_LEXPARSE_OP_GET_DOTCLASS, } proto_lexparse_opcode_t; typedef int proto_lexparse_exchange_fn_t (void *lexer_context, proto_lexparse_opcode_t opcode, void *parameter); My long-term hope is that the need for this "exchange" function would dwindle when the token stream delivered by the lexer is reworked to deliver information more directly that it does at present, but this is a non-trivial job, and can wait until later. At present, the untangle code focuses on the tokens/class/lex/parse/musts code, and does not try to break out the remaining high-level dfa code at present. Having a "dfa->" context reference for the remaining code should be easy, as this code is already set up for multiple instances in the original. -------- I will rebase the untangle script sometime in the next week; I'm waiting for things to settle down a bit before undertaking this work, as rebasing the script, while automated to a fair extent, is still a considerable effort. As the gap between my reworking of the code and the current dfa.c code grows, the effort needed to analyse and possibly reinterpret modified code in a different form to fit into the "untangle"d layout grows. Finally, of course, I've been laying low, trying to let people sort out the next release without having the untangle code as an immediate distraction. I've broken my silence here, as the comments regarding both performance and the desire to eliminate static variables align very closely with the effort I've already expended writing "untangle", cheers, behoffski (Brenton Hoff) Programmer, Grouse Software From debbugs-submit-bounces@debbugs.gnu.org Wed Apr 23 20:16:23 2014 Received: (at 17328) by debbugs.gnu.org; 24 Apr 2014 00:16:23 +0000 Received: from localhost ([127.0.0.1]:56628 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wd7Kw-0006yH-Oo for submit@debbugs.gnu.org; Wed, 23 Apr 2014 20:16:23 -0400 Received: from mailgw01.kcn.ne.jp ([61.86.7.208]:50508) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1Wd7Kt-0006y4-JC for 17328@debbugs.gnu.org; Wed, 23 Apr 2014 20:16:20 -0400 Received: from imp01 (mailgw5.kcn.ne.jp [61.86.15.231]) by mailgw01.kcn.ne.jp (Postfix) with ESMTP id 1820F8029D for <17328@debbugs.gnu.org>; Thu, 24 Apr 2014 09:16:18 +0900 (JST) Received: from mail04.kcn.ne.jp ([61.86.6.183]) by imp01 with bizsmtp id tcGJ1n0013wvxAM01cGJDv; Thu, 24 Apr 2014 09:16:18 +0900 X-OrgRCPT: 17328@debbugs.gnu.org Received: from [10.120.1.25] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail04.kcn.ne.jp (Postfix) with ESMTPA id C651A1290022; Thu, 24 Apr 2014 09:16:17 +0900 (JST) Date: Thu, 24 Apr 2014 09:16:19 +0900 From: Norihiro Tanaka To: 17328@debbugs.gnu.org, eggert@cs.ucla.edu, arnold@skeeve.com Subject: bug#17328: dfa.c will fail if used on more than one DFA In-Reply-To: <53581949.9050008@cs.ucla.edu> References: <201404231836.s3NIaBx7015502@skeeve.com> <53581949.9050008@cs.ucla.edu> Message-Id: <20140424091618.6B4A.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5358566B000000006BB8_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.65.07 [ja] X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 17328 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) --------_5358566B000000006BB8_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit I confirmed memory leak by below after omit-static-variables. $ yes abcdabc | head -50000000 >k $ env LC_ALL=ja_JP.eucJP src/grep -v abcd.bc k I send the patch to fix it. --------_5358566B000000006BB8_MULTIPART_MIXED_ Content-Type: text/plain; charset="UTF-8"; name="patch.txt" Content-Disposition: attachment; filename="patch.txt" Content-Transfer-Encoding: base64 RnJvbSA0NTQ2N2JlYTFmMjZhOTE2NjA4Mzg3MjI1YTMzOWQ4ZGQwMjE2ZWNlIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBUaHUsIDI0IEFwciAyMDE0IDA4OjI0OjM0ICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZGZh OiBmaXggbWVtb3J5IGxlYWsgYWZ0ZXIgb21pdC1zdGF0aWMtdmFyaWFibGVzCgpzcmMvZGZhLmMg KGRmYWV4ZWMpOiBGaXggbWVtb3J5IGxlYWsgYWZ0ZXIgb21pdC1zdGF0aWMtdmFyaWFibGVzLgot LS0KIHNyYy9kZmEuYyB8IDIwICsrKysrKysrKysrKysrKystLS0tCiAxIGZpbGUgY2hhbmdlZCwg MTYgaW5zZXJ0aW9ucygrKSwgNCBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9zcmMvZGZhLmMg Yi9zcmMvZGZhLmMKaW5kZXggNDJhOTczNi4uNTM5YzI3NiAxMDA2NDQKLS0tIGEvc3JjL2RmYS5j CisrKyBiL3NyYy9kZmEuYwpAQCAtNDI3LDYgKzQyNyw4IEBAIHN0cnVjdCBkZmEKICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgb24gZGVtYW5kLiAgKi8KICAgaW50ICptYl9tYXRj aF9sZW5zOyAgICAgICAgICAgLyogQXJyYXkgb2YgbGVuZ3RoIHJlZHVjZWQgYnkgQU5ZQ0hBUiBh bmQvb3IKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgTUJDU0VULiAgKi8KKyAg Ym9vbCBtYl9hbGxvYzsgICAgICAgICAgICAgICAgLyogVHJ1ZSBpZiBtYl9tYXRjaF9sZW5zIGFu ZCBlbGVtZW50cyBvZgorICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBtYl9mb2xs b3dzIGlzIGFsbG9jYXRlZC4gICovCiB9OwogCiAvKiBTb21lIG1hY3JvcyBmb3IgdXNlciBhY2Nl c3MgdG8gZGZhIGludGVybmFscy4gICovCkBAIC0zMjQ1LDggKzMyNDcsMTMgQEAgZGZhZXhlYyAo c3RydWN0IGRmYSAqZCwgY2hhciBjb25zdCAqYmVnaW4sIGNoYXIgKmVuZCwKICAgaWYgKGQtPm1i X2N1cl9tYXggPiAxKQogICAgIHsKICAgICAgIG1lbXNldCAoJmQtPm1icywgMCwgc2l6ZW9mIGQt Pm1icyk7Ci0gICAgICBkLT5tYl9tYXRjaF9sZW5zID0geG5tYWxsb2MgKGQtPm5sZWF2ZXMsIHNp emVvZiAqZC0+bWJfbWF0Y2hfbGVucyk7Ci0gICAgICBhbGxvY19wb3NpdGlvbl9zZXQgKCZkLT5t Yl9mb2xsb3dzLCBkLT5ubGVhdmVzKTsKKworICAgICAgaWYgKCFkLT5tYl9hbGxvYykKKyAgICAg ICAgeworICAgICAgICAgIGQtPm1iX21hdGNoX2xlbnMgPSB4bm1hbGxvYyAoZC0+bmxlYXZlcywg c2l6ZW9mICpkLT5tYl9tYXRjaF9sZW5zKTsKKyAgICAgICAgICBhbGxvY19wb3NpdGlvbl9zZXQg KCZkLT5tYl9mb2xsb3dzLCBkLT5ubGVhdmVzKTsKKyAgICAgICAgICBkLT5tYl9hbGxvYyA9IHRy dWU7CisgICAgICAgIH0KICAgICB9CiAKICAgZm9yICg7OykKQEAgLTM0MjAsOCArMzQyNywxMyBA QCBmcmVlX21iZGF0YSAoc3RydWN0IGRmYSAqZCkKICAgICB9CiAKICAgZnJlZSAoZC0+bWJjc2V0 cyk7Ci0gIGZyZWUgKGQtPm1iX2ZvbGxvd3MuZWxlbXMpOwotICBmcmVlIChkLT5tYl9tYXRjaF9s ZW5zKTsKKworICBpZiAoZC0+bWJfYWxsb2MpCisgICAgeworICAgICAgZnJlZSAoZC0+bWJfZm9s bG93cy5lbGVtcyk7CisgICAgICBmcmVlIChkLT5tYl9tYXRjaF9sZW5zKTsKKyAgICAgIGQtPm1i X2FsbG9jID0gZmFsc2U7CisgICAgfQogfQogCiAvKiBJbml0aWFsaXplIHRoZSBjb21wb25lbnRz IG9mIGEgZGZhIHRoYXQgdGhlIG90aGVyIHJvdXRpbmVzIGRvbid0Ci0tIAoxLjkuMgoK --------_5358566B000000006BB8_MULTIPART_MIXED_-- From debbugs-submit-bounces@debbugs.gnu.org Thu Apr 24 02:23:28 2014 Received: (at 17328) by debbugs.gnu.org; 24 Apr 2014 06:23:28 +0000 Received: from localhost ([127.0.0.1]:56797 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WdD4C-0002Oy-10 for submit@debbugs.gnu.org; Thu, 24 Apr 2014 02:23:28 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:56817) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WdD49-0002Ok-1H for 17328@debbugs.gnu.org; Thu, 24 Apr 2014 02:23:25 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 80D34A60001; Wed, 23 Apr 2014 23:23:24 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vU67JfYvLDtX; Wed, 23 Apr 2014 23:23:20 -0700 (PDT) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 5162339E8012; Wed, 23 Apr 2014 23:23:20 -0700 (PDT) Message-ID: <5358ADD8.9020805@cs.ucla.edu> Date: Wed, 23 Apr 2014 23:23:20 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Norihiro Tanaka , 17328@debbugs.gnu.org, arnold@skeeve.com Subject: Re: bug#17328: dfa.c will fail if used on more than one DFA References: <201404231836.s3NIaBx7015502@skeeve.com> <53581949.9050008@cs.ucla.edu> <20140424091618.6B4A.27F6AC2D@kcn.ne.jp> In-Reply-To: <20140424091618.6B4A.27F6AC2D@kcn.ne.jp> Content-Type: multipart/mixed; boundary="------------050907030100040805090404" X-Spam-Score: -3.0 (---) X-Debbugs-Envelope-To: 17328 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) This is a multi-part message in MIME format. --------------050907030100040805090404 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Thanks. We can improve on that a bit, by using one of the pointers to indicate whether the storage is allocated, instead of requiring an extra boolean for this information. I pushed the attached patch. --------------050907030100040805090404 Content-Type: text/plain; charset=UTF-8; name="0001-dfa-fix-memory-leak-reintroduced-by-previous-patch.patch" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename*0="0001-dfa-fix-memory-leak-reintroduced-by-previous-patch.patc"; filename*1="h" RnJvbSBjNWE4M2RiYjk1OTZjOThhMjYwYWY5YjU0NWZjNjRhNTdiNTliZTYwIE1vbiBTZXAg MTcgMDA6MDA6MDAgMjAwMQpGcm9tOiBQYXVsIEVnZ2VydCA8ZWdnZXJ0QGNzLnVjbGEuZWR1 PgpEYXRlOiBXZWQsIDIzIEFwciAyMDE0IDIzOjIwOjM1IC0wNzAwClN1YmplY3Q6IFtQQVRD SF0gZGZhOiBmaXggbWVtb3J5IGxlYWsgcmVpbnRyb2R1Y2VkIGJ5IHByZXZpb3VzIHBhdGNo CgpSZXBvcnRlZCBieSBOb3JpaGlybyBUYW5ha2EgaW4gPGh0dHA6Ly9idWdzLmdudS5vcmcv MTczMjgjMTY+LgoqIHNyYy9kZmEuYyAoZGZhZXhlYyk6IEFsbG9jYXRlIG1iX21hdGNoX2xl bnMgYW5kIG1iX2ZvbGxvd3Mgb25seQppZiBub3QgYWxyZWFkeSBhbGxvY2F0ZWQuCihmcmVl X21iZGF0YSk6IE51bGwgb3V0IG1iX21hdGNoX2xlbnMgdG8gbWFyayBpdCBhcyBiZWluZyBm cmVlZC4KLS0tCiBzcmMvZGZhLmMgfCAxMSArKysrKysrKy0tLQogMSBmaWxlIGNoYW5nZWQs IDggaW5zZXJ0aW9ucygrKSwgMyBkZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9zcmMvZGZh LmMgYi9zcmMvZGZhLmMKaW5kZXggNDJhOTczNi4uNWRjMGYwOSAxMDA2NDQKLS0tIGEvc3Jj L2RmYS5jCisrKyBiL3NyYy9kZmEuYwpAQCAtNDI2LDcgKzQyNiw4IEBAIHN0cnVjdCBkZmEK ICAgcG9zaXRpb25fc2V0IG1iX2ZvbGxvd3M7CS8qIEZvbGxvdyBzZXQgYWRkZWQgYnkgQU5Z Q0hBUiBhbmQvb3IgTUJDU0VUCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAg IG9uIGRlbWFuZC4gICovCiAgIGludCAqbWJfbWF0Y2hfbGVuczsgICAgICAgICAgIC8qIEFy cmF5IG9mIGxlbmd0aCByZWR1Y2VkIGJ5IEFOWUNIQVIgYW5kL29yCi0gICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgIE1CQ1NFVC4gICovCisgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgIE1CQ1NFVC4gIE51bGwgaWYgbWJfZm9sbG93cy5lbGVtcyBoYXMg bm90CisgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGJlZW4gYWxsb2NhdGVk LiAgKi8KIH07CiAKIC8qIFNvbWUgbWFjcm9zIGZvciB1c2VyIGFjY2VzcyB0byBkZmEgaW50 ZXJuYWxzLiAgKi8KQEAgLTMyNDUsOCArMzI0NiwxMSBAQCBkZmFleGVjIChzdHJ1Y3QgZGZh ICpkLCBjaGFyIGNvbnN0ICpiZWdpbiwgY2hhciAqZW5kLAogICBpZiAoZC0+bWJfY3VyX21h eCA+IDEpCiAgICAgewogICAgICAgbWVtc2V0ICgmZC0+bWJzLCAwLCBzaXplb2YgZC0+bWJz KTsKLSAgICAgIGQtPm1iX21hdGNoX2xlbnMgPSB4bm1hbGxvYyAoZC0+bmxlYXZlcywgc2l6 ZW9mICpkLT5tYl9tYXRjaF9sZW5zKTsKLSAgICAgIGFsbG9jX3Bvc2l0aW9uX3NldCAoJmQt Pm1iX2ZvbGxvd3MsIGQtPm5sZWF2ZXMpOworICAgICAgaWYgKCEgZC0+bWJfbWF0Y2hfbGVu cykKKyAgICAgICAgeworICAgICAgICAgIGQtPm1iX21hdGNoX2xlbnMgPSB4bm1hbGxvYyAo ZC0+bmxlYXZlcywgc2l6ZW9mICpkLT5tYl9tYXRjaF9sZW5zKTsKKyAgICAgICAgICBhbGxv Y19wb3NpdGlvbl9zZXQgKCZkLT5tYl9mb2xsb3dzLCBkLT5ubGVhdmVzKTsKKyAgICAgICAg fQogICAgIH0KIAogICBmb3IgKDs7KQpAQCAtMzQyMiw2ICszNDI2LDcgQEAgZnJlZV9tYmRh dGEgKHN0cnVjdCBkZmEgKmQpCiAgIGZyZWUgKGQtPm1iY3NldHMpOwogICBmcmVlIChkLT5t Yl9mb2xsb3dzLmVsZW1zKTsKICAgZnJlZSAoZC0+bWJfbWF0Y2hfbGVucyk7CisgIGQtPm1i X21hdGNoX2xlbnMgPSBOVUxMOwogfQogCiAvKiBJbml0aWFsaXplIHRoZSBjb21wb25lbnRz IG9mIGEgZGZhIHRoYXQgdGhlIG90aGVyIHJvdXRpbmVzIGRvbid0Ci0tIAoxLjkuMAoK --------------050907030100040805090404-- From unknown Sun Jun 15 08:44:24 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 22 May 2014 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator