From debbugs-submit-bounces@debbugs.gnu.org Mon Mar 24 02:06:58 2014 Received: (at submit) by debbugs.gnu.org; 24 Mar 2014 06:06:58 +0000 Received: from localhost ([127.0.0.1]:46630 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WRy2D-0003Wa-Hd for submit@debbugs.gnu.org; Mon, 24 Mar 2014 02:06:57 -0400 Received: from eggs.gnu.org ([208.118.235.92]:50773) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WRy2A-0003WH-5J for submit@debbugs.gnu.org; Mon, 24 Mar 2014 02:06:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WRy1w-0007fc-HA for submit@debbugs.gnu.org; Mon, 24 Mar 2014 02:06:48 -0400 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_40 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:38865) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WRy1w-0007fY-E0 for submit@debbugs.gnu.org; Mon, 24 Mar 2014 02:06:40 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52468) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WRy1q-0003fQ-CS for bug-grep@gnu.org; Mon, 24 Mar 2014 02:06:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WRy1k-0007d0-AL for bug-grep@gnu.org; Mon, 24 Mar 2014 02:06:34 -0400 Received: from kiwi.cs.ucla.edu ([131.179.128.19]:53522) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WRy1k-0007cm-1r for bug-grep@gnu.org; Mon, 24 Mar 2014 02:06:28 -0400 Received: from kiwi.cs.ucla.edu (localhost.cs.ucla.edu [127.0.0.1]) by kiwi.cs.ucla.edu (8.14.5+Sun/8.14.5/UCLACS-6.0) with ESMTP id s2O66Qwc004593 for ; Sun, 23 Mar 2014 23:06:26 -0700 (PDT) Received: (from eggert@localhost) by kiwi.cs.ucla.edu (8.14.5+Sun/8.14.5/Submit) id s2O66Qcs004592 for bug-grep@gnu.org; Sun, 23 Mar 2014 23:06:26 -0700 (PDT) Message-Id: <201403240606.s2O66Qcs004592@kiwi.cs.ucla.edu> From: Paul Eggert Date: Sun, 23 Mar 2014 23:04:26 -0700 Subject: [PATCH] dfa: avoid undefined behavior To: bug-grep@gnu.org X-detected-operating-system: by eggs.gnu.org: Solaris 10 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) * src/dfa.c (FETCH_WC, addtok_wc): Don't rely on undefined behavior when converting an out-of-range value to 'int'. (FETCH_WC, prepare_wc_buf): Don't rely on conversion state after mbrtowc returns a special value, as it's undefined for (size_t) -1. (prepare_wc_buf): Simplify test for valid character. --- src/dfa.c | 42 ++++++++++++++++++++++++------------------ 1 file changed, 24 insertions(+), 18 deletions(-) diff --git a/src/dfa.c b/src/dfa.c index 92ac1b9..0a2b8b8 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -807,7 +807,7 @@ static int minrep, maxrep; /* Repeat counts for {m,n}. */ static int cur_mb_len = 1; /* Length of the multibyte representation of wctok. */ /* These variables are used only if (MB_CUR_MAX > 1). */ -static mbstate_t mbs; /* Mbstate for mbrlen. */ +static mbstate_t mbs; /* mbstate for mbrtowc. */ static wchar_t wctok; /* Wide character representation of the current multibyte character. */ static unsigned char *mblen_buf;/* Correspond to the input buffer in dfaexec. @@ -844,15 +844,18 @@ static unsigned char const *buf_end; /* reference to end in dfaexec. */ else \ { \ wchar_t _wc; \ - cur_mb_len = mbrtowc (&_wc, lexptr, lexleft, &mbs); \ - if (cur_mb_len <= 0) \ + size_t nbytes = mbrtowc (&_wc, lexptr, lexleft, &mbs); \ + bool valid_char = 1 <= nbytes && nbytes < (size_t) -2; \ + if (! valid_char) \ { \ + memset (&mbs, 0, sizeof mbs); \ cur_mb_len = 1; \ --lexleft; \ (wc) = (c) = to_uchar (*lexptr++); \ } \ else \ { \ + cur_mb_len = nbytes; \ lexptr += cur_mb_len; \ lexleft -= cur_mb_len; \ (wc) = _wc; \ @@ -1685,16 +1688,19 @@ static void addtok_wc (wint_t wc) { unsigned char buf[MB_LEN_MAX]; - mbstate_t s; + mbstate_t s = { 0 }; int i; - memset (&s, 0, sizeof s); - cur_mb_len = wcrtomb ((char *) buf, wc, &s); + size_t stored_bytes = wcrtomb ((char *) buf, wc, &s); - /* This is merely stop-gap. When cur_mb_len is 0 or negative, - buf[0] is undefined, yet skipping the addtok_mb call altogether - can result in heap corruption. */ - if (cur_mb_len <= 0) - buf[0] = 0; + if (stored_bytes != (size_t) -1) + cur_mb_len = stored_bytes; + else + { + /* This is merely stop-gap. buf[0] is undefined, yet skipping + the addtok_mb call altogether can corrupt the heap. */ + cur_mb_len = 1; + buf[0] = 0; + } addtok_mb (buf[0], cur_mb_len == 1 ? 3 : 1); for (i = 1; i < cur_mb_len; i++) @@ -3328,13 +3334,13 @@ prepare_wc_buf (const char *begin, const char *end) { if (remain_bytes == 0) { - remain_bytes + size_t nbytes = mbrtowc (inputwcs + i, begin + i, end - begin - i + 1, &mbs); - if (remain_bytes < 1 - || remain_bytes == (size_t) -1 - || remain_bytes == (size_t) -2 - || (remain_bytes == 1 && inputwcs[i] == (wchar_t) begin[i])) + if (! (1 <= nbytes && nbytes < (size_t) -2) + || (nbytes == 1 && inputwcs[i] == (wchar_t) begin[i])) { + if ((size_t) -2 <= nbytes) + memset (&mbs, 0, sizeof mbs); remain_bytes = 0; inputwcs[i] = (wchar_t) begin[i]; mblen_buf[i] = 0; @@ -3343,8 +3349,8 @@ prepare_wc_buf (const char *begin, const char *end) } else { - mblen_buf[i] = remain_bytes; - remain_bytes--; + mblen_buf[i] = nbytes; + remain_bytes = nbytes - 1; } } else -- 1.8.5.3 From debbugs-submit-bounces@debbugs.gnu.org Mon Mar 24 02:08:30 2014 Received: (at control) by debbugs.gnu.org; 24 Mar 2014 06:08:30 +0000 Received: from localhost ([127.0.0.1]:46635 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WRy3i-0003ZZ-Ak for submit@debbugs.gnu.org; Mon, 24 Mar 2014 02:08:30 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:42608) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WRy3f-0003ZQ-SD for control@debbugs.gnu.org; Mon, 24 Mar 2014 02:08:28 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 657C539E8011 for ; Sun, 23 Mar 2014 23:08:27 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QQ492NxEl4r9 for ; Sun, 23 Mar 2014 23:08:27 -0700 (PDT) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 0010C39E8008 for ; Sun, 23 Mar 2014 23:08:26 -0700 (PDT) Message-ID: <532FCBDA.7000408@cs.ucla.edu> Date: Sun, 23 Mar 2014 23:08:26 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: control@debbugs.gnu.org Subject: 17081 is done Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.8 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.8 (--) tags 17081 + patch close 17081 thanks From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 01 04:27:21 2014 Received: (at 17081) by debbugs.gnu.org; 1 Apr 2014 08:27:21 +0000 Received: from localhost ([127.0.0.1]:58688 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WUu2S-0003mi-I8 for submit@debbugs.gnu.org; Tue, 01 Apr 2014 04:27:20 -0400 Received: from mail-we0-f171.google.com ([74.125.82.171]:53048) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WUu2Q-0003mQ-8a for 17081@debbugs.gnu.org; Tue, 01 Apr 2014 04:27:19 -0400 Received: by mail-we0-f171.google.com with SMTP id t61so5935757wes.16 for <17081@debbugs.gnu.org>; Tue, 01 Apr 2014 01:27:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=3uP2cmadbhxGjR8VsSoq8PBxxxSIVPvEIUEKlPlDb1Y=; b=rfHnBbQNaoivmXgv7fiJg+lNc8aowELjkm5TJ3sLSJJkijsgvyrbzeSMDjzlNWsB6E RwJkg21JeP7GXLUUwUWS2rjvG2dHqr7W35PGY1zzyzMNi2+T1/nxv7XcrfTLXIawzHO6 UBM7w9VChqjEDL6J+Gsml+OUvhWg7w0AAEM77OLjdPe+WeCQr/0CkpacDzjF6y1Hw8Zy Uua56pmc8j2V1+39gIPYB+6Jjc0zTeR+dOimcbOcRcpnw4RKOpU3FU/FQAweIh+Iyg5f cARCFml4bCNhmZqTyKw8G6xUyyfYilWXS08yZmBYxQYhb75iUDcIwlA1aQFdz9k5M9Qy RHVA== X-Received: by 10.181.8.66 with SMTP id di2mr18213767wid.43.1396340837274; Tue, 01 Apr 2014 01:27:17 -0700 (PDT) Received: from yakj.usersys.redhat.com (net-37-117-156-129.cust.vodafonedsl.it. [37.117.156.129]) by mx.google.com with ESMTPSA id m42sm38893549eex.21.2014.04.01.01.27.15 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 01 Apr 2014 01:27:16 -0700 (PDT) Message-ID: <533A7861.1090603@gnu.org> Date: Tue, 01 Apr 2014 10:27:13 +0200 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Paul Eggert , 17081@debbugs.gnu.org Subject: Re: bug#17081: [PATCH] dfa: avoid undefined behavior References: <201403240606.s2O66Qcs004592@kiwi.cs.ucla.edu> In-Reply-To: <201403240606.s2O66Qcs004592@kiwi.cs.ucla.edu> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 17081 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Il 24/03/2014 07:04, Paul Eggert ha scritto: > * src/dfa.c (FETCH_WC, addtok_wc): Don't rely on undefined behavior > when converting an out-of-range value to 'int'. > (FETCH_WC, prepare_wc_buf): Don't rely on conversion state after > mbrtowc returns a special value, as it's undefined for (size_t) -1. > (prepare_wc_buf): Simplify test for valid character. > --- > src/dfa.c | 42 ++++++++++++++++++++++++------------------ > 1 file changed, 24 insertions(+), 18 deletions(-) > > diff --git a/src/dfa.c b/src/dfa.c > index 92ac1b9..0a2b8b8 100644 > --- a/src/dfa.c > +++ b/src/dfa.c > @@ -807,7 +807,7 @@ static int minrep, maxrep; /* Repeat counts for {m,n}. */ > static int cur_mb_len = 1; /* Length of the multibyte representation of > wctok. */ > /* These variables are used only if (MB_CUR_MAX > 1). */ > -static mbstate_t mbs; /* Mbstate for mbrlen. */ > +static mbstate_t mbs; /* mbstate for mbrtowc. */ > static wchar_t wctok; /* Wide character representation of the current > multibyte character. */ > static unsigned char *mblen_buf;/* Correspond to the input buffer in dfaexec. > @@ -844,15 +844,18 @@ static unsigned char const *buf_end; /* reference to end in dfaexec. */ > else \ > { \ > wchar_t _wc; \ > - cur_mb_len = mbrtowc (&_wc, lexptr, lexleft, &mbs); \ > - if (cur_mb_len <= 0) \ > + size_t nbytes = mbrtowc (&_wc, lexptr, lexleft, &mbs); \ > + bool valid_char = 1 <= nbytes && nbytes < (size_t) -2; \ I find these conditionals complicated to follow. In addition, a return value of nbytes == 0 is valid, so I believe you should have simply bool valid_char = nbytes < (size_t) -2; or better: > + if (! valid_char) \ if (nbytes >= (size_t) -2) > { \ > + memset (&mbs, 0, sizeof mbs); \ > cur_mb_len = 1; \ > --lexleft; \ > (wc) = (c) = to_uchar (*lexptr++); \ > } \ > else \ > { \ > + cur_mb_len = nbytes; \ > lexptr += cur_mb_len; \ > lexleft -= cur_mb_len; \ > (wc) = _wc; \ > @@ -1685,16 +1688,19 @@ static void > addtok_wc (wint_t wc) > { > unsigned char buf[MB_LEN_MAX]; > - mbstate_t s; > + mbstate_t s = { 0 }; > int i; > - memset (&s, 0, sizeof s); > - cur_mb_len = wcrtomb ((char *) buf, wc, &s); > + size_t stored_bytes = wcrtomb ((char *) buf, wc, &s); > > - /* This is merely stop-gap. When cur_mb_len is 0 or negative, > - buf[0] is undefined, yet skipping the addtok_mb call altogether > - can result in heap corruption. */ > - if (cur_mb_len <= 0) > - buf[0] = 0; > + if (stored_bytes != (size_t) -1) > + cur_mb_len = stored_bytes; > + else > + { > + /* This is merely stop-gap. buf[0] is undefined, yet skipping > + the addtok_mb call altogether can corrupt the heap. */ > + cur_mb_len = 1; > + buf[0] = 0; > + } > > addtok_mb (buf[0], cur_mb_len == 1 ? 3 : 1); > for (i = 1; i < cur_mb_len; i++) > @@ -3328,13 +3334,13 @@ prepare_wc_buf (const char *begin, const char *end) > { > if (remain_bytes == 0) > { > - remain_bytes > + size_t nbytes > = mbrtowc (inputwcs + i, begin + i, end - begin - i + 1, &mbs); > - if (remain_bytes < 1 > - || remain_bytes == (size_t) -1 > - || remain_bytes == (size_t) -2 > - || (remain_bytes == 1 && inputwcs[i] == (wchar_t) begin[i])) > + if (! (1 <= nbytes && nbytes < (size_t) -2) if (nbytes >= (size_t) -2) > + || (nbytes == 1 && inputwcs[i] == (wchar_t) begin[i])) > { > + if ((size_t) -2 <= nbytes) if (nbytes >= (size_t) -2) > + memset (&mbs, 0, sizeof mbs); > remain_bytes = 0; > inputwcs[i] = (wchar_t) begin[i]; > mblen_buf[i] = 0; > @@ -3343,8 +3349,8 @@ prepare_wc_buf (const char *begin, const char *end) > } > else > { > - mblen_buf[i] = remain_bytes; > - remain_bytes--; > + mblen_buf[i] = nbytes; > + remain_bytes = nbytes - 1; > } > } > else > From debbugs-submit-bounces@debbugs.gnu.org Tue Apr 01 04:42:51 2014 Received: (at 17081) by debbugs.gnu.org; 1 Apr 2014 08:42:51 +0000 Received: from localhost ([127.0.0.1]:58706 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WUuHS-0004Ee-2A for submit@debbugs.gnu.org; Tue, 01 Apr 2014 04:42:50 -0400 Received: from mail-we0-f175.google.com ([74.125.82.175]:63762) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WUuHN-0004ER-HR for 17081@debbugs.gnu.org; Tue, 01 Apr 2014 04:42:46 -0400 Received: by mail-we0-f175.google.com with SMTP id q58so5783393wes.6 for <17081@debbugs.gnu.org>; Tue, 01 Apr 2014 01:42:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=UyFYVxbJohOSqpXggwB/lKySK0DsUYzU9LCH/IiABVE=; b=wkBW57kH9Cu3bnfJcpfHM56p0IpUWkxxQcfXkIei4xBumtCCjRrmwP5tBPN/YCc3RU /ke/mXpPBHZQIf2tY9rsslTNQvKtu4XqTAyxm8Es3shVvAu5pzumsKH1/YyHMAVrevyP DKLldnidtxRchh5kuoAK/+YE70Je8v9tt76DyKJst4bm6UI0EgNfnB92/oqRBzRbsC8F wNTCKvaVVytvFXsaV57nrBNpjFdMw1nDfqsJP+bBRG9q4Hiqmyu9AVJZgl05pV8m2ozL YNqEBNUcoTvPl7brjFatln78VHbEsImd1Ehq5xHb3rBTqLdq9PkDJBHUXfUGYJjq9jFn OHBw== X-Received: by 10.180.98.165 with SMTP id ej5mr18152722wib.33.1396341764509; Tue, 01 Apr 2014 01:42:44 -0700 (PDT) Received: from yakj.usersys.redhat.com (net-37-117-156-129.cust.vodafonedsl.it. [37.117.156.129]) by mx.google.com with ESMTPSA id x45sm38990249eef.15.2014.04.01.01.42.42 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 01 Apr 2014 01:42:43 -0700 (PDT) Message-ID: <533A7C01.40208@gnu.org> Date: Tue, 01 Apr 2014 10:42:41 +0200 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Paul Eggert , 17081@debbugs.gnu.org Subject: Re: bug#17081: [PATCH] dfa: avoid undefined behavior References: <201403240606.s2O66Qcs004592@kiwi.cs.ucla.edu> <533A7861.1090603@gnu.org> In-Reply-To: <533A7861.1090603@gnu.org> X-Enigmail-Version: 1.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 17081 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.7 (/) Il 01/04/2014 10:27, Paolo Bonzini ha scritto: > Il 24/03/2014 07:04, Paul Eggert ha scritto: >> * src/dfa.c (FETCH_WC, addtok_wc): Don't rely on undefined behavior >> when converting an out-of-range value to 'int'. >> (FETCH_WC, prepare_wc_buf): Don't rely on conversion state after >> mbrtowc returns a special value, as it's undefined for (size_t) -1. >> (prepare_wc_buf): Simplify test for valid character. >> --- >> src/dfa.c | 42 ++++++++++++++++++++++++------------------ >> 1 file changed, 24 insertions(+), 18 deletions(-) >> >> diff --git a/src/dfa.c b/src/dfa.c >> index 92ac1b9..0a2b8b8 100644 >> --- a/src/dfa.c >> +++ b/src/dfa.c >> @@ -807,7 +807,7 @@ static int minrep, maxrep; /* Repeat counts >> for {m,n}. */ >> static int cur_mb_len = 1; /* Length of the multibyte >> representation of >> wctok. */ >> /* These variables are used only if (MB_CUR_MAX > 1). */ >> -static mbstate_t mbs; /* Mbstate for mbrlen. */ >> +static mbstate_t mbs; /* mbstate for mbrtowc. */ >> static wchar_t wctok; /* Wide character representation of >> the current >> multibyte character. */ >> static unsigned char *mblen_buf;/* Correspond to the input buffer in >> dfaexec. >> @@ -844,15 +844,18 @@ static unsigned char const *buf_end; /* >> reference to end in dfaexec. */ >> else \ >> { \ >> wchar_t _wc; \ >> - cur_mb_len = mbrtowc (&_wc, lexptr, lexleft, &mbs); \ >> - if (cur_mb_len <= 0) \ >> + size_t nbytes = mbrtowc (&_wc, lexptr, lexleft, &mbs); \ >> + bool valid_char = 1 <= nbytes && nbytes < (size_t) -2; \ > > I find these conditionals complicated to follow. In addition, a return > value of nbytes == 0 is valid, so I believe you should have simply > > bool valid_char = nbytes < (size_t) -2; > > or better: > >> + if (! valid_char) \ > > if (nbytes >= (size_t) -2) I see this patch has been committed already. Can you please submit a followup? Paolo From debbugs-submit-bounces@debbugs.gnu.org Sat Apr 05 18:10:57 2014 Received: (at 17081) by debbugs.gnu.org; 5 Apr 2014 22:10:57 +0000 Received: from localhost ([127.0.0.1]:37144 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WWYng-0001JM-P9 for submit@debbugs.gnu.org; Sat, 05 Apr 2014 18:10:57 -0400 Received: from smtp.cs.ucla.edu ([131.179.128.62]:35381) by debbugs.gnu.org with esmtp (Exim 4.80) (envelope-from ) id 1WWYne-0001JC-BW for 17081@debbugs.gnu.org; Sat, 05 Apr 2014 18:10:55 -0400 Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id 914D039E801A; Sat, 5 Apr 2014 15:10:53 -0700 (PDT) X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id r-vLuN3BmAIn; Sat, 5 Apr 2014 15:10:45 -0700 (PDT) Received: from [192.168.1.9] (pool-108-0-233-62.lsanca.fios.verizon.net [108.0.233.62]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id F2FB439E8015; Sat, 5 Apr 2014 15:10:44 -0700 (PDT) Message-ID: <53407F64.60001@cs.ucla.edu> Date: Sat, 05 Apr 2014 15:10:44 -0700 From: Paul Eggert Organization: UCLA Computer Science Department User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Paolo Bonzini , 17081@debbugs.gnu.org Subject: Re: bug#17081: [PATCH] dfa: avoid undefined behavior References: <201403240606.s2O66Qcs004592@kiwi.cs.ucla.edu> <533A7861.1090603@gnu.org> In-Reply-To: <533A7861.1090603@gnu.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 17081 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) Paolo Bonzini wrote: \ >> + size_t nbytes = mbrtowc (&_wc, lexptr, lexleft, &mbs); \ >> + bool valid_char = 1 <= nbytes && nbytes < (size_t) -2; \ > > I find these conditionals complicated to follow. Yes, that identifier 'valid_char' was a confusing choice; as you noted, the character is valid even when nbytes is zero. > I believe you should have simply > > bool valid_char = nbytes < (size_t) -2; > > or better: > >> + if (! valid_char) \ > > if (nbytes >= (size_t) -2) That wouldn't do, because when mbrtowc returns 0 the caller still needs to advance the pointer by 1 to get past the null byte, just as it needs to advance by 1 if mbrtowc returns (size_t) -2 or (size_t) -1. > I see this patch has been committed already. Can you please submit a followup? There was a followup patch, in commit 2b9c57c, and the code's changed so that it no longer has a 'valid_char' local. Perhaps it's clear enough now. From unknown Mon Aug 18 02:37:05 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sun, 04 May 2014 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator