From unknown Fri Jun 20 07:09:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales Resent-From: Stephane Chazelas Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sun, 20 Nov 2016 21:15:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 24973 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 24973@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.147967648926090 (code B ref -1); Sun, 20 Nov 2016 21:15:01 +0000 Received: (at submit) by debbugs.gnu.org; 20 Nov 2016 21:14:49 +0000 Received: from localhost ([127.0.0.1]:36439 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8ZRp-0006mk-4O for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:14:49 -0500 Received: from eggs.gnu.org ([208.118.235.92]:36431) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8ZRm-0006mW-Np for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:14:46 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c8ZRg-0004jP-Pp for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:14:41 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:50845) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c8ZRg-0004jL-Mr for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:14:40 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55632) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c8ZRf-0000Kt-OS for bug-grep@gnu.org; Sun, 20 Nov 2016 16:14:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c8ZRb-0004iU-IC for bug-grep@gnu.org; Sun, 20 Nov 2016 16:14:39 -0500 Received: from mail-wm0-x242.google.com ([2a00:1450:400c:c09::242]:36456) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1c8ZRb-0004iB-B1 for bug-grep@gnu.org; Sun, 20 Nov 2016 16:14:35 -0500 Received: by mail-wm0-x242.google.com with SMTP id m203so21848892wma.3 for ; Sun, 20 Nov 2016 13:14:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version:content-disposition :content-transfer-encoding:user-agent; bh=/SESE52eKcLqwVZfybDjynahOPpzdrRc0Cocf+aNSKE=; b=o8Muhnevu0vHNBURRbnsq4XxC9iNejf8IqXgu6V+WbUCiW+Nj9pnbLFw1wgW/2pTjK 8mERjW8K84VRGxJWXp8S3h0Ur55Mw8Zewb6wjNOc6QPH+yneHeNrn1ebipm+XydDD24Y +IE1by/xSFCQZv2DBAbIqXHlIKgXL0bjlc2h5MY1dJPk9FZZT9HN76tZ6gMpRxIecbiK ksYPZ5ls1ylO2/IKcDtpqDAJVNn6B+4gqbMqMC/9ELRfy3BmnRjTYflYB8n257BQF/Z+ VJNiNFDn0Wv7TfxXzQoBUbPPIQlXuzAfOtLlWYh0dz3pJSMrzKcqhm10oRF95z2RVK5Z loPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:content-transfer-encoding:user-agent; bh=/SESE52eKcLqwVZfybDjynahOPpzdrRc0Cocf+aNSKE=; b=OebGkQWJX+BrbqrnAt9FG3pe2K3Qwp0N79gk1GGF5D/eq0UeGySX/VTNYwAJ2Bl9A3 VgJN+VX6T7WgcqrP4a9Vy+ZBtw31Zo2Tsts5Mv8dz01QdKRV6qUrK0oW79Zw3colYgHF 3reXoRz8qcBcLkNX7jNj8FewDPpAyQtSR6i7TBNigirhRyAD59AAcscbG5zmIiQYmnjU C8GtgCBbRGn1aFS6lH2fPcf5M+/SG6c97J1gEci5lwZ9CNi70Q6lcfGdQ7ZebBJebPTu McnBNV/5tGm5apGsb8X6n9hIoD4mLvrqEjgbEALJcUhapes5ERb9OWFWVxVPeryTqEXA uqIw== X-Gm-Message-State: AKaTC01RTWuSTY3PNSWee+o/0xXuIR17+xFdE2gYLVBFnJ9kWeZynUScDhsHzgfSKWJebQ== X-Received: by 10.194.201.103 with SMTP id jz7mr7589880wjc.70.1479676473416; Sun, 20 Nov 2016 13:14:33 -0800 (PST) Received: from chaz.gmail.com ([90.201.137.34]) by smtp.gmail.com with ESMTPSA id kq7sm21171767wjb.30.2016.11.20.13.14.32 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sun, 20 Nov 2016 13:14:32 -0800 (PST) Date: Sun, 20 Nov 2016 21:14:31 +0000 From: Stephane Chazelas Message-ID: <20161120211431.GC4814@chaz.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Hello, In grep 2.26, echo é | grep '[d-f]' no longer matches in locales like fr_FR.iso885915@euro or en_GB.iso88591 where the character set is single-byte like ISO-8859-1. It still works OK with UTF-8. 2.25 was OK. git bisect points to commit 2769d5331a38d623b67b1860ac46b39ff7e54aca Reproduce with: printf '\351\n' | LC_ALL=en_US.iso88591 ./src/grep '[d-f]' || echo fail (assuming that locale is available on the system). Tested on Ubuntu 16.04 amd64. -- Stephane From unknown Fri Jun 20 07:09:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales Resent-From: Stephane Chazelas Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sun, 20 Nov 2016 21:24:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24973 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 24973@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.147967701127034 (code B ref -1); Sun, 20 Nov 2016 21:24:02 +0000 Received: (at submit) by debbugs.gnu.org; 20 Nov 2016 21:23:31 +0000 Received: from localhost ([127.0.0.1]:36446 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8ZaF-00071y-11 for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:23:31 -0500 Received: from eggs.gnu.org ([208.118.235.92]:37996) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8ZaE-00071l-FM for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:23:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c8Za8-0007JB-I3 for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:23:25 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_20,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:54703) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c8Za8-0007J7-Ee for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:23:24 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57200) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c8Za7-0002YW-FL for bug-grep@gnu.org; Sun, 20 Nov 2016 16:23:24 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c8Za3-0007IX-FN for bug-grep@gnu.org; Sun, 20 Nov 2016 16:23:23 -0500 Received: from mail-wj0-x22a.google.com ([2a00:1450:400c:c01::22a]:33299) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1c8Za3-0007IR-8b for bug-grep@gnu.org; Sun, 20 Nov 2016 16:23:19 -0500 Received: by mail-wj0-x22a.google.com with SMTP id xy5so19091995wjc.0 for ; Sun, 20 Nov 2016 13:23:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=b3G2A2BIh0DmX2TmfDokfrDyJrWYU1o5oriGLqeDKVo=; b=NOXQfm1O380Uk5rIMyuPPWhottPEm2xga9zZVqqrOyhYpO15s2VqMNF+LtsBjyS6W5 LwPDCQB64bmOLUa/t3uN1CFAdzoC7y2Rg6oiu562NU1vnZ2ePwzVswnEPjB/IgK9JYRH G6K86sbsye/gQszq7wsHyOOHXTO+sGNuiJw46bwgvs+BbgP5TtZc3ktP0w4aNxfjZwdH Bg59MLq5b+ej7vSOiS1gIf2l2nnCjM8MWiaMKBXx4B5enzLzV172XEXLezIPtQBE70gr cHJO9pDiTDbFst3Xjmp0mEeEKYftgtX/o0vCbG8GeQrYGSlAIUzNxk5dFPpcdx/8yZn3 i1TQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=b3G2A2BIh0DmX2TmfDokfrDyJrWYU1o5oriGLqeDKVo=; b=Pr0gG8nk5G9auAz/qLvq2Dd+RE3R6hnYrUdVjQuoZ67qMrmK/q2xqdmL/mOTLBE5na +bs8qhbffs2gOfFK349h8k0IcPe5IGAeVM8JNFRm2LT1MCtgGy/+sHWXQKVnV/ectnhA 1JHrhemCLXkx6rEiL2PoZGGbGBDUVMnnAMWIOLYfjK6gdp08sq4TwLGT+Q3s7TY2Iyro 6RdsBZBS6S4ATGydLfgKzk6PB4fv5hj9B8Spp0q9HnB5fRw5NlodaAIL7b/tjJ4x4f88 /pEWbd4rvPf8VUWCqmxts8ZGyCISfsS9gbQJKo43ivmFwQ6yvFAZuC2td+HgHLMGMFk5 I2iQ== X-Gm-Message-State: AKaTC02YHnRVcdCZSwozFPooIZTCBWtey44m23KlXDPA2FKGN0M8WnNwmu25BKtXa6z6mw== X-Received: by 10.194.26.133 with SMTP id l5mr6776678wjg.4.1479676998042; Sun, 20 Nov 2016 13:23:18 -0800 (PST) Received: from chaz.gmail.com ([90.201.137.34]) by smtp.gmail.com with ESMTPSA id q7sm21269461wjh.9.2016.11.20.13.23.17 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sun, 20 Nov 2016 13:23:17 -0800 (PST) Date: Sun, 20 Nov 2016 21:23:16 +0000 From: Stephane Chazelas Message-ID: <20161120212316.GA25881@chaz.gmail.com> References: <20161120211431.GC4814@chaz.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20161120211431.GC4814@chaz.gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) 2016-11-20 21:14:31 +0000, Stephane Chazelas: [...] > echo é | grep '[d-f]' > > no longer matches in locales like fr_FR.iso885915@euro or > en_GB.iso88591 where the character set is single-byte like > ISO-8859-1. It still works OK with UTF-8. [...] I also seems to still be OK with other multi-byte locales like zh_HK.big5hkscs: $ locale charmap BIG5-HKSCS $ printf '\ue9' | ./src/grep '[d-f]' | hd 00000000 88 6d 0a |.m.| 00000003 Though: $ printf '\ue9' | ./src/grep '.*m' | hd 00000000 88 6d 0a |.m.| However, that seems to be a separate issue as it also failed in earlier versions. I'll raise that separately. -- Stephane From unknown Fri Jun 20 07:09:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales Resent-From: Dennis Clarke Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sun, 20 Nov 2016 21:39:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24973 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 24973@debbugs.gnu.org, stephane.chazelas@gmail.com X-Debbugs-Original-To: bug-grep@gnu.org, stephane.chazelas@gmail.com Received: via spool by submit@debbugs.gnu.org id=B.147967792628473 (code B ref -1); Sun, 20 Nov 2016 21:39:01 +0000 Received: (at submit) by debbugs.gnu.org; 20 Nov 2016 21:38:46 +0000 Received: from localhost ([127.0.0.1]:36465 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8Zoz-0007PA-V4 for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:38:46 -0500 Received: from eggs.gnu.org ([208.118.235.92]:40419) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8Zoy-0007Ow-7s for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:38:44 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c8Zos-00036K-DV for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:38:39 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:32887) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c8Zos-00036G-A9 for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:38:38 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59623) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c8Zor-0005ma-Aa for bug-grep@gnu.org; Sun, 20 Nov 2016 16:38:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c8Zoo-00035m-6i for bug-grep@gnu.org; Sun, 20 Nov 2016 16:38:37 -0500 Received: from atl4mhob12.myregisteredsite.com ([209.17.115.50]:59930) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c8Zoo-00035h-0m for bug-grep@gnu.org; Sun, 20 Nov 2016 16:38:34 -0500 Received: from mailpod.hostingplatform.com ([10.30.77.35]) by atl4mhob12.myregisteredsite.com (8.14.4/8.14.4) with ESMTP id uAKLcVhW030180 for ; Sun, 20 Nov 2016 16:38:31 -0500 Received: (qmail 8341 invoked by uid 0); 20 Nov 2016 21:38:31 -0000 X-TCPREMOTEIP: 99.253.103.29 X-Authenticated-UID: dclarke@blastwave.org Received: from unknown (HELO ?172.16.35.41?) (dclarke@blastwave.org@99.253.103.29) by 0 with ESMTPA; 20 Nov 2016 21:38:30 -0000 References: <20161120211431.GC4814@chaz.gmail.com> From: Dennis Clarke Message-ID: <7175213a-c7a9-d98b-32d1-8160dc4fb6cd@blastwave.org> Date: Sun, 20 Nov 2016 16:38:29 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161120211431.GC4814@chaz.gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by atl4mhob12.myregisteredsite.com id uAKLcVhW030180 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) On 11/20/2016 04:14 PM, Stephane Chazelas wrote: > printf '\351\n' | LC_ALL=3Den_US.iso88591 On a Solaris 10 system the locales are named a bit different : dasoyva_$ locale -a C POSIX en_CA en_CA.ISO8859-1 en_CA.UTF-8 en_US en_US.ISO8859-1 en_US.ISO8859-15 en_US.ISO8859-15@euro en_US.UTF-8 es es_MX es_MX.ISO8859-1 es_MX.UTF-8 fr fr_CA fr_CA.ISO8859-1 fr_CA.UTF-8 dasoyva_$ LC_ALL=3Den_US.ISO8859-1 /usr/bin/printf '\351\n' | od -Ax -t x= 1 -v 0000000 e9 0a 0000002 I am not sure if the single byte 0xe9h is correct at all for this test. dasoyva_$ LC_ALL=3Den_US.UTF-8 /usr/bin/printf '\351\n' | od -Ax -t x1 -v 0000000 e9 0a 0000002 dasoyva_$ LC_ALL=3Den_US.ISO8859-1 /usr/bin/printf '\351\n' =EF=BF=BD Wonder how I would test this on a strict POSIX system here. Any thoughts? Dennis From unknown Fri Jun 20 07:09:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales Resent-From: Stephane CHAZELAS Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sun, 20 Nov 2016 22:07:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24973 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Dennis Clarke Cc: 24973@debbugs.gnu.org Received: via spool by 24973-submit@debbugs.gnu.org id=B24973.147967959931470 (code B ref 24973); Sun, 20 Nov 2016 22:07:01 +0000 Received: (at 24973) by debbugs.gnu.org; 20 Nov 2016 22:06:39 +0000 Received: from localhost ([127.0.0.1]:36482 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8aFy-0008BW-NM for submit@debbugs.gnu.org; Sun, 20 Nov 2016 17:06:38 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:35040) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8aFx-0008BH-3L for 24973@debbugs.gnu.org; Sun, 20 Nov 2016 17:06:37 -0500 Received: by mail-wm0-f67.google.com with SMTP id a20so22164073wme.2 for <24973@debbugs.gnu.org>; Sun, 20 Nov 2016 14:06:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=PqPHylVfSq1Z3DH3eo9OR2ln7HFCA1vS9rfTLSpbsu8=; b=nDWS2pvTsD9oUNv+ht4mYOK0mn3AQam74kmZ3SolDD1OchnrD+nkje4h3SgC8dOwTt zzajy0ZCUc0H/f+GVgnWKENDfm/TxCfB3/LpM6kQao/NEaG9r1CNXGKlt9KIb9EgqbKu O38d22pYs97Cb9BgVZC0u5z0WN/ndr++BTk0xEHbx08HK+nWG36WvV8qWG1IDzjUC4dn wAgMra3yghPnWt2iKTus/7yNDo8xA2+mCnJD+agRj4WLFsHMrQydV2zZEC4iPoD3ycbF EKu9Lne/nkntkqmbzvBOTxIEIwcfSuDdIzGbKcmEx4EA6kYailO+vEfZPnNthONQ3ttA MuLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=PqPHylVfSq1Z3DH3eo9OR2ln7HFCA1vS9rfTLSpbsu8=; b=ApAshPNyXrIE27mtRNL8irLlcWgsrTFntUoHnFBlI15wPkTaD4NFaEnJUkIQOHhuVN CUKlS7TQEx4EyV3GgiAEUDEx13R0rkJZ5IgqCfHjStLOS1ypbeluOIOHOBWhQyWEp5M1 s8vK5Q2GG3qiWklXuMQa48gXd2Y30v5LYWNdDp526NGgxANYB1fwgDBhc0ZwNvxZKEPC sVjpezxwQ8mCNZ5Bc+RDvRvUUGceJkOg+yJfdUY3rS2khEkFow7DjB5VEYrHWA5n+D72 LW3DQILhu5CyOf+X0yHKrV0ZLac1XTrl8imOsn3lKOskdaUGN2wmHrbzioLnFm1sfPcR rOvg== X-Gm-Message-State: AKaTC01olJilLwEfmG3EK366NCbMaEDLgZ+9iDG171nout7ZpIaUZIWVQVsakd438DZQwg== X-Received: by 10.28.226.139 with SMTP id z133mr11140373wmg.139.1479679591455; Sun, 20 Nov 2016 14:06:31 -0800 (PST) Received: from chaz.gmail.com ([90.201.137.34]) by smtp.gmail.com with ESMTPSA id b15sm15985586wma.5.2016.11.20.14.06.30 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sun, 20 Nov 2016 14:06:30 -0800 (PST) Date: Sun, 20 Nov 2016 22:06:29 +0000 From: Stephane CHAZELAS Message-ID: <20161120220629.GD4814@chaz.gmail.com> References: <20161120211431.GC4814@chaz.gmail.com> <7175213a-c7a9-d98b-32d1-8160dc4fb6cd@blastwave.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <7175213a-c7a9-d98b-32d1-8160dc4fb6cd@blastwave.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: 0.5 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.5 (/) 2016-11-20 16:38:29 -0500, Dennis Clarke: [...] > On a Solaris 10 system the locales are named a bit different : > dasoyva_$ locale -a [...] > en_US.ISO8859-1 > en_US.ISO8859-15 [...] > dasoyva_$ LC_ALL=en_US.ISO8859-1 /usr/bin/printf '\351\n' | od -Ax -t x1 -v > 0000000 e9 0a > 0000002 > > I am not sure if the single byte 0xe9h is correct at all for this test. [...] Note that printf '\351' Will print the byte 0xe9 regardless of the locale. 0xe9 happens to be the code point for é in ISO8859-1 and ISO8859-15. > dasoyva_$ LC_ALL=en_US.UTF-8 /usr/bin/printf '\351\n' | od -Ax -t x1 -v > 0000000 e9 0a > 0000002 > > dasoyva_$ LC_ALL=en_US.ISO8859-1 /usr/bin/printf '\351\n' > � > > Wonder how I would test this on a strict POSIX system here. Any thoughts? [...] POSIX leaves all that unspecified. It doesn't specify any locale other than C/POSIX. It leaves '[d-f]' unspecified in locales other than C/POSIX. Here, the problem is a change of behaviour between GNU grep 2.25 and 2.26. (and 2.26 behaviour makes it inconsistent with other GNU utilities). Both behaviours are POSIX compliant, since [d-f] is unspecified anyway. On your Solaris machine, you can check: printf '\351\n' | LC_ALL=en_US.ISO8859-1 gnu-grep '[d-f]' | od -An -vtx1 And check if it's consistent with /usr/xpg4/bin/grep. -- Stephane From unknown Fri Jun 20 07:09:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales Resent-From: Stephane CHAZELAS Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Sun, 20 Nov 2016 22:23:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24973 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: Dennis Clarke Cc: 24973@debbugs.gnu.org Received: via spool by 24973-submit@debbugs.gnu.org id=B24973.1479680565608 (code B ref 24973); Sun, 20 Nov 2016 22:23:01 +0000 Received: (at 24973) by debbugs.gnu.org; 20 Nov 2016 22:22:45 +0000 Received: from localhost ([127.0.0.1]:36488 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8aVZ-00009k-4L for submit@debbugs.gnu.org; Sun, 20 Nov 2016 17:22:45 -0500 Received: from mail-wj0-f193.google.com ([209.85.210.193]:32818) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8aVY-00009V-4C for 24973@debbugs.gnu.org; Sun, 20 Nov 2016 17:22:44 -0500 Received: by mail-wj0-f193.google.com with SMTP id kp2so2646593wjc.0 for <24973@debbugs.gnu.org>; Sun, 20 Nov 2016 14:22:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=FGqAMA9E8+bNJ0+ob5JtdXzrNHOVe8F6+LIu0AHYcNs=; b=nmYWox9KNHtnTvtgCSjBfHF9QGYP7UmHNECp9eLtkBS1/n9/5RQbMtLVRsPzAa3ckU rSxkMvFr0rZMvLsZuvSWssoRv0tgvDgnOWEbW5g6rcr+Oob/5NBYd0oUgIYP/NFg6PDw MqTKUoZ86vwW7JUfbd4QAo9aEw/Qb1YJ5/734sIqCy71OVOCq6LNWm5m0J8Xg3cpCiWh w1bucW83xAqrbJG8+VtTAAVoobg5N7c5GYrddmXBC3Xlm7/cDUdNCLYTBuAf2ZXyYMXr Xnuy3YrUsPMiwIkobDWnHD2/Ebg4A0HRy3tBDknzh0/qT6TN65LJo0vxXf9JKId3LZ2L vI4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=FGqAMA9E8+bNJ0+ob5JtdXzrNHOVe8F6+LIu0AHYcNs=; b=dLExGE7nWWdI7U89p1PsWO8zBcelbv1j+nI1wK05JInlK6n2hXYYIEmw1xHYf0awBU XNkthK++5fSJjHUJWbm1jS+66bXU+OFEaJ2fTrjxOVNBd8GikxNcTDSpKgiOgxOTAk90 zdu4j8mwUZy7DBT1bBBMr000nPLh8omlU5mx2ismhFuHT33rXnW7krdzUaFaYSpkM8DJ QXjDORN7S8Esx2BjS9n9t/G46hwK9NkVCWkLHGOzuGa/0vx4pq7FRP33SCJnvG8+1I0r 04mq5Ia0CqCnz21c6lv62eK/4jxzcDc6nYmgWK6+PNRAXsWshqH2d0dY+UqpuFUM9/0l ClXQ== X-Gm-Message-State: AKaTC03tFwVAcce8abWDKogLAOE9Iro7okEqV8nyJecZcOqpZhhbPg+JytKmbexxVt3eAw== X-Received: by 10.194.52.74 with SMTP id r10mr6812296wjo.113.1479680558450; Sun, 20 Nov 2016 14:22:38 -0800 (PST) Received: from chaz.gmail.com ([90.201.137.34]) by smtp.gmail.com with ESMTPSA id 138sm15983931wms.20.2016.11.20.14.22.37 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sun, 20 Nov 2016 14:22:37 -0800 (PST) Date: Sun, 20 Nov 2016 22:22:36 +0000 From: Stephane CHAZELAS Message-ID: <20161120222236.GE4814@chaz.gmail.com> References: <20161120211431.GC4814@chaz.gmail.com> <7175213a-c7a9-d98b-32d1-8160dc4fb6cd@blastwave.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <7175213a-c7a9-d98b-32d1-8160dc4fb6cd@blastwave.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) 2016-11-20 16:38:29 -0500, Dennis Clarke: > On 11/20/2016 04:14 PM, Stephane Chazelas wrote: > >printf '\351\n' | LC_ALL=en_US.iso88591 > > On a Solaris 10 system [...] FWIW, on Solaris 11, it looks as if (speculated from very few tests) GNU grep's ranges ([x-y]) are only based on code point, both in 2.25 and 2.26 so [d-f] doesn't match é in any locale. Seems to behave like /bin/grep in that instance, not /usr/xpg4/bin/grep -- Stephane From unknown Fri Jun 20 07:09:58 2025 X-Loop: help-debbugs@gnu.org Subject: bug#24973: [PATCH] dfa: fix logic typo References: <20161120211431.GC4814@chaz.gmail.com> In-Reply-To: <20161120211431.GC4814@chaz.gmail.com> Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Mon, 21 Nov 2016 04:20:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 24973 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: bug-gnulib@gnu.org, 24973@debbugs.gnu.org, stephane.chazelas@gmail.com Cc: Paul Eggert Received: via spool by 24973-submit@debbugs.gnu.org id=B24973.14797019449785 (code B ref 24973); Mon, 21 Nov 2016 04:20:01 +0000 Received: (at 24973) by debbugs.gnu.org; 21 Nov 2016 04:19:04 +0000 Received: from localhost ([127.0.0.1]:36570 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8g4O-0002Xk-HL for submit@debbugs.gnu.org; Sun, 20 Nov 2016 23:19:04 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:48822) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8g4M-0002XE-U7 for 24973@debbugs.gnu.org; Sun, 20 Nov 2016 23:19:03 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C31A41600A3; Sun, 20 Nov 2016 20:18:55 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id uTwZ7FSEppXM; Sun, 20 Nov 2016 20:18:55 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 09A721600A2; Sun, 20 Nov 2016 20:18:55 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id uMTn_HMXB5Tb; Sun, 20 Nov 2016 20:18:54 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id DD4641600A0; Sun, 20 Nov 2016 20:18:54 -0800 (PST) From: Paul Eggert Date: Sun, 20 Nov 2016 20:18:38 -0800 Message-Id: <1479701918-7149-1-git-send-email-eggert@cs.ucla.edu> X-Mailer: git-send-email 2.7.4 X-Spam-Score: -2.9 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) Problem reported by Stephane Chazelas (Bug#24973). * lib/dfa.c (using_simple_locale): Fix typo that caused some non-simple locales like fr_FR to be treated as simple. --- ChangeLog | 7 +++++++ lib/dfa.c | 4 ++-- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/ChangeLog b/ChangeLog index 88139c3..fbdecf0 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,10 @@ +2016-11-20 Paul Eggert + + dfa: fix logic typo + Problem reported by Stephane Chazelas (Bug#24973). + * lib/dfa.c (using_simple_locale): Fix typo that caused some + non-simple locales like fr_FR to be treated as simple. + 2016-11-20 Jim Meyering fix test driver leaks: exclude, malloc, realloc diff --git a/lib/dfa.c b/lib/dfa.c index 744a9f1..7b80a1a 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -815,8 +815,8 @@ using_simple_locale (bool multibyte) && '}' == 125 && '~' == 126) }; - if (native_c_charset && !multibyte) - return true; + if (!native_c_charset || multibyte) + return false; else { /* Treat C and POSIX locales as being compatible. Also, treat -- 2.7.4 From unknown Fri Jun 20 07:09:58 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Stephane Chazelas Subject: bug#24973: closed (Re: bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales) Message-ID: References: <48683d86-36dc-7a02-4024-56870014b294@cs.ucla.edu> <20161120211431.GC4814@chaz.gmail.com> X-Gnu-PR-Message: they-closed 24973 X-Gnu-PR-Package: grep Reply-To: 24973@debbugs.gnu.org Date: Mon, 21 Nov 2016 04:35:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1479702902-17668-1" This is a multi-part message in MIME format... ------------=_1479702902-17668-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #24973: [regression] [d-f] no longer includes e with acute accent in single= -byte locales which was filed against the grep package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 24973@debbugs.gnu.org. --=20 24973: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D24973 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1479702902-17668-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 24973-done) by debbugs.gnu.org; 21 Nov 2016 04:34:45 +0000 Received: from localhost ([127.0.0.1]:36585 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8gJZ-0004aR-DG for submit@debbugs.gnu.org; Sun, 20 Nov 2016 23:34:45 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:49862) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8gJX-0004aB-FW for 24973-done@debbugs.gnu.org; Sun, 20 Nov 2016 23:34:44 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E4E2B16009D; Sun, 20 Nov 2016 20:34:37 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id iqElSgNaub9g; Sun, 20 Nov 2016 20:34:36 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 882301600A0; Sun, 20 Nov 2016 20:34:36 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Cj6uH4lHKrxr; Sun, 20 Nov 2016 20:34:36 -0800 (PST) Received: from [192.168.1.9] (unknown [47.153.178.162]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 65C0E16009D; Sun, 20 Nov 2016 20:34:36 -0800 (PST) Subject: Re: bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales To: Stephane Chazelas , 24973-done@debbugs.gnu.org References: <20161120211431.GC4814@chaz.gmail.com> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <48683d86-36dc-7a02-4024-56870014b294@cs.ucla.edu> Date: Sun, 20 Nov 2016 20:34:35 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161120211431.GC4814@chaz.gmail.com> Content-Type: multipart/mixed; boundary="------------874ADBC0536C7473AF4CB26F" X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 24973-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) This is a multi-part message in MIME format. --------------874ADBC0536C7473AF4CB26F Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Stephane Chazelas wrote: > 2.25 was OK. git bisect points to commit > 2769d5331a38d623b67b1860ac46b39ff7e54aca Thanks for pinpointing the bug. It was my logic error in that commit. Fixed by altering Gnulib as follows: http://lists.gnu.org/archive/html/bug-gnulib/2016-11/msg00086.html and by installing the attached patches into grep. --------------874ADBC0536C7473AF4CB26F Content-Type: text/x-diff; name="0001-build-update-gnulib-submodule-to-latest.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-build-update-gnulib-submodule-to-latest.patch" >From 00a6d71259ba8432db7eaa2729d215858c4c0cb3 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 20 Nov 2016 20:21:06 -0800 Subject: [PATCH 1/2] build: update gnulib submodule to latest --- gnulib | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gnulib b/gnulib index 3c72272..60e8ffc 160000 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit 3c72272268021349cbc9a442fe033e7ba13a0c17 +Subproject commit 60e8ffca02dd4eac3a87b744f4f9ef68f3dffa35 -- 2.7.4 --------------874ADBC0536C7473AF4CB26F Content-Type: text/x-diff; name="0002-tests-check-for-unibyte-French-range-bug.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0002-tests-check-for-unibyte-French-range-bug.patch" >From ed6228198180fedc728a4e2981939fa0c902bbf3 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 20 Nov 2016 20:31:01 -0800 Subject: [PATCH 2/2] tests: check for unibyte French range bug Problem reported by Stephane Chazelas (Bug#24973). This bug was fixed in Gnulib. * NEWS: Document the fix. * tests/init.cfg (require_ru_RU_koi8_r): Remove. * tests/unibyte-bracket-expr: Add a test for the bug. Call get-mb-cur-max directly instead of bothering with require_ru_RU_koi8_r. --- NEWS | 3 +++ tests/init.cfg | 9 ------- tests/unibyte-bracket-expr | 58 ++++++++++++++++++++++++++++------------------ 3 files changed, 39 insertions(+), 31 deletions(-) diff --git a/NEWS b/NEWS index 6138b48..bd1a201 100644 --- a/NEWS +++ b/NEWS @@ -10,6 +10,9 @@ GNU grep NEWS -*- outline -*- >/dev/null" where PROGRAM dies when writing into a broken pipe. [bug introduced in grep-2.26] + grep no longer mishandles ranges in nontrivial unibyte locales. + [bug introduced in grep-2.26] + grep -P no longer attempts multiline matches. This works more intuitively with unusual patterns, and means that grep -Pz no longer rejects patterns containing ^ and $ and works when combined with -x. diff --git a/tests/init.cfg b/tests/init.cfg index 1677ec5..6c7abd2 100644 --- a/tests/init.cfg +++ b/tests/init.cfg @@ -74,15 +74,6 @@ require_tr_utf8_locale_() esac } -require_ru_RU_koi8_r() -{ - path_prepend_ . - case $(get-mb-cur-max ru_RU.KOI8-R) in - 1) ;; - *) skip_ 'ru_RU.KOI8-R locale not found' ;; - esac -} - require_compiled_in_MB_support() { require_en_utf8_locale_ diff --git a/tests/unibyte-bracket-expr b/tests/unibyte-bracket-expr index 68c475c..85aff1c 100755 --- a/tests/unibyte-bracket-expr +++ b/tests/unibyte-bracket-expr @@ -1,9 +1,4 @@ #!/bin/sh -# Exercise a DFA range bug that arises only with a unibyte encoding -# for which the wide-char-to-single-byte mapping is nontrivial. -# E.g., the regexp, [C] would fail to match C in a unibyte locale like -# ru_RU.KOI8-R for any C whose wide-char representation differed from -# its single-byte equivalent. # Copyright (C) 2011-2016 Free Software Foundation, Inc. @@ -21,23 +16,42 @@ # along with this program. If not, see . . "${srcdir=.}/init.sh"; path_prepend_ ../src -require_ru_RU_koi8_r -LC_ALL=ru_RU.KOI8-R -export LC_ALL - -fail=0 - -i=128 -while :; do - in=in-$i - octal=$(printf '%03o' $i) - b=$(printf "\\$octal") - echo "$b" > $in || framework_failure_ - grep "[$b]" $in > out || fail=1 - compare out $in || fail=1 - - test $i = 255 && break - i=$(expr $i + 1) + +# Add "." to PATH for the use of get-mb-cur-max. +path_prepend_ . + +# Exercise a DFA range bug that arises only with a unibyte encoding +# for which the wide-char-to-single-byte mapping is nontrivial. +# E.g., the regexp, [C] would fail to match C in a unibyte locale like +# ru_RU.KOI8-R for any C whose wide-char representation differed from +# its single-byte equivalent. + +case $(get-mb-cur-max ru_RU.KOI8-R) in + 1) + fail=0 + + i=128 + while :; do + in=in-$i + octal=$(printf '%03o' $i) + b=$(printf "\\$octal") + echo "$b" > $in || framework_failure_ + LC_ALL=ru_RU.KOI8-R grep "[$b]" $in > out || fail=1 + compare out $in || fail=1 + + test $i = 255 && break + i=$(expr $i + 1) + done;; +esac + +# Exercise a DFA range bug where '[d-f]' did not match accented 'e' in a +# unibyte French locale. + +for locale in fr_FR.iso88591 fr_FR.iso885915@euro fr_FR.ISO8859-1; do + case $(get-mb-cur-max $locale) in + 1) + printf '\351\n' | LC_ALL=$locale grep '[d-f]' || fail=1;; + esac done Exit $fail -- 2.7.4 --------------874ADBC0536C7473AF4CB26F-- ------------=_1479702902-17668-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 20 Nov 2016 21:14:49 +0000 Received: from localhost ([127.0.0.1]:36439 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8ZRp-0006mk-4O for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:14:49 -0500 Received: from eggs.gnu.org ([208.118.235.92]:36431) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8ZRm-0006mW-Np for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:14:46 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c8ZRg-0004jP-Pp for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:14:41 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:50845) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c8ZRg-0004jL-Mr for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:14:40 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55632) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c8ZRf-0000Kt-OS for bug-grep@gnu.org; Sun, 20 Nov 2016 16:14:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c8ZRb-0004iU-IC for bug-grep@gnu.org; Sun, 20 Nov 2016 16:14:39 -0500 Received: from mail-wm0-x242.google.com ([2a00:1450:400c:c09::242]:36456) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1c8ZRb-0004iB-B1 for bug-grep@gnu.org; Sun, 20 Nov 2016 16:14:35 -0500 Received: by mail-wm0-x242.google.com with SMTP id m203so21848892wma.3 for ; Sun, 20 Nov 2016 13:14:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version:content-disposition :content-transfer-encoding:user-agent; bh=/SESE52eKcLqwVZfybDjynahOPpzdrRc0Cocf+aNSKE=; b=o8Muhnevu0vHNBURRbnsq4XxC9iNejf8IqXgu6V+WbUCiW+Nj9pnbLFw1wgW/2pTjK 8mERjW8K84VRGxJWXp8S3h0Ur55Mw8Zewb6wjNOc6QPH+yneHeNrn1ebipm+XydDD24Y +IE1by/xSFCQZv2DBAbIqXHlIKgXL0bjlc2h5MY1dJPk9FZZT9HN76tZ6gMpRxIecbiK ksYPZ5ls1ylO2/IKcDtpqDAJVNn6B+4gqbMqMC/9ELRfy3BmnRjTYflYB8n257BQF/Z+ VJNiNFDn0Wv7TfxXzQoBUbPPIQlXuzAfOtLlWYh0dz3pJSMrzKcqhm10oRF95z2RVK5Z loPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:content-transfer-encoding:user-agent; bh=/SESE52eKcLqwVZfybDjynahOPpzdrRc0Cocf+aNSKE=; b=OebGkQWJX+BrbqrnAt9FG3pe2K3Qwp0N79gk1GGF5D/eq0UeGySX/VTNYwAJ2Bl9A3 VgJN+VX6T7WgcqrP4a9Vy+ZBtw31Zo2Tsts5Mv8dz01QdKRV6qUrK0oW79Zw3colYgHF 3reXoRz8qcBcLkNX7jNj8FewDPpAyQtSR6i7TBNigirhRyAD59AAcscbG5zmIiQYmnjU C8GtgCBbRGn1aFS6lH2fPcf5M+/SG6c97J1gEci5lwZ9CNi70Q6lcfGdQ7ZebBJebPTu McnBNV/5tGm5apGsb8X6n9hIoD4mLvrqEjgbEALJcUhapes5ERb9OWFWVxVPeryTqEXA uqIw== X-Gm-Message-State: AKaTC01RTWuSTY3PNSWee+o/0xXuIR17+xFdE2gYLVBFnJ9kWeZynUScDhsHzgfSKWJebQ== X-Received: by 10.194.201.103 with SMTP id jz7mr7589880wjc.70.1479676473416; Sun, 20 Nov 2016 13:14:33 -0800 (PST) Received: from chaz.gmail.com ([90.201.137.34]) by smtp.gmail.com with ESMTPSA id kq7sm21171767wjb.30.2016.11.20.13.14.32 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sun, 20 Nov 2016 13:14:32 -0800 (PST) Date: Sun, 20 Nov 2016 21:14:31 +0000 From: Stephane Chazelas To: bug-grep@gnu.org Subject: [regression] [d-f] no longer includes e with acute accent in single-byte locales Message-ID: <20161120211431.GC4814@chaz.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Hello, In grep 2.26, echo é | grep '[d-f]' no longer matches in locales like fr_FR.iso885915@euro or en_GB.iso88591 where the character set is single-byte like ISO-8859-1. It still works OK with UTF-8. 2.25 was OK. git bisect points to commit 2769d5331a38d623b67b1860ac46b39ff7e54aca Reproduce with: printf '\351\n' | LC_ALL=en_US.iso88591 ./src/grep '[d-f]' || echo fail (assuming that locale is available on the system). Tested on Ubuntu 16.04 amd64. -- Stephane ------------=_1479702902-17668-1--