From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 20 16:14:49 2016 Received: (at submit) by debbugs.gnu.org; 20 Nov 2016 21:14:49 +0000 Received: from localhost ([127.0.0.1]:36439 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8ZRp-0006mk-4O for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:14:49 -0500 Received: from eggs.gnu.org ([208.118.235.92]:36431) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8ZRm-0006mW-Np for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:14:46 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c8ZRg-0004jP-Pp for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:14:41 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:50845) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c8ZRg-0004jL-Mr for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:14:40 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55632) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c8ZRf-0000Kt-OS for bug-grep@gnu.org; Sun, 20 Nov 2016 16:14:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c8ZRb-0004iU-IC for bug-grep@gnu.org; Sun, 20 Nov 2016 16:14:39 -0500 Received: from mail-wm0-x242.google.com ([2a00:1450:400c:c09::242]:36456) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1c8ZRb-0004iB-B1 for bug-grep@gnu.org; Sun, 20 Nov 2016 16:14:35 -0500 Received: by mail-wm0-x242.google.com with SMTP id m203so21848892wma.3 for ; Sun, 20 Nov 2016 13:14:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mime-version:content-disposition :content-transfer-encoding:user-agent; bh=/SESE52eKcLqwVZfybDjynahOPpzdrRc0Cocf+aNSKE=; b=o8Muhnevu0vHNBURRbnsq4XxC9iNejf8IqXgu6V+WbUCiW+Nj9pnbLFw1wgW/2pTjK 8mERjW8K84VRGxJWXp8S3h0Ur55Mw8Zewb6wjNOc6QPH+yneHeNrn1ebipm+XydDD24Y +IE1by/xSFCQZv2DBAbIqXHlIKgXL0bjlc2h5MY1dJPk9FZZT9HN76tZ6gMpRxIecbiK ksYPZ5ls1ylO2/IKcDtpqDAJVNn6B+4gqbMqMC/9ELRfy3BmnRjTYflYB8n257BQF/Z+ VJNiNFDn0Wv7TfxXzQoBUbPPIQlXuzAfOtLlWYh0dz3pJSMrzKcqhm10oRF95z2RVK5Z loPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:content-transfer-encoding:user-agent; bh=/SESE52eKcLqwVZfybDjynahOPpzdrRc0Cocf+aNSKE=; b=OebGkQWJX+BrbqrnAt9FG3pe2K3Qwp0N79gk1GGF5D/eq0UeGySX/VTNYwAJ2Bl9A3 VgJN+VX6T7WgcqrP4a9Vy+ZBtw31Zo2Tsts5Mv8dz01QdKRV6qUrK0oW79Zw3colYgHF 3reXoRz8qcBcLkNX7jNj8FewDPpAyQtSR6i7TBNigirhRyAD59AAcscbG5zmIiQYmnjU C8GtgCBbRGn1aFS6lH2fPcf5M+/SG6c97J1gEci5lwZ9CNi70Q6lcfGdQ7ZebBJebPTu McnBNV/5tGm5apGsb8X6n9hIoD4mLvrqEjgbEALJcUhapes5ERb9OWFWVxVPeryTqEXA uqIw== X-Gm-Message-State: AKaTC01RTWuSTY3PNSWee+o/0xXuIR17+xFdE2gYLVBFnJ9kWeZynUScDhsHzgfSKWJebQ== X-Received: by 10.194.201.103 with SMTP id jz7mr7589880wjc.70.1479676473416; Sun, 20 Nov 2016 13:14:33 -0800 (PST) Received: from chaz.gmail.com ([90.201.137.34]) by smtp.gmail.com with ESMTPSA id kq7sm21171767wjb.30.2016.11.20.13.14.32 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sun, 20 Nov 2016 13:14:32 -0800 (PST) Date: Sun, 20 Nov 2016 21:14:31 +0000 From: Stephane Chazelas To: bug-grep@gnu.org Subject: [regression] [d-f] no longer includes e with acute accent in single-byte locales Message-ID: <20161120211431.GC4814@chaz.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) Hello, In grep 2.26, echo é | grep '[d-f]' no longer matches in locales like fr_FR.iso885915@euro or en_GB.iso88591 where the character set is single-byte like ISO-8859-1. It still works OK with UTF-8. 2.25 was OK. git bisect points to commit 2769d5331a38d623b67b1860ac46b39ff7e54aca Reproduce with: printf '\351\n' | LC_ALL=en_US.iso88591 ./src/grep '[d-f]' || echo fail (assuming that locale is available on the system). Tested on Ubuntu 16.04 amd64. -- Stephane From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 20 16:23:31 2016 Received: (at submit) by debbugs.gnu.org; 20 Nov 2016 21:23:31 +0000 Received: from localhost ([127.0.0.1]:36446 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8ZaF-00071y-11 for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:23:31 -0500 Received: from eggs.gnu.org ([208.118.235.92]:37996) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8ZaE-00071l-FM for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:23:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c8Za8-0007JB-I3 for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:23:25 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_20,FREEMAIL_FROM, T_DKIM_INVALID autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:54703) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c8Za8-0007J7-Ee for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:23:24 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57200) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c8Za7-0002YW-FL for bug-grep@gnu.org; Sun, 20 Nov 2016 16:23:24 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c8Za3-0007IX-FN for bug-grep@gnu.org; Sun, 20 Nov 2016 16:23:23 -0500 Received: from mail-wj0-x22a.google.com ([2a00:1450:400c:c01::22a]:33299) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1c8Za3-0007IR-8b for bug-grep@gnu.org; Sun, 20 Nov 2016 16:23:19 -0500 Received: by mail-wj0-x22a.google.com with SMTP id xy5so19091995wjc.0 for ; Sun, 20 Nov 2016 13:23:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=b3G2A2BIh0DmX2TmfDokfrDyJrWYU1o5oriGLqeDKVo=; b=NOXQfm1O380Uk5rIMyuPPWhottPEm2xga9zZVqqrOyhYpO15s2VqMNF+LtsBjyS6W5 LwPDCQB64bmOLUa/t3uN1CFAdzoC7y2Rg6oiu562NU1vnZ2ePwzVswnEPjB/IgK9JYRH G6K86sbsye/gQszq7wsHyOOHXTO+sGNuiJw46bwgvs+BbgP5TtZc3ktP0w4aNxfjZwdH Bg59MLq5b+ej7vSOiS1gIf2l2nnCjM8MWiaMKBXx4B5enzLzV172XEXLezIPtQBE70gr cHJO9pDiTDbFst3Xjmp0mEeEKYftgtX/o0vCbG8GeQrYGSlAIUzNxk5dFPpcdx/8yZn3 i1TQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=b3G2A2BIh0DmX2TmfDokfrDyJrWYU1o5oriGLqeDKVo=; b=Pr0gG8nk5G9auAz/qLvq2Dd+RE3R6hnYrUdVjQuoZ67qMrmK/q2xqdmL/mOTLBE5na +bs8qhbffs2gOfFK349h8k0IcPe5IGAeVM8JNFRm2LT1MCtgGy/+sHWXQKVnV/ectnhA 1JHrhemCLXkx6rEiL2PoZGGbGBDUVMnnAMWIOLYfjK6gdp08sq4TwLGT+Q3s7TY2Iyro 6RdsBZBS6S4ATGydLfgKzk6PB4fv5hj9B8Spp0q9HnB5fRw5NlodaAIL7b/tjJ4x4f88 /pEWbd4rvPf8VUWCqmxts8ZGyCISfsS9gbQJKo43ivmFwQ6yvFAZuC2td+HgHLMGMFk5 I2iQ== X-Gm-Message-State: AKaTC02YHnRVcdCZSwozFPooIZTCBWtey44m23KlXDPA2FKGN0M8WnNwmu25BKtXa6z6mw== X-Received: by 10.194.26.133 with SMTP id l5mr6776678wjg.4.1479676998042; Sun, 20 Nov 2016 13:23:18 -0800 (PST) Received: from chaz.gmail.com ([90.201.137.34]) by smtp.gmail.com with ESMTPSA id q7sm21269461wjh.9.2016.11.20.13.23.17 for (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sun, 20 Nov 2016 13:23:17 -0800 (PST) Date: Sun, 20 Nov 2016 21:23:16 +0000 From: Stephane Chazelas To: bug-grep@gnu.org Subject: Re: [regression] [d-f] no longer includes e with acute accent in single-byte locales Message-ID: <20161120212316.GA25881@chaz.gmail.com> References: <20161120211431.GC4814@chaz.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20161120211431.GC4814@chaz.gmail.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -4.0 (----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -4.0 (----) 2016-11-20 21:14:31 +0000, Stephane Chazelas: [...] > echo é | grep '[d-f]' > > no longer matches in locales like fr_FR.iso885915@euro or > en_GB.iso88591 where the character set is single-byte like > ISO-8859-1. It still works OK with UTF-8. [...] I also seems to still be OK with other multi-byte locales like zh_HK.big5hkscs: $ locale charmap BIG5-HKSCS $ printf '\ue9' | ./src/grep '[d-f]' | hd 00000000 88 6d 0a |.m.| 00000003 Though: $ printf '\ue9' | ./src/grep '.*m' | hd 00000000 88 6d 0a |.m.| However, that seems to be a separate issue as it also failed in earlier versions. I'll raise that separately. -- Stephane From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 20 16:38:46 2016 Received: (at submit) by debbugs.gnu.org; 20 Nov 2016 21:38:46 +0000 Received: from localhost ([127.0.0.1]:36465 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8Zoz-0007PA-V4 for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:38:46 -0500 Received: from eggs.gnu.org ([208.118.235.92]:40419) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8Zoy-0007Ow-7s for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:38:44 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c8Zos-00036K-DV for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:38:39 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50 autolearn=disabled version=3.3.2 Received: from lists.gnu.org ([2001:4830:134:3::11]:32887) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1c8Zos-00036G-A9 for submit@debbugs.gnu.org; Sun, 20 Nov 2016 16:38:38 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59623) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c8Zor-0005ma-Aa for bug-grep@gnu.org; Sun, 20 Nov 2016 16:38:38 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1c8Zoo-00035m-6i for bug-grep@gnu.org; Sun, 20 Nov 2016 16:38:37 -0500 Received: from atl4mhob12.myregisteredsite.com ([209.17.115.50]:59930) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1c8Zoo-00035h-0m for bug-grep@gnu.org; Sun, 20 Nov 2016 16:38:34 -0500 Received: from mailpod.hostingplatform.com ([10.30.77.35]) by atl4mhob12.myregisteredsite.com (8.14.4/8.14.4) with ESMTP id uAKLcVhW030180 for ; Sun, 20 Nov 2016 16:38:31 -0500 Received: (qmail 8341 invoked by uid 0); 20 Nov 2016 21:38:31 -0000 X-TCPREMOTEIP: 99.253.103.29 X-Authenticated-UID: dclarke@blastwave.org Received: from unknown (HELO ?172.16.35.41?) (dclarke@blastwave.org@99.253.103.29) by 0 with ESMTPA; 20 Nov 2016 21:38:30 -0000 Subject: Re: bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales To: bug-grep@gnu.org, stephane.chazelas@gmail.com References: <20161120211431.GC4814@chaz.gmail.com> From: Dennis Clarke Message-ID: <7175213a-c7a9-d98b-32d1-8160dc4fb6cd@blastwave.org> Date: Sun, 20 Nov 2016 16:38:29 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161120211431.GC4814@chaz.gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-MIME-Autoconverted: from 8bit to quoted-printable by atl4mhob12.myregisteredsite.com id uAKLcVhW030180 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x X-Received-From: 2001:4830:134:3::11 X-Spam-Score: -5.0 (-----) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -5.0 (-----) On 11/20/2016 04:14 PM, Stephane Chazelas wrote: > printf '\351\n' | LC_ALL=3Den_US.iso88591 On a Solaris 10 system the locales are named a bit different : dasoyva_$ locale -a C POSIX en_CA en_CA.ISO8859-1 en_CA.UTF-8 en_US en_US.ISO8859-1 en_US.ISO8859-15 en_US.ISO8859-15@euro en_US.UTF-8 es es_MX es_MX.ISO8859-1 es_MX.UTF-8 fr fr_CA fr_CA.ISO8859-1 fr_CA.UTF-8 dasoyva_$ LC_ALL=3Den_US.ISO8859-1 /usr/bin/printf '\351\n' | od -Ax -t x= 1 -v 0000000 e9 0a 0000002 I am not sure if the single byte 0xe9h is correct at all for this test. dasoyva_$ LC_ALL=3Den_US.UTF-8 /usr/bin/printf '\351\n' | od -Ax -t x1 -v 0000000 e9 0a 0000002 dasoyva_$ LC_ALL=3Den_US.ISO8859-1 /usr/bin/printf '\351\n' =EF=BF=BD Wonder how I would test this on a strict POSIX system here. Any thoughts? Dennis From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 20 17:06:38 2016 Received: (at 24973) by debbugs.gnu.org; 20 Nov 2016 22:06:39 +0000 Received: from localhost ([127.0.0.1]:36482 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8aFy-0008BW-NM for submit@debbugs.gnu.org; Sun, 20 Nov 2016 17:06:38 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:35040) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8aFx-0008BH-3L for 24973@debbugs.gnu.org; Sun, 20 Nov 2016 17:06:37 -0500 Received: by mail-wm0-f67.google.com with SMTP id a20so22164073wme.2 for <24973@debbugs.gnu.org>; Sun, 20 Nov 2016 14:06:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=PqPHylVfSq1Z3DH3eo9OR2ln7HFCA1vS9rfTLSpbsu8=; b=nDWS2pvTsD9oUNv+ht4mYOK0mn3AQam74kmZ3SolDD1OchnrD+nkje4h3SgC8dOwTt zzajy0ZCUc0H/f+GVgnWKENDfm/TxCfB3/LpM6kQao/NEaG9r1CNXGKlt9KIb9EgqbKu O38d22pYs97Cb9BgVZC0u5z0WN/ndr++BTk0xEHbx08HK+nWG36WvV8qWG1IDzjUC4dn wAgMra3yghPnWt2iKTus/7yNDo8xA2+mCnJD+agRj4WLFsHMrQydV2zZEC4iPoD3ycbF EKu9Lne/nkntkqmbzvBOTxIEIwcfSuDdIzGbKcmEx4EA6kYailO+vEfZPnNthONQ3ttA MuLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=PqPHylVfSq1Z3DH3eo9OR2ln7HFCA1vS9rfTLSpbsu8=; b=ApAshPNyXrIE27mtRNL8irLlcWgsrTFntUoHnFBlI15wPkTaD4NFaEnJUkIQOHhuVN CUKlS7TQEx4EyV3GgiAEUDEx13R0rkJZ5IgqCfHjStLOS1ypbeluOIOHOBWhQyWEp5M1 s8vK5Q2GG3qiWklXuMQa48gXd2Y30v5LYWNdDp526NGgxANYB1fwgDBhc0ZwNvxZKEPC sVjpezxwQ8mCNZ5Bc+RDvRvUUGceJkOg+yJfdUY3rS2khEkFow7DjB5VEYrHWA5n+D72 LW3DQILhu5CyOf+X0yHKrV0ZLac1XTrl8imOsn3lKOskdaUGN2wmHrbzioLnFm1sfPcR rOvg== X-Gm-Message-State: AKaTC01olJilLwEfmG3EK366NCbMaEDLgZ+9iDG171nout7ZpIaUZIWVQVsakd438DZQwg== X-Received: by 10.28.226.139 with SMTP id z133mr11140373wmg.139.1479679591455; Sun, 20 Nov 2016 14:06:31 -0800 (PST) Received: from chaz.gmail.com ([90.201.137.34]) by smtp.gmail.com with ESMTPSA id b15sm15985586wma.5.2016.11.20.14.06.30 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sun, 20 Nov 2016 14:06:30 -0800 (PST) Date: Sun, 20 Nov 2016 22:06:29 +0000 From: Stephane CHAZELAS To: Dennis Clarke Subject: Re: bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales Message-ID: <20161120220629.GD4814@chaz.gmail.com> References: <20161120211431.GC4814@chaz.gmail.com> <7175213a-c7a9-d98b-32d1-8160dc4fb6cd@blastwave.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <7175213a-c7a9-d98b-32d1-8160dc4fb6cd@blastwave.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 24973 Cc: 24973@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: 0.5 (/) 2016-11-20 16:38:29 -0500, Dennis Clarke: [...] > On a Solaris 10 system the locales are named a bit different : > dasoyva_$ locale -a [...] > en_US.ISO8859-1 > en_US.ISO8859-15 [...] > dasoyva_$ LC_ALL=en_US.ISO8859-1 /usr/bin/printf '\351\n' | od -Ax -t x1 -v > 0000000 e9 0a > 0000002 > > I am not sure if the single byte 0xe9h is correct at all for this test. [...] Note that printf '\351' Will print the byte 0xe9 regardless of the locale. 0xe9 happens to be the code point for é in ISO8859-1 and ISO8859-15. > dasoyva_$ LC_ALL=en_US.UTF-8 /usr/bin/printf '\351\n' | od -Ax -t x1 -v > 0000000 e9 0a > 0000002 > > dasoyva_$ LC_ALL=en_US.ISO8859-1 /usr/bin/printf '\351\n' > � > > Wonder how I would test this on a strict POSIX system here. Any thoughts? [...] POSIX leaves all that unspecified. It doesn't specify any locale other than C/POSIX. It leaves '[d-f]' unspecified in locales other than C/POSIX. Here, the problem is a change of behaviour between GNU grep 2.25 and 2.26. (and 2.26 behaviour makes it inconsistent with other GNU utilities). Both behaviours are POSIX compliant, since [d-f] is unspecified anyway. On your Solaris machine, you can check: printf '\351\n' | LC_ALL=en_US.ISO8859-1 gnu-grep '[d-f]' | od -An -vtx1 And check if it's consistent with /usr/xpg4/bin/grep. -- Stephane From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 20 17:22:45 2016 Received: (at 24973) by debbugs.gnu.org; 20 Nov 2016 22:22:45 +0000 Received: from localhost ([127.0.0.1]:36488 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8aVZ-00009k-4L for submit@debbugs.gnu.org; Sun, 20 Nov 2016 17:22:45 -0500 Received: from mail-wj0-f193.google.com ([209.85.210.193]:32818) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8aVY-00009V-4C for 24973@debbugs.gnu.org; Sun, 20 Nov 2016 17:22:44 -0500 Received: by mail-wj0-f193.google.com with SMTP id kp2so2646593wjc.0 for <24973@debbugs.gnu.org>; Sun, 20 Nov 2016 14:22:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=FGqAMA9E8+bNJ0+ob5JtdXzrNHOVe8F6+LIu0AHYcNs=; b=nmYWox9KNHtnTvtgCSjBfHF9QGYP7UmHNECp9eLtkBS1/n9/5RQbMtLVRsPzAa3ckU rSxkMvFr0rZMvLsZuvSWssoRv0tgvDgnOWEbW5g6rcr+Oob/5NBYd0oUgIYP/NFg6PDw MqTKUoZ86vwW7JUfbd4QAo9aEw/Qb1YJ5/734sIqCy71OVOCq6LNWm5m0J8Xg3cpCiWh w1bucW83xAqrbJG8+VtTAAVoobg5N7c5GYrddmXBC3Xlm7/cDUdNCLYTBuAf2ZXyYMXr Xnuy3YrUsPMiwIkobDWnHD2/Ebg4A0HRy3tBDknzh0/qT6TN65LJo0vxXf9JKId3LZ2L vI4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to:user-agent; bh=FGqAMA9E8+bNJ0+ob5JtdXzrNHOVe8F6+LIu0AHYcNs=; b=dLExGE7nWWdI7U89p1PsWO8zBcelbv1j+nI1wK05JInlK6n2hXYYIEmw1xHYf0awBU XNkthK++5fSJjHUJWbm1jS+66bXU+OFEaJ2fTrjxOVNBd8GikxNcTDSpKgiOgxOTAk90 zdu4j8mwUZy7DBT1bBBMr000nPLh8omlU5mx2ismhFuHT33rXnW7krdzUaFaYSpkM8DJ QXjDORN7S8Esx2BjS9n9t/G46hwK9NkVCWkLHGOzuGa/0vx4pq7FRP33SCJnvG8+1I0r 04mq5Ia0CqCnz21c6lv62eK/4jxzcDc6nYmgWK6+PNRAXsWshqH2d0dY+UqpuFUM9/0l ClXQ== X-Gm-Message-State: AKaTC03tFwVAcce8abWDKogLAOE9Iro7okEqV8nyJecZcOqpZhhbPg+JytKmbexxVt3eAw== X-Received: by 10.194.52.74 with SMTP id r10mr6812296wjo.113.1479680558450; Sun, 20 Nov 2016 14:22:38 -0800 (PST) Received: from chaz.gmail.com ([90.201.137.34]) by smtp.gmail.com with ESMTPSA id 138sm15983931wms.20.2016.11.20.14.22.37 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sun, 20 Nov 2016 14:22:37 -0800 (PST) Date: Sun, 20 Nov 2016 22:22:36 +0000 From: Stephane CHAZELAS To: Dennis Clarke Subject: Re: bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales Message-ID: <20161120222236.GE4814@chaz.gmail.com> References: <20161120211431.GC4814@chaz.gmail.com> <7175213a-c7a9-d98b-32d1-8160dc4fb6cd@blastwave.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <7175213a-c7a9-d98b-32d1-8160dc4fb6cd@blastwave.org> User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Score: -0.0 (/) X-Debbugs-Envelope-To: 24973 Cc: 24973@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) 2016-11-20 16:38:29 -0500, Dennis Clarke: > On 11/20/2016 04:14 PM, Stephane Chazelas wrote: > >printf '\351\n' | LC_ALL=en_US.iso88591 > > On a Solaris 10 system [...] FWIW, on Solaris 11, it looks as if (speculated from very few tests) GNU grep's ranges ([x-y]) are only based on code point, both in 2.25 and 2.26 so [d-f] doesn't match é in any locale. Seems to behave like /bin/grep in that instance, not /usr/xpg4/bin/grep -- Stephane From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 20 23:19:04 2016 Received: (at 24973) by debbugs.gnu.org; 21 Nov 2016 04:19:04 +0000 Received: from localhost ([127.0.0.1]:36570 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8g4O-0002Xk-HL for submit@debbugs.gnu.org; Sun, 20 Nov 2016 23:19:04 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:48822) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8g4M-0002XE-U7 for 24973@debbugs.gnu.org; Sun, 20 Nov 2016 23:19:03 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C31A41600A3; Sun, 20 Nov 2016 20:18:55 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id uTwZ7FSEppXM; Sun, 20 Nov 2016 20:18:55 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 09A721600A2; Sun, 20 Nov 2016 20:18:55 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id uMTn_HMXB5Tb; Sun, 20 Nov 2016 20:18:54 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id DD4641600A0; Sun, 20 Nov 2016 20:18:54 -0800 (PST) From: Paul Eggert To: bug-gnulib@gnu.org, 24973@debbugs.gnu.org, stephane.chazelas@gmail.com Subject: [PATCH] dfa: fix logic typo Date: Sun, 20 Nov 2016 20:18:38 -0800 Message-Id: <1479701918-7149-1-git-send-email-eggert@cs.ucla.edu> X-Mailer: git-send-email 2.7.4 X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 24973 Cc: Paul Eggert X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) Problem reported by Stephane Chazelas (Bug#24973). * lib/dfa.c (using_simple_locale): Fix typo that caused some non-simple locales like fr_FR to be treated as simple. --- ChangeLog | 7 +++++++ lib/dfa.c | 4 ++-- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/ChangeLog b/ChangeLog index 88139c3..fbdecf0 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,10 @@ +2016-11-20 Paul Eggert + + dfa: fix logic typo + Problem reported by Stephane Chazelas (Bug#24973). + * lib/dfa.c (using_simple_locale): Fix typo that caused some + non-simple locales like fr_FR to be treated as simple. + 2016-11-20 Jim Meyering fix test driver leaks: exclude, malloc, realloc diff --git a/lib/dfa.c b/lib/dfa.c index 744a9f1..7b80a1a 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -815,8 +815,8 @@ using_simple_locale (bool multibyte) && '}' == 125 && '~' == 126) }; - if (native_c_charset && !multibyte) - return true; + if (!native_c_charset || multibyte) + return false; else { /* Treat C and POSIX locales as being compatible. Also, treat -- 2.7.4 From debbugs-submit-bounces@debbugs.gnu.org Sun Nov 20 23:34:45 2016 Received: (at 24973-done) by debbugs.gnu.org; 21 Nov 2016 04:34:45 +0000 Received: from localhost ([127.0.0.1]:36585 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8gJZ-0004aR-DG for submit@debbugs.gnu.org; Sun, 20 Nov 2016 23:34:45 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:49862) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1c8gJX-0004aB-FW for 24973-done@debbugs.gnu.org; Sun, 20 Nov 2016 23:34:44 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id E4E2B16009D; Sun, 20 Nov 2016 20:34:37 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id iqElSgNaub9g; Sun, 20 Nov 2016 20:34:36 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 882301600A0; Sun, 20 Nov 2016 20:34:36 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Cj6uH4lHKrxr; Sun, 20 Nov 2016 20:34:36 -0800 (PST) Received: from [192.168.1.9] (unknown [47.153.178.162]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 65C0E16009D; Sun, 20 Nov 2016 20:34:36 -0800 (PST) Subject: Re: bug#24973: [regression] [d-f] no longer includes e with acute accent in single-byte locales To: Stephane Chazelas , 24973-done@debbugs.gnu.org References: <20161120211431.GC4814@chaz.gmail.com> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <48683d86-36dc-7a02-4024-56870014b294@cs.ucla.edu> Date: Sun, 20 Nov 2016 20:34:35 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161120211431.GC4814@chaz.gmail.com> Content-Type: multipart/mixed; boundary="------------874ADBC0536C7473AF4CB26F" X-Spam-Score: -2.9 (--) X-Debbugs-Envelope-To: 24973-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.9 (--) This is a multi-part message in MIME format. --------------874ADBC0536C7473AF4CB26F Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Stephane Chazelas wrote: > 2.25 was OK. git bisect points to commit > 2769d5331a38d623b67b1860ac46b39ff7e54aca Thanks for pinpointing the bug. It was my logic error in that commit. Fixed by altering Gnulib as follows: http://lists.gnu.org/archive/html/bug-gnulib/2016-11/msg00086.html and by installing the attached patches into grep. --------------874ADBC0536C7473AF4CB26F Content-Type: text/x-diff; name="0001-build-update-gnulib-submodule-to-latest.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-build-update-gnulib-submodule-to-latest.patch" >From 00a6d71259ba8432db7eaa2729d215858c4c0cb3 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 20 Nov 2016 20:21:06 -0800 Subject: [PATCH 1/2] build: update gnulib submodule to latest --- gnulib | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gnulib b/gnulib index 3c72272..60e8ffc 160000 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit 3c72272268021349cbc9a442fe033e7ba13a0c17 +Subproject commit 60e8ffca02dd4eac3a87b744f4f9ef68f3dffa35 -- 2.7.4 --------------874ADBC0536C7473AF4CB26F Content-Type: text/x-diff; name="0002-tests-check-for-unibyte-French-range-bug.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0002-tests-check-for-unibyte-French-range-bug.patch" >From ed6228198180fedc728a4e2981939fa0c902bbf3 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Sun, 20 Nov 2016 20:31:01 -0800 Subject: [PATCH 2/2] tests: check for unibyte French range bug Problem reported by Stephane Chazelas (Bug#24973). This bug was fixed in Gnulib. * NEWS: Document the fix. * tests/init.cfg (require_ru_RU_koi8_r): Remove. * tests/unibyte-bracket-expr: Add a test for the bug. Call get-mb-cur-max directly instead of bothering with require_ru_RU_koi8_r. --- NEWS | 3 +++ tests/init.cfg | 9 ------- tests/unibyte-bracket-expr | 58 ++++++++++++++++++++++++++++------------------ 3 files changed, 39 insertions(+), 31 deletions(-) diff --git a/NEWS b/NEWS index 6138b48..bd1a201 100644 --- a/NEWS +++ b/NEWS @@ -10,6 +10,9 @@ GNU grep NEWS -*- outline -*- >/dev/null" where PROGRAM dies when writing into a broken pipe. [bug introduced in grep-2.26] + grep no longer mishandles ranges in nontrivial unibyte locales. + [bug introduced in grep-2.26] + grep -P no longer attempts multiline matches. This works more intuitively with unusual patterns, and means that grep -Pz no longer rejects patterns containing ^ and $ and works when combined with -x. diff --git a/tests/init.cfg b/tests/init.cfg index 1677ec5..6c7abd2 100644 --- a/tests/init.cfg +++ b/tests/init.cfg @@ -74,15 +74,6 @@ require_tr_utf8_locale_() esac } -require_ru_RU_koi8_r() -{ - path_prepend_ . - case $(get-mb-cur-max ru_RU.KOI8-R) in - 1) ;; - *) skip_ 'ru_RU.KOI8-R locale not found' ;; - esac -} - require_compiled_in_MB_support() { require_en_utf8_locale_ diff --git a/tests/unibyte-bracket-expr b/tests/unibyte-bracket-expr index 68c475c..85aff1c 100755 --- a/tests/unibyte-bracket-expr +++ b/tests/unibyte-bracket-expr @@ -1,9 +1,4 @@ #!/bin/sh -# Exercise a DFA range bug that arises only with a unibyte encoding -# for which the wide-char-to-single-byte mapping is nontrivial. -# E.g., the regexp, [C] would fail to match C in a unibyte locale like -# ru_RU.KOI8-R for any C whose wide-char representation differed from -# its single-byte equivalent. # Copyright (C) 2011-2016 Free Software Foundation, Inc. @@ -21,23 +16,42 @@ # along with this program. If not, see . . "${srcdir=.}/init.sh"; path_prepend_ ../src -require_ru_RU_koi8_r -LC_ALL=ru_RU.KOI8-R -export LC_ALL - -fail=0 - -i=128 -while :; do - in=in-$i - octal=$(printf '%03o' $i) - b=$(printf "\\$octal") - echo "$b" > $in || framework_failure_ - grep "[$b]" $in > out || fail=1 - compare out $in || fail=1 - - test $i = 255 && break - i=$(expr $i + 1) + +# Add "." to PATH for the use of get-mb-cur-max. +path_prepend_ . + +# Exercise a DFA range bug that arises only with a unibyte encoding +# for which the wide-char-to-single-byte mapping is nontrivial. +# E.g., the regexp, [C] would fail to match C in a unibyte locale like +# ru_RU.KOI8-R for any C whose wide-char representation differed from +# its single-byte equivalent. + +case $(get-mb-cur-max ru_RU.KOI8-R) in + 1) + fail=0 + + i=128 + while :; do + in=in-$i + octal=$(printf '%03o' $i) + b=$(printf "\\$octal") + echo "$b" > $in || framework_failure_ + LC_ALL=ru_RU.KOI8-R grep "[$b]" $in > out || fail=1 + compare out $in || fail=1 + + test $i = 255 && break + i=$(expr $i + 1) + done;; +esac + +# Exercise a DFA range bug where '[d-f]' did not match accented 'e' in a +# unibyte French locale. + +for locale in fr_FR.iso88591 fr_FR.iso885915@euro fr_FR.ISO8859-1; do + case $(get-mb-cur-max $locale) in + 1) + printf '\351\n' | LC_ALL=$locale grep '[d-f]' || fail=1;; + esac done Exit $fail -- 2.7.4 --------------874ADBC0536C7473AF4CB26F-- From unknown Sat Jun 21 02:53:31 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 19 Dec 2016 12:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator