From debbugs-submit-bounces@debbugs.gnu.org Wed Sep 23 09:23:28 2020 Received: (at submit) by debbugs.gnu.org; 23 Sep 2020 13:23:28 +0000 Received: from localhost ([127.0.0.1]:34829 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kL4k4-0002yd-6w for submit@debbugs.gnu.org; Wed, 23 Sep 2020 09:23:28 -0400 Received: from lists.gnu.org ([209.51.188.17]:54900) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kL4k0-0002yQ-HA for submit@debbugs.gnu.org; Wed, 23 Sep 2020 09:23:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:32998) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kL4k0-0007C8-At for bug-grep@gnu.org; Wed, 23 Sep 2020 09:23:24 -0400 Received: from mailgw05.kcn.ne.jp ([61.86.7.212]:36096) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kL4jx-0003uc-Qj for bug-grep@gnu.org; Wed, 23 Sep 2020 09:23:23 -0400 Received: from mxs01-s (mailgw1.kcn.ne.jp [61.86.15.233]) by mailgw05.kcn.ne.jp (Postfix) with ESMTP id 0AB4C2079395 for ; Wed, 23 Sep 2020 22:23:13 +0900 (JST) X-matriXscan-loop-detect: 232c64a20622663dc07b16036fce9c7e91c056e2 Received: from mail13.kcn.ne.jp ([61.86.6.131]) by mxs01-s with ESMTP; Wed, 23 Sep 2020 22:23:11 +0900 (JST) Received: from [10.120.1.110] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail13.kcn.ne.jp (Postfix) with ESMTPA id E702F40E242A for ; Wed, 23 Sep 2020 22:23:10 +0900 (JST) Date: Wed, 23 Sep 2020 22:23:09 +0900 From: Norihiro Tanaka To: Subject: wrong result for grep -io in turkish locale Message-Id: <20200923222309.5BA2.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.75.01 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized Received-SPF: pass client-ip=61.86.7.212; envelope-from=noritnk@kcn.ne.jp; helo=mailgw05.kcn.ne.jp X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/23 09:23:13 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) In turkish locale, upper and lower case are mapped as following. U0049 <-> U0131 U0069 <-> U0130 It's expected that both following test cases returns U0130, but later returns nothing. $ printf '\304\260\n' >I # U0130 $ env LC_ALL=tr_TR.utf8 grep -i i I ? # U0130 $ env LC_ALL=tr_TR.utf8 grep -oi i I $ By the way, both following test cases work correctly. $ printf '\304\260\n' >i # U0131 $ env LC_ALL=tr_TR.utf8 grep -i I i ? # U0131 $ env LC_ALL=tr_TR.utf8 grep -oi I i ? # U0131 $ From debbugs-submit-bounces@debbugs.gnu.org Wed Sep 23 10:31:19 2020 Received: (at 43577) by debbugs.gnu.org; 23 Sep 2020 14:31:19 +0000 Received: from localhost ([127.0.0.1]:37417 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kL5ni-00065E-Ou for submit@debbugs.gnu.org; Wed, 23 Sep 2020 10:31:19 -0400 Received: from mail-wm1-f54.google.com ([209.85.128.54]:39689) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kL5nf-00064v-St for 43577@debbugs.gnu.org; Wed, 23 Sep 2020 10:31:16 -0400 Received: by mail-wm1-f54.google.com with SMTP id b79so270689wmb.4 for <43577@debbugs.gnu.org>; Wed, 23 Sep 2020 07:31:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=HsTVlKBrG+bNxhPSVe2KZZexI77cmiN8VYWqXQEDf14=; b=jGYjlGqMwpfBoHoNSmj3nOoPvWvq1P4vN+iVyBlju0+mYOmIMBTNsF9xUGB2Iy3jLL wHrYK631vZOAHKyEd0v1DUbzrkjlsoFet2fusRWSJAoeehraQ4CUnPybGePGD4V2zLcX IuNuQedhnRPoK1V0JYKaNPLcF8lZMJSXedQAyLCdSODSjppTGxkE1fSozpy8M6FNqClU q3YTCUTosKTGzQUHbJtw5fZGdgcJx+0HnxEpr2jB49QM9t2embcz2v8/3y6wOuM1zOxm ylFES0nQBy/XYGuA1jMMJDJE0kGJObPNnRJl3Avxj3LQNL9zjVB2IHW0pH4q6AN3EKZn 77eg== X-Gm-Message-State: AOAM530uoa9ju0ShoePSruxrDKBXmcTUMVkcvrXLG/aEubaSXzvXyeJo 151opjH2B9twvNAlHx5sylvq0OfQL9Rr3NHb+ss= X-Google-Smtp-Source: ABdhPJzQMdA1bhzc+Mgehh70S166Cf8knK7qv6uMBXvkPL7/XRD3uBIuuH9GVxuJS80B0yUhM3820JCduFG7MeTMnRU= X-Received: by 2002:a1c:5f46:: with SMTP id t67mr6710434wmb.71.1600871469974; Wed, 23 Sep 2020 07:31:09 -0700 (PDT) MIME-Version: 1.0 References: <20200923222309.5BA2.27F6AC2D@kcn.ne.jp> In-Reply-To: <20200923222309.5BA2.27F6AC2D@kcn.ne.jp> From: Jim Meyering Date: Wed, 23 Sep 2020 07:30:58 -0700 Message-ID: Subject: Re: bug#43577: wrong result for grep -io in turkish locale To: Norihiro Tanaka Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 43577 Cc: 43577@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Wed, Sep 23, 2020 at 6:24 AM Norihiro Tanaka wrote: > > In turkish locale, upper and lower case are mapped as following. > > U0049 <-> U0131 > U0069 <-> U0130 > > It's expected that both following test cases returns U0130, but later > returns nothing. > > $ printf '\304\260\n' >I # U0130 > $ env LC_ALL=tr_TR.utf8 grep -i i I > ? # U0130 Oh! We must have different code or systems. When I run anything using -i and that locale on Fedora 32, it aborts: $ LC_ALL=tr_TR.utf8 src/grep -i a zsh: abort (core dumped) LC_ALL=tr_TR.utf8 src/grep -i a From debbugs-submit-bounces@debbugs.gnu.org Wed Sep 23 14:57:36 2020 Received: (at 43577) by debbugs.gnu.org; 23 Sep 2020 18:57:36 +0000 Received: from localhost ([127.0.0.1]:37871 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kL9xQ-0006up-II for submit@debbugs.gnu.org; Wed, 23 Sep 2020 14:57:36 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:37304) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kL9xN-0006uR-Cg for 43577@debbugs.gnu.org; Wed, 23 Sep 2020 14:57:35 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 39F74160104; Wed, 23 Sep 2020 11:57:27 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 4msFBFlcnrLt; Wed, 23 Sep 2020 11:57:26 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 82735160108; Wed, 23 Sep 2020 11:57:26 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 94qU8qNTbC_Z; Wed, 23 Sep 2020 11:57:26 -0700 (PDT) Received: from [192.168.1.9] (cpe-23-243-218-95.socal.res.rr.com [23.243.218.95]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 170C51600F3; Wed, 23 Sep 2020 11:57:26 -0700 (PDT) Subject: Re: bug#43577: wrong result for grep -io in turkish locale To: Jim Meyering , Norihiro Tanaka References: <20200923222309.5BA2.27F6AC2D@kcn.ne.jp> From: Paul Eggert Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDVlFRVEFRZ0FQ d0liQXdZTApDUWdIQXdJR0ZRZ0NDUW9MQkJZQ0F3RUNIZ0VDRjRBV0lRUitONUtwMkt6MzFq TzhGWWp0bCtrT1lxcCtOQVVDClh5Vzlsd1VKRks0THN3QUtDUkR0bCtrT1lxcCtOS05WRC85 SE1zSTE2MDZuMFV1VFhId0lUc3lPakFJOVNET1QKK0MzRFV2NnFsTTVCSDJuV0FNVGlJaXlB NXVnbHNKdjkzb2kydk50RmYvUS9tLzFjblpXZ25WbkV4a3lMSTRFTgpTZDF1QnZyMC9sQ1Nk UGxQME1nNkdXU3BYTXUreDB2ZFQwQWFaTk9URTBGblB1b2xkYzNYRDc2QzJxZzhzWC9pCmF4 WFRLSHk5UCtCbEFxL0NzNy9weERRMEV6U24wVVNaMkMwbDV2djRQTXBBL3BpY25TNks2MDlK dkRHYU9SbXcKWmVYSVpxUU5aVitaUXMrVVl0Vm9ndURUcWJ5M0lVWTFJOEJsWEhScHRhajlB TW40VW9oL0NxcFFsVm9qb3lXbApIcWFGbm5KQktlRjBodko5U0F5YWx3dXpBakc3dlFXMDdN WW5jYU9GbTB3b2lLYmc1SkxPOEY0U0JUSWt1TzBECkNmOW5MQWF5NlZzQjRyendkRWZSd2pQ TFlBbjdNUjNmdkhDRXpmcmtsZFRyYWlCTzFUMGllREs4MEk3c0xmNnAKTWVDWUkxOXBVbHgw L05STUdDZGRpRklRZGZ0aEtXWEdSUzVMQXM4andCZjhINkc1UFdpblByRUlhb21JUDIxaQp2 dWhRRDA3YllxOUlpSWRlbGpqVWRIY0dJMGkvQjRNNTZaYWE4RmYzOGluaU9sckRZQ21ZV1I0 ZENXWml1UWVaCjNPZ3FlUXM5YTZqVHZnZERHVm1SVnFZK2p6azhQbGFIZmNvazhST2hGY0hL a2NmaHVCaEwyNWhsUklzaFJET0UKc2tYcUt3bnpyYnFnYTNHWFpYZnNYQW9GYnpOaExkTHY5 QStMSkFZU2tYUDYvNXFkVHBFTFZHb3N5SDg4NFZkYgpCcGtHSTA0b1lWcXVsYmtDRFFSTWdI SmtBUkFBcG9YcnZ4UDNESWZqQ05PdFhVL1Bkd01TaEtkWC9SbFNzNVBmCnVuVjF3YktQOGhl clhIcnZRZEZWcUVDYVRTeG1saHpiazhYMFBrWTlnY1ZhVTJPNDlUM3FzT2QxY0hlRjUyWUYK R0V0MExoc0JlTWpnTlg1dVoxVjc2cjhneWVWbEZwV1diMFNJd0pVQkhyRFhleEY2N3VwZVJi MnZkSEJqWUROZQp5U24rMEI3Z0ZFcXZWbVp1K0xhZHVkRHA2a1FMamF0RnZIUUhVU0dOc2hC bmtrY2FUYmlJOVBzdDBHQ2MyYWl6Cm5CaVBQQTJXUXhBUGxQUmgzT0dUc241VEhBRG1ianFZ NkZFTUxhc1ZYOERTQ2JsTXZMd05lTy84U3h6aUJpZGgKcUxwSkNxZFFSV0hrdTVYeGdJa0dl S096NU9MRHZYSFdKeWFmckVZamprUzZBazZCNXo2c3ZLbGlDbFduakhRYwpqbFB6eW9GRmdL VEVmY3FEeENqNFJZMEQwRGd0RkQwTmZ5ZU9pZHJTQi9TelRlMmh3cnlRRTNycFNpcW8rMGNH CmR6aDR5QUhLWUorVXJYWjRwOTNaaGpHZktEMXhsck5ZRGxXeVc5UEdtYnZxRnVEbWlJQVFm OVdEL3d6RWZJQ2MKK0YrdURESSt1WWtSeFVGcDkyeWttZGhERUZnMXlqWXNVOGlHVTY5YUh5 dmhxMzZ6NHpjdHZicWhSTnpPV0IxYgpWSi9kSU1EdnNFeEdjWFFWRElUN3NETlh2MHdFM2pL U0twcDdOREcxb1hVWEwrMitTRjk5S2p5NzUzQWJRU0FtCkg2MTdmeUJOd2hKV3ZRWWcrbVV2 UHBpR090c2VzOUVYVUkzbFM0djBNRWFQRzQzZmxFczFVUisxcnBGUVdWSG8KMXkxT08rc0FF UUVBQVlrQ1BBUVlBUWdBSmdJYkRCWWhCSDQza3FuWXJQZldNN3dWaU8yWDZRNWlxbjQwQlFK ZgpKYjJ6QlFrVXJndlBBQW9KRU8yWDZRNWlxbjQwY25NUC8xN0NnVWtYVDlhSUpyaVBNOHdi Y2VZcmNsNytiZFlFCmY3OVNsd1NiYkhON1I0Q29JSkZPbE45Uy8zNHR5cEdWWXZwZ21DSkRZ RlRCeHlQTzkyaU1YRGdBNCtjV0h6dDUKVDFhWU85aHNLaGg3dkR0Sys2UHJvWkdjKzA4Z1VU WEhoYjk3aE1NUWhrbkpsbmZqcFNFQzllbTkwNkZVK0k5MwpUMWZUR3VwbkJhM2FXY0s4ak0w SmFCR2J5MmhHMVMzb2xhRExTVHRCSU5OQlltdnVXUjlNS09oaHFEcmxrNWN3CkZESkxoNU5y WHRlRVkwOFdBemNMekczcGtyWFBIa0ZlTVF0ZnFrMGpMZEdHdkdDM05DSWtxWXJkTGhpUnZH cHIKdTM4QzI2UkVuNWY0STB2R0UzVmZJWEhlOFRNQ05tUXV0MU50TXVVbXBESXkxYUx4R3p1 cHRVaG5PSk4vL3IrVgpqRFBvaTNMT3lTTllwaHFlL2RNdWJzZlVyNm9oUDQxbUtGODFGdXdJ NGFtcUp0cnFJTDJ5cWF4M2EwcWxmd0N4ClhmdGllcUpjdWVrWCtlQ1BEQ0tyWU1YUjBGWWd3 cEcySVRaVUd0ckVqRVNsRTZEc2N4NzM0SEtkcjVPUklvY0wKVVVLRU9HZWlVNkRHaEdGZGI1 VHd1MFNuK3UxbVVQRE4wTSsrQ2RNdkNsSUU4a2xvNEc5MUVPSW11MVVwYjh4YwpPUFF3eGgx andxU3JVNVF3b05tU1llZ1FTSExwSVV1ckZ6MWlRVWgxdnBQWHpLaW5rV0VxdjRJcUExY2lM K0x5CnlTdUxrcDdNc0pwVlJNYldKQ05XT09TYmFING9EQko1ZEhNR2MzNXg1bW9zQ2s5MFBY a251RkREc1lIZkRvNXMKbWY5bG82WVh4N045Cj0zTGFJCi0tLS0tRU5EIFBHUCBQVUJMSUMg S0VZIEJMT0NLLS0tLS0K Organization: UCLA Computer Science Department Message-ID: <79342811-0ea9-512f-45d3-1c2eea3248f3@cs.ucla.edu> Date: Wed, 23 Sep 2020 11:57:25 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 43577 Cc: 43577@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) On 9/23/20 7:30 AM, Jim Meyering wrote: > $ LC_ALL=tr_TR.utf8 src/grep -i a > zsh: abort (core dumped) LC_ALL=tr_TR.utf8 src/grep -i a I can reproduce this bug. There seems to be a performance regression too. I'll look into it. From debbugs-submit-bounces@debbugs.gnu.org Wed Sep 23 21:47:45 2020 Received: (at 43577) by debbugs.gnu.org; 24 Sep 2020 01:47:45 +0000 Received: from localhost ([127.0.0.1]:38212 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kLGML-0001j0-Gd for submit@debbugs.gnu.org; Wed, 23 Sep 2020 21:47:45 -0400 Received: from mailgw07.kcn.ne.jp ([61.86.7.214]:39606) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kLGMH-0001ij-8S for 43577@debbugs.gnu.org; Wed, 23 Sep 2020 21:47:44 -0400 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw07.kcn.ne.jp (Postfix) with ESMTP id AD0334100B for <43577@debbugs.gnu.org>; Thu, 24 Sep 2020 10:47:33 +0900 (JST) X-matriXscan-loop-detect: 862ea557a998b1a892a8011d958b67f873cf8b0e Received: from mail10.kcn.ne.jp ([61.86.6.128]) by mxs02-s with ESMTP; Thu, 24 Sep 2020 10:47:32 +0900 (JST) Received: from [10.120.1.110] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail10.kcn.ne.jp (Postfix) with ESMTPA id B82E14125BB6 for <43577@debbugs.gnu.org>; Thu, 24 Sep 2020 10:47:32 +0900 (JST) Date: Thu, 24 Sep 2020 10:47:31 +0900 From: Norihiro Tanaka To: 43577@debbugs.gnu.org Subject: Re: bug#43577: wrong result for grep -io in turkish locale In-Reply-To: <20200923222309.5BA2.27F6AC2D@kcn.ne.jp> References: <20200923222309.5BA2.27F6AC2D@kcn.ne.jp> Message-Id: <20200924104724.1F2A.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5F6BFA2D000000001F21_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.75.01 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 43577 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) --------_5F6BFA2D000000001F21_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit I attach the fix for the bug. Regex is fixed in Paul, thank you. --------_5F6BFA2D000000001F21_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-grep-fix-ignore-case-Turkish-bug.patch" Content-Disposition: attachment; filename="0001-grep-fix-ignore-case-Turkish-bug.patch" Content-Transfer-Encoding: base64 RnJvbSA4ODRjNDZhYWRiZTZhMmY3MjAzZjg0ZDQxNzNhNTE1Y2E0Y2NmOGRlIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBUaHUsIDI0IFNlcCAyMDIwIDEwOjM5OjQ2ICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZ3Jl cDogZml4IGlnbm9yZS1jYXNlIFR1cmtpc2ggYnVnCgoqIHNyYy9ncmVwLmMgKGZncmVwX2ljYXNl X2NoYXJsZW4pOiBEbyBub3QgYXNzdW1lIHRoYXQgY29udmVydGluZyBzaW5nbGUtYnl0ZQpjaGFy YWN0ZXIgdG8gdXBwZXIgeWllbGRzIGEgc2luZ2xlLWJ5dGUgY2hhcmFjdGVyLgoqIHRlc3RzL3R1 cmtpc2gtZXllczogQWRkIG5ldyB0ZXN0IGNhc2VzIGZvciB0aGlzIGNoYW5nZXMuCi0tLQogc3Jj L2dyZXAuYyAgICAgICAgIHwgICAyOSArKysrKysrKysrKysrKystLS0tLS0tLS0tLS0tLQogdGVz dHMvdHVya2lzaC1leWVzIHwgICAxNiArKysrKysrKysrKysrKysrCiAyIGZpbGVzIGNoYW5nZWQs IDMxIGluc2VydGlvbnMoKyksIDE0IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL3NyYy9ncmVw LmMgYi9zcmMvZ3JlcC5jCmluZGV4IDE0NTNiMTQuLjFlZmFmM2IgMTAwNjQ0Ci0tLSBhL3NyYy9n cmVwLmMKKysrIGIvc3JjL2dyZXAuYwpAQCAtMjMxMCwyMyArMjMxMCwyNCBAQCBjb250YWluc19l bmNvZGluZ19lcnJvciAoY2hhciBjb25zdCAqcGF0LCBzaXplX3QgcGF0bGVuKQogc3RhdGljIGlu dAogZmdyZXBfaWNhc2VfY2hhcmxlbiAoY2hhciBjb25zdCAqcGF0LCBzaXplX3QgcGF0bGVuLCBt YnN0YXRlX3QgKm1icykKIHsKLSAgaW50IG4gPSBsb2NhbGVpbmZvLnNiY2xlblt0b191Y2hhciAo KnBhdCldOwotICBpZiAobiA8IDApCisgIGlmICghbG9jYWxlaW5mby5tdWx0aWJ5dGUpCisgICAg cmV0dXJuIGxvY2FsZWluZm8uc2JjbGVuW3RvX3VjaGFyICgqcGF0KV07CisKKyAgd2NoYXJfdCB3 YzsKKyAgc2l6ZV90IHduID0gbWJydG93YyAoJndjLCBwYXQsIHBhdGxlbiwgbWJzKTsKKyAgd2No YXJfdCBmb2xkZWRbQ0FTRV9GT0xERURfQlVGU0laRV07CisKKyAgaWYgKE1CX0xFTl9NQVggPCB3 biB8fCBjYXNlX2ZvbGRlZF9jb3VudGVycGFydHMgKHdjLCBmb2xkZWQpKQorICAgIHJldHVybiAt MTsKKworICBmb3IgKGludCBpID0gd247IDAgPCAtLWk7ICkKICAgICB7Ci0gICAgICB3Y2hhcl90 IHdjOwotICAgICAgd2NoYXJfdCBmb2xkZWRbQ0FTRV9GT0xERURfQlVGU0laRV07Ci0gICAgICBz aXplX3Qgd24gPSBtYnJ0b3djICgmd2MsIHBhdCwgcGF0bGVuLCBtYnMpOwotICAgICAgaWYgKE1C X0xFTl9NQVggPCB3biB8fCBjYXNlX2ZvbGRlZF9jb3VudGVycGFydHMgKHdjLCBmb2xkZWQpKQor ICAgICAgdW5zaWduZWQgY2hhciBjID0gcGF0W2ldOworICAgICAgaWYgKHRvdXBwZXIgKGMpICE9 IGMpCiAgICAgICAgIHJldHVybiAtMTsKLSAgICAgIGZvciAoaW50IGkgPSB3bjsgMCA8IC0taTsg KQotICAgICAgICB7Ci0gICAgICAgICAgdW5zaWduZWQgY2hhciBjID0gcGF0W2ldOwotICAgICAg ICAgIGlmICh0b3VwcGVyIChjKSAhPSBjKQotICAgICAgICAgICAgcmV0dXJuIC0xOwotICAgICAg ICB9Ci0gICAgICBuID0gd247CiAgICAgfQotICByZXR1cm4gbjsKKworICByZXR1cm4gd247CiB9 CiAKIC8qIFJldHVybiB0cnVlIGlmIHRoZSAtRiBwYXR0ZXJucyBQQVQsIG9mIHNpemUgUEFUTEVO LCBjb250YWluIG9ubHkKZGlmZiAtLWdpdCBhL3Rlc3RzL3R1cmtpc2gtZXllcyBiL3Rlc3RzL3R1 cmtpc2gtZXllcwppbmRleCBiYTFlYTMzLi5kMWU3OTc4IDEwMDc1NQotLS0gYS90ZXN0cy90dXJr aXNoLWV5ZXMKKysrIGIvdGVzdHMvdHVya2lzaC1leWVzCkBAIC00Myw0ICs0MywyMCBAQCBmb3Ig b3B0IGluIC1FIC1GIC1HOyBkbwogICBjb21wYXJlIG91dCBpbiB8fCBmYWlsPTEKIGRvbmUKIAor cHJpbnRmICIkSVxuIiA+IGluIHx8IGZyYW1ld29ya19mYWlsdXJlXworc2VhcmNoX3N0cj1pCisK K2ZvciBvcHQgaW4gLUUgLUYgLUc7IGRvCisgIExDX0FMTD0kTCBncmVwICRvcHQgLWlvICIkc2Vh cmNoX3N0ciIgaW4gPiBvdXQgfHwgZmFpbD0xCisgIGNvbXBhcmUgb3V0IGluIHx8IGZhaWw9MQor ZG9uZQorCitwcmludGYgIiRpXG4iID4gaW4gfHwgZnJhbWV3b3JrX2ZhaWx1cmVfCitzZWFyY2hf c3RyPUkKKworZm9yIG9wdCBpbiAtRSAtRiAtRzsgZG8KKyAgTENfQUxMPSRMIGdyZXAgJG9wdCAt aW8gIiRzZWFyY2hfc3RyIiBpbiA+IG91dCB8fCBmYWlsPTEKKyAgY29tcGFyZSBvdXQgaW4gfHwg ZmFpbD0xCitkb25lCisKIEV4aXQgJGZhaWwKLS0gCjEuNy4xCgo= --------_5F6BFA2D000000001F21_MULTIPART_MIXED_-- From debbugs-submit-bounces@debbugs.gnu.org Wed Sep 23 22:57:49 2020 Received: (at 43577-done) by debbugs.gnu.org; 24 Sep 2020 02:57:49 +0000 Received: from localhost ([127.0.0.1]:38294 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kLHS8-0003MD-Jq for submit@debbugs.gnu.org; Wed, 23 Sep 2020 22:57:49 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:55954) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kLHS5-0003Ly-47 for 43577-done@debbugs.gnu.org; Wed, 23 Sep 2020 22:57:48 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id DFCF1160089; Wed, 23 Sep 2020 19:57:38 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id LfqVLYOZsGVl; Wed, 23 Sep 2020 19:57:37 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id EA4BC160104; Wed, 23 Sep 2020 19:57:36 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id KHFGgav-Ja-y; Wed, 23 Sep 2020 19:57:36 -0700 (PDT) Received: from [192.168.1.9] (cpe-23-243-218-95.socal.res.rr.com [23.243.218.95]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id AD74A160089; Wed, 23 Sep 2020 19:57:36 -0700 (PDT) Subject: Re: bug#43577: wrong result for grep -io in turkish locale To: Norihiro Tanaka References: <20200923222309.5BA2.27F6AC2D@kcn.ne.jp> <20200924104724.1F2A.27F6AC2D@kcn.ne.jp> From: Paul Eggert Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDVlFRVEFRZ0FQ d0liQXdZTApDUWdIQXdJR0ZRZ0NDUW9MQkJZQ0F3RUNIZ0VDRjRBV0lRUitONUtwMkt6MzFq TzhGWWp0bCtrT1lxcCtOQVVDClh5Vzlsd1VKRks0THN3QUtDUkR0bCtrT1lxcCtOS05WRC85 SE1zSTE2MDZuMFV1VFhId0lUc3lPakFJOVNET1QKK0MzRFV2NnFsTTVCSDJuV0FNVGlJaXlB NXVnbHNKdjkzb2kydk50RmYvUS9tLzFjblpXZ25WbkV4a3lMSTRFTgpTZDF1QnZyMC9sQ1Nk UGxQME1nNkdXU3BYTXUreDB2ZFQwQWFaTk9URTBGblB1b2xkYzNYRDc2QzJxZzhzWC9pCmF4 WFRLSHk5UCtCbEFxL0NzNy9weERRMEV6U24wVVNaMkMwbDV2djRQTXBBL3BpY25TNks2MDlK dkRHYU9SbXcKWmVYSVpxUU5aVitaUXMrVVl0Vm9ndURUcWJ5M0lVWTFJOEJsWEhScHRhajlB TW40VW9oL0NxcFFsVm9qb3lXbApIcWFGbm5KQktlRjBodko5U0F5YWx3dXpBakc3dlFXMDdN WW5jYU9GbTB3b2lLYmc1SkxPOEY0U0JUSWt1TzBECkNmOW5MQWF5NlZzQjRyendkRWZSd2pQ TFlBbjdNUjNmdkhDRXpmcmtsZFRyYWlCTzFUMGllREs4MEk3c0xmNnAKTWVDWUkxOXBVbHgw L05STUdDZGRpRklRZGZ0aEtXWEdSUzVMQXM4andCZjhINkc1UFdpblByRUlhb21JUDIxaQp2 dWhRRDA3YllxOUlpSWRlbGpqVWRIY0dJMGkvQjRNNTZaYWE4RmYzOGluaU9sckRZQ21ZV1I0 ZENXWml1UWVaCjNPZ3FlUXM5YTZqVHZnZERHVm1SVnFZK2p6azhQbGFIZmNvazhST2hGY0hL a2NmaHVCaEwyNWhsUklzaFJET0UKc2tYcUt3bnpyYnFnYTNHWFpYZnNYQW9GYnpOaExkTHY5 QStMSkFZU2tYUDYvNXFkVHBFTFZHb3N5SDg4NFZkYgpCcGtHSTA0b1lWcXVsYmtDRFFSTWdI SmtBUkFBcG9YcnZ4UDNESWZqQ05PdFhVL1Bkd01TaEtkWC9SbFNzNVBmCnVuVjF3YktQOGhl clhIcnZRZEZWcUVDYVRTeG1saHpiazhYMFBrWTlnY1ZhVTJPNDlUM3FzT2QxY0hlRjUyWUYK R0V0MExoc0JlTWpnTlg1dVoxVjc2cjhneWVWbEZwV1diMFNJd0pVQkhyRFhleEY2N3VwZVJi MnZkSEJqWUROZQp5U24rMEI3Z0ZFcXZWbVp1K0xhZHVkRHA2a1FMamF0RnZIUUhVU0dOc2hC bmtrY2FUYmlJOVBzdDBHQ2MyYWl6Cm5CaVBQQTJXUXhBUGxQUmgzT0dUc241VEhBRG1ianFZ NkZFTUxhc1ZYOERTQ2JsTXZMd05lTy84U3h6aUJpZGgKcUxwSkNxZFFSV0hrdTVYeGdJa0dl S096NU9MRHZYSFdKeWFmckVZamprUzZBazZCNXo2c3ZLbGlDbFduakhRYwpqbFB6eW9GRmdL VEVmY3FEeENqNFJZMEQwRGd0RkQwTmZ5ZU9pZHJTQi9TelRlMmh3cnlRRTNycFNpcW8rMGNH CmR6aDR5QUhLWUorVXJYWjRwOTNaaGpHZktEMXhsck5ZRGxXeVc5UEdtYnZxRnVEbWlJQVFm OVdEL3d6RWZJQ2MKK0YrdURESSt1WWtSeFVGcDkyeWttZGhERUZnMXlqWXNVOGlHVTY5YUh5 dmhxMzZ6NHpjdHZicWhSTnpPV0IxYgpWSi9kSU1EdnNFeEdjWFFWRElUN3NETlh2MHdFM2pL U0twcDdOREcxb1hVWEwrMitTRjk5S2p5NzUzQWJRU0FtCkg2MTdmeUJOd2hKV3ZRWWcrbVV2 UHBpR090c2VzOUVYVUkzbFM0djBNRWFQRzQzZmxFczFVUisxcnBGUVdWSG8KMXkxT08rc0FF UUVBQVlrQ1BBUVlBUWdBSmdJYkRCWWhCSDQza3FuWXJQZldNN3dWaU8yWDZRNWlxbjQwQlFK ZgpKYjJ6QlFrVXJndlBBQW9KRU8yWDZRNWlxbjQwY25NUC8xN0NnVWtYVDlhSUpyaVBNOHdi Y2VZcmNsNytiZFlFCmY3OVNsd1NiYkhON1I0Q29JSkZPbE45Uy8zNHR5cEdWWXZwZ21DSkRZ RlRCeHlQTzkyaU1YRGdBNCtjV0h6dDUKVDFhWU85aHNLaGg3dkR0Sys2UHJvWkdjKzA4Z1VU WEhoYjk3aE1NUWhrbkpsbmZqcFNFQzllbTkwNkZVK0k5MwpUMWZUR3VwbkJhM2FXY0s4ak0w SmFCR2J5MmhHMVMzb2xhRExTVHRCSU5OQlltdnVXUjlNS09oaHFEcmxrNWN3CkZESkxoNU5y WHRlRVkwOFdBemNMekczcGtyWFBIa0ZlTVF0ZnFrMGpMZEdHdkdDM05DSWtxWXJkTGhpUnZH cHIKdTM4QzI2UkVuNWY0STB2R0UzVmZJWEhlOFRNQ05tUXV0MU50TXVVbXBESXkxYUx4R3p1 cHRVaG5PSk4vL3IrVgpqRFBvaTNMT3lTTllwaHFlL2RNdWJzZlVyNm9oUDQxbUtGODFGdXdJ NGFtcUp0cnFJTDJ5cWF4M2EwcWxmd0N4ClhmdGllcUpjdWVrWCtlQ1BEQ0tyWU1YUjBGWWd3 cEcySVRaVUd0ckVqRVNsRTZEc2N4NzM0SEtkcjVPUklvY0wKVVVLRU9HZWlVNkRHaEdGZGI1 VHd1MFNuK3UxbVVQRE4wTSsrQ2RNdkNsSUU4a2xvNEc5MUVPSW11MVVwYjh4YwpPUFF3eGgx andxU3JVNVF3b05tU1llZ1FTSExwSVV1ckZ6MWlRVWgxdnBQWHpLaW5rV0VxdjRJcUExY2lM K0x5CnlTdUxrcDdNc0pwVlJNYldKQ05XT09TYmFING9EQko1ZEhNR2MzNXg1bW9zQ2s5MFBY a251RkREc1lIZkRvNXMKbWY5bG82WVh4N045Cj0zTGFJCi0tLS0tRU5EIFBHUCBQVUJMSUMg S0VZIEJMT0NLLS0tLS0K Organization: UCLA Computer Science Department Message-ID: <566c67b3-062e-d648-2dff-15f8c4b08e36@cs.ucla.edu> Date: Wed, 23 Sep 2020 19:57:36 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20200924104724.1F2A.27F6AC2D@kcn.ne.jp> Content-Type: multipart/mixed; boundary="------------9329C13DFCA7463C2ABADCB5" Content-Language: en-US X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 43577-done Cc: 43577-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) This is a multi-part message in MIME format. --------------9329C13DFCA7463C2ABADCB5 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit On 9/23/20 6:47 PM, Norihiro Tanaka wrote: > I attach the fix for the bug. Regex is fixed in Paul, thank you. > Thanks, I had written a similar patch, and your patch helped me find a bug in what I wrote. The patch I wrote uses an auxiliary ok_fold table that lets fgrep_icase_charlen avoid calling mbrtwoc for single-byte characters in the pattern; this may help performance for long patterns. More important, fgrep_icase_charlen does not return -1 for a character like 'a' in an en_US.UTF-8 locale merely because 'a' has a case folded counterpart 'A'; the idea is that we should be OK if the case folded counterparts are single-byte. I had added more-extensive tests than were in your patch, and some of them found a crash in kwsinit that indicated a similar change is needed there. I assume this was because the patch I wrote had a more-generous fgrep_icase_charlen. As this simplifies kwsinit, this patch does that too. While looking into this I found a performance glitch I recently introduced (I double-counted some regular expressions, messing up later heuristics). Plus I checked on this on our old Solaris 10 box and fixed a couple of porting glitches. I installed the attached patches, into the master branch, to help make it easier for you to compare your changes to mine. Patch 0003 is the enhanced version of the patch that you wrote. Thanks again for working on this. --------------9329C13DFCA7463C2ABADCB5 Content-Type: text/x-patch; charset=UTF-8; name="0001-grep-fix-recently-introduced-performance-glitch.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-grep-fix-recently-introduced-performance-glitch.patch" >From 545bd506efcd6cab4f28c07a438868f14b7dc1d2 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 23 Sep 2020 10:52:12 -0700 Subject: [PATCH 1/5] grep: fix recently-introduced performance glitch * src/grep.c (main): Do not double-increment update_patterns. update_patterns increments n_patterns now; do not increment it again, as the incorrect count would hurt performance heuristics later. --- src/grep.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/grep.c b/src/grep.c index 1453b14..11856d8 100644 --- a/src/grep.c +++ b/src/grep.c @@ -2881,7 +2881,6 @@ main (int argc, char **argv) ptrdiff_t patlen = strlen (keys); keys[patlen] = '\n'; keycc = update_patterns (keys, 0, patlen + 1, ""); - n_patterns++; } else usage (EXIT_TROUBLE); -- 2.17.1 --------------9329C13DFCA7463C2ABADCB5 Content-Type: text/x-patch; charset=UTF-8; name="0002-build-update-gnulib-submodule-to-latest.patch" Content-Disposition: attachment; filename="0002-build-update-gnulib-submodule-to-latest.patch" Content-Transfer-Encoding: quoted-printable >From 4af448a142b1f78be4920d2bd2aedd2b748a1289 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 23 Sep 2020 17:07:36 -0700 Subject: [PATCH 2/5] build: update gnulib submodule to latest * NEWS: Mention Bug#43577, which this fixes. --- NEWS | 6 ++++++ gnulib | 2 +- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index 442d4d2..36e423d 100644 --- a/NEWS +++ b/NEWS @@ -32,6 +32,12 @@ GNU grep NEWS -*- o= utline -*- constituent just before what would otherwise be a word match. [Bug#43225 introduced in grep 2.28] =20 + grep -i no longer mishandles ASCII characters that match multibyte + characters. For example, 'LC_ALL=3Dtr_TR.utf8 grep -i i' no longer + dumps core merely because 'i' matches '=C4=B0' (U+0130 LATIN CAPITAL + LETTER I WITH DOT ABOVE) in Turkish when ignoring case. + [Bug#43577 introduced in grep 3.4] + A performance regression with -E and many patterns has been mostly fix= ed. "Mostly" as there is a performance tradeoff between Bug#22357 and Bug#= 40634. [Bug#40634 introduced in grep 2.28] diff --git a/gnulib b/gnulib index 4a3aec7..0c487ff 160000 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit 4a3aec702f994f3a16e4bc6c51f2c0ae3dd76a02 +Subproject commit 0c487ff1286660c4d572c3277e73ac6618ba832d --=20 2.17.1 --------------9329C13DFCA7463C2ABADCB5 Content-Type: text/x-patch; charset=UTF-8; name="0003-grep-fix-more-Turkish-eyes-bugs.patch" Content-Disposition: attachment; filename="0003-grep-fix-more-Turkish-eyes-bugs.patch" Content-Transfer-Encoding: quoted-printable >From 678f829c869059cd9cb0fe38b87880ef0a78d210 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 23 Sep 2020 18:57:57 -0700 Subject: [PATCH 3/5] grep: fix more Turkish-eyes bugs Fix more bugs recently uncovered by Norihiro Tanaka (Bug#43577). * NEWS: Mention new bug report. * src/grep.c (ok_fold): New static var. (setup_ok_fold): New function. (fgrep_icase_charlen): Reject single-byte characters if they match some multibyte characters when ignoring case. This part of the patch is partly derived from , which means it is: Co-authored-by: Norihiro Tanaka (main): Call setup_ok_fold if ok_fold might be needed. * src/searchutils.c (kwsinit): With the grep.c changes, this code can now revert to classic 7th Edition Unix style; aborting would be wrong. * tests/turkish-eyes: Add tests for these bugs. --- NEWS | 2 +- src/grep.c | 116 +++++++++++++++++++++++++++++++-------------- src/searchutils.c | 23 ++------- tests/turkish-eyes | 18 +++++-- 4 files changed, 102 insertions(+), 57 deletions(-) diff --git a/NEWS b/NEWS index 36e423d..ab00ff2 100644 --- a/NEWS +++ b/NEWS @@ -36,7 +36,7 @@ GNU grep NEWS -*- ou= tline -*- characters. For example, 'LC_ALL=3Dtr_TR.utf8 grep -i i' no longer dumps core merely because 'i' matches '=C4=B0' (U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE) in Turkish when ignoring case. - [Bug#43577 introduced in grep 3.4] + [Bug#43577 introduced partly in grep 2.28 and partly in grep 3.4] =20 A performance regression with -E and many patterns has been mostly fix= ed. "Mostly" as there is a performance tradeoff between Bug#22357 and Bug#= 40634. diff --git a/src/grep.c b/src/grep.c index 11856d8..1a52c89 100644 --- a/src/grep.c +++ b/src/grep.c @@ -2300,37 +2300,75 @@ contains_encoding_error (char const *pat, size_t = patlen) return false; } =20 +/* When ignoring case and (-E or -F or -G), then for each single-byte + character I, ok_fold[I] is 1 if every case folded counterpart of I + is also single-byte, and is -1 otherwise. */ +static signed char ok_fold[NCHAR]; +static void +setup_ok_fold (void) +{ + for (int i =3D 0; i < NCHAR; i++) + { + wint_t wi =3D localeinfo.sbctowc[i]; + if (wi =3D=3D WEOF) + continue; + + int ok =3D 1; + wchar_t folded[CASE_FOLDED_BUFSIZE]; + for (int n =3D case_folded_counterparts (wi, folded); 0 <=3D --n; = ) + { + char buf[MB_LEN_MAX]; + mbstate_t s =3D { 0 }; + if (wcrtomb (buf, folded[n], &s) !=3D 1) + { + ok =3D -1; + break; + } + } + ok_fold[i] =3D ok; + } +} + /* Return the number of bytes in the initial character of PAT, of size PATLEN, if Fcompile can handle that character. Return -1 if Fcompile cannot handle it. MBS is the multibyte conversion state. - - Fcompile can handle a character C if C is single-byte, or if C has no - case folded counterparts and toupper translates none of its bytes. *= / + PATLEN must be nonzero. */ =20 static int fgrep_icase_charlen (char const *pat, size_t patlen, mbstate_t *mbs) { - int n =3D localeinfo.sbclen[to_uchar (*pat)]; - if (n < 0) + unsigned char pat0 =3D pat[0]; + + /* If PAT starts with a single-byte character, Fcompile works if + every case folded counterpart is also single-byte. */ + if (localeinfo.sbctowc[pat0] !=3D WEOF) + return ok_fold[pat0]; + + wchar_t wc; + size_t wn =3D mbrtowc (&wc, pat, patlen, mbs); + + /* If PAT starts with an encoding error, Fcompile does not work. */ + if (MB_LEN_MAX < wn) + return -1; + + /* PAT starts with a multibyte character. Fcompile works if the + character has no case folded counterparts and toupper translates + none of its encoding's bytes. */ + wchar_t folded[CASE_FOLDED_BUFSIZE]; + if (case_folded_counterparts (wc, folded)) + return -1; + for (int i =3D wn; 0 < --i; ) { - wchar_t wc; - wchar_t folded[CASE_FOLDED_BUFSIZE]; - size_t wn =3D mbrtowc (&wc, pat, patlen, mbs); - if (MB_LEN_MAX < wn || case_folded_counterparts (wc, folded)) + unsigned char c =3D pat[i]; + if (toupper (c) !=3D c) return -1; - for (int i =3D wn; 0 < --i; ) - { - unsigned char c =3D pat[i]; - if (toupper (c) !=3D c) - return -1; - } - n =3D wn; } - return n; + return wn; } =20 /* Return true if the -F patterns PAT, of size PATLEN, contain only - single-byte characters or characters not subject to case folding, + single-byte characters that case-fold only to single-byte + characters, or multibyte characters not subject to case folding, and so can be processed by Fcompile. */ =20 static bool @@ -2950,26 +2988,34 @@ main (int argc, char **argv) if (matcher < 0) matcher =3D G_MATCHER_INDEX; =20 - /* In a single-byte locale, switch from -F to -G if it is a single - pattern that matches words, where -G is typically faster. In a - multi-byte locale, switch if the patterns have an encoding error - (where -F does not work) or if -i and the patterns will not work - for -iF. */ if (matcher =3D=3D F_MATCHER_INDEX - && (! localeinfo.multibyte - ? n_patterns =3D=3D 1 && match_words - : (contains_encoding_error (keys, keycc) - || (match_icase && !fgrep_icase_available (keys, keycc))))) + || matcher =3D=3D E_MATCHER_INDEX || matcher =3D=3D G_MATCHER_INDE= X) { - fgrep_to_grep_pattern (&pattern_array, &keycc); - keys =3D pattern_array; - matcher =3D G_MATCHER_INDEX; + if (match_icase) + setup_ok_fold (); + + /* In a single-byte locale, switch from -F to -G if it is a single + pattern that matches words, where -G is typically faster. In a + multibyte locale, switch if the patterns have an encoding error + (where -F does not work) or if -i and the patterns will not wor= k + for -iF. */ + if (matcher =3D=3D F_MATCHER_INDEX) + { + if (! localeinfo.multibyte + ? n_patterns =3D=3D 1 && match_words + : (contains_encoding_error (keys, keycc) + || (match_icase && !fgrep_icase_available (keys, keycc)= ))) + { + fgrep_to_grep_pattern (&pattern_array, &keycc); + keys =3D pattern_array; + matcher =3D G_MATCHER_INDEX; + } + } + /* With two or more patterns, if -F works then switch from either = -E + or -G, as -F is probably faster then. */ + else if (1 < n_patterns) + matcher =3D try_fgrep_pattern (matcher, keys, &keycc); } - /* With two or more patterns, if -F works then switch from either -E - or -G, as -F is probably faster then. */ - else if ((matcher =3D=3D G_MATCHER_INDEX || matcher =3D=3D E_MATCHER_I= NDEX) - && 1 < n_patterns) - matcher =3D try_fgrep_pattern (matcher, keys, &keycc); =20 execute =3D matchers[matcher].execute; compiled_pattern =3D diff --git a/src/searchutils.c b/src/searchutils.c index c4bb802..aa11063 100644 --- a/src/searchutils.c +++ b/src/searchutils.c @@ -48,24 +48,11 @@ kwsinit (bool mb_trans) if (match_icase && (MB_CUR_MAX =3D=3D 1 || mb_trans)) { trans =3D xmalloc (NCHAR); - if (MB_CUR_MAX =3D=3D 1) - for (int i =3D 0; i < NCHAR; i++) - trans[i] =3D toupper (i); - else - for (int i =3D 0; i < NCHAR; i++) - { - wint_t wc =3D localeinfo.sbctowc[i]; - wint_t uwc =3D towupper (wc); - if (uwc !=3D wc) - { - mbstate_t mbs =3D { 0 }; - size_t len =3D wcrtomb (&trans[i], uwc, &mbs); - if (len !=3D 1) - abort (); - } - else - trans[i] =3D i; - } + /* If I is a single-byte character that becomes a different + single-byte character when uppercased, set trans[I] + to that character. Otherwise, set trans[I] to I. */ + for (int i =3D 0; i < NCHAR; i++) + trans[i] =3D toupper (i); } =20 return kwsalloc (trans); diff --git a/tests/turkish-eyes b/tests/turkish-eyes index ba1ea33..879b59d 100755 --- a/tests/turkish-eyes +++ b/tests/turkish-eyes @@ -36,11 +36,23 @@ i=3D$(printf '\304\261') # lowercase dotless i =20 data=3D"I:$I $i:i" search_str=3D"$i:i I:$I" -printf "$data\n" > in || framework_failure_ +printf "$data\\n" > in || framework_failure_ =20 for opt in -E -F -G; do - LC_ALL=3D$L grep $opt -i "$search_str" in > out || fail=3D1 - compare out in || fail=3D1 + for pat in i I "$i" "$I" " " : "$search_str"; do + LC_ALL=3D$L grep $opt -i "$pat" in > out || fail=3D1 + compare in out || fail=3D1 + + case $pat in + i|"$I") printf "$I\\ni\\n";; + I|"$i") printf "I\\n$i\\n";; + :) printf ":\\n:\\n";; + ' ') printf " \\n";; + *) cat in;; + esac >exp || framework_failure_ + LC_ALL=3D$L grep -o $opt -i "$pat" in > out || fail=3D1 + compare exp out || fail=3D1 + done done =20 Exit $fail --=20 2.17.1 --------------9329C13DFCA7463C2ABADCB5 Content-Type: text/x-patch; charset=UTF-8; name="0004-grep-pacify-Sun-C-5.15.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0004-grep-pacify-Sun-C-5.15.patch" >From ee6b62007dcbf860f204fbc6921a4d0af74845c3 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 23 Sep 2020 19:04:01 -0700 Subject: [PATCH 4/5] grep: pacify Sun C 5.15 This suppresses a false alarm '"grep.c", line 720: warning: initializer will be sign-extended: -1'. * src/grep.c (uword_max): New static constant. (initialize_unibyte_mask): Use it. --- src/grep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/grep.c b/src/grep.c index 1a52c89..de7616a 100644 --- a/src/grep.c +++ b/src/grep.c @@ -684,6 +684,7 @@ clean_up_stdout (void) /* An unsigned type suitable for fast matching. */ typedef uintmax_t uword; +static uword const uword_max = UINTMAX_MAX; struct localeinfo localeinfo; @@ -717,7 +718,6 @@ initialize_unibyte_mask (void) /* Now MASK will detect any encoding-error byte, although it may cry wolf and it may not be optimal. Build a uword-length mask by repeating MASK. */ - uword uword_max = -1; unibyte_mask = uword_max / UCHAR_MAX * mask; } -- 2.17.1 --------------9329C13DFCA7463C2ABADCB5 Content-Type: text/x-patch; charset=UTF-8; name="0005-grep-don-t-assume-PCRE-in-tests.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0005-grep-don-t-assume-PCRE-in-tests.patch" >From 63e1b8a4356957d24bdb6e2235e79ce55990d7f3 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 23 Sep 2020 19:15:15 -0700 Subject: [PATCH 5/5] grep: don't assume PCRE in tests * tests/filename-lineno.pl: Remove invalid-re-P-paren and invalid-re-P-star-paren as they assume PCRE support, which causes a false alarm "grep: Perl matching not supported in a --disable-perl-regexp build" on platforms without PCRE. --- tests/filename-lineno.pl | 6 ------ 1 file changed, 6 deletions(-) diff --git a/tests/filename-lineno.pl b/tests/filename-lineno.pl index ebd8d1e..be927ef 100755 --- a/tests/filename-lineno.pl +++ b/tests/filename-lineno.pl @@ -97,12 +97,6 @@ my @Tests = ['invalid-re-G-star-paren', '-G "a.*\\)"', {EXIT=>2}, {ERR => "$prog: Unmatched ) or \\)\n"}, ], - ['invalid-re-P-paren', '-P ")"', {EXIT=>2}, - {ERR => "$prog: unmatched parentheses\n"}, - ], - ['invalid-re-P-star-paren', '-P "a.*)"', {EXIT=>2}, - {ERR => "$prog: unmatched parentheses\n"}, - ], ); my $save_temps = $ENV{DEBUG}; -- 2.17.1 --------------9329C13DFCA7463C2ABADCB5-- From unknown Sun Jun 15 08:46:41 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 22 Oct 2020 11:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator