From debbugs-submit-bounces@debbugs.gnu.org Sat Sep 05 12:05:36 2020 Received: (at submit) by debbugs.gnu.org; 5 Sep 2020 16:05:36 +0000 Received: from localhost ([127.0.0.1]:43939 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kEah6-0005SD-A9 for submit@debbugs.gnu.org; Sat, 05 Sep 2020 12:05:36 -0400 Received: from lists.gnu.org ([209.51.188.17]:58536) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kEZPC-0007TR-35 for submit@debbugs.gnu.org; Sat, 05 Sep 2020 10:43:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:36932) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kEZPB-0002p5-SU for bug-grep@gnu.org; Sat, 05 Sep 2020 10:43:01 -0400 Received: from mail-oln040092009037.outbound.protection.outlook.com ([40.92.9.37]:22344 helo=NAM04-BN3-obe.outbound.protection.outlook.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kEZP9-00064C-To for bug-grep@gnu.org; Sat, 05 Sep 2020 10:43:01 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Ipgpcq0k/mI/OGrREvX3F4vyx8BgRGgPWO2cK0Mw9GrBuEtIUpcH9g36sCKo++nXSlaKEwtqnW1/5kGUC6uRruLh63G+KOBrDlUZboGvRNAACLjW5m8Bmz+LfNCxJwiqxqjR9qg+fr2KaiQ3Y97L0XPs6Ns7oqNLRO7ZJRF1nmfp5IvegpiMaMuOCMtN9QLhmPv7Z8X2JSKhrNd3CiqrqsVg5ZoEPfFMkbIcdRpIvyE3wfahXJZH1nAnXmjrOAiHTCWPvhfEt5ySv9fZExVLTvIm/V0WjBsgGenUc6O/PPRHKScrpUJhcFTxdI8wZySuYi30JetUAnwI93hlwZHmUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=enTt4hJE513eEaxMLaYMDBQTixq5GJsHHezfHCdtIfs=; b=eenIru1Yz+PCa7I6PPgfj63gWTtF1vL+Cw1t1i+vuJD00/K2o1C8ukpCnq6OF+IuvFeOiTBpmoQhiI5Pd5ePB7qVjr9/lasFdpQgh7tcqse6dGRe1020pA1Lz9RZ9sf7v9/dKU2hfD/vsvyIjdZwnMtGlkl7Ccs56v8nBQChIWUzsDRTIlrUqt+nSjWuqOBSYuVBAsnkbyvSDBjg2eNu2HPhp/UYLZr89ScwlnJ30I4TOfpXdxkEC6jWYWPV4ApfGd63lRJp5CB2CZzB4nga4ovlx4rmByhSqy+VTCbVSLX0VQEWNBCKI0eALZKbP+W+qDn0xcPgkQrgv/33SvmQAQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=outlook.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=enTt4hJE513eEaxMLaYMDBQTixq5GJsHHezfHCdtIfs=; b=oZ11Gt+FqQgfduXHcPNhs7t6oPYcuZp5tcX6l9oWy5TvwI9JUnyzq1tv2U1lpu5U1nUNMk8PBh39t2pXilVSv5WJCBnkbBaYlQm/ca//8wyWo2MzK3Ldf3YWsyCIjxyC2aW+VCizc3vFtQbjmPMfyZ+eo2rDyBBIIQ609HUof9+mWgSPSnuwgoYF5+e1fY4tGTw/u2b6Z3CLBY6llbAYOExt5fbgRQVAWrkp2nrRJHNXEq9JTHO33oxRsKKowDhNXTa36Zl+v5V4heF5NCMz7Ijobek5rBm8hKn1/kZ2t89rDdYmqTVzlJZxCd67jtLDA9xqZXIU3E28D34oKRJaqQ== Received: from BN8NAM04FT057.eop-NAM04.prod.protection.outlook.com (2a01:111:e400:7e85::4e) by BN8NAM04HT165.eop-NAM04.prod.protection.outlook.com (2a01:111:e400:7e85::118) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3348.16; Sat, 5 Sep 2020 14:27:56 +0000 Received: from BN7PR15MB2196.namprd15.prod.outlook.com (2a01:111:e400:7e85::41) by BN8NAM04FT057.mail.protection.outlook.com (2a01:111:e400:7e85::323) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3348.16 via Frontend Transport; Sat, 5 Sep 2020 14:27:56 +0000 Received: from BN7PR15MB2196.namprd15.prod.outlook.com ([fe80::9c72:793b:bebb:7884]) by BN7PR15MB2196.namprd15.prod.outlook.com ([fe80::9c72:793b:bebb:7884%6]) with mapi id 15.20.3348.018; Sat, 5 Sep 2020 14:27:56 +0000 From: Mayo Fark To: "bug-grep@gnu.org" Subject: Grep treats extended Latin characters like whitespace Thread-Topic: Grep treats extended Latin characters like whitespace Thread-Index: AQHWg49uoB82QO3x50m2ASdVIkhTqg== Date: Sat, 5 Sep 2020 14:27:56 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-incomingtopheadermarker: OriginalChecksum:3F782AD6FFD205F6D813276B8A06EEC92CAB42D7371923749619D143B9B900BF; UpperCasedChecksum:4F2B0010EC3F1F6C579E484A660E2C292E73383AC9E2EEE5E2F1743E424940B4; SizeAsReceived:6645; Count:41 x-tmn: [HtsXxBJ8MJ7qxIesQEstaJMonycVdHzp] x-ms-publictraffictype: Email x-incomingheadercount: 41 x-eopattributedmessage: 0 x-ms-office365-filtering-correlation-id: 041264e8-24fb-430e-dcac-08d851a7e1e4 x-ms-traffictypediagnostic: BN8NAM04HT165: x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: TCSMkolGyrWjuVP4dpsI8ZWQAMYdNDf3TwvVnSq1qTkd2FMsYpnCye0fBWFL3KkbfX1Eaisonm0UdiCeDyxfHV/6lPPEsBumtICSidBnEy3Slf/GLtOM171CfeJtAmDTIAh3BY/taBg4uTpVYXRFF4p77qDeCMyezV0mj0EJhYqKbsdyvuVIggzjfQRUDkh/n2z5BcIk95VdbSWwDCJDKA== x-ms-exchange-antispam-messagedata: Nu7rUzhYLdVJNMhWBL3WsBS1DtZFPdjSOsgyJxB2LN6wgKWlEXOFv5kocI2ZNo2NbRL4xvTqk2eQ5d16s/84tGJmYiR1Fh2X0mYG/+9d0pm8Wx4DGG2GNlydRzfeBWX1Xf6k7mGOrFh2qNnN7YVikA== x-ms-exchange-transport-forked: True Content-Type: multipart/alternative; boundary="_000_BN7PR15MB2196A96CE0ECA17A58C64B0ED82A0BN7PR15MB2196namp_" MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-AuthSource: BN8NAM04FT057.eop-NAM04.prod.protection.outlook.com X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-CrossTenant-Network-Message-Id: 041264e8-24fb-430e-dcac-08d851a7e1e4 X-MS-Exchange-CrossTenant-rms-persistedconsumerorg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-CrossTenant-originalarrivaltime: 05 Sep 2020 14:27:56.2886 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN8NAM04HT165 Received-SPF: pass client-ip=40.92.9.37; envelope-from=mayofark@outlook.com; helo=NAM04-BN3-obe.outbound.protection.outlook.com X-detected-operating-system: by eggs.gnu.org: First seen = 2020/09/05 10:42:59 X-ACL-Warn: Detected OS = Windows NT kernel [generic] [fuzzy] X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 05 Sep 2020 12:05:34 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) --_000_BN7PR15MB2196A96CE0ECA17A58C64B0ED82A0BN7PR15MB2196namp_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable What I did: ``` grep -Riw cone * ''' Expected result: lines with the word "cone" surrounded by whitespace, ignor= ing case. What I got instead: ``` data/po/pt_BR.po:msgstr "Pressione o =EDcone de p=F3dio para iniciar o tuto= rial" ''' Why this is a bug: the word =EDcone is not the same as cone and should not = have been returned in the result set. It appears that grep treats the =ED c= haracter in =EDcone as whitespace, which affects other extended-Latin chara= cters as well. --_000_BN7PR15MB2196A96CE0ECA17A58C64B0ED82A0BN7PR15MB2196namp_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
What I did:
```
grep -Riw cone *
'''

Expected result: lines with the word "cone" surrounded by wh= itespace, ignoring case.

What I got instead:
```
data/po/pt_BR.po:msgstr "Pressione o =EDcone de p=F3dio para inic= iar o tutorial"
'''

Why this is a bug: the word =EDcone is not the same as cone and should= not have been returned in the result set. It appears that grep treats the = =ED character in =EDcone as whitespace, which affects other extended-Latin = characters as well.


--_000_BN7PR15MB2196A96CE0ECA17A58C64B0ED82A0BN7PR15MB2196namp_-- From debbugs-submit-bounces@debbugs.gnu.org Wed Sep 09 15:45:24 2020 Received: (at 43225-done) by debbugs.gnu.org; 9 Sep 2020 19:45:24 +0000 Received: from localhost ([127.0.0.1]:34792 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kG61z-0006Oh-MK for submit@debbugs.gnu.org; Wed, 09 Sep 2020 15:45:23 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:46666) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kG61v-0006OM-NP for 43225-done@debbugs.gnu.org; Wed, 09 Sep 2020 15:45:22 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 43AD0160052; Wed, 9 Sep 2020 12:45:13 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id gwBSRQOEOL7X; Wed, 9 Sep 2020 12:45:12 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 61CA9160104; Wed, 9 Sep 2020 12:45:12 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Jl5TBHoMgu5H; Wed, 9 Sep 2020 12:45:12 -0700 (PDT) Received: from [192.168.1.9] (cpe-75-82-69-226.socal.res.rr.com [75.82.69.226]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 34679160052; Wed, 9 Sep 2020 12:45:12 -0700 (PDT) Subject: Re: bug#43225: Grep treats extended Latin characters like whitespace To: Mayo Fark References: From: Paul Eggert Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDVlFRVEFRZ0FQ d0liQXdZTApDUWdIQXdJR0ZRZ0NDUW9MQkJZQ0F3RUNIZ0VDRjRBV0lRUitONUtwMkt6MzFq TzhGWWp0bCtrT1lxcCtOQVVDClh5Vzlsd1VKRks0THN3QUtDUkR0bCtrT1lxcCtOS05WRC85 SE1zSTE2MDZuMFV1VFhId0lUc3lPakFJOVNET1QKK0MzRFV2NnFsTTVCSDJuV0FNVGlJaXlB NXVnbHNKdjkzb2kydk50RmYvUS9tLzFjblpXZ25WbkV4a3lMSTRFTgpTZDF1QnZyMC9sQ1Nk UGxQME1nNkdXU3BYTXUreDB2ZFQwQWFaTk9URTBGblB1b2xkYzNYRDc2QzJxZzhzWC9pCmF4 WFRLSHk5UCtCbEFxL0NzNy9weERRMEV6U24wVVNaMkMwbDV2djRQTXBBL3BpY25TNks2MDlK dkRHYU9SbXcKWmVYSVpxUU5aVitaUXMrVVl0Vm9ndURUcWJ5M0lVWTFJOEJsWEhScHRhajlB TW40VW9oL0NxcFFsVm9qb3lXbApIcWFGbm5KQktlRjBodko5U0F5YWx3dXpBakc3dlFXMDdN WW5jYU9GbTB3b2lLYmc1SkxPOEY0U0JUSWt1TzBECkNmOW5MQWF5NlZzQjRyendkRWZSd2pQ TFlBbjdNUjNmdkhDRXpmcmtsZFRyYWlCTzFUMGllREs4MEk3c0xmNnAKTWVDWUkxOXBVbHgw L05STUdDZGRpRklRZGZ0aEtXWEdSUzVMQXM4andCZjhINkc1UFdpblByRUlhb21JUDIxaQp2 dWhRRDA3YllxOUlpSWRlbGpqVWRIY0dJMGkvQjRNNTZaYWE4RmYzOGluaU9sckRZQ21ZV1I0 ZENXWml1UWVaCjNPZ3FlUXM5YTZqVHZnZERHVm1SVnFZK2p6azhQbGFIZmNvazhST2hGY0hL a2NmaHVCaEwyNWhsUklzaFJET0UKc2tYcUt3bnpyYnFnYTNHWFpYZnNYQW9GYnpOaExkTHY5 QStMSkFZU2tYUDYvNXFkVHBFTFZHb3N5SDg4NFZkYgpCcGtHSTA0b1lWcXVsYmtDRFFSTWdI SmtBUkFBcG9YcnZ4UDNESWZqQ05PdFhVL1Bkd01TaEtkWC9SbFNzNVBmCnVuVjF3YktQOGhl clhIcnZRZEZWcUVDYVRTeG1saHpiazhYMFBrWTlnY1ZhVTJPNDlUM3FzT2QxY0hlRjUyWUYK R0V0MExoc0JlTWpnTlg1dVoxVjc2cjhneWVWbEZwV1diMFNJd0pVQkhyRFhleEY2N3VwZVJi MnZkSEJqWUROZQp5U24rMEI3Z0ZFcXZWbVp1K0xhZHVkRHA2a1FMamF0RnZIUUhVU0dOc2hC bmtrY2FUYmlJOVBzdDBHQ2MyYWl6Cm5CaVBQQTJXUXhBUGxQUmgzT0dUc241VEhBRG1ianFZ NkZFTUxhc1ZYOERTQ2JsTXZMd05lTy84U3h6aUJpZGgKcUxwSkNxZFFSV0hrdTVYeGdJa0dl S096NU9MRHZYSFdKeWFmckVZamprUzZBazZCNXo2c3ZLbGlDbFduakhRYwpqbFB6eW9GRmdL VEVmY3FEeENqNFJZMEQwRGd0RkQwTmZ5ZU9pZHJTQi9TelRlMmh3cnlRRTNycFNpcW8rMGNH CmR6aDR5QUhLWUorVXJYWjRwOTNaaGpHZktEMXhsck5ZRGxXeVc5UEdtYnZxRnVEbWlJQVFm OVdEL3d6RWZJQ2MKK0YrdURESSt1WWtSeFVGcDkyeWttZGhERUZnMXlqWXNVOGlHVTY5YUh5 dmhxMzZ6NHpjdHZicWhSTnpPV0IxYgpWSi9kSU1EdnNFeEdjWFFWRElUN3NETlh2MHdFM2pL U0twcDdOREcxb1hVWEwrMitTRjk5S2p5NzUzQWJRU0FtCkg2MTdmeUJOd2hKV3ZRWWcrbVV2 UHBpR090c2VzOUVYVUkzbFM0djBNRWFQRzQzZmxFczFVUisxcnBGUVdWSG8KMXkxT08rc0FF UUVBQVlrQ1BBUVlBUWdBSmdJYkRCWWhCSDQza3FuWXJQZldNN3dWaU8yWDZRNWlxbjQwQlFK ZgpKYjJ6QlFrVXJndlBBQW9KRU8yWDZRNWlxbjQwY25NUC8xN0NnVWtYVDlhSUpyaVBNOHdi Y2VZcmNsNytiZFlFCmY3OVNsd1NiYkhON1I0Q29JSkZPbE45Uy8zNHR5cEdWWXZwZ21DSkRZ RlRCeHlQTzkyaU1YRGdBNCtjV0h6dDUKVDFhWU85aHNLaGg3dkR0Sys2UHJvWkdjKzA4Z1VU WEhoYjk3aE1NUWhrbkpsbmZqcFNFQzllbTkwNkZVK0k5MwpUMWZUR3VwbkJhM2FXY0s4ak0w SmFCR2J5MmhHMVMzb2xhRExTVHRCSU5OQlltdnVXUjlNS09oaHFEcmxrNWN3CkZESkxoNU5y WHRlRVkwOFdBemNMekczcGtyWFBIa0ZlTVF0ZnFrMGpMZEdHdkdDM05DSWtxWXJkTGhpUnZH cHIKdTM4QzI2UkVuNWY0STB2R0UzVmZJWEhlOFRNQ05tUXV0MU50TXVVbXBESXkxYUx4R3p1 cHRVaG5PSk4vL3IrVgpqRFBvaTNMT3lTTllwaHFlL2RNdWJzZlVyNm9oUDQxbUtGODFGdXdJ NGFtcUp0cnFJTDJ5cWF4M2EwcWxmd0N4ClhmdGllcUpjdWVrWCtlQ1BEQ0tyWU1YUjBGWWd3 cEcySVRaVUd0ckVqRVNsRTZEc2N4NzM0SEtkcjVPUklvY0wKVVVLRU9HZWlVNkRHaEdGZGI1 VHd1MFNuK3UxbVVQRE4wTSsrQ2RNdkNsSUU4a2xvNEc5MUVPSW11MVVwYjh4YwpPUFF3eGgx andxU3JVNVF3b05tU1llZ1FTSExwSVV1ckZ6MWlRVWgxdnBQWHpLaW5rV0VxdjRJcUExY2lM K0x5CnlTdUxrcDdNc0pwVlJNYldKQ05XT09TYmFING9EQko1ZEhNR2MzNXg1bW9zQ2s5MFBY a251RkREc1lIZkRvNXMKbWY5bG82WVh4N045Cj0zTGFJCi0tLS0tRU5EIFBHUCBQVUJMSUMg S0VZIEJMT0NLLS0tLS0K Organization: UCLA Computer Science Department Message-ID: <87d378cf-2c5b-c0aa-a9c4-1557ecb7c40e@cs.ucla.edu> Date: Wed, 9 Sep 2020 12:45:11 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------86EEA2D1E70452EC4EEDF107" Content-Language: en-US X-Spam-Score: -5.9 (-----) X-Debbugs-Envelope-To: 43225-done Cc: 43225-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -6.9 (------) This is a multi-part message in MIME format. --------------86EEA2D1E70452EC4EEDF107 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable On 9/5/20 7:27 AM, Mayo Fark wrote: > grep -Riw cone * > ... > data/po/pt_BR.po:msgstr "Pressione o =C3=ADcone de p=C3=B3dio para inic= iar o tutorial" Thanks for the bug report. This bug is due to an overenthusiastic optimiz= ation=20 that I installed in late 2016. I installed the attached patch to fix the = bug. --------------86EEA2D1E70452EC4EEDF107 Content-Type: text/x-patch; charset=UTF-8; name="0001-grep-fix-w-bug-in-UTF-8-locales.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="0001-grep-fix-w-bug-in-UTF-8-locales.patch" >From 8952431b790b409f4ef2ffdcb564475160548c50 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Wed, 9 Sep 2020 12:43:11 -0700 Subject: [PATCH] grep: fix -w bug in UTF-8 locales Problem reported by Mayo Fark (Bug#43225). * src/searchutils.c (wordchar_prev): In a UTF-8 locale, do not assume that an encoding-error byte cannot be part of a word constituent, as this assumption is incorrect for the last byte of a multibyte word constituent. * tests/word-delim-multibyte: Add a test for the bug. --- NEWS | 4 ++++ src/searchutils.c | 2 +- tests/word-delim-multibyte | 8 ++++++++ 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index acd95dd..28c7835 100644 --- a/NEWS +++ b/NEWS @@ -11,6 +11,10 @@ GNU grep NEWS -*- outline -*- ** Bug fixes + In UTF-8 locales, grep -w no longer ignores a multibyte word + constituent just before what would otherwise be a word match. + [Bug#43225 introduced in grep 2.28] + A performance regression with many duplicate patterns has been fixed. [Bug#43040 introduced in grep 3.4] diff --git a/src/searchutils.c b/src/searchutils.c index 84c319c..c4bb802 100644 --- a/src/searchutils.c +++ b/src/searchutils.c @@ -195,7 +195,7 @@ wordchar_prev (char const *buf, char const *cur, char const *end) return 0; unsigned char b = *--cur; if (! localeinfo.multibyte - || (localeinfo.using_utf8 && localeinfo.sbclen[b] != -2)) + || (localeinfo.using_utf8 && localeinfo.sbclen[b] == 1)) return sbwordchar[b]; char const *p = buf; cur -= mb_goback (&p, NULL, cur, end); diff --git a/tests/word-delim-multibyte b/tests/word-delim-multibyte index 7d2c433..31190ad 100755 --- a/tests/word-delim-multibyte +++ b/tests/word-delim-multibyte @@ -34,4 +34,12 @@ for locale in C en_US.UTF-8; do compare /dev/null err || fail=1 done +# Bug#43255 +printf 'a \303\255cone b\n' >in +for flag in '' -i; do + returns_ 1 env LC_ALL=en_US.UTF-8 grep -w $flag cone in >out 2>err || fail=1 + compare /dev/null out || fail=1 + compare /dev/null err || fail=1 +done + Exit $fail -- 2.17.1 --------------86EEA2D1E70452EC4EEDF107-- From unknown Sat Aug 16 20:03:01 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Thu, 08 Oct 2020 11:24:05 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator