From debbugs-submit-bounces@debbugs.gnu.org Fri May 01 13:06:39 2020 Received: (at submit) by debbugs.gnu.org; 1 May 2020 17:06:39 +0000 Received: from localhost ([127.0.0.1]:50562 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jUZ7P-0007B0-Pt for submit@debbugs.gnu.org; Fri, 01 May 2020 13:06:39 -0400 Received: from lists.gnu.org ([209.51.188.17]:49560) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jUZ6U-000791-0t for submit@debbugs.gnu.org; Fri, 01 May 2020 13:05:34 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:59052) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jUZ6S-0000EL-Py for bug-grep@gnu.org; Fri, 01 May 2020 13:05:33 -0400 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.90_1) (envelope-from ) id 1jUZ6S-0003ZH-7V for bug-grep@gnu.org; Fri, 01 May 2020 13:05:32 -0400 Received: from mout.web.de ([212.227.15.14]:34239) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jUZ6Q-0003Ne-Qy for bug-grep@gnu.org; Fri, 01 May 2020 13:05:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=web.de; s=dbaedf251592; t=1588352728; bh=lCuT4n58yjSYbxxhIK/s9WQNBqsEkpXz8fhqV3glLHM=; h=X-UI-Sender-Class:From:To:Subject:Date; b=eT0ZUHJHaPbC0O6PD0uOR/sjj4czFPOOBLYeG1KDDR6LP0IfhQ1IqiepfJV4j44/Z YzVr13jLtkQfYKsbM4qBTgQn3V+PcKVpfQN3mYB/qzlDe6IxmPVYFaVELqM1GHJdws uN9RzNXcPHwP0fP8JvNYrgs8IChzCT9YBZcGGWFo= X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9 Received: from [193.174.231.68] ([193.174.231.68]) by web-mail.web.de (3c-app-webde-bs62.server.lan [172.19.170.157]) (via HTTP); Fri, 1 May 2020 19:05:28 +0200 MIME-Version: 1.0 Message-ID: From: Radisson97@web.de To: bug-grep@gnu.org Subject: Documentation:enhancement - search for hexvalue Content-Type: text/plain; charset=UTF-8 Date: Fri, 1 May 2020 19:05:28 +0200 Importance: normal Sensitivity: Normal X-Priority: 3 X-Provags-ID: V03:K1:HYtlMI/LdRdM+CqGuxMjT/Vh2cFGFZcr8VzZTrufsPYszPRIZOQQj9P91nJgxUlqwPIxk L83+/lsH0b9X/5NpxQ+xGGYPBjgfGub8t0gLFWWVL4XgU+cJIp5WNuW6HgnPU+8CsJLY205cbC7S EB9nYoNRuRrZSz+jdONEoejCD5MpbaAlZZwFFm8+lokPR8cUrR5DEbhw2+y2DLybFWtdDXPJ8yul 6dvEZiSJykF0+E+0nmxQxqWnBh160t+ASGIsaCGHXLvmi2hYAHqR1SponvRn7j00WuRV4W2eLah5 s8= X-UI-Out-Filterresults: notjunk:1;V03:K0:q8/vH3Wqfgg=:AJq5E0gddqPqIJMnDRNUlt ogGEThlYhSmq02Zf9lnx5q6BcP6Z3PEZ/CgzkUgnfZhh7ATK48h8wzmBDzCCCIRDLHW81HCbi Tgdxzyq0PZRFASmqsW3GBdUO+MXLDWQV3AgKRGSEf4ocuxkahpWW6/AuR3Lq2sUKAmGZrOu2M y4ob46SUz+eLr8/bXiTQLsEHUVwT4wRO3fYX6ueOrb5wAh8wWWmVsHCLzw6uYL+ibX8B5Zrfe Tk446XDAH6zO2gIzup8pNpiQwSIoRYgEO7ZolPqgR7Vg+GAvTmSFtNGipkqVOnvczg8KtCK54 vzuMyhqS5UWhwmwf06xpgh5j8hsMu59H2pYMYx1Kpn77+f0neVlkw5YVOah9XHbpMNXrOW2Vr dSFfSKp6rtdZdTHVA5517bXxBJvFriel7di2yUHvZRBOqQrSMp0l1EvZChSZjBGHDDpgZvXqm 4sFCigLFdV+nloYrVtRUcgYoV1KJbGa5ZpvH8nWfStkrM6asgcPgD432wNWz++JRlCWYUOk/N gLuADp7/hjSu3LxtOvyxrTUOoFCH9IXgVu3UzyeJ8IldSBMGMCNplemaT8wESAe4TUZTdMaEp ekR8Yq+o5wJQlszpG459d148FjlNcodBL3sSmOsq5yYnNdg4kCul/cEqlUdm7p5qOEi14sXRz rpCVR4zZa1WA865TMhbHCzXD+L+whufN91qyPUKW3efUjnofGTUCNNjcNDbqG2d3UYQc= Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=212.227.15.14; envelope-from=Radisson97@web.de; helo=mout.web.de X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/01 13:05:28 X-ACL-Warn: Detected OS = Linux 2.2.x-3.x [generic] X-Received-From: 212.227.15.14 X-Spam-Score: 1.9 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Hi, i had the problem of searching for a non-printable character in a long list of strings. I found nothing the documentation and but several discussion how to do that where either complicated or did [...] Content analysis details: (1.9 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (radisson97[at]web.de) 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (radisson97[at]web.de) 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record 0.9 SPF_FAIL SPF: sender does not match SPF record (fail) [SPF failed: Please see http://www.openspf.org/Why?s=mfrom; id=radisson97%40web.de; ip=209.51.188.17; r=debbugs.gnu.org] -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [209.51.188.17 listed in list.dnswl.org] -0.0 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [209.51.188.17 listed in wl.mailspike.net] 2.0 SPOOFED_FREEMAIL No description available. 1.0 XPRIO Has X-Priority header X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Fri, 01 May 2020 13:06:31 -0400 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.0 (---) Hi, i had the problem of searching for a non-printable character in a long list of strings. I found nothing the documentation and but several discuss= ion how to do that where either complicated or did not fit for my case, maybe = i was unlucky, ntl i found a simple solution that should be mentioned in the documentation. problem: grep for a character where only the hexcode in known. solution: use $'\xNN' then shell expands this to the required code example: printf "A\nB\nC\n" | grep $'\x41' note: that uses only printable characters, it works also with anything els= e except \0 (i guess). i found that solution nice, it did no require any flags etc, for my proble= m it worked like a charm. (i am not member of the list please reply directly to this address) . hope that helps, radisson From debbugs-submit-bounces@debbugs.gnu.org Sun May 03 15:25:25 2020 Received: (at 41004) by debbugs.gnu.org; 3 May 2020 19:25:25 +0000 Received: from localhost ([127.0.0.1]:58056 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jVKEv-0005hE-I7 for submit@debbugs.gnu.org; Sun, 03 May 2020 15:25:25 -0400 Received: from mail-wr1-f49.google.com ([209.85.221.49]:37639) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jVKEt-0005gy-6C for 41004@debbugs.gnu.org; Sun, 03 May 2020 15:25:23 -0400 Received: by mail-wr1-f49.google.com with SMTP id k1so18365981wrx.4 for <41004@debbugs.gnu.org>; Sun, 03 May 2020 12:25:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lwyACt3XGm8gHnWhRy9nDM6rTxRvIqKp5DY3ZynZ4eo=; b=JeKCESAePwx6hWdSixrMq+BLNWb10GkvUOsOncBLU0nRiLa/RUBXr2hDdeR8VPMpKK MrMxArNyPe3pgtIX/GQh5JQbJLSU9IhE5NXik96EK25ugd59rfMnK+IdCTxTgjOBVLoX D8dhLLicWw9ZqODk1FipzWvyuPXhhCYWMUMgX30WzbqJQaq2je6txZxBF4WC9ndK386o h7UK+sy101aNVY3I++1z2qGf0iGgjCEwQhh7g2hjsL6hdLQ2mSQhSAqUpT7/inYJvhih AVuZawQSJGCJq+NDW6n5pRBRJiEDKbHeMoFBd7weGy85tC1l9aDyAw6SFfpJNOR8oG2z eDKg== X-Gm-Message-State: AGi0PuZWHsFNyRoegw2zVxh4tytut3ud03Fjs2JjudzNn8p4vB98nLLQ hkr92EY32gJMB/fsi7572qT1UgY7hKkVkJ963v8= X-Google-Smtp-Source: APiQypICgTLk+wLQC9onpmVgytsmzYlwx4ktR87KdWJdRsB7oGAeDThEUE5l02P9Mjra0N1qBhVquQPZEis2V+mv2xE= X-Received: by 2002:a5d:6582:: with SMTP id q2mr15353219wru.343.1588533917361; Sun, 03 May 2020 12:25:17 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jim Meyering Date: Sun, 3 May 2020 12:25:04 -0700 Message-ID: Subject: Re: bug#41004: Documentation:enhancement - search for hexvalue To: Radisson97@web.de Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 41004 Cc: 41004@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Fri, May 1, 2020 at 10:07 AM wrote: > Hi, > i had the problem of searching for a non-printable character in a long > list of strings. I found nothing the documentation and but several discussion > how to do that where either complicated or did not fit for my case, maybe i > was unlucky, ntl i found a simple solution that should be mentioned in the > documentation. > > problem: grep for a character where only the hexcode in known. > > solution: use $'\xNN' > then shell expands this to the required code > > example: printf "A\nB\nC\n" | grep $'\x41' > > note: that uses only printable characters, it works also with anything else > except \0 (i guess). > > i found that solution nice, it did no require any flags etc, for my problem it > worked like a charm. > (i am not member of the list please reply directly to this address) . Thank you for the suggestion. Another approach is to use grep's -P option: $ printf '%s\n' A B C| grep -P '\x41' A If you'd like to add an example to the documentation, please send a patch, but I'm not sure how much of PCRE syntax we want to document in grep's own manual. From debbugs-submit-bounces@debbugs.gnu.org Sun May 10 12:59:51 2020 Received: (at 41004) by debbugs.gnu.org; 10 May 2020 16:59:51 +0000 Received: from localhost ([127.0.0.1]:50339 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jXpIs-0008KY-8h for submit@debbugs.gnu.org; Sun, 10 May 2020 12:59:51 -0400 Received: from relay6-d.mail.gandi.net ([217.70.183.198]:36451) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jXp6K-0007v6-9b for 41004@debbugs.gnu.org; Sun, 10 May 2020 12:46:53 -0400 X-Originating-IP: 94.3.152.49 Received: from chazelas.org (unknown [94.3.152.49]) (Authenticated sender: stephane@chazelas.org) by relay6-d.mail.gandi.net (Postfix) with ESMTPSA id 2B2D6C0004; Sun, 10 May 2020 16:46:44 +0000 (UTC) Date: Sun, 10 May 2020 17:46:44 +0100 From: Stephane Chazelas To: Radisson97@web.de Subject: Re: bug#41004: Documentation:enhancement - search for hexvalue Message-ID: <20200510164644.n2cpqpwdxdarowv7@chazelas.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 41004 X-Mailman-Approved-At: Sun, 10 May 2020 12:59:49 -0400 Cc: 41004@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) 2020-05-01 19:05:28 +0200, Radisson97@web.de: [...] > problem: grep for a character where only the hexcode in known. > > solution: use $'\xNN' > then shell expands this to the required code > > example: printf "A\nB\nC\n" | grep $'\x41' [...] The $'\x41' ksh93 quoting operator expands to *byte* values. To get a character based on the Unicode codepoint value, you'd need the $'\u41' zsh operator (or $'\U10000' for code points above 0xffff). But in any case, that is done by the shell, that has nothing to do with grep and the syntax of those shell operators varies between shells. In the fish shell you'd use: grep \u41 or grep \x41 instead. Also, since it's done by the shell, things like: grep $'\u2e' where U+002E is "FULL STOP", would not only match on "." characters but on any character. All grep sees is a "." character. That would be different from grep -P '\x2e' which matches "." (U+002E) only. Note that: grep -P '\xE9' matches on the byte 0xE9 in singlebyte locales (regardless of what character that byte represents in the locale's charset) and on character U+00E9 in UTF-8 locales (so the 0xc3 0xa9 sequence of bytes, not byte 0xe9). -- Stephane From debbugs-submit-bounces@debbugs.gnu.org Tue May 12 23:19:25 2020 Received: (at 41004) by debbugs.gnu.org; 13 May 2020 03:19:26 +0000 Received: from localhost ([127.0.0.1]:57075 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jYhvZ-00037M-Gf for submit@debbugs.gnu.org; Tue, 12 May 2020 23:19:25 -0400 Received: from mail-wm1-f42.google.com ([209.85.128.42]:37476) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jYhvX-000379-Bu for 41004@debbugs.gnu.org; Tue, 12 May 2020 23:19:24 -0400 Received: by mail-wm1-f42.google.com with SMTP id z72so17011633wmc.2 for <41004@debbugs.gnu.org>; Tue, 12 May 2020 20:19:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4aSIjFZ5ip8GD28Ap6zDRakD8HNCtp3tioc08ebIf8Q=; b=c4zKjGCQWoRO7lQ+Zk9dgKCj7iHOhi41gOkPzq7L1f4ZKLkkHc+y97luBrQGm5ZdDI c9pEPwfVbxudPXj08/EVcRN7t7Yy4qy8TD+JVCrRZhUSLnfpPlKfUcmNT6GnbSlfgx5P TM3J6hO6WvgC1117WUrLlQR8fgKfhqgKiZ+/2CMsS8/UdX4vjzfTGqulJqy//JhYjCrp DIbhw/t67ugB33WYrXLuKevuXqKj6tiI1JnQHM8LFF2u2613PQF2FmqYVp8md7KU+TAk s1muURi43Jeayaf9B7mhXlI8bFlzawn21Z4UAxEOLWW1Ah2rEPgRV+2Ht+FIQQfu/Zbu QW2g== X-Gm-Message-State: AGi0PuYvI+alCU+Z9hlvkvyhW35gNU7JlibaRCSvuv6O9MCeMd71YDT1 FZJ64IJw/xxORbqnlkIUU1DJdd5rNjEZN8+lwH8= X-Google-Smtp-Source: APiQypIcOgXbs4u/zJqCCj5us3EzayzyZXrhYsFGRRhtNZCQd0pq37EnbKoJhKuqaCXi7xCR/LgFgcmtrrUnfQjQbMo= X-Received: by 2002:a1c:2457:: with SMTP id k84mr37920394wmk.96.1589339957455; Tue, 12 May 2020 20:19:17 -0700 (PDT) MIME-Version: 1.0 References: <20200510164644.n2cpqpwdxdarowv7@chazelas.org> In-Reply-To: <20200510164644.n2cpqpwdxdarowv7@chazelas.org> From: Jim Meyering Date: Tue, 12 May 2020 20:19:05 -0700 Message-ID: Subject: Re: bug#41004: Documentation:enhancement - search for hexvalue To: Stephane Chazelas Content-Type: text/plain; charset="UTF-8" X-Spam-Score: 0.5 (/) X-Debbugs-Envelope-To: 41004 Cc: 41004@debbugs.gnu.org, Radisson97@web.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.5 (/) On Sun, May 10, 2020 at 10:00 AM Stephane Chazelas wrote: > > 2020-05-01 19:05:28 +0200, Radisson97@web.de: > [...] > > problem: grep for a character where only the hexcode in known. > > > > solution: use $'\xNN' > > then shell expands this to the required code > > > > example: printf "A\nB\nC\n" | grep $'\x41' > [...] > > The $'\x41' ksh93 quoting operator expands to *byte* values. > > To get a character based on the Unicode codepoint value, you'd > need the $'\u41' zsh operator (or $'\U10000' for code points > above 0xffff). > > But in any case, that is done by the shell, that has nothing to > do with grep and the syntax of those shell operators varies > between shells. > > In the fish shell you'd use: > > grep \u41 > > or > > grep \x41 > > instead. > > Also, since it's done by the shell, things like: > > grep $'\u2e' > > where U+002E is "FULL STOP", would not only match on "." > characters but on any character. All grep sees is a "." > character. That would be different from grep -P '\x2e' which > matches "." (U+002E) only. > > Note that: > > grep -P '\xE9' > > matches on the byte 0xE9 in singlebyte locales (regardless of > what character that byte represents in the locale's charset) and > on character U+00E9 in UTF-8 locales (so the 0xc3 0xa9 sequence > of bytes, not byte 0xe9). Thank you for the thorough reply, Stephane! Bearing that in mind, Radisson, please consider submitting a revised patch. I suggest to recommend something like this: $ printf '%s\n' A B C| LC_ALL=C grep -P '\x41' A so that the example is independent of both the current locale and the shell. From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 21 15:34:37 2020 Received: (at control) by debbugs.gnu.org; 21 Sep 2020 19:34:37 +0000 Received: from localhost ([127.0.0.1]:56356 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kKRa9-0002Cx-4B for submit@debbugs.gnu.org; Mon, 21 Sep 2020 15:34:37 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:52924) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kKRa7-0002Ck-40 for control@debbugs.gnu.org; Mon, 21 Sep 2020 15:34:35 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C1AA11600F1 for ; Mon, 21 Sep 2020 12:34:29 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Nyvdp4jebFGY for ; Mon, 21 Sep 2020 12:34:29 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 0F84D1600FC for ; Mon, 21 Sep 2020 12:34:29 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id J_geBstRN41M for ; Mon, 21 Sep 2020 12:34:28 -0700 (PDT) Received: from [192.168.1.9] (cpe-23-243-218-95.socal.res.rr.com [23.243.218.95]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id E2ABC1600F1 for ; Mon, 21 Sep 2020 12:34:28 -0700 (PDT) To: control@debbugs.gnu.org From: Paul Eggert Subject: 41004 is wishlist Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDVlFRVEFRZ0FQ d0liQXdZTApDUWdIQXdJR0ZRZ0NDUW9MQkJZQ0F3RUNIZ0VDRjRBV0lRUitONUtwMkt6MzFq TzhGWWp0bCtrT1lxcCtOQVVDClh5Vzlsd1VKRks0THN3QUtDUkR0bCtrT1lxcCtOS05WRC85 SE1zSTE2MDZuMFV1VFhId0lUc3lPakFJOVNET1QKK0MzRFV2NnFsTTVCSDJuV0FNVGlJaXlB NXVnbHNKdjkzb2kydk50RmYvUS9tLzFjblpXZ25WbkV4a3lMSTRFTgpTZDF1QnZyMC9sQ1Nk UGxQME1nNkdXU3BYTXUreDB2ZFQwQWFaTk9URTBGblB1b2xkYzNYRDc2QzJxZzhzWC9pCmF4 WFRLSHk5UCtCbEFxL0NzNy9weERRMEV6U24wVVNaMkMwbDV2djRQTXBBL3BpY25TNks2MDlK dkRHYU9SbXcKWmVYSVpxUU5aVitaUXMrVVl0Vm9ndURUcWJ5M0lVWTFJOEJsWEhScHRhajlB TW40VW9oL0NxcFFsVm9qb3lXbApIcWFGbm5KQktlRjBodko5U0F5YWx3dXpBakc3dlFXMDdN WW5jYU9GbTB3b2lLYmc1SkxPOEY0U0JUSWt1TzBECkNmOW5MQWF5NlZzQjRyendkRWZSd2pQ TFlBbjdNUjNmdkhDRXpmcmtsZFRyYWlCTzFUMGllREs4MEk3c0xmNnAKTWVDWUkxOXBVbHgw L05STUdDZGRpRklRZGZ0aEtXWEdSUzVMQXM4andCZjhINkc1UFdpblByRUlhb21JUDIxaQp2 dWhRRDA3YllxOUlpSWRlbGpqVWRIY0dJMGkvQjRNNTZaYWE4RmYzOGluaU9sckRZQ21ZV1I0 ZENXWml1UWVaCjNPZ3FlUXM5YTZqVHZnZERHVm1SVnFZK2p6azhQbGFIZmNvazhST2hGY0hL a2NmaHVCaEwyNWhsUklzaFJET0UKc2tYcUt3bnpyYnFnYTNHWFpYZnNYQW9GYnpOaExkTHY5 QStMSkFZU2tYUDYvNXFkVHBFTFZHb3N5SDg4NFZkYgpCcGtHSTA0b1lWcXVsYmtDRFFSTWdI SmtBUkFBcG9YcnZ4UDNESWZqQ05PdFhVL1Bkd01TaEtkWC9SbFNzNVBmCnVuVjF3YktQOGhl clhIcnZRZEZWcUVDYVRTeG1saHpiazhYMFBrWTlnY1ZhVTJPNDlUM3FzT2QxY0hlRjUyWUYK R0V0MExoc0JlTWpnTlg1dVoxVjc2cjhneWVWbEZwV1diMFNJd0pVQkhyRFhleEY2N3VwZVJi MnZkSEJqWUROZQp5U24rMEI3Z0ZFcXZWbVp1K0xhZHVkRHA2a1FMamF0RnZIUUhVU0dOc2hC bmtrY2FUYmlJOVBzdDBHQ2MyYWl6Cm5CaVBQQTJXUXhBUGxQUmgzT0dUc241VEhBRG1ianFZ NkZFTUxhc1ZYOERTQ2JsTXZMd05lTy84U3h6aUJpZGgKcUxwSkNxZFFSV0hrdTVYeGdJa0dl S096NU9MRHZYSFdKeWFmckVZamprUzZBazZCNXo2c3ZLbGlDbFduakhRYwpqbFB6eW9GRmdL VEVmY3FEeENqNFJZMEQwRGd0RkQwTmZ5ZU9pZHJTQi9TelRlMmh3cnlRRTNycFNpcW8rMGNH CmR6aDR5QUhLWUorVXJYWjRwOTNaaGpHZktEMXhsck5ZRGxXeVc5UEdtYnZxRnVEbWlJQVFm OVdEL3d6RWZJQ2MKK0YrdURESSt1WWtSeFVGcDkyeWttZGhERUZnMXlqWXNVOGlHVTY5YUh5 dmhxMzZ6NHpjdHZicWhSTnpPV0IxYgpWSi9kSU1EdnNFeEdjWFFWRElUN3NETlh2MHdFM2pL U0twcDdOREcxb1hVWEwrMitTRjk5S2p5NzUzQWJRU0FtCkg2MTdmeUJOd2hKV3ZRWWcrbVV2 UHBpR090c2VzOUVYVUkzbFM0djBNRWFQRzQzZmxFczFVUisxcnBGUVdWSG8KMXkxT08rc0FF UUVBQVlrQ1BBUVlBUWdBSmdJYkRCWWhCSDQza3FuWXJQZldNN3dWaU8yWDZRNWlxbjQwQlFK ZgpKYjJ6QlFrVXJndlBBQW9KRU8yWDZRNWlxbjQwY25NUC8xN0NnVWtYVDlhSUpyaVBNOHdi Y2VZcmNsNytiZFlFCmY3OVNsd1NiYkhON1I0Q29JSkZPbE45Uy8zNHR5cEdWWXZwZ21DSkRZ RlRCeHlQTzkyaU1YRGdBNCtjV0h6dDUKVDFhWU85aHNLaGg3dkR0Sys2UHJvWkdjKzA4Z1VU WEhoYjk3aE1NUWhrbkpsbmZqcFNFQzllbTkwNkZVK0k5MwpUMWZUR3VwbkJhM2FXY0s4ak0w SmFCR2J5MmhHMVMzb2xhRExTVHRCSU5OQlltdnVXUjlNS09oaHFEcmxrNWN3CkZESkxoNU5y WHRlRVkwOFdBemNMekczcGtyWFBIa0ZlTVF0ZnFrMGpMZEdHdkdDM05DSWtxWXJkTGhpUnZH cHIKdTM4QzI2UkVuNWY0STB2R0UzVmZJWEhlOFRNQ05tUXV0MU50TXVVbXBESXkxYUx4R3p1 cHRVaG5PSk4vL3IrVgpqRFBvaTNMT3lTTllwaHFlL2RNdWJzZlVyNm9oUDQxbUtGODFGdXdJ NGFtcUp0cnFJTDJ5cWF4M2EwcWxmd0N4ClhmdGllcUpjdWVrWCtlQ1BEQ0tyWU1YUjBGWWd3 cEcySVRaVUd0ckVqRVNsRTZEc2N4NzM0SEtkcjVPUklvY0wKVVVLRU9HZWlVNkRHaEdGZGI1 VHd1MFNuK3UxbVVQRE4wTSsrQ2RNdkNsSUU4a2xvNEc5MUVPSW11MVVwYjh4YwpPUFF3eGgx andxU3JVNVF3b05tU1llZ1FTSExwSVV1ckZ6MWlRVWgxdnBQWHpLaW5rV0VxdjRJcUExY2lM K0x5CnlTdUxrcDdNc0pwVlJNYldKQ05XT09TYmFING9EQko1ZEhNR2MzNXg1bW9zQ2s5MFBY a251RkREc1lIZkRvNXMKbWY5bG82WVh4N045Cj0zTGFJCi0tLS0tRU5EIFBHUCBQVUJMSUMg S0VZIEJMT0NLLS0tLS0K Organization: UCLA Computer Science Department Message-ID: <0d9cbf72-04f3-304b-2020-d79d6cee6a89@cs.ucla.edu> Date: Mon, 21 Sep 2020 12:34:28 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) severity 41004 wishlist From debbugs-submit-bounces@debbugs.gnu.org Mon Sep 21 23:25:26 2020 Received: (at 41004-done) by debbugs.gnu.org; 22 Sep 2020 03:25:26 +0000 Received: from localhost ([127.0.0.1]:56844 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kKYvl-00030F-R7 for submit@debbugs.gnu.org; Mon, 21 Sep 2020 23:25:26 -0400 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:35066) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kKYvj-0002zy-DC for 41004-done@debbugs.gnu.org; Mon, 21 Sep 2020 23:25:24 -0400 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 531FB1600E9; Mon, 21 Sep 2020 20:25:17 -0700 (PDT) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 13Cvnq782sHc; Mon, 21 Sep 2020 20:25:16 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 2934B1600EB; Mon, 21 Sep 2020 20:25:16 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id jHY3phxZ8s4A; Mon, 21 Sep 2020 20:25:16 -0700 (PDT) Received: from [192.168.1.9] (cpe-23-243-218-95.socal.res.rr.com [23.243.218.95]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id E9E891600E9; Mon, 21 Sep 2020 20:25:15 -0700 (PDT) Subject: Re: bug#41004: Documentation:enhancement - search for hexvalue To: Jim Meyering , Stephane Chazelas References: <20200510164644.n2cpqpwdxdarowv7@chazelas.org> From: Paul Eggert Autocrypt: addr=eggert@cs.ucla.edu; prefer-encrypt=mutual; keydata= LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tCgptUUlOQkV5QWNtUUJFQURB QXlIMnhvVHU3cHBHNUQzYThGTVpFb243NGRDdmM0K3ExWEEySjJ0QnkycHdhVHFmCmhweHhk R0E5Smo1MFVKM1BENGJTVUVnTjh0TFowc2FuNDdsNVhUQUZMaTI0NTZjaVNsNW04c0thSGxH ZHQ5WG0KQUF0bVhxZVpWSVlYL1VGUzk2ZkR6ZjR4aEVtbS95N0xiWUVQUWRVZHh1NDd4QTVL aFRZcDVibHRGM1dZRHoxWQpnZDdneDA3QXV3cDdpdzdlTnZub0RUQWxLQWw4S1lEWnpiRE5D UUdFYnBZM2VmWkl2UGRlSStGV1FONFcra2doCnkrUDZhdTZQcklJaFlyYWV1YTdYRGRiMkxT MWVuM1NzbUUzUWpxZlJxSS9BMnVlOEpNd3N2WGUvV0szOEV6czYKeDc0aVRhcUkzQUZINmls QWhEcXBNbmQvbXNTRVNORnQ3NkRpTzFaS1FNcjlhbVZQa25qZlBtSklTcWRoZ0IxRApsRWR3 MzRzUk9mNlY4bVp3MHhmcVQ2UEtFNDZMY0ZlZnpzMGtiZzRHT1JmOHZqRzJTZjF0azVlVThN Qml5Ti9iClowM2JLTmpOWU1wT0REUVF3dVA4NGtZTGtYMndCeHhNQWhCeHdiRFZadWR6eERa SjFDMlZYdWpDT0pWeHEya2wKakJNOUVUWXVVR3FkNzVBVzJMWHJMdzYrTXVJc0hGQVlBZ1Jy NytLY3dEZ0JBZndoUEJZWDM0blNTaUhsbUxDKwpLYUhMZUNMRjVaSTJ2S20zSEVlQ1R0bE9n N3haRU9OZ3d6TCtmZEtvK0Q2U29DOFJSeEpLczhhM3NWZkk0dDZDCm5yUXp2SmJCbjZneGRn Q3U1aTI5SjFRQ1lyQ1l2cWwyVXlGUEFLK2RvOTkvMWpPWFQ0bTI4MzZqMXdBUkFRQUIKdENC UVlYVnNJRVZuWjJWeWRDQThaV2RuWlhKMFFHTnpMblZqYkdFdVpXUjFQb2tDVlFRVEFRZ0FQ d0liQXdZTApDUWdIQXdJR0ZRZ0NDUW9MQkJZQ0F3RUNIZ0VDRjRBV0lRUitONUtwMkt6MzFq TzhGWWp0bCtrT1lxcCtOQVVDClh5Vzlsd1VKRks0THN3QUtDUkR0bCtrT1lxcCtOS05WRC85 SE1zSTE2MDZuMFV1VFhId0lUc3lPakFJOVNET1QKK0MzRFV2NnFsTTVCSDJuV0FNVGlJaXlB NXVnbHNKdjkzb2kydk50RmYvUS9tLzFjblpXZ25WbkV4a3lMSTRFTgpTZDF1QnZyMC9sQ1Nk UGxQME1nNkdXU3BYTXUreDB2ZFQwQWFaTk9URTBGblB1b2xkYzNYRDc2QzJxZzhzWC9pCmF4 WFRLSHk5UCtCbEFxL0NzNy9weERRMEV6U24wVVNaMkMwbDV2djRQTXBBL3BpY25TNks2MDlK dkRHYU9SbXcKWmVYSVpxUU5aVitaUXMrVVl0Vm9ndURUcWJ5M0lVWTFJOEJsWEhScHRhajlB TW40VW9oL0NxcFFsVm9qb3lXbApIcWFGbm5KQktlRjBodko5U0F5YWx3dXpBakc3dlFXMDdN WW5jYU9GbTB3b2lLYmc1SkxPOEY0U0JUSWt1TzBECkNmOW5MQWF5NlZzQjRyendkRWZSd2pQ TFlBbjdNUjNmdkhDRXpmcmtsZFRyYWlCTzFUMGllREs4MEk3c0xmNnAKTWVDWUkxOXBVbHgw L05STUdDZGRpRklRZGZ0aEtXWEdSUzVMQXM4andCZjhINkc1UFdpblByRUlhb21JUDIxaQp2 dWhRRDA3YllxOUlpSWRlbGpqVWRIY0dJMGkvQjRNNTZaYWE4RmYzOGluaU9sckRZQ21ZV1I0 ZENXWml1UWVaCjNPZ3FlUXM5YTZqVHZnZERHVm1SVnFZK2p6azhQbGFIZmNvazhST2hGY0hL a2NmaHVCaEwyNWhsUklzaFJET0UKc2tYcUt3bnpyYnFnYTNHWFpYZnNYQW9GYnpOaExkTHY5 QStMSkFZU2tYUDYvNXFkVHBFTFZHb3N5SDg4NFZkYgpCcGtHSTA0b1lWcXVsYmtDRFFSTWdI SmtBUkFBcG9YcnZ4UDNESWZqQ05PdFhVL1Bkd01TaEtkWC9SbFNzNVBmCnVuVjF3YktQOGhl clhIcnZRZEZWcUVDYVRTeG1saHpiazhYMFBrWTlnY1ZhVTJPNDlUM3FzT2QxY0hlRjUyWUYK R0V0MExoc0JlTWpnTlg1dVoxVjc2cjhneWVWbEZwV1diMFNJd0pVQkhyRFhleEY2N3VwZVJi MnZkSEJqWUROZQp5U24rMEI3Z0ZFcXZWbVp1K0xhZHVkRHA2a1FMamF0RnZIUUhVU0dOc2hC bmtrY2FUYmlJOVBzdDBHQ2MyYWl6Cm5CaVBQQTJXUXhBUGxQUmgzT0dUc241VEhBRG1ianFZ NkZFTUxhc1ZYOERTQ2JsTXZMd05lTy84U3h6aUJpZGgKcUxwSkNxZFFSV0hrdTVYeGdJa0dl S096NU9MRHZYSFdKeWFmckVZamprUzZBazZCNXo2c3ZLbGlDbFduakhRYwpqbFB6eW9GRmdL VEVmY3FEeENqNFJZMEQwRGd0RkQwTmZ5ZU9pZHJTQi9TelRlMmh3cnlRRTNycFNpcW8rMGNH CmR6aDR5QUhLWUorVXJYWjRwOTNaaGpHZktEMXhsck5ZRGxXeVc5UEdtYnZxRnVEbWlJQVFm OVdEL3d6RWZJQ2MKK0YrdURESSt1WWtSeFVGcDkyeWttZGhERUZnMXlqWXNVOGlHVTY5YUh5 dmhxMzZ6NHpjdHZicWhSTnpPV0IxYgpWSi9kSU1EdnNFeEdjWFFWRElUN3NETlh2MHdFM2pL U0twcDdOREcxb1hVWEwrMitTRjk5S2p5NzUzQWJRU0FtCkg2MTdmeUJOd2hKV3ZRWWcrbVV2 UHBpR090c2VzOUVYVUkzbFM0djBNRWFQRzQzZmxFczFVUisxcnBGUVdWSG8KMXkxT08rc0FF UUVBQVlrQ1BBUVlBUWdBSmdJYkRCWWhCSDQza3FuWXJQZldNN3dWaU8yWDZRNWlxbjQwQlFK ZgpKYjJ6QlFrVXJndlBBQW9KRU8yWDZRNWlxbjQwY25NUC8xN0NnVWtYVDlhSUpyaVBNOHdi Y2VZcmNsNytiZFlFCmY3OVNsd1NiYkhON1I0Q29JSkZPbE45Uy8zNHR5cEdWWXZwZ21DSkRZ RlRCeHlQTzkyaU1YRGdBNCtjV0h6dDUKVDFhWU85aHNLaGg3dkR0Sys2UHJvWkdjKzA4Z1VU WEhoYjk3aE1NUWhrbkpsbmZqcFNFQzllbTkwNkZVK0k5MwpUMWZUR3VwbkJhM2FXY0s4ak0w SmFCR2J5MmhHMVMzb2xhRExTVHRCSU5OQlltdnVXUjlNS09oaHFEcmxrNWN3CkZESkxoNU5y WHRlRVkwOFdBemNMekczcGtyWFBIa0ZlTVF0ZnFrMGpMZEdHdkdDM05DSWtxWXJkTGhpUnZH cHIKdTM4QzI2UkVuNWY0STB2R0UzVmZJWEhlOFRNQ05tUXV0MU50TXVVbXBESXkxYUx4R3p1 cHRVaG5PSk4vL3IrVgpqRFBvaTNMT3lTTllwaHFlL2RNdWJzZlVyNm9oUDQxbUtGODFGdXdJ NGFtcUp0cnFJTDJ5cWF4M2EwcWxmd0N4ClhmdGllcUpjdWVrWCtlQ1BEQ0tyWU1YUjBGWWd3 cEcySVRaVUd0ckVqRVNsRTZEc2N4NzM0SEtkcjVPUklvY0wKVVVLRU9HZWlVNkRHaEdGZGI1 VHd1MFNuK3UxbVVQRE4wTSsrQ2RNdkNsSUU4a2xvNEc5MUVPSW11MVVwYjh4YwpPUFF3eGgx andxU3JVNVF3b05tU1llZ1FTSExwSVV1ckZ6MWlRVWgxdnBQWHpLaW5rV0VxdjRJcUExY2lM K0x5CnlTdUxrcDdNc0pwVlJNYldKQ05XT09TYmFING9EQko1ZEhNR2MzNXg1bW9zQ2s5MFBY a251RkREc1lIZkRvNXMKbWY5bG82WVh4N045Cj0zTGFJCi0tLS0tRU5EIFBHUCBQVUJMSUMg S0VZIEJMT0NLLS0tLS0K Organization: UCLA Computer Science Department Message-ID: <8a8edbb9-af9f-b8b3-43c3-1e22b0a54141@cs.ucla.edu> Date: Mon, 21 Sep 2020 20:25:15 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------C42AC9D60BCD235D4695C281" Content-Language: en-US X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 41004-done Cc: 41004-done@debbugs.gnu.org, Radisson97@web.de X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) This is a multi-part message in MIME format. --------------C42AC9D60BCD235D4695C281 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit I installed the attached doc patch, which I hope addresses the issues mentioned in this bug report, and am boldly closing the bug report. --------------C42AC9D60BCD235D4695C281 Content-Type: text/x-patch; charset=UTF-8; name="0001-doc-say-how-to-match-chars-by-code.patch" Content-Disposition: attachment; filename="0001-doc-say-how-to-match-chars-by-code.patch" Content-Transfer-Encoding: quoted-printable >From 1444b4979dc5935b7fe1d13e76539dddbaabd242 Mon Sep 17 00:00:00 2001 From: Paul Eggert Date: Mon, 21 Sep 2020 20:22:02 -0700 Subject: [PATCH] doc: say how to match chars by code >From a suggestion in Bug#41004. * doc/grep.texi (Character Encoding, Matching Non-ASCII): New sections. Move some material from Environment Variables into these sections. --- doc/grep.texi | 84 +++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 68 insertions(+), 16 deletions(-) diff --git a/doc/grep.texi b/doc/grep.texi index a680d39..15185f3 100644 --- a/doc/grep.texi +++ b/doc/grep.texi @@ -1044,22 +1044,8 @@ interpreted. These variables specify the locale for the @env{LC_CTYPE} category, which determines the type of characters, e.g., which characters are whitespace. -This category also determines the character encoding, that is, whether -text is encoded in UTF-8, ASCII, or some other encoding. In the -@samp{C} or @samp{POSIX} locale, all characters are encoded as a -single byte and every byte is a valid character. -In more-complex encodings such as UTF-8, a sequence of multiple bytes -may be needed to represent a character, and some bytes may be encoding -errors that do not contribute to the representation of any character. -POSIX does not specify the behavior of @command{grep} when patterns or -input data contain encoding errors or null characters, so portable -scripts should avoid such usage. As an extension to POSIX, GNU -@command{grep} treats null characters like any other character. -However, unless the @option{-a} (@option{--binary-files=3Dtext}) option -is used, the presence of null characters in input or of encoding -errors in output causes GNU @command{grep} to treat the file as binary -and suppress details about matches. @xref{File and Directory -Selection}. +This category also determines the character encoding. +@xref{Character Encoding}. =20 @item LANGUAGE @itemx LC_ALL @@ -1208,6 +1194,8 @@ pages, but work only if PCRE is available in the sy= stem. * Anchoring:: * Back-references and Subexpressions:: * Basic vs Extended:: +* Character Encoding:: +* Matching Non-ASCII:: @end menu =20 @node Fundamental Structure @@ -1559,6 +1547,70 @@ instead of reporting a syntax error in the regular= expression. POSIX allows this behavior as an extension, but portable scripts should avoid it. =20 +@node Character Encoding +@section Character Encoding +@cindex character encoding + +The @env{LC_CTYPE} locale specifies the encoding of characters in +patterns and data, that is, whether text is encoded in UTF-8, ASCII, +or some other encoding. @xref{Environment Variables}. + +In the @samp{C} or @samp{POSIX} locale, every character is encoded as +a single byte and every byte is a valid character. In more-complex +encodings such as UTF-8, a sequence of multiple bytes may be needed to +represent a character, and some bytes may be encoding errors that do +not contribute to the representation of any character. POSIX does not +specify the behavior of @command{grep} when patterns or input data +contain encoding errors or null characters, so portable scripts should +avoid such usage. As an extension to POSIX, GNU @command{grep} treats +null characters like any other character. However, unless the +@option{-a} (@option{--binary-files=3Dtext}) option is used, the +presence of null characters in input or of encoding errors in output +causes GNU @command{grep} to treat the file as binary and suppress +details about matches. @xref{File and Directory Selection}. + +Regardless of locale, the 103 characters in the POSIX Portable +Character Set (a subset of ASCII) are always encoded as a single byte, +and the 128 ASCII characters have their usual single-byte encodings on +all but oddball platforms. + +@node Matching Non-ASCII +@section Matching Non-ASCII and Non-printable Characters +@cindex non-ASCII matching +@cindex non-printable matching + +In a regular expression, non-ASCII and non-printable characters other +than newline are not special, and represent themselves. For example, +in a locale using UTF-8 the command @samp{grep '=CE=9B@tie{}=CF=89'} (wh= ere the +white space between @samp{=CE=9B} and the @samp{=CF=89} is a tab charact= er) +searches for @samp{=CE=9B} (Unicode character U+039B GREEK CAPITAL LETTE= R +LAMBDA), followed by a tab (U+0009 TAB), followed by @samp{=CF=89} (U+03= C9 +GREEK SMALL LETTER OMEGA). + +Suppose you want to limit your pattern to only printable characters +(or even only printable ASCII characters) to keep your script readable +or portable, but you also want to match specific non-ASCII or non-null +non-printable characters. If you are using the @option{-P} +(@option{--perl-regexp}) option, PCREs give you several ways to do +this. Otherwise, if you are using Bash, the GNU project's shell, you +can represent these characters via ANSI-C quoting. For example, the +Bash commands @samp{grep $'=CE=9B\t=CF=89'} and @samp{grep $'\u039B\t\u0= 3C9'} +both search for the same three-character string @samp{=CE=9B@tie{}=CF=89= } +mentioned earlier. However, because Bash translates ANSI-C quoting +before @command{grep} sees the pattern, this technique should not be +used to match printable ASCII characters; for example, @samp{grep +$'\u005E'} is equivalent to @samp{grep '^'} and matches any line, not +just lines containing the character @samp{^} (U+005E CIRCUMFLEX +ACCENT). + +Since PCREs and ANSI-C quoting are GNU extensions to POSIX, portable +shell scripts written in ASCII should use other methods to match +specific non-ASCII characters. For example, in a UTF-8 locale the +command @samp{grep "$(printf '\316\233\t\317\211\n')"} is a portable +albeit hard-to-read alternative to Bash's @samp{grep $'=CE=9B\t=CF=89'}. +However, none of these techniques will let you put a null character +directly into a command-line pattern; null characters can appear only +in a pattern specified via the @option{-f} (@option{--file}) option. =20 @node Usage @chapter Usage --=20 2.17.1 --------------C42AC9D60BCD235D4695C281-- From unknown Sat Jun 21 12:11:33 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 20 Oct 2020 11:24:04 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator