From unknown Sat Jun 21 03:02:02 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#78276 <78276@debbugs.gnu.org> To: bug#78276 <78276@debbugs.gnu.org> Subject: Status: grep on file with 0xF3 byte in utf-8 locale Reply-To: bug#78276 <78276@debbugs.gnu.org> Date: Sat, 21 Jun 2025 10:02:02 +0000 retitle 78276 grep on file with 0xF3 byte in utf-8 locale reassign 78276 grep submitter 78276 Arkadiusz Mi=C5=9Bkiewicz severity 78276 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Tue May 06 03:38:03 2025 Received: (at submit) by debbugs.gnu.org; 6 May 2025 07:38:03 +0000 Received: from localhost ([127.0.0.1]:49202 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1uCCsK-0007Xt-J6 for submit@debbugs.gnu.org; Tue, 06 May 2025 03:38:03 -0400 Received: from lists.gnu.org ([2001:470:142::17]:54220) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1uCCsG-0007Wv-K9 for submit@debbugs.gnu.org; Tue, 06 May 2025 03:37:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uCCs9-0007S3-Kh for bug-grep@gnu.org; Tue, 06 May 2025 03:37:49 -0400 Received: from mail-ed1-x52a.google.com ([2a00:1450:4864:20::52a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1uCCs6-0006dT-6C for bug-grep@gnu.org; Tue, 06 May 2025 03:37:49 -0400 Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-5e677f59438so7759645a12.2 for ; Tue, 06 May 2025 00:37:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=maven.pl; s=maven; t=1746517063; x=1747121863; darn=gnu.org; h=content-transfer-encoding:subject:from:to:content-language :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=nWX8auJ7GTW6wpGs//32wmLJ6NKahvtsKKUqEXpNVeY=; b=OfC4P7mO538n7Fs1GV0eo/CZHg0BmhLYYW3+tr8Tay0YN3qC8bzhLyKN5+yjJno31D W1tgbD+D9T6B6LhRMZgzldIKV5yUUpKPOyrAgyVzT/5w69pfLYYlv3Ny9SNjsxnF+upy NvLhE0JTSMAfQN7dHERPOvNx84YsS64YkY31A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746517063; x=1747121863; h=content-transfer-encoding:subject:from:to:content-language :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=nWX8auJ7GTW6wpGs//32wmLJ6NKahvtsKKUqEXpNVeY=; b=w+dWr98Wfu6NAAkGejsoxMQdtqepj4ZWZWrZ9dzASVBA6onbcoTr+ho8BO5Kt2vg+B 4+xal2mjQ69DbkbSThw6yLJyvC4SLUObv0JtNMv0Hq6zRRh4XbRRYd+n/Sh9keEs7CPL 2OJFGCq2PGg7mBZY0+9pDwbVCtl39DTf6BHqRtIHS10rrzDNWkdiXTQchfLMEs0m04Rd kL4daa6cV+5guuefgDTk/P7UjKq4rFQWU2NTovLW0saE4XXyqvo7m/pjzA2vD9NuR4PS HS4NZAxeq0efe4JAWvbf2dFP1S4FHNd93gdp976ra5t/UvhC2BlNQ1kqmxSO17C3ElP4 QKTw== X-Gm-Message-State: AOJu0YxdRZ//4B9dYiWI2KG1NsFCVTBqI8FZY/r0X/Wq6aDy+ScKVDE/ npDj1GuYRWhZ5/egzcs13gWSc9fBDpoL3C7qAtFaNqkiOnlXLphSY7XL2il6ngyJ6r11622U8ft v X-Gm-Gg: ASbGnctlAPjxJ8z94x7/SKGjTaqJLI/OqQ0l/iDiHvDwmjkney5W/PUB18AF26qwVWi /r+jihUi0vZdxlxr8SX5qYStYqCE7f5em5jWumOJuNRTUbGS99O41QLX1TlfrfogS8vXI7rkdag quZNh5r3o0HJ/isYZ6U2ba84+AlZfaZwszvE8LlTs55YgKZT+2ArWjQ2Mhbj4DYakRNDh5Ph/hx ao7r3FSGjdZGr8+JBgRWbEv3wizTkhHSwGURMZFlJVe6ZveSywnd2CiFGtkCXQceka/vbRc7rzT mVQeoyUz4OfrSjBKG7sLNjH/lr6wYZ3rOo+7Zh/RTkEfWPYe1b3By53Eavg7o4dL0q2CoWNXQfy /wFfQ8HU= X-Google-Smtp-Source: AGHT+IG0kk0Rdj/7a3eTzpqRXOoc6Wthykw5ogZBI0wiplt07DBsszb3CP68vRGqcruPG3yN7WSlvg== X-Received: by 2002:a05:6402:2106:b0:5ec:9e9e:3c3 with SMTP id 4fb4d7f45d1cf-5fab056ceecmr7894151a12.6.1746517058398; Tue, 06 May 2025 00:37:38 -0700 (PDT) Received: from [192.168.68.100] (user-5-173-22-22.play-internet.pl. [5.173.22.22]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5fa777561d3sm7187774a12.16.2025.05.06.00.37.37 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 06 May 2025 00:37:37 -0700 (PDT) Message-ID: Date: Tue, 6 May 2025 09:37:36 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US, pl To: bug-grep@gnu.org From: =?UTF-8?Q?Arkadiusz_Mi=C5=9Bkiewicz?= Subject: grep on file with 0xF3 byte in utf-8 locale Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2a00:1450:4864:20::52a; envelope-from=arekm@maven.pl; helo=mail-ed1-x52a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.0 (+) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.0 (/) Hi. I was trying to grep logs for some mail log entries and spammer used 0xF3 byte to try to hide / trick things. For grep it looks like this: $ printf 'a\xF3bcdefgh' > x2 $ LC_ALL=C.UTF-8 grep 'a.*h' x2 $ $ LC_ALL=C grep 'a.*h' x2 abcdefgh $ LC_ALL=C.UTF-8 grep -a 'a.*h' x2 $ [arekm@ixion ~]$ LC_ALL=C grep -a 'a.*h' x2 abcdefgh Is that expected behavior, no binary file warning and no matching with utf-8 locale, even with -a? AFAIK that's not correct utf-8 sequence. $ grep --version x2 grep (GNU grep) 3.12 Copyright (C) 2025 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later . This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Mike Haertel and others; see . grep -P uses PCRE2 10.45 2025-02-05 -- Arkadiusz Miƛkiewicz, arekm / ( maven.pl | pld-linux.org ) From debbugs-submit-bounces@debbugs.gnu.org Tue May 06 05:12:40 2025 Received: (at 78276-done) by debbugs.gnu.org; 6 May 2025 09:12:41 +0000 Received: from localhost ([127.0.0.1]:51180 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1uCELu-0001Bm-HZ for submit@debbugs.gnu.org; Tue, 06 May 2025 05:12:40 -0400 Received: from mail.cs.ucla.edu ([131.179.128.66]:35886) by debbugs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1uCELo-0001AO-Mo for 78276-done@debbugs.gnu.org; Tue, 06 May 2025 05:12:36 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id 240473C010852; Tue, 6 May 2025 02:12:26 -0700 (PDT) Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavis, port 10032) with ESMTP id k9A8Sj-pEZfy; Tue, 6 May 2025 02:12:26 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by mail.cs.ucla.edu (Postfix) with ESMTP id F10343C010854; Tue, 6 May 2025 02:12:25 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.cs.ucla.edu F10343C010854 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.ucla.edu; s=9D0B346E-2AEB-11ED-9476-E14B719DCE6C; t=1746522746; bh=gNta6gA6wb695wdlhZDujh4+zePuxJbeyBVBTgcehXE=; h=Message-ID:Date:MIME-Version:To:From; b=MnztYRVE7wCGbp+6kn0wdl/GQflMU0GsfixM6Dqnx2025qm9yeu9J+lHUThw+fcUE TSUjhDl+D7G1foUfU+nShIQq+XdWePwwmjROyhb1z1pi66/t5bRuHW5AV44OSHFptz Uxz0Bc/MuwsX89UBimRcq7tk2ibBBiHUFUhtuo/eDvL0Uj6jgd5gZXSJsRYhb0pLXv rn7ijm2fPMtHWRNqmucXKClzlQLV2HTGL33/RnuK1wUIQ3VcdvKazHopRNK4Hkkmta N5c7KZdfz5Qt8xuM6czAL+qEmqaQQq/cNeo59gxt8faJbWdwmLezc3Tq0K0e6LNHmE dD5drd0FsFj0w== X-Virus-Scanned: amavis at mail.cs.ucla.edu Received: from mail.cs.ucla.edu ([127.0.0.1]) by localhost (mail.cs.ucla.edu [127.0.0.1]) (amavis, port 10026) with ESMTP id GW1sdEcWm5f5; Tue, 6 May 2025 02:12:25 -0700 (PDT) Received: from [192.168.254.12] (47-147-225-25.fdr01.snmn.ca.ip.frontiernet.net [47.147.225.25]) by mail.cs.ucla.edu (Postfix) with ESMTPSA id D394C3C010852; Tue, 6 May 2025 02:12:25 -0700 (PDT) Message-ID: <9803c83b-83e9-4e76-ad05-7fe01dd1476e@cs.ucla.edu> Date: Tue, 6 May 2025 02:12:25 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: bug#78276: grep on file with 0xF3 byte in utf-8 locale To: =?UTF-8?Q?Arkadiusz_Mi=C5=9Bkiewicz?= References: Content-Language: en-US From: Paul Eggert Organization: UCLA Computer Science Department In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 78276-done Cc: 78276-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) On 2025-05-06 00:37, Arkadiusz Mi=C5=9Bkiewicz via Bug reports for GNU gr= ep=20 wrote: > Is that expected behavior, no binary file warning and no matching with=20 > utf-8 locale, even with -a? It's allowed behavior, as '.' need not match encoding errors.[1] Also,=20 'grep' need not diagnose encoding errors that don't harm the output.[2] As you mentioned in your email, using LC_ALL=3DC should let '.' match any= =20 byte, so that should let you do what you want. [1]:=20 https://www.gnu.org/software/grep/manual/html_node/Fundamental-Structure.= html [2]:=20 https://www.gnu.org/software/grep/manual/html_node/File-and-Directory-Sel= ection.html From unknown Sat Jun 21 03:02:02 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Tue, 03 Jun 2025 11:24:44 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator