From unknown Tue Aug 19 23:11:25 2025 X-Loop: help-debbugs@gnu.org Subject: bug#38503: Locale can cause incorrect number parsing in binary files Resent-From: jan h Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Thu, 05 Dec 2019 20:02:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 38503 X-GNU-PR-Package: grep X-GNU-PR-Keywords: To: 38503@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.157557609410061 (code B ref -1); Thu, 05 Dec 2019 20:02:01 +0000 Received: (at submit) by debbugs.gnu.org; 5 Dec 2019 20:01:34 +0000 Received: from localhost ([127.0.0.1]:45238 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icxJd-0002cD-Cr for submit@debbugs.gnu.org; Thu, 05 Dec 2019 15:01:33 -0500 Received: from lists.gnu.org ([209.51.188.17]:50917) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icvuL-0008Ic-FW for submit@debbugs.gnu.org; Thu, 05 Dec 2019 13:31:21 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:34876) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1icvuK-0000h8-Fv for bug-grep@gnu.org; Thu, 05 Dec 2019 13:31:21 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1icvuJ-0002vm-He for bug-grep@gnu.org; Thu, 05 Dec 2019 13:31:20 -0500 Received: from mail-qk1-x731.google.com ([2607:f8b0:4864:20::731]:33891) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1icvuH-0002rv-Ds for bug-grep@gnu.org; Thu, 05 Dec 2019 13:31:17 -0500 Received: by mail-qk1-x731.google.com with SMTP id d202so4225209qkb.1 for ; Thu, 05 Dec 2019 10:31:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=H+48xYGVWW/AFNZUg8ER0T4bSxisRhk86HPGucQg7+8=; b=aM8FGvcvjt/QaEAKnLMakQhpZLWBijov59gb70NM9LJ59n81hI4YhyIi1t/gkyeS5m fOsf2bYYTbNnT62dyXN+uq3ka6wf4tUUYB/+5A+bOgQsl9j4SSGpvdDNZM7kTAYxnrOe OO6k/hMQi1J09OzDXFn5vAxtTUJc/8y6EB6jl5hpTeX5jXpxiyVqq9T8jvrSvjKdVbi3 TfGEeQ+LI6DDskfZwEPdTPIFaSlM4xdRob8sB4XfDXdd0H0vYqSKYysX4ugFi4ROsP3/ caUI7WLH7gLhWB80037QQ3qJAUX+Sj11qY8U7VIWV4xPd3KRX0p5fxleaCAfJWtUb8i6 5jwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=H+48xYGVWW/AFNZUg8ER0T4bSxisRhk86HPGucQg7+8=; b=Uu8F9WXxRqNVZU2k+DUWWdceC6+suWnk6h192b4tkD/9vkE9wbx6fOhGWWXhoa4M8Y fgt3+g2yZr5vYBP9+gRrPU6SemI94JjC39bhXupbkusCzD2/zrXgyZeKu8FL7za3SKlI dguhpajk3iesuzUjCo3uJHf0Axv3tdbMpEpt2GXRaOOeRSG7Wa/vFX5kB1ACakhp74V2 dFueY4F3IBD0gC1a0xzPDny9ZBUvTLGO070P0coAifJN9EGtiP2QURU5sb53G+DDxhdQ lOtRpi2st3Hp1JWQk+jLLjijLTAU/rus7Qwl250Li5scx+mG/3ow/I/sTilAH/MylnH7 V33g== X-Gm-Message-State: APjAAAWbhUAHejE/uCkBkUdz8G/CL1Wa7fC/IzPu9XhktLA1Po2G0ZnN 8HhVKeqlxIZdcsOlVK7w+jdx4bB8nHS5xKR0JUO6JOc+so+MCw== X-Google-Smtp-Source: APXvYqy4TJxsh236XQVEJ7NqJGcDZothYhSNtyr1CTsErCgjkDGw7jTg5G1g40fWwfZlNYXeyYIO29NE7cfDBIQRUm4= X-Received: by 2002:a37:9acb:: with SMTP id c194mr9487304qke.291.1575570671612; Thu, 05 Dec 2019 10:31:11 -0800 (PST) MIME-Version: 1.0 From: jan h Date: Thu, 5 Dec 2019 18:30:58 +0000 Message-ID: Content-Type: text/plain; charset="UTF-8" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::731 X-Spam-Score: 0.7 (/) X-Mailman-Approved-At: Thu, 05 Dec 2019 15:01:32 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) grep 3.3 I get a few weird symbols (seems valid utf-8), along with normal numbers with the following simple snippet (.UTF-8 and .utf8 result in same, even .UtF---8 is the same): LC_ALL=en_US.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "\n" wc -c counts 1047 and and 1033 and 1036 etc, so they're multi-byte characters meanwhile, with LC_ALL being C.UTF-8 this is not the case, LC_ALL=C.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "\n"|wc -c consistently results in 1024 characters/bytes, as it's supposed to be... it's not just en_US, it seems ANY utf-8 locale, other than C results in this bug, whereas non-utf8 versions are fine, bare en_US doesn't show this bug, nor does en_US.iso88591... worthy of note is that [[:digit:]] works correctly, while [0-9] does not (and 1-9 is same bug as 0-9, if you were wondering), setting -E doesn't change anything either... From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 05 15:29:34 2019 Received: (at control) by debbugs.gnu.org; 5 Dec 2019 20:29:34 +0000 Received: from localhost ([127.0.0.1]:45291 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icxkj-0003Tr-Ob for submit@debbugs.gnu.org; Thu, 05 Dec 2019 15:29:34 -0500 Received: from us-smtp-delivery-1.mimecast.com ([207.211.31.120]:24183 helo=us-smtp-1.mimecast.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icxkg-0003TQ-Gl for control@debbugs.gnu.org; Thu, 05 Dec 2019 15:29:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575577765; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=htAVTw5Vn6k6J/H0AawEGTDRDE3VedAWKB5fDkix/LA=; b=C+4XwjwmTJAZFEoRGbg/qicQr6dlYDqd2KkeY9836DIaDPw46mk2zBhER6nCX75hRGprEi 4O1CWX/06Pdxg+U3tOZNu8fSZlGfcpunitaXiFe4th6eb72B2X6ze6X6B54YGwmI4AQdGZ DYaqNI5WpZnvWDDJxB+zfWPVmiHf5II= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-25-fzixrLkRMauydfthgNnxHA-1; Thu, 05 Dec 2019 15:29:22 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CC1A61005502; Thu, 5 Dec 2019 20:29:20 +0000 (UTC) Received: from [10.3.116.171] (ovpn-116-171.phx2.redhat.com [10.3.116.171]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 81F77600D1; Thu, 5 Dec 2019 20:29:20 +0000 (UTC) Subject: Re: bug#38503: Locale can cause incorrect number parsing in binary files To: jan h , 38503-done@debbugs.gnu.org References: From: Eric Blake Organization: Red Hat, Inc. Message-ID: <756269ef-ec82-f723-1bc8-b784bfbabad9@redhat.com> Date: Thu, 5 Dec 2019 14:29:19 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-MC-Unique: fzixrLkRMauydfthgNnxHA-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: control X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) tag 38503 notabug thanks On 12/5/19 12:30 PM, jan h wrote: > grep 3.3 >=20 > I get a few weird symbols (seems valid utf-8), along with normal > numbers with the following simple snippet (.UTF-8 and .utf8 result in > same, even .UtF---8 is the same): > LC_ALL=3Den_US.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "= \n" > wc -c counts 1047 and and 1033 and 1036 etc, so they're multi-byte charac= ters It's important to note that POSIX says that the regex [0-9] has=20 locale-dependent effects. Outside of the C/POSIX locale, it matches=20 whatever the locale definition says it should. For example, some=20 locales allow [A-Z] to match non-ASCII letters like =C3=81. Similarly, as= =20 you have found, on your system, the en_US.UTF-8 locale is defined to=20 match non-ASCII Unicode digits when a range expression for [0-9] is in=20 force. Note that the Rational Range Interpretation of ranges claims that [0-9]=20 should have the expansion [012345689] in ALL locales; and more and more=20 versions of GNU utilities are starting to move to RRI (even newer glibc=20 is trying to move towards RRI for more regex operations). If this=20 example is run where RRI is in force, then it should not match non-ASCII=20 Unicode digits. But you didn't mention which version of grep you are=20 using, let alone which version of libc is providing your locale=20 definitions, to make that determination; and POSIX does not require RRI. > meanwhile, with LC_ALL being C.UTF-8 this is not the case, > LC_ALL=3DC.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "\n"|= wc -c > consistently results in 1024 characters/bytes, as it's supposed to be... Well, in the POSIX locale (C.UTF-8 is not the POSIX locale, but follows=20 enough of the same rules), [0-9] _is_ required to match the same as=20 [01234356789]. That's the only locale where you get RRI for free,=20 rather than having to worry if your choice of program version and locale=20 definition provide it. > it's not just en_US, it seems ANY utf-8 locale, other than C results > in this bug, whereas non-utf8 versions are fine, bare en_US doesn't > show this bug, nor does en_US.iso88591... en_US.iso88591 does not have the problem because in that encoding, there=20 aren't any non-ASCII digits. So [0-9] will never match any non-ASCII=20 Unicode digits because the charset in use doesn't have such characters. >=20 > worthy of note is that [[:digit:]] works correctly, while [0-9] does > not (and 1-9 is same bug as 0-9, if you were wondering), setting -E > doesn't change anything either... POSIX requires [[:digit:]] to expand to the same 10 characters in ALL=20 locales, regardless of what the implementation does with [0-9], and=20 regardless of whether an implementation uses RRI. (This is true for=20 [[:digit:]], but not for other named ranges; for example, [[:alpha:]] is=20 still locale-dependent and may expand to more than 26 characters). Since the problem you reported is due to your locale, I'm closing this=20 as a non-bug. We may reopen it if additional details show that your=20 version of grep was supposed to be using RRI but failed to do so. And=20 feel free to continue conversation, even if we don't reopen the bug. --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org From unknown Tue Aug 19 23:11:25 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: jan h Subject: bug#38503: closed (Re: bug#38503: Locale can cause incorrect number parsing in binary files) Message-ID: References: <756269ef-ec82-f723-1bc8-b784bfbabad9@redhat.com> X-Gnu-PR-Message: they-closed 38503 X-Gnu-PR-Package: grep X-Gnu-PR-Keywords: notabug Reply-To: 38503@debbugs.gnu.org Date: Thu, 05 Dec 2019 20:30:05 +0000 Content-Type: multipart/mixed; boundary="----------=_1575577805-13457-1" This is a multi-part message in MIME format... ------------=_1575577805-13457-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #38503: Locale can cause incorrect number parsing in binary files which was filed against the grep package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 38503@debbugs.gnu.org. --=20 38503: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D38503 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1575577805-13457-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 38503-done) by debbugs.gnu.org; 5 Dec 2019 20:29:31 +0000 Received: from localhost ([127.0.0.1]:45289 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icxkh-0003Ti-B7 for submit@debbugs.gnu.org; Thu, 05 Dec 2019 15:29:31 -0500 Received: from us-smtp-2.mimecast.com ([207.211.31.81]:55334 helo=us-smtp-delivery-1.mimecast.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icxkf-0003TO-7S for 38503-done@debbugs.gnu.org; Thu, 05 Dec 2019 15:29:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575577763; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=htAVTw5Vn6k6J/H0AawEGTDRDE3VedAWKB5fDkix/LA=; b=OFhqoffocfnqoil5q4VyYMJTDt09n0G5LuDs52/ygb108uB0DeuhiEbEtlwuXX+sMOggMX eoNuyb8BfHM2urSQVykgAh3f/mPMQm+E/CkeeLTza4+IKsBlW6rgPGzVzYj61+B4TYtoQx yd4Uf06wCnnKWsOmUA1iRDQlFvfvRu8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-25-fzixrLkRMauydfthgNnxHA-1; Thu, 05 Dec 2019 15:29:22 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CC1A61005502; Thu, 5 Dec 2019 20:29:20 +0000 (UTC) Received: from [10.3.116.171] (ovpn-116-171.phx2.redhat.com [10.3.116.171]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 81F77600D1; Thu, 5 Dec 2019 20:29:20 +0000 (UTC) Subject: Re: bug#38503: Locale can cause incorrect number parsing in binary files To: jan h , 38503-done@debbugs.gnu.org References: From: Eric Blake Organization: Red Hat, Inc. Message-ID: <756269ef-ec82-f723-1bc8-b784bfbabad9@redhat.com> Date: Thu, 5 Dec 2019 14:29:19 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-MC-Unique: fzixrLkRMauydfthgNnxHA-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 38503-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) tag 38503 notabug thanks On 12/5/19 12:30 PM, jan h wrote: > grep 3.3 >=20 > I get a few weird symbols (seems valid utf-8), along with normal > numbers with the following simple snippet (.UTF-8 and .utf8 result in > same, even .UtF---8 is the same): > LC_ALL=3Den_US.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "= \n" > wc -c counts 1047 and and 1033 and 1036 etc, so they're multi-byte charac= ters It's important to note that POSIX says that the regex [0-9] has=20 locale-dependent effects. Outside of the C/POSIX locale, it matches=20 whatever the locale definition says it should. For example, some=20 locales allow [A-Z] to match non-ASCII letters like =C3=81. Similarly, as= =20 you have found, on your system, the en_US.UTF-8 locale is defined to=20 match non-ASCII Unicode digits when a range expression for [0-9] is in=20 force. Note that the Rational Range Interpretation of ranges claims that [0-9]=20 should have the expansion [012345689] in ALL locales; and more and more=20 versions of GNU utilities are starting to move to RRI (even newer glibc=20 is trying to move towards RRI for more regex operations). If this=20 example is run where RRI is in force, then it should not match non-ASCII=20 Unicode digits. But you didn't mention which version of grep you are=20 using, let alone which version of libc is providing your locale=20 definitions, to make that determination; and POSIX does not require RRI. > meanwhile, with LC_ALL being C.UTF-8 this is not the case, > LC_ALL=3DC.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "\n"|= wc -c > consistently results in 1024 characters/bytes, as it's supposed to be... Well, in the POSIX locale (C.UTF-8 is not the POSIX locale, but follows=20 enough of the same rules), [0-9] _is_ required to match the same as=20 [01234356789]. That's the only locale where you get RRI for free,=20 rather than having to worry if your choice of program version and locale=20 definition provide it. > it's not just en_US, it seems ANY utf-8 locale, other than C results > in this bug, whereas non-utf8 versions are fine, bare en_US doesn't > show this bug, nor does en_US.iso88591... en_US.iso88591 does not have the problem because in that encoding, there=20 aren't any non-ASCII digits. So [0-9] will never match any non-ASCII=20 Unicode digits because the charset in use doesn't have such characters. >=20 > worthy of note is that [[:digit:]] works correctly, while [0-9] does > not (and 1-9 is same bug as 0-9, if you were wondering), setting -E > doesn't change anything either... POSIX requires [[:digit:]] to expand to the same 10 characters in ALL=20 locales, regardless of what the implementation does with [0-9], and=20 regardless of whether an implementation uses RRI. (This is true for=20 [[:digit:]], but not for other named ranges; for example, [[:alpha:]] is=20 still locale-dependent and may expand to more than 26 characters). Since the problem you reported is due to your locale, I'm closing this=20 as a non-bug. We may reopen it if additional details show that your=20 version of grep was supposed to be using RRI but failed to do so. And=20 feel free to continue conversation, even if we don't reopen the bug. --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org ------------=_1575577805-13457-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 5 Dec 2019 20:01:34 +0000 Received: from localhost ([127.0.0.1]:45238 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icxJd-0002cD-Cr for submit@debbugs.gnu.org; Thu, 05 Dec 2019 15:01:33 -0500 Received: from lists.gnu.org ([209.51.188.17]:50917) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icvuL-0008Ic-FW for submit@debbugs.gnu.org; Thu, 05 Dec 2019 13:31:21 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:34876) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1icvuK-0000h8-Fv for bug-grep@gnu.org; Thu, 05 Dec 2019 13:31:21 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1icvuJ-0002vm-He for bug-grep@gnu.org; Thu, 05 Dec 2019 13:31:20 -0500 Received: from mail-qk1-x731.google.com ([2607:f8b0:4864:20::731]:33891) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1icvuH-0002rv-Ds for bug-grep@gnu.org; Thu, 05 Dec 2019 13:31:17 -0500 Received: by mail-qk1-x731.google.com with SMTP id d202so4225209qkb.1 for ; Thu, 05 Dec 2019 10:31:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=H+48xYGVWW/AFNZUg8ER0T4bSxisRhk86HPGucQg7+8=; b=aM8FGvcvjt/QaEAKnLMakQhpZLWBijov59gb70NM9LJ59n81hI4YhyIi1t/gkyeS5m fOsf2bYYTbNnT62dyXN+uq3ka6wf4tUUYB/+5A+bOgQsl9j4SSGpvdDNZM7kTAYxnrOe OO6k/hMQi1J09OzDXFn5vAxtTUJc/8y6EB6jl5hpTeX5jXpxiyVqq9T8jvrSvjKdVbi3 TfGEeQ+LI6DDskfZwEPdTPIFaSlM4xdRob8sB4XfDXdd0H0vYqSKYysX4ugFi4ROsP3/ caUI7WLH7gLhWB80037QQ3qJAUX+Sj11qY8U7VIWV4xPd3KRX0p5fxleaCAfJWtUb8i6 5jwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=H+48xYGVWW/AFNZUg8ER0T4bSxisRhk86HPGucQg7+8=; b=Uu8F9WXxRqNVZU2k+DUWWdceC6+suWnk6h192b4tkD/9vkE9wbx6fOhGWWXhoa4M8Y fgt3+g2yZr5vYBP9+gRrPU6SemI94JjC39bhXupbkusCzD2/zrXgyZeKu8FL7za3SKlI dguhpajk3iesuzUjCo3uJHf0Axv3tdbMpEpt2GXRaOOeRSG7Wa/vFX5kB1ACakhp74V2 dFueY4F3IBD0gC1a0xzPDny9ZBUvTLGO070P0coAifJN9EGtiP2QURU5sb53G+DDxhdQ lOtRpi2st3Hp1JWQk+jLLjijLTAU/rus7Qwl250Li5scx+mG/3ow/I/sTilAH/MylnH7 V33g== X-Gm-Message-State: APjAAAWbhUAHejE/uCkBkUdz8G/CL1Wa7fC/IzPu9XhktLA1Po2G0ZnN 8HhVKeqlxIZdcsOlVK7w+jdx4bB8nHS5xKR0JUO6JOc+so+MCw== X-Google-Smtp-Source: APXvYqy4TJxsh236XQVEJ7NqJGcDZothYhSNtyr1CTsErCgjkDGw7jTg5G1g40fWwfZlNYXeyYIO29NE7cfDBIQRUm4= X-Received: by 2002:a37:9acb:: with SMTP id c194mr9487304qke.291.1575570671612; Thu, 05 Dec 2019 10:31:11 -0800 (PST) MIME-Version: 1.0 From: jan h Date: Thu, 5 Dec 2019 18:30:58 +0000 Message-ID: Subject: Locale can cause incorrect number parsing in binary files To: bug-grep@gnu.org Content-Type: text/plain; charset="UTF-8" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::731 X-Spam-Score: 0.7 (/) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Thu, 05 Dec 2019 15:01:32 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) grep 3.3 I get a few weird symbols (seems valid utf-8), along with normal numbers with the following simple snippet (.UTF-8 and .utf8 result in same, even .UtF---8 is the same): LC_ALL=en_US.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "\n" wc -c counts 1047 and and 1033 and 1036 etc, so they're multi-byte characters meanwhile, with LC_ALL being C.UTF-8 this is not the case, LC_ALL=C.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "\n"|wc -c consistently results in 1024 characters/bytes, as it's supposed to be... it's not just en_US, it seems ANY utf-8 locale, other than C results in this bug, whereas non-utf8 versions are fine, bare en_US doesn't show this bug, nor does en_US.iso88591... worthy of note is that [[:digit:]] works correctly, while [0-9] does not (and 1-9 is same bug as 0-9, if you were wondering), setting -E doesn't change anything either... ------------=_1575577805-13457-1-- From unknown Tue Aug 19 23:11:25 2025 X-Loop: help-debbugs@gnu.org Subject: bug#38503: Locale can cause incorrect number parsing in binary files Resent-From: Eric Blake Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Thu, 05 Dec 2019 20:41:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 38503 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: jan h , 38503-done@debbugs.gnu.org Received: via spool by 38503-done@debbugs.gnu.org id=D38503.157557844822057 (code D ref 38503); Thu, 05 Dec 2019 20:41:01 +0000 Received: (at 38503-done) by debbugs.gnu.org; 5 Dec 2019 20:40:48 +0000 Received: from localhost ([127.0.0.1]:45319 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icxvb-0005jh-Pj for submit@debbugs.gnu.org; Thu, 05 Dec 2019 15:40:48 -0500 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:33193 helo=us-smtp-1.mimecast.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icxvZ-0005jY-UH for 38503-done@debbugs.gnu.org; Thu, 05 Dec 2019 15:40:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575578445; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=R8/tIBJhK7Qi3OzbnTtcp5XssfmU1ThZvlx95w1L2v4=; b=T7kJV5fbO7i/+ktY+smxX01tqswmfCSfBEWxCRvxORKMZl5V89JMINyP35cog4pc+M6bPD uXqt86YY/Km7gxzoSa2PAcYsyIdkq/7y6IHvrky6Gh6BqtwentumajIQ3/yvf1DqriAvb/ kfQsZ4UWXQuBKwOxQQzxCZASBqXiJt0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-216-CSaAwOB-P1K1V7kUrvx-Kg-1; Thu, 05 Dec 2019 15:40:44 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4A71D1005502; Thu, 5 Dec 2019 20:40:43 +0000 (UTC) Received: from [10.3.116.171] (ovpn-116-171.phx2.redhat.com [10.3.116.171]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 15A6F579F; Thu, 5 Dec 2019 20:40:42 +0000 (UTC) From: Eric Blake References: <756269ef-ec82-f723-1bc8-b784bfbabad9@redhat.com> Organization: Red Hat, Inc. Message-ID: Date: Thu, 5 Dec 2019 14:40:42 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <756269ef-ec82-f723-1bc8-b784bfbabad9@redhat.com> Content-Language: en-US X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-MC-Unique: CSaAwOB-P1K1V7kUrvx-Kg-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) On 12/5/19 2:29 PM, Eric Blake wrote: > tag 38503 notabug > thanks >=20 > On 12/5/19 12:30 PM, jan h wrote: >> grep 3.3 >> >=20 > Note that the Rational Range Interpretation of ranges claims that [0-9]= =20 > should have the expansion [012345689] in ALL locales; and more and more= =20 > versions of GNU utilities are starting to move to RRI (even newer glibc= =20 > is trying to move towards RRI for more regex operations).=C2=A0 If this= =20 > example is run where RRI is in force, then it should not match non-ASCII= =20 > Unicode digits.=C2=A0 But you didn't mention which version of grep you ar= e=20 > using, let alone which version of libc is providing your locale=20 > definitions, to make that determination; and POSIX does not require RRI. Sorry, I missed that you did mention grep 3.3. And the NEWS for grep=20 does not mention 'RRI' or 'Rational Range Interpretation' (compare that=20 to bash 4.2 introducing globasciiranges, or gawk introducing RRI in=20 4.0.1). So I'm not sure of the current state of whether grep tries to=20 use RRI on all systems or only on systems where it relies on gnulib's=20 regcomp instead of libc. So we may still need to reopen this if we=20 decide grep needs more RRI fixes. --=20 Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org From unknown Tue Aug 19 23:11:25 2025 X-Loop: help-debbugs@gnu.org Subject: bug#38503: Locale can cause incorrect number parsing in binary files Resent-From: jan h Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Thu, 05 Dec 2019 20:44:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 38503 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: 38503@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.157557864022386 (code B ref -1); Thu, 05 Dec 2019 20:44:03 +0000 Received: (at submit) by debbugs.gnu.org; 5 Dec 2019 20:44:00 +0000 Received: from localhost ([127.0.0.1]:45329 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icxyh-0005oy-9l for submit@debbugs.gnu.org; Thu, 05 Dec 2019 15:43:59 -0500 Received: from lists.gnu.org ([209.51.188.17]:55816) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icw3O-00004s-KW for submit@debbugs.gnu.org; Thu, 05 Dec 2019 13:40:43 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:59943) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1icw3N-0003eH-Gd for bug-grep@gnu.org; Thu, 05 Dec 2019 13:40:42 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: * X-Spam-Status: No, score=1.3 required=5.0 tests=BAYES_50,FREEMAIL_FROM, PDS_BTC_ID autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1icw3J-0005dA-Ve for bug-grep@gnu.org; Thu, 05 Dec 2019 13:40:39 -0500 Received: from mail-qk1-x72a.google.com ([2607:f8b0:4864:20::72a]:36713) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1icw3J-0005YQ-Op for bug-grep@gnu.org; Thu, 05 Dec 2019 13:40:37 -0500 Received: by mail-qk1-x72a.google.com with SMTP id v19so4242695qkv.3 for ; Thu, 05 Dec 2019 10:40:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=GkVusCWmgiPo/o2C4AmSj1satnyjcVjj5nmzg8V+5HI=; b=CwmStDRoP3r9a25oYSBQG3/70mgs0eeoYv40itm/gu17b5nFxDz5NtZY3gIHFo3BkF iPwFIGPJs8TeY2C34OdK/9QaiFmXA6VzaMKIUmKXvxr89KfaQ9ed0x3HRrZjNiSDBsnT rK1freKWyEV59FDdWQyl19vhnK5Fe0zS6oLWS4tvDX4C50AEui1kDVaCBIs5KqtfQ7V7 tDDQU+BpDQQuw+0xLsLJXrID8xcgTPsoc8tGMp2oT42WcubS+M1EKRAxr49Rwc8LuyyJ r2h/QagQNIUV1oO82SG4mCB9Om0IFu0DvHoqjpSeboB5d7Wlf41ULJYAc531SaTWZU4k W5/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=GkVusCWmgiPo/o2C4AmSj1satnyjcVjj5nmzg8V+5HI=; b=tI1L/7N/TyZdmOsrgCgRJ/m8Wk0K9yLmZ1yKSgOAWt1H64/+L9+B/tQJbzKuJIR76B 53ZECztQhech2TJLiWWJxrVnj1ofc0u0DXU3QZVGiGVjyht76LbdPYrY/lAMhWrJ4uMr QIn52jiFSEl1lpPH0SW5hK+q6hNmA0v02et6AD6+IiY9mfiskKQWZ40mD/hkX0aOj80+ 3KDEqic3Q159GNAN7Liiu6OSujBOMtm2HsyB5kqE4PD1L1mZ2n6vL5LETViOqBJyvQ1F lmpCtd9rV2rDBImIycY0aZCNvlj/T0L6iqTK804ZjjfzZLceWwS8ojBdCNpWCHftZ+a1 GcPg== X-Gm-Message-State: APjAAAXPKOG6H94jWmtB6QrUjELdSGWlxRaHnMRr1PcpZw0O3rBjwISC T0JyJYS/LlGf2Cgb8wzAQPJHiZA/ZaY/sYmx1UQgWkuoBlQ= X-Google-Smtp-Source: APXvYqyrJXzb4eWQCVrLDNGwJwKOvk2TjrLjbv9mLshXZmj3EfG5zC5LRw94Ok+3s9CUaGSsC+lLWgFdhqoJgTVBD3Y= X-Received: by 2002:ae9:e649:: with SMTP id x9mr9376399qkl.405.1575571235331; Thu, 05 Dec 2019 10:40:35 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: jan h Date: Thu, 5 Dec 2019 18:40:21 +0000 Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::72a X-Spam-Score: -0.8 (/) X-Mailman-Approved-At: Thu, 05 Dec 2019 15:43:58 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.8 (-) On another machine with grep 3.1 this does not appear to be the case, so, regression? Kontakt jan h () kirjutas kuup=C3=A4eval N, 5. detsember 2019 kell 18:30: > > grep 3.3 > > I get a few weird symbols (seems valid utf-8), along with normal > numbers with the following simple snippet (.UTF-8 and .utf8 result in > same, even .UtF---8 is the same): > LC_ALL=3Den_US.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "= \n" > wc -c counts 1047 and and 1033 and 1036 etc, so they're multi-byte charac= ters > meanwhile, with LC_ALL being C.UTF-8 this is not the case, > LC_ALL=3DC.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "\n"|= wc -c > consistently results in 1024 characters/bytes, as it's supposed to be... > it's not just en_US, it seems ANY utf-8 locale, other than C results > in this bug, whereas non-utf8 versions are fine, bare en_US doesn't > show this bug, nor does en_US.iso88591... > > worthy of note is that [[:digit:]] works correctly, while [0-9] does > not (and 1-9 is same bug as 0-9, if you were wondering), setting -E > doesn't change anything either... From unknown Tue Aug 19 23:11:25 2025 X-Loop: help-debbugs@gnu.org Subject: bug#38503: Locale can cause incorrect number parsing in binary files Resent-From: jan h Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Thu, 05 Dec 2019 20:44:03 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 38503 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: 38503@debbugs.gnu.org X-Debbugs-Original-To: bug-grep@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.157557864022398 (code B ref -1); Thu, 05 Dec 2019 20:44:03 +0000 Received: (at submit) by debbugs.gnu.org; 5 Dec 2019 20:44:00 +0000 Received: from localhost ([127.0.0.1]:45332 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icxyh-0005p0-NI for submit@debbugs.gnu.org; Thu, 05 Dec 2019 15:44:00 -0500 Received: from lists.gnu.org ([209.51.188.17]:42900) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icwHb-0000ZE-4W for submit@debbugs.gnu.org; Thu, 05 Dec 2019 13:55:23 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:34649) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1icwHZ-0004KX-Nc for bug-grep@gnu.org; Thu, 05 Dec 2019 13:55:22 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_40,FREEMAIL_FROM autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1icwHX-0003kb-Eo for bug-grep@gnu.org; Thu, 05 Dec 2019 13:55:21 -0500 Received: from mail-qv1-xf31.google.com ([2607:f8b0:4864:20::f31]:42464) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1icwHV-0003j5-J0 for bug-grep@gnu.org; Thu, 05 Dec 2019 13:55:19 -0500 Received: by mail-qv1-xf31.google.com with SMTP id q19so1690804qvy.9 for ; Thu, 05 Dec 2019 10:55:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=1rPwBJbJAJofMxUoBx7pvlJ4YPpAORGtqRFS7b6zz3I=; b=Q4b+MO1fuTAteSs00aBtfF2AN0nhgzONADoGf9W4SB6beU+cjZ372LnLeBAAFl2cmR gp1cT2v9Bic/uhljuWdnF8jeWcUMirhAwC6QfqII3liEDaYVX8mpG5K1YrrO3cGTb6jp hQpNXPTDB0Oa6Dlc4bdh/glxZ3AQIWRPnZyB79Guq/28uASD65bPEaiJa+p3OuXe6vK0 lnb6k/rpGbiMo4v4rqkORY4/Z6uy7Dl1TNGlne3pcX+v6q0jU2WjSMw4MdaeW9wtqhqN sVEUo6C61qg+AAKDz+LGLcARlhpbPMFSC5iEqOKQCwPRAoWDJiN5EB65gv5U9szKfHUr 0HoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=1rPwBJbJAJofMxUoBx7pvlJ4YPpAORGtqRFS7b6zz3I=; b=WIqvB95i7zz/gAOahs6brToZOLT+7FPaowwrJE4cgHrO8cSyeGt10uKbigS8aoT4Gp yPHvFDAecsoxu13bmndOkIa1t48JiKkRg4pQXC9cf6OYUaNXhTyK+4KhOxX98/FER97f cs2TPr+ZBDsTvYG7rtLvZe3HwuMb5bHmaV6FNY7PFz3jEviKT6Uvx/Gouwwj6T/KtS+s k90FXpwikPndkp+0sCAN8p2q1/+mrRJJrHjb9Y0RRwOWEHLVCLzAt2SV37Ic40H82zsT cF6BZvA2f+mBVANEeoX3xWT84v8W1WcPBNSmyeWlegLP8yjXAtH+ktuEapfqdAljDIdu L+YA== X-Gm-Message-State: APjAAAVRdWimKOzGOAL35sCY6/590CDrUQPxoTuLja9WgWvZk9Wk01P+ ZDKfxsCf9E/7cR4/vMfGSFew8zlcZDqpt8EIUvknJNoBLcU= X-Google-Smtp-Source: APXvYqwBTaItam1vJXxDivEiU3cp3+io78x7so0x4mj8oBGpBSJD31g1BUg8Qu6RkihHnvZIoIkhT3DUy36WsvvqdFM= X-Received: by 2002:ad4:4e6a:: with SMTP id ec10mr9047535qvb.160.1575572116027; Thu, 05 Dec 2019 10:55:16 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: jan h Date: Thu, 5 Dec 2019 18:55:01 +0000 Message-ID: Content-Type: text/plain; charset="UTF-8" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:4864:20::f31 X-Spam-Score: -1.3 (-) X-Mailman-Approved-At: Thu, 05 Dec 2019 15:43:58 -0500 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) compiling from scratch resulted in a normal, working version apparently Arch's package was somehow badly made? From unknown Tue Aug 19 23:11:25 2025 X-Loop: help-debbugs@gnu.org Subject: bug#38503: Locale can cause incorrect number parsing in binary files Resent-From: Eric Blake Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Thu, 05 Dec 2019 20:51:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 38503 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: jan h , 38503@debbugs.gnu.org Received: via spool by 38503-submit@debbugs.gnu.org id=B38503.157557901930550 (code B ref 38503); Thu, 05 Dec 2019 20:51:01 +0000 Received: (at 38503) by debbugs.gnu.org; 5 Dec 2019 20:50:19 +0000 Received: from localhost ([127.0.0.1]:45356 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icy4p-0007wf-Dg for submit@debbugs.gnu.org; Thu, 05 Dec 2019 15:50:19 -0500 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:46468 helo=us-smtp-1.mimecast.com) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icy4m-0007wX-LB for 38503@debbugs.gnu.org; Thu, 05 Dec 2019 15:50:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1575579016; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g4G5cmCXo7mD0aTSU97AnX5cIzXCyyL6oDml5EQZIwI=; b=DWjUs9I+7tYInoBbfjSk0CuWrhFk1vxGi7m20CxFV52crKgDYnVVe9O9BSt7WbzUiRqbuN 732Vgw41GS9jLMPk4P6pMI2QHVoNCIgtcTZiKwJPYiOnFk7mTpdsr+3zjzAFHpJeJO0Cyg pFqwQ8FbYQtCpgb5dv4hZwLF0Gr/FdQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-170-b8SnVRNaM7-1DvQqKN8apQ-1; Thu, 05 Dec 2019 15:50:14 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B5B7518557C8; Thu, 5 Dec 2019 20:50:13 +0000 (UTC) Received: from [10.3.116.171] (ovpn-116-171.phx2.redhat.com [10.3.116.171]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8234A694A3; Thu, 5 Dec 2019 20:50:13 +0000 (UTC) References: From: Eric Blake Organization: Red Hat, Inc. Message-ID: <8206172d-1dfc-4509-5f21-e6a24d01830b@redhat.com> Date: Thu, 5 Dec 2019 14:50:12 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-MC-Unique: b8SnVRNaM7-1DvQqKN8apQ-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) On 12/5/19 12:55 PM, jan h wrote: > compiling from scratch resulted in a normal, working version > apparently Arch's package was somehow badly made? You also need to check whether your builds were using gnulib's regcomp replacement, or sticking with the one from glibc; and in turn which version of glibc is in use (as it was glibc 2.28 that tried to use RRI in more locales, although work is still not complete there - and the presence or absence of particular historical glibc regcomp bugs determines whether configure decides to use gnulib's version instead). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org From unknown Tue Aug 19 23:11:25 2025 X-Loop: help-debbugs@gnu.org Subject: bug#38503: Locale can cause incorrect number parsing in binary files Resent-From: Paul Eggert Original-Sender: "Debbugs-submit" Resent-CC: bug-grep@gnu.org Resent-Date: Thu, 05 Dec 2019 20:57:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 38503 X-GNU-PR-Package: grep X-GNU-PR-Keywords: notabug To: Eric Blake , jan h , 38503-done@debbugs.gnu.org Received: via spool by 38503-done@debbugs.gnu.org id=D38503.157557937331181 (code D ref 38503); Thu, 05 Dec 2019 20:57:02 +0000 Received: (at 38503-done) by debbugs.gnu.org; 5 Dec 2019 20:56:13 +0000 Received: from localhost ([127.0.0.1]:45366 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icyAX-00086q-Br for submit@debbugs.gnu.org; Thu, 05 Dec 2019 15:56:13 -0500 Received: from zimbra.cs.ucla.edu ([131.179.128.68]:34000) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1icyAV-00086c-Bc for 38503-done@debbugs.gnu.org; Thu, 05 Dec 2019 15:56:12 -0500 Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id 720841601B4; Thu, 5 Dec 2019 12:56:04 -0800 (PST) Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id lXrvs-KblNc8; Thu, 5 Dec 2019 12:56:03 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by zimbra.cs.ucla.edu (Postfix) with ESMTP id C3CAF16023B; Thu, 5 Dec 2019 12:56:03 -0800 (PST) X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu Received: from zimbra.cs.ucla.edu ([127.0.0.1]) by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 9l1WJRmNFLGl; Thu, 5 Dec 2019 12:56:03 -0800 (PST) Received: from Penguin.CS.UCLA.EDU (Penguin.CS.UCLA.EDU [131.179.64.200]) by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id A88E91601B4; Thu, 5 Dec 2019 12:56:03 -0800 (PST) References: <756269ef-ec82-f723-1bc8-b784bfbabad9@redhat.com> From: Paul Eggert Organization: UCLA Computer Science Department Message-ID: <248b9c64-5cdf-f2f2-a902-187f68f99a4e@cs.ucla.edu> Date: Thu, 5 Dec 2019 12:56:03 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) On 12/5/19 12:40 PM, Eric Blake wrote: > I'm not sure of the current state of whether grep tries to use RRI on > all systems or only on systems where it relies on gnulib's regcomp > instead of libc. As I recall, grep doesn't make any special effort to use RRI. That is, if the underlying library uses RRI, then grep does so as well; otherwise it doesn't.