From unknown Sat Aug 09 14:05:02 2025 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Mailer: MIME-tools 5.509 (Entity 5.509) Content-Type: text/plain; charset=utf-8 From: bug#48114 <48114@debbugs.gnu.org> To: bug#48114 <48114@debbugs.gnu.org> Subject: Status: Disarchive occasionally fails tests Reply-To: bug#48114 <48114@debbugs.gnu.org> Date: Sat, 09 Aug 2025 21:05:02 +0000 retitle 48114 Disarchive occasionally fails tests reassign 48114 guix submitter 48114 Ludovic Court=C3=A8s severity 48114 normal thanks From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 30 06:01:28 2021 Received: (at submit) by debbugs.gnu.org; 30 Apr 2021 10:01:28 +0000 Received: from localhost ([127.0.0.1]:57534 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lcPxg-0007iY-0A for submit@debbugs.gnu.org; Fri, 30 Apr 2021 06:01:28 -0400 Received: from lists.gnu.org ([209.51.188.17]:33662) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lcPxb-0007iN-F4 for submit@debbugs.gnu.org; Fri, 30 Apr 2021 06:01:26 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:55452) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lcPxZ-0002M5-I5 for bug-guix@gnu.org; Fri, 30 Apr 2021 06:01:23 -0400 Received: from mail3-relais-sop.national.inria.fr ([192.134.164.104]:12103) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lcPxX-0006Rd-4Q for bug-guix@gnu.org; Fri, 30 Apr 2021 06:01:21 -0400 IronPort-HdrOrdr: =?us-ascii?q?A9a23=3AMK6hjapsqQZAkjcdSXmoqtYaV5trL9V00zAX?= =?us-ascii?q?/kB9WHVpW+afkN2jm+le6AT9jywfVGpltdeLPqSBRn20z+8M3aA6O7C+UA76/F?= =?us-ascii?q?a5NY0K1/qa/xTMEzDzn9QtsJtIXLN5DLTLZmRSqebfzE2GH807wN+BmZrY49v2?= =?us-ascii?q?63t2VwllZ+VBwm5Ce2KmO3Z7TgVHGpY1faD0jvZvnSaqengcc62Aa0UtYu6rnb?= =?us-ascii?q?f2va79bQVDLxAq7xTmt0LO1JfKVzWV1RwDXSkK671KywT4uj28wJ/LiYDY9jbs?= =?us-ascii?q?k1b81tB4mNzuxsBbH8yKl6EuW1bRozftXapZH5+PrFkOwN2H2RISvuCJgRsxe+?= =?us-ascii?q?RfgkmxQkiF5SDgxA/kzCsv7Xjr0k/drHP/raXCKg4SOo58vqcxSHTk13Y=3D?= X-IronPort-AV: E=Sophos;i="5.82,262,1613430000"; d="scan'208";a="380123782" Received: from 91-160-117-201.subs.proxad.net (HELO ribbon) ([91.160.117.201]) by mail3-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 30 Apr 2021 12:00:37 +0200 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Subject: Disarchive occasionally fails tests X-Debbugs-Cc: Timothy Sample X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 11 =?utf-8?Q?Flor=C3=A9al?= an 229 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Fri, 30 Apr 2021 12:00:36 +0200 Message-ID: <87v984gkhn.fsf@inria.fr> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=192.134.164.104; envelope-from=ludovic.courtes@inria.fr; helo=mail3-relais-sop.national.inria.fr X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.3 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.3 (--) Hi Timothy, Disarchive 0.2.0 occasionally fails two tests: FAIL: tests/kinds/octal.scm - [prop] Writing is reversible FAIL: tests/kinds/octal.scm - [prop] Serializing is reversible (Thanks, Quickcheck! :-)) I added =E2=80=98pk=E2=80=99 calls like so: --8<---------------cut here---------------start------------->8--- (test-assert "[prop] Writing is reversible" (quickcheck (property ((octal $octal)) (test-when (valid-octal? octal) (begin (equal? (pk 'oct octal) (pk 'decode (decode-octal (encode-octal oc= tal))))))))) (test-assert "[prop] Serializing is reversible" (quickcheck (property ((octal $octal)) (test-when (valid-octal? octal) (equal? (pk 'OCT octal) (pk 'DECODE (serdeser -octal- octal))))))) --8<---------------cut here---------------end--------------->8--- and got this output: --8<---------------cut here---------------start------------->8--- ;;; (oct #< value: 0 source: #< value: "\U= 0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #v= u8(172 156 23 48 25 29 159 226 210)>>) ;;; (decode #< value: 0 source: #< value: = "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer:= #vu8(172 156 23 48 25 29 159 226 210)>>) actual-value: #f actual-error: + (out-of-range + #f + "Value out of range ~S to ~S: ~S" + (8 9 10) + (10)) result: FAIL [=E2=80=A6] ;;; (OCT #< value: 0 source: #< value: "\U= 0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: #v= u8(172 156 23 48 25 29 159 226 210)>>) ;;; (DECODE #< value: 0 source: #< value: = "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer:= #vu8(172 156 23 48 25 29 159 226 210)>>) actual-value: #f actual-error: + (out-of-range + #f + "Value out of range ~S to ~S: ~S" + (8 9 10) + (10)) result: FAIL --8<---------------cut here---------------end--------------->8--- I=E2=80=99m not sure where the exception comes from though. Thoughts? Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Fri Apr 30 15:50:02 2021 Received: (at 48114) by debbugs.gnu.org; 30 Apr 2021 19:50:03 +0000 Received: from localhost ([127.0.0.1]:59875 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lcZ9G-0000Xj-6Q for submit@debbugs.gnu.org; Fri, 30 Apr 2021 15:50:02 -0400 Received: from out5-smtp.messagingengine.com ([66.111.4.29]:59523) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lcZ9D-0000XK-Nb for 48114@debbugs.gnu.org; Fri, 30 Apr 2021 15:50:00 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id 4D14D5C00B5; Fri, 30 Apr 2021 15:49:54 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute4.internal (MEProxy); Fri, 30 Apr 2021 15:49:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; bh=Vb20ZdhziWEIpLGYCsBqFNgF1gXqzRnV3N0+ifg4q Do=; b=f+lIXdTub/gI5YjCxmszmAgjlveFTHY+lRtd5Zl0Q3Ke1D05GHOSSbPMu e+loDLc+DFzGPRdO50s5uPNI8uF5CsEnbRbBltrkF6JRpqVtEpe4fno+kg2LPW1w +x8/8PfbuCTQuhBBYECjewe609HqLLMlN8n3MaAnMlPT9YAqwLU/D6CfWf7OLVXc FEsdTXluA7OvgcBpDy06P/jjJJFE1TSk+BMbMiaiK/Swky7kJO/cmawXR9TIG+qn MWGplOxplncou5FuVbZ9w+wqIr9ivnPpruFGKkKqTX1WBpAbkry/myeHYW10FssI v/bkRMm0sF8rHil26ua6Tf8Yu17/A== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvddviedgudegfecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufhffjgfkfgggtgfgsehtqhertddtreejnecuhfhrohhmpefvihhm ohhthhihucfurghmphhlvgcuoehsrghmphhlvghtsehnghihrhhordgtohhmqeenucggtf frrghtthgvrhhnpeeiudeuhfeggeelleevheegudfguefhieekffdtveeilefglefhvddt gfeiheetgfenucfkphepjeegrdduudeirddukeeirdeggeenucevlhhushhtvghrufhiii gvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehsrghmphhlvghtsehnghihrhhordgt ohhm X-ME-Proxy: Received: from mrblack (74-116-186-44.qc.dsl.ebox.net [74.116.186.44]) by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 30 Apr 2021 15:49:53 -0400 (EDT) From: Timothy Sample To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#48114: Disarchive occasionally fails tests References: <87v984gkhn.fsf@inria.fr> Date: Fri, 30 Apr 2021 15:49:52 -0400 In-Reply-To: <87v984gkhn.fsf@inria.fr> ("Ludovic =?utf-8?Q?Court=C3=A8s=22?= =?utf-8?Q?'s?= message of "Fri, 30 Apr 2021 12:00:36 +0200") Message-ID: <87pmybeen3.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 48114 Cc: 48114@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Hey, Ludovic Court=C3=A8s writes: > Disarchive 0.2.0 occasionally fails two tests: > > FAIL: tests/kinds/octal.scm - [prop] Writing is reversible > FAIL: tests/kinds/octal.scm - [prop] Serializing is reversible These two tests have a bit of a problem. They occasionally fail by =E2=80=9Cgiving up=E2=80=9D, which is when too many test cases are discarde= d rather than used. (This happens because you might write a generator for a superset of the values you=E2=80=99re interested in, and then filter out some values= with =E2=80=9Ctest-when=E2=80=9D.) I don=E2=80=99t think this is happening here= , though. You would see something like =E2=80=9CGave up! Passed only 0 ests [sic].=E2=80=9D > I added =E2=80=98pk=E2=80=99 calls like so: > > (test-assert "[prop] Writing is reversible" > (quickcheck > (property ((octal $octal)) > (test-when (valid-octal? octal) > (begin > (equal? (pk 'oct octal) (pk 'decode (decode-octal (encode-octal = octal))))))))) > > (test-assert "[prop] Serializing is reversible" > (quickcheck > (property ((octal $octal)) > (test-when (valid-octal? octal) > (equal? (pk 'OCT octal) (pk 'DECODE (serdeser -octal- octal))))))) > > > and got this output: > > ;;; (oct #< value: 0 source: #< value: "= \U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: = #vu8(172 156 23 48 25 29 159 226 210)>>) > > ;;; (decode #< value: 0 source: #< value= : "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" traile= r: #vu8(172 156 23 48 25 29 159 226 210)>>) > actual-value: #f > actual-error: > + (out-of-range > + #f > + "Value out of range ~S to ~S: ~S" > + (8 9 10) > + (10)) > result: FAIL > > [=E2=80=A6] > > ;;; (OCT #< value: 0 source: #< value: "= \U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" trailer: = #vu8(172 156 23 48 25 29 159 226 210)>>) > > ;;; (DECODE #< value: 0 source: #< value= : "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" traile= r: #vu8(172 156 23 48 25 29 159 226 210)>>) > actual-value: #f > actual-error: > + (out-of-range > + #f > + "Value out of range ~S to ~S: ~S" > + (8 9 10) > + (10)) > result: FAIL > > I=E2=80=99m not sure where the exception comes from though. I can=E2=80=99t seem to reproduce this. I=E2=80=99ve run the test suite ma= ny, many times, but I also tried: ,use (disarchive kinds octal) ,use (disarchive kinds zero-string) ,use (disarchive serialization) (define the-zero-string (make-zero-string "\U0f94a4\u0912\U025627\U10e96a\u9576\u2077\u048f\U0f2f60\U0f744b" #vu8(172 156 23 48 25 29 159 226 210))) (define the-octal (make-unstructured-octal 0 the-zero-string)) (equal? the-octal (decode-octal (encode-octal the-octal))) (equal? the-octal (serdeser -octal- the-octal)) Which works fine. (Does it work for you?) However, isn=E2=80=99t it possible that these values aren=E2=80=99t the cul= prits? With the =E2=80=9Cpk=E2=80=9D calls you added, isn=E2=80=99t it printing the las= t OK value without telling us the value causing the issue? What if you run it with the following? (test-assert "[prop] Writing is reversible" (quickcheck (property ((octal $octal)) (test-when (valid-octal? octal) (false-if-exception ; <-- changed! (equal? octal (decode-octal (encode-octal octal)))))))) This way, Guile-QuickCheck should print the offending value and the seed used for the tests, which could be helpful for reproducing. (The fact that it doesn=E2=80=99t handle exceptions well is a known bug!) -- Tim From debbugs-submit-bounces@debbugs.gnu.org Sun May 02 15:57:57 2021 Received: (at 48114) by debbugs.gnu.org; 2 May 2021 19:57:57 +0000 Received: from localhost ([127.0.0.1]:44427 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldIE0-0000AI-Mi for submit@debbugs.gnu.org; Sun, 02 May 2021 15:57:56 -0400 Received: from eggs.gnu.org ([209.51.188.92]:58220) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldIDx-0000AC-2i for 48114@debbugs.gnu.org; Sun, 02 May 2021 15:57:55 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:39096) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ldIDq-0003jy-6e; Sun, 02 May 2021 15:57:47 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=47022 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ldIDo-00053f-Dj; Sun, 02 May 2021 15:57:45 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Timothy Sample Subject: Re: bug#48114: Disarchive occasionally fails tests References: <87v984gkhn.fsf@inria.fr> <87pmybeen3.fsf@ngyro.com> Date: Sun, 02 May 2021 21:57:43 +0200 In-Reply-To: <87pmybeen3.fsf@ngyro.com> (Timothy Sample's message of "Fri, 30 Apr 2021 15:49:52 -0400") Message-ID: <874kfk6h8o.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48114 Cc: 48114@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hello! Timothy Sample skribis: > I can=E2=80=99t seem to reproduce this. I=E2=80=99ve run the test suite = many, many > times, but I also tried: I can reproduce it quickly with: while make check TESTS=3Dtests/kinds/octal.scm -j5 ; do : ; done =E2=80=A6 in C locale (LC_ALL & co. all unset). > However, isn=E2=80=99t it possible that these values aren=E2=80=99t the c= ulprits? With > the =E2=80=9Cpk=E2=80=9D calls you added, isn=E2=80=99t it printing the l= ast OK value without > telling us the value causing the issue? You=E2=80=99re right, the values printed are not the culprit. The problem = comes from the generator (I had to raise the (quickcheck =E2=80=A6) form out of =E2=80=98test-assert=E2=80=99 so I could get a backtrace): --8<---------------cut here---------------start------------->8--- Backtrace: 13 (primitive-load "/data/src/disarchive/./build-aux/test-driver.= scm") In ice-9/eval.scm: 619:8 12 (_ #(#(# ((() "./tests/ki= nds/octal.scm") (# . "no") (# . #) ?)) #)) 619:8 11 (_ #(#(#(#(#(#(#(#(# ("./= tests/kinds/octal?") ?)) ?) ?) ?) ?) ?) ?)) In ice-9/boot-9.scm: 142:2 10 (dynamic-wind _ _ #) In unknown file: 9 (primitive-load "./tests/kinds/octal.scm") In quickcheck.scm: 118:6 8 (check #< seed: 321557891 stop?: # ?) 98:12 7 (check-results _ #< names: (octal) gen/arbs: (#< gen: #< proc: #) In quickcheck/generator.scm: 65:2 6 (_ 7 #< start: #(1907167801 2749187034 1190323419 1= 039883844 766725436 3567744198) s1: #(29?>) 65:2 5 (_ 7 #< start: #(1907167801 2749187034 1190323419 1= 039883844 766725436 3567744198) s1: #(29?>) 78:17 4 (_ 7 #< start: #(1907167801 2749187034 1190323419 1= 039883844 766725436 3567744198) s1: #(28?>) 105:22 3 (_ _) In tests/kinds.scm: 84:22 2 (fix-unstructured-octal-value #< value: 7 = source: #< value: "\U0f99aa?>) 86:47 1 (_ _) In unknown file: 0 (substring "\U0f99aa?\U0ff7c1\U0fb97a\U0ff933?\U0fe7a1" 6 8) ERROR: In procedure substring: Value out of range 6 to 7: 8 --8<---------------cut here---------------end--------------->8--- Note that this is in C locale, which may mean that =E2=80=98regexp-exec=E2= =80=99, which passes strings to libc, gets offsets wrong somehow (see =E2=80=98fixup_multibyte_match=E2=80=99 in libguile), though I couldn=E2=80= =99t reproduce it with the string above. Anyway, =E2=80=98guix build disarchive=E2=80=99 builds in en_US.utf8 locale= , so the thing above is probably a wrong lead. If I switch to en_US.utf8, I occasionally get the following error instead: --8<---------------cut here---------------start------------->8--- test-name: [prop] Serializing is reversible location: tests/kinds/octal.scm:154 source: + (test-assert + "[prop] Serializing is reversible" + (quickcheck + (property + ((octal $octal)) + (test-when + (valid-octal? octal) + (equal? + (pk 'OCT octal) + (pk 'DECODE (serdeser -octal- octal))))))) ;;; (OCT #< value: 0 source: #< value: "" = trailer: "">>) ;;; (DECODE #< value: 0 source: #< value: = "" trailer: "">>) Gave up! Passed only 1 est. actual-value: #f result: FAIL --8<---------------cut here---------------end--------------->8--- This is more in line with what you described. Any ideas on how to address that? Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Sun May 02 22:24:16 2021 Received: (at 48114) by debbugs.gnu.org; 3 May 2021 02:24:16 +0000 Received: from localhost ([127.0.0.1]:45943 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldOFs-0005xd-6I for submit@debbugs.gnu.org; Sun, 02 May 2021 22:24:16 -0400 Received: from out4-smtp.messagingengine.com ([66.111.4.28]:55685) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldOFn-0005xV-8R for 48114@debbugs.gnu.org; Sun, 02 May 2021 22:24:14 -0400 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 320595C0152; Sun, 2 May 2021 22:24:06 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Sun, 02 May 2021 22:24:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; bh=UUZbMXS2a3SQhsQxcePMTBUXW2KWfWHN7UQAGO7A6 Fs=; b=wX/opGJ7xb2QgVSxOL3Fs2Bp/LoJkljTS74jA9JpJJe5cHEzQH/Qe3ZKY z8h2fwknHfWmnVp0cNEyDFHq+gYG/dF+MOto1HZMav3GboK4EarP5Nz8aMxX+7PU kg6X6JXiwrTlk6+e7lIkGrXHX4Yl2oDKLLySvFckj7oQpQUdV3J3NGTUtCCwvIiv 8zBM3oSLkaws0ciCEx6TIRd1r2I4YJLmWgfApwLUm8ykmUyG3ejyxjAnWvxc6YW4 6WuYXELV9jCAqhihf7XK2tCG9OVU9KbEQ/17vb9eRshoDm93NB1L5GuI3rkw5nh/ bZUkJ96HBkJMz1YI57dJAwwCAB3Fg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvdefvddgudefudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufhffjgfkfgggtgfgsehtqhertddtreejnecuhfhrohhmpefvihhm ohhthhihucfurghmphhlvgcuoehsrghmphhlvghtsehnghihrhhordgtohhmqeenucggtf frrghtthgvrhhnpeeiudeuhfeggeelleevheegudfguefhieekffdtveeilefglefhvddt gfeiheetgfenucfkphepjeegrdduudeirddukeeirdeggeenucevlhhushhtvghrufhiii gvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehsrghmphhlvghtsehnghihrhhordgt ohhm X-ME-Proxy: Received: from mrblack (74-116-186-44.qc.dsl.ebox.net [74.116.186.44]) by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 2 May 2021 22:24:05 -0400 (EDT) From: Timothy Sample To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#48114: Disarchive occasionally fails tests References: <87v984gkhn.fsf@inria.fr> <87pmybeen3.fsf@ngyro.com> <874kfk6h8o.fsf@gnu.org> Date: Sun, 02 May 2021 22:24:04 -0400 In-Reply-To: <874kfk6h8o.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Sun, 02 May 2021 21:57:43 +0200") Message-ID: <87a6pceerf.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 48114 Cc: 48114@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Hi, Ludovic Court=C3=A8s writes: [...] > ERROR: In procedure substring: > Value out of range 6 to 7: 8 > > Note that this is in C locale, which may mean that =E2=80=98regexp-exec= =E2=80=99, which > passes strings to libc, gets offsets wrong somehow (see > =E2=80=98fixup_multibyte_match=E2=80=99 in libguile), though I couldn=E2= =80=99t reproduce it > with the string above. I=E2=80=99m still looking into this, but I wanted to quickly post this reproducer for the Guile bug: (use-modules (ice-9 regex)) (define str "\U101514\U103ab0\U0f6e6e\U02e278\U01d9eb\U10b996\U1089b5\u= ea15\U0fa074\U101e41\U02e330\u0177\u2492") (match:substring (string-match "[0-8]+" str)) This triggers the out-of-range error when run with =E2=80=9CLC_ALL=3DC=E2= =80=9D. -- Tim From debbugs-submit-bounces@debbugs.gnu.org Mon May 03 00:02:21 2021 Received: (at 48114) by debbugs.gnu.org; 3 May 2021 04:02:21 +0000 Received: from localhost ([127.0.0.1]:45991 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldPmm-0006ra-PS for submit@debbugs.gnu.org; Mon, 03 May 2021 00:02:20 -0400 Received: from out4-smtp.messagingengine.com ([66.111.4.28]:49517) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldPmi-0006rU-VH for 48114@debbugs.gnu.org; Mon, 03 May 2021 00:02:19 -0400 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 7D99E5C00B3; Mon, 3 May 2021 00:02:11 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute1.internal (MEProxy); Mon, 03 May 2021 00:02:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; bh=ge1IckHNqeMpTFfHUYsnfoyHsqLL+btOedgAWituY 6o=; b=Co6NZlX3gDmwvXfntXC0aWzdi6enAisWzCyht+wDGV9J5rekMcUlRHYX+ I+MeXnDGdwlrJiEDmU+HoSkz1X43hqPCgF4vpXAs8zhbDYRVD3U2hFHr5N+nI2Th rDGQuXvXVdi1LXrZJgnBDO4ljojvH8eQJWAd4WgU30eqoG12hX2YYH5PKNPD2lx7 zby3wwoFAYMaab2gh+AGVXdRVTvDHrggCVQ+OTkgrPJhD2518qMqadMtMZ2Zje3C u5h4pEZC4AZj6C48mhzXA9F5bPz7xc4OAKlCufUrht0suQJTbcTb7AsFXb0MZvhM 5cVFq48DFnKE8Ht2oJxhrCuYuxy4w== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvdefvddgudehtdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufhffjgfkfgggtgfgsehtqhertddtreejnecuhfhrohhmpefvihhm ohhthhihucfurghmphhlvgcuoehsrghmphhlvghtsehnghihrhhordgtohhmqeenucggtf frrghtthgvrhhnpeeiudeuhfeggeelleevheegudfguefhieekffdtveeilefglefhvddt gfeiheetgfenucfkphepjeegrdduudeirddukeeirdeggeenucevlhhushhtvghrufhiii gvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehsrghmphhlvghtsehnghihrhhordgt ohhm X-ME-Proxy: Received: from mrblack (74-116-186-44.qc.dsl.ebox.net [74.116.186.44]) by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 3 May 2021 00:02:10 -0400 (EDT) From: Timothy Sample To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#48114: Disarchive occasionally fails tests References: <87v984gkhn.fsf@inria.fr> <87pmybeen3.fsf@ngyro.com> <874kfk6h8o.fsf@gnu.org> <87a6pceerf.fsf@ngyro.com> Date: Mon, 03 May 2021 00:02:09 -0400 In-Reply-To: <87a6pceerf.fsf@ngyro.com> (Timothy Sample's message of "Sun, 02 May 2021 22:24:04 -0400") Message-ID: <8735v4ea7y.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 48114 Cc: 48114@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Timothy Sample writes: > I=E2=80=99m still looking into this, but I wanted to quickly post this > reproducer for the Guile bug: > > (use-modules (ice-9 regex)) > (define str > "\U101514\U103ab0\U0f6e6e\U02e278\U01d9eb\U10b996\U1089b5\uea15\U0fa074\U= 101e41\U02e330\u0177\u2492") > (match:substring (string-match "[0-8]+" str)) > > This triggers the out-of-range error when run with =E2=80=9CLC_ALL=3DC=E2= =80=9D. It turns out that all that=E2=80=99s needed is the last code point, which is =E2=80=9CNumber Eleven Full Stop=E2=80=9D, or =E2=80=98=E2=92=92=E2=80=99. = When Guile converts this to an ASCII C string using =E2=80=98u32_conv_from_encoding=E2=80=99, it becomes =E2=80= =9C11.=E2=80=9D. The regex (=E2=80=9C[0-8]+=E2=80=9D) matches the =E2=80=9C11=E2=80=9D part with start= index 0 and end index 2. The =E2=80=98fixup_multibyte_match=E2=80=99 function does nothing (it only = matters when the locale encoding is multibyte) [1]. Guile then builds the match vector with the original string but keeps the ASCII offsets. In other words, it thinks the match substring goes from 0 to 2 in a single code point string: ,use (ice-9 regex) (string-match "11" "\u2492") =3D> #("\u2492" (0 . 2)) I=E2=80=99m not sure there=E2=80=99s any way to solve this nicely in Guile.= It would be clearer if the match vector included the string as libc matched it, but it=E2=80=99s still surprising that the match happens with a different strin= g. In Disarchive, I can rewrite the generator without regex. I=E2=80=99ll do = that and see what I can do about the =E2=80=9CGave up!=E2=80=9D issue. [1] It works on the converted-to-ASCII C string, which means that the byte offsets and code point offsets are the same. Hence, it has nothing to do. -- Tim From debbugs-submit-bounces@debbugs.gnu.org Mon May 03 02:20:13 2021 Received: (at 48114) by debbugs.gnu.org; 3 May 2021 06:20:13 +0000 Received: from localhost ([127.0.0.1]:46054 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldRwC-0008Di-Q5 for submit@debbugs.gnu.org; Mon, 03 May 2021 02:20:13 -0400 Received: from imta-37.everyone.net ([216.200.145.37]:51644 helo=imta-38.everyone.net) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldRw7-0008DW-Ha for 48114@debbugs.gnu.org; Mon, 03 May 2021 02:20:11 -0400 Received: from pps.filterd (m0004962.ppops.net [127.0.0.1]) by imta-38.everyone.net (8.16.0.43/8.16.0.43) with SMTP id 143685IG016802; Sun, 2 May 2021 23:20:05 -0700 X-Eon-Originating-Account: zKbnJClbHE4RrXfO8FGTHu71PbUggyAyyvm4jUVqHLE X-Eon-Dm: m0116952.ppops.net Received: by m0116952.mta.everyone.net (EON-AUTHRELAY2 - 53b92615) id m0116952.60622040.32699c; Sun, 2 May 2021 23:20:04 -0700 X-Eon-Sig: AQMHrIJgj5YU0Eyn4QIAAAAD,dc994c3f1869cca911fc38ede01eda29 X-Eip: mFzkUTY-jR0AG2JfaO9mrEUsZZvGvs4vHcGS_KwLFJQ Date: Mon, 3 May 2021 08:19:50 +0200 From: Bengt Richter To: Timothy Sample Subject: Re: bug#48114: Disarchive occasionally fails tests Message-ID: <20210503061950.GA26660@LionPure> References: <87v984gkhn.fsf@inria.fr> <87pmybeen3.fsf@ngyro.com> <874kfk6h8o.fsf@gnu.org> <87a6pceerf.fsf@ngyro.com> <8735v4ea7y.fsf@ngyro.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8735v4ea7y.fsf@ngyro.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Proofpoint-ORIG-GUID: DcmakGCANGBO1sTw16rCtrNEJjBQYRER X-Proofpoint-GUID: DcmakGCANGBO1sTw16rCtrNEJjBQYRER X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391, 18.0.761 definitions=2021-05-03_03:2021-04-30, 2021-05-03 signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 lowpriorityscore=0 malwarescore=0 mlxscore=0 mlxlogscore=999 phishscore=0 clxscore=1034 impostorscore=0 adultscore=0 suspectscore=0 spamscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104060000 definitions=main-2105030044 X-Spam-Score: 0.2 (/) X-Debbugs-Envelope-To: 48114 Cc: 48114@debbugs.gnu.org, Ludovic =?utf-8?Q?Court=C3=A8s?= X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Bengt Richter Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -0.8 (/) Hi Timothy, Ludo, On +2021-05-03 00:02:09 -0400, Timothy Sample wrote: > Timothy Sample writes: > > > I’m still looking into this, but I wanted to quickly post this > > reproducer for the Guile bug: > > > > (use-modules (ice-9 regex)) > > (define str > > "\U101514\U103ab0\U0f6e6e\U02e278\U01d9eb\U10b996\U1089b5\uea15\U0fa074\U101e41\U02e330\u0177\u2492") > > (match:substring (string-match "[0-8]+" str)) > > > > This triggers the out-of-range error when run with “LC_ALL=C”. > > It turns out that all that’s needed is the last code point, which is > “Number Eleven Full Stop”, or ‘⒒’. When Guile converts this to an ASCII > C string using ‘u32_conv_from_encoding’, it becomes “11.”. The regex > (“[0-8]+”) matches the “11” part with start index 0 and end index 2. > The ‘fixup_multibyte_match’ function does nothing (it only matters when > the locale encoding is multibyte) [1]. Guile then builds the match > vector with the original string but keeps the ASCII offsets. In other > words, it thinks the match substring goes from 0 to 2 in a single code > point string: > > ,use (ice-9 regex) > (string-match "11" "\u2492") > => #("\u2492" (0 . 2)) > > I’m not sure there’s any way to solve this nicely in Guile. It would be > clearer if the match vector included the string as libc matched it, but > it’s still surprising that the match happens with a different string. > > In Disarchive, I can rewrite the generator without regex. I’ll do that > and see what I can do about the “Gave up!” issue. > > [1] It works on the converted-to-ASCII C string, which means that the > byte offsets and code point offsets are the same. Hence, it has nothing > to do. > > > -- Tim > > > What happens with these? (code ppoints in decimal) 8554 _Ⅺ_ "ROMAN NUMERAL ELEVEN" 8570 _ⅺ_ "SMALL ROMAN NUMERAL ELEVEN" 9322 _⑪_ "CIRCLED NUMBER ELEVEN" 9342 _⑾_ "PARENTHESIZED NUMBER ELEVEN" 9362 _⒒_ "NUMBER ELEVEN FULL STOP" 9451 _⓫_ "NEGATIVE CIRCLED NUMBER ELEVEN" 13155 _㍣_ "IDEOGRAPHIC TELEGRAPH SYMBOL FOR HOUR ELEVEN" 13290 _㏪_ "IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY ELEVEN" I would argue that none of these should be "decoded" into ascii polyglyphs since they are atomic character glyphs. IMO It is over-eager transformation to make them into ascii polyglyphs. /Super/sub/-script placement metadata is another thing to consider -- "decode" to ascii art?? ;-) Unicode characters representing mathematical values in other languages are different. Those are subject to natural language translation with locale-dependent semantics. These might be candidates for that?: (code points in decimal) 8544 _Ⅰ_ "ROMAN NUMERAL ONE" 8545 _Ⅱ_ "ROMAN NUMERAL TWO" 8546 _Ⅲ_ "ROMAN NUMERAL THREE" 8547 _Ⅳ_ "ROMAN NUMERAL FOUR" 8548 _Ⅴ_ "ROMAN NUMERAL FIVE" 8549 _Ⅵ_ "ROMAN NUMERAL SIX" 8550 _Ⅶ_ "ROMAN NUMERAL SEVEN" 8551 _Ⅷ_ "ROMAN NUMERAL EIGHT" 8552 _Ⅸ_ "ROMAN NUMERAL NINE" 8553 _Ⅹ_ "ROMAN NUMERAL TEN" 8554 _Ⅺ_ "ROMAN NUMERAL ELEVEN" 8555 _Ⅻ_ "ROMAN NUMERAL TWELVE" 8556 _Ⅼ_ "ROMAN NUMERAL FIFTY" 8557 _Ⅽ_ "ROMAN NUMERAL ONE HUNDRED" 8558 _Ⅾ_ "ROMAN NUMERAL FIVE HUNDRED" 8559 _Ⅿ_ "ROMAN NUMERAL ONE THOUSAND" 8560 _ⅰ_ "SMALL ROMAN NUMERAL ONE" 8561 _ⅱ_ "SMALL ROMAN NUMERAL TWO" 8562 _ⅲ_ "SMALL ROMAN NUMERAL THREE" 8563 _ⅳ_ "SMALL ROMAN NUMERAL FOUR" 8564 _ⅴ_ "SMALL ROMAN NUMERAL FIVE" 8565 _ⅵ_ "SMALL ROMAN NUMERAL SIX" 8566 _ⅶ_ "SMALL ROMAN NUMERAL SEVEN" 8567 _ⅷ_ "SMALL ROMAN NUMERAL EIGHT" 8568 _ⅸ_ "SMALL ROMAN NUMERAL NINE" 8569 _ⅹ_ "SMALL ROMAN NUMERAL TEN" 8570 _ⅺ_ "SMALL ROMAN NUMERAL ELEVEN" 8571 _ⅻ_ "SMALL ROMAN NUMERAL TWELVE" 8572 _ⅼ_ "SMALL ROMAN NUMERAL FIFTY" 8573 _ⅽ_ "SMALL ROMAN NUMERAL ONE HUNDRED" 8574 _ⅾ_ "SMALL ROMAN NUMERAL FIVE HUNDRED" 8575 _ⅿ_ "SMALL ROMAN NUMERAL ONE THOUSAND" 8576 _ↀ_ "ROMAN NUMERAL ONE THOUSAND C D" 8577 _ↁ_ "ROMAN NUMERAL FIVE THOUSAND" 8578 _ↂ_ "ROMAN NUMERAL TEN THOUSAND" 8579 _Ↄ_ "ROMAN NUMERAL REVERSED ONE HUNDRED" 8581 _ↅ_ "ROMAN NUMERAL SIX LATE FORM" 8582 _ↆ_ "ROMAN NUMERAL FIFTY EARLY FORM" 8583 _ↇ_ "ROMAN NUMERAL FIFTY THOUSAND" 8584 _ↈ_ "ROMAN NUMERAL ONE HUNDRED THOUSAND" 12321 _〡_ "HANGZHOU NUMERAL ONE" 12322 _〢_ "HANGZHOU NUMERAL TWO" 12323 _〣_ "HANGZHOU NUMERAL THREE" 12324 _〤_ "HANGZHOU NUMERAL FOUR" 12325 _〥_ "HANGZHOU NUMERAL FIVE" 12326 _〦_ "HANGZHOU NUMERAL SIX" 12327 _〧_ "HANGZHOU NUMERAL SEVEN" 12328 _〨_ "HANGZHOU NUMERAL EIGHT" 12329 _〩_ "HANGZHOU NUMERAL NINE" 12344 _〸_ "HANGZHOU NUMERAL TEN" 12345 _〹_ "HANGZHOU NUMERAL TWENTY" 12346 _〺_ "HANGZHOU NUMERAL THIRTY" Just my intuitive reaction, no academic creds to back it up ;) -- Regards, Bengt Richter From debbugs-submit-bounces@debbugs.gnu.org Mon May 03 16:04:08 2021 Received: (at 48114) by debbugs.gnu.org; 3 May 2021 20:04:08 +0000 Received: from localhost ([127.0.0.1]:48426 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldenY-0004e5-JK for submit@debbugs.gnu.org; Mon, 03 May 2021 16:04:08 -0400 Received: from eggs.gnu.org ([209.51.188.92]:47064) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ldenW-0004di-LG for 48114@debbugs.gnu.org; Mon, 03 May 2021 16:04:06 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:33242) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ldenR-0000P2-Ac; Mon, 03 May 2021 16:04:01 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=54470 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ldenQ-0007UM-KF; Mon, 03 May 2021 16:04:00 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Timothy Sample Subject: Re: bug#48114: Disarchive occasionally fails tests References: <87v984gkhn.fsf@inria.fr> <87pmybeen3.fsf@ngyro.com> <874kfk6h8o.fsf@gnu.org> <87a6pceerf.fsf@ngyro.com> <8735v4ea7y.fsf@ngyro.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 14 =?utf-8?Q?Flor=C3=A9al?= an 229 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 03 May 2021 22:03:59 +0200 In-Reply-To: <8735v4ea7y.fsf@ngyro.com> (Timothy Sample's message of "Mon, 03 May 2021 00:02:09 -0400") Message-ID: <874kfjwpn4.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48114 Cc: 48114@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi! Timothy Sample skribis: > Timothy Sample writes: > >> I=E2=80=99m still looking into this, but I wanted to quickly post this >> reproducer for the Guile bug: >> >> (use-modules (ice-9 regex)) >> (define str >> "\U101514\U103ab0\U0f6e6e\U02e278\U01d9eb\U10b996\U1089b5\uea15\U0fa074\= U101e41\U02e330\u0177\u2492") >> (match:substring (string-match "[0-8]+" str)) >> >> This triggers the out-of-range error when run with =E2=80=9CLC_ALL=3DC= =E2=80=9D. > > It turns out that all that=E2=80=99s needed is the last code point, which= is > =E2=80=9CNumber Eleven Full Stop=E2=80=9D, or =E2=80=98=E2=92=92=E2=80=99. Whaaat? =E2=80=9CNumber Eleven Full Stop=E2=80=9D, I wonder how the Unicode= folks came up with that one. =E3=8A=B7 =3D =E3=89=9A + =E2=92=93 > When Guile converts this to an ASCII C string using > =E2=80=98u32_conv_from_encoding=E2=80=99, it becomes =E2=80=9C11.=E2=80= =9D. The regex (=E2=80=9C[0-8]+=E2=80=9D) > matches the =E2=80=9C11=E2=80=9D part with start index 0 and end index 2.= The > =E2=80=98fixup_multibyte_match=E2=80=99 function does nothing (it only ma= tters when > the locale encoding is multibyte) [1]. Guile then builds the match > vector with the original string but keeps the ASCII offsets. In other > words, it thinks the match substring goes from 0 to 2 in a single code > point string: > > ,use (ice-9 regex) > (string-match "11" "\u2492") > =3D> #("\u2492" (0 . 2)) > > I=E2=80=99m not sure there=E2=80=99s any way to solve this nicely in Guil= e. It would be > clearer if the match vector included the string as libc matched it, but > it=E2=80=99s still surprising that the match happens with a different str= ing. Yeah, I don=E2=80=99t think there=E2=80=99s much we can do. It=E2=80=99s a= lot of fun anyway. Thanks for investigating! Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Thu May 13 17:04:41 2021 Received: (at 48114) by debbugs.gnu.org; 13 May 2021 21:04:41 +0000 Received: from localhost ([127.0.0.1]:44443 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lhIVd-0008Vr-2T for submit@debbugs.gnu.org; Thu, 13 May 2021 17:04:41 -0400 Received: from eggs.gnu.org ([209.51.188.92]:43666) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lhIVb-0008Vc-EK for 48114@debbugs.gnu.org; Thu, 13 May 2021 17:04:39 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:58464) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lhIVV-0003JS-P8; Thu, 13 May 2021 17:04:34 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=43942 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lhIVQ-0005Op-F1; Thu, 13 May 2021 17:04:28 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Timothy Sample Subject: Re: bug#48114: Disarchive occasionally fails tests References: <87v984gkhn.fsf@inria.fr> <87pmybeen3.fsf@ngyro.com> <874kfk6h8o.fsf@gnu.org> <87a6pceerf.fsf@ngyro.com> <8735v4ea7y.fsf@ngyro.com> Date: Thu, 13 May 2021 23:04:26 +0200 In-Reply-To: <8735v4ea7y.fsf@ngyro.com> (Timothy Sample's message of "Mon, 03 May 2021 00:02:09 -0400") Message-ID: <87lf8iqrad.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48114 Cc: 48114@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi! Timothy Sample skribis: > In Disarchive, I can rewrite the generator without regex. I=E2=80=99ll d= o that > and see what I can do about the =E2=80=9CGave up!=E2=80=9D issue. Did you have a chance to look into it? I=E2=80=99d like to make =E2=80=98guix=E2=80=99 and =E2=80=98guix-daemon=E2= =80=99 depend on Disarchive, but not before we can be sure its test suite passes. Thanks, Ludo=E2=80=99. From debbugs-submit-bounces@debbugs.gnu.org Thu May 13 23:06:17 2021 Received: (at 48114-done) by debbugs.gnu.org; 14 May 2021 03:06:17 +0000 Received: from localhost ([127.0.0.1]:44634 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lhO9Z-0000cE-FF for submit@debbugs.gnu.org; Thu, 13 May 2021 23:06:17 -0400 Received: from wout1-smtp.messagingengine.com ([64.147.123.24]:49823) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lhO9W-0000bx-Pj for 48114-done@debbugs.gnu.org; Thu, 13 May 2021 23:06:16 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id 2482815B6; Thu, 13 May 2021 23:06:09 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Thu, 13 May 2021 23:06:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; bh=GgT2Z4djvtUV92DExp/kD34mosHal1+6OcOHFJVfP cU=; b=vCsJ/rTN8pCkqAd/Gt7mG+hmtL/woxN2YTcBNbUno21E7LCiQ8uR8H2xa GqnoOuUaLr7k0C/vNi1oXH3UadnOKOtdZHYlUNtpJDINM8Alw7XwZik0kK3pFUMw H5Ul3jhVeS8sYu8hENVVhxK+IdYONwquIzw7ufROAuXrXk4T4hfqNwth7TcbOQ3F Lzd6X9dwOla3frYf/knDIBrpJP4jwkONgVnZaIQzl/61J/14PA1smiiede8g8xOH +yjnmkThoAOT6wZw2SQpbHLfGUvSy45rQRgruhGVi0xuR7qistAXyoz3Gq4335lx rba4I+tKPJ0ovnpUwTi9BQiwCKbRQ== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrvdehhedgieehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufhfffgjkfgfgggtgfesthhqredttderjeenucfhrhhomhepvfhimhho thhhhicuufgrmhhplhgvuceoshgrmhhplhgvthesnhhghihrohdrtghomheqnecuggftrf grthhtvghrnhepieduuefhgeegleelveehgedugfeuhfeikefftdevieelgfelhfdvtdfg ieehtefgnecukfhppeejgedrudduiedrudekiedrgeegnecuvehluhhsthgvrhfuihiivg eptdenucfrrghrrghmpehmrghilhhfrhhomhepshgrmhhplhgvthesnhhghihrohdrtgho mh X-ME-Proxy: Received: from mrblack (74-116-186-44.qc.dsl.ebox.net [74.116.186.44]) by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 13 May 2021 23:06:08 -0400 (EDT) From: Timothy Sample To: Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: bug#48114: Disarchive occasionally fails tests References: <87v984gkhn.fsf@inria.fr> <87pmybeen3.fsf@ngyro.com> <874kfk6h8o.fsf@gnu.org> <87a6pceerf.fsf@ngyro.com> <8735v4ea7y.fsf@ngyro.com> <87lf8iqrad.fsf@gnu.org> Date: Thu, 13 May 2021 23:06:07 -0400 In-Reply-To: <87lf8iqrad.fsf@gnu.org> ("Ludovic =?utf-8?Q?Court=C3=A8s=22'?= =?utf-8?Q?s?= message of "Thu, 13 May 2021 23:04:26 +0200") Message-ID: <87sg2qko9s.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 48114-done Cc: 48114-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) Heyo, Ludovic Court=C3=A8s writes: > Timothy Sample skribis: > >> In Disarchive, I can rewrite the generator without regex. I=E2=80=99ll = do that >> and see what I can do about the =E2=80=9CGave up!=E2=80=9D issue. > > Did you have a chance to look into it? I just pushed b9f0e78238e6186d28d738c7c5355a56557ce84f, which updates Disarchive to 0.2.1, which has fixes for the test suite. The giving up problem has not been solved outright, but it should be practically impossible to trigger. (In fact, it probably *is* impossible to trigger given how few PRNG states there are....) > I=E2=80=99d like to make =E2=80=98guix=E2=80=99 and =E2=80=98guix-daemon= =E2=80=99 depend on Disarchive, but not > before we can be sure its test suite passes. Exciting! -- Tim From debbugs-submit-bounces@debbugs.gnu.org Fri May 14 09:51:23 2021 Received: (at 48114-done) by debbugs.gnu.org; 14 May 2021 13:51:23 +0000 Received: from localhost ([127.0.0.1]:45293 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lhYDr-0006qh-L6 for submit@debbugs.gnu.org; Fri, 14 May 2021 09:51:23 -0400 Received: from eggs.gnu.org ([209.51.188.92]:52714) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lhYDn-0006qT-89 for 48114-done@debbugs.gnu.org; Fri, 14 May 2021 09:51:21 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:50346) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lhYDh-0005Rv-Fi; Fri, 14 May 2021 09:51:13 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=47176 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lhYDh-00086Z-6w; Fri, 14 May 2021 09:51:13 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Timothy Sample Subject: Re: bug#48114: Disarchive occasionally fails tests References: <87v984gkhn.fsf@inria.fr> <87pmybeen3.fsf@ngyro.com> <874kfk6h8o.fsf@gnu.org> <87a6pceerf.fsf@ngyro.com> <8735v4ea7y.fsf@ngyro.com> <87lf8iqrad.fsf@gnu.org> <87sg2qko9s.fsf@ngyro.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 25 =?utf-8?Q?Flor=C3=A9al?= an 229 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Fri, 14 May 2021 15:51:10 +0200 In-Reply-To: <87sg2qko9s.fsf@ngyro.com> (Timothy Sample's message of "Thu, 13 May 2021 23:06:07 -0400") Message-ID: <87o8ddmnjl.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 48114-done Cc: 48114-done@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hi Timothy, Timothy Sample skribis: > Ludovic Court=C3=A8s writes: > >> Timothy Sample skribis: >> >>> In Disarchive, I can rewrite the generator without regex. I=E2=80=99ll= do that >>> and see what I can do about the =E2=80=9CGave up!=E2=80=9D issue. >> >> Did you have a chance to look into it? > > I just pushed b9f0e78238e6186d28d738c7c5355a56557ce84f, which updates > Disarchive to 0.2.1, which has fixes for the test suite. The giving up > problem has not been solved outright, but it should be practically > impossible to trigger. (In fact, it probably *is* impossible to trigger > given how few PRNG states there are....) Yay! Thanks for the quick reply! I=E2=80=99ll have =E2=80=98guix=E2=80=99 depend on Disarchive and report ba= ck. Ludo=E2=80=99. From unknown Sat Aug 09 14:05:02 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Sat, 12 Jun 2021 11:24:03 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator