From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 06 16:21:03 2021 Received: (at submit) by debbugs.gnu.org; 6 Dec 2021 21:21:03 +0000 Received: from localhost ([127.0.0.1]:35568 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1muLPy-000144-Pb for submit@debbugs.gnu.org; Mon, 06 Dec 2021 16:21:02 -0500 Received: from lists.gnu.org ([209.51.188.17]:53410) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1muLPx-00013p-R3 for submit@debbugs.gnu.org; Mon, 06 Dec 2021 16:21:02 -0500 Received: from eggs.gnu.org ([209.51.188.92]:50116) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1muLPx-0002b6-JL for bug-guix@gnu.org; Mon, 06 Dec 2021 16:21:01 -0500 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:36171) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1muLPv-0001Ru-Sv for bug-guix@gnu.org; Mon, 06 Dec 2021 16:21:01 -0500 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 87E375C0233; Mon, 6 Dec 2021 16:20:58 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Mon, 06 Dec 2021 16:20:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=famulari.name; h=date:from:to:subject:message-id:mime-version:content-type; s= mesmtp; bh=i0j0URj5r82LVUNhFfK90P6QvKS+AA4MQCr5LyiR/M4=; b=shmVI OgNOyHu0tnwpFkKCbeESUsMUVnEZX/+8Od/WLo44ZGtTROtv/RXZDRJZJC+zRd0n 02OP/uqPlZjc5i+XaQYIxorGQbtOGjuFZRwnoSy7OATDpbGX9Vaae27PlgryehdF HqiI0lMR39mhElAqrA26BDUV3pXDjJij3/tCkE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:message-id :mime-version:subject:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; bh=i0j0URj5r82LVUNhFfK90P6QvKS+A A4MQCr5LyiR/M4=; b=hUwiaYFogsArb++AQdfOjqQY0SUkSsbJUPC+lRnaaeeTY XOqb1QsL/ISso31u6XlhJ09zdPTLThIr+CkpyK5mndWkWDQSRh0v+3L7/ew8wQf4 ROQLIfEJtffrzz3dRwboAAi2L4q/Vgvc9qS8niyn/IW0FtqyCnI3mYeJTPfR5ePi ONGmoh1KuMKpDW9O1Y2Bvm10rAnzj7vUk6wsnntK5CDhWvO3/llM3aCa9JGRrkF5 guWVzFMIv0uA5V6AD+0on/F6MxZ6sVf7zv/8INkQvsy0ikG2FhK8X9Fd0dIkemEa WT9ffZPK4opVGSf1Rp/lnMQBLXiGXexkoromLC5wQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrjeefgddugeejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkgggtugesthdtredttd dtvdenucfhrhhomhepnfgvohcuhfgrmhhulhgrrhhiuceolhgvohesfhgrmhhulhgrrhhi rdhnrghmvgeqnecuggftrfgrthhtvghrnhepfeeuueetgfdtleehieeugeetieekhffhge ekheekgfdtkeekkefhtdegleekvedvnecuffhomhgrihhnpehgnhhurdhorhhgpdgsihhn ghdrtghomhdpshgvmhhruhhshhdrtghomhenucevlhhushhtvghrufhiiigvpedtnecurf grrhgrmhepmhgrihhlfhhrohhmpehlvghosehfrghmuhhlrghrihdrnhgrmhgv X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Mon, 6 Dec 2021 16:20:58 -0500 (EST) Date: Mon, 6 Dec 2021 16:20:55 -0500 From: Leo Famulari To: bug-guix@gnu.org Subject: Crawler bots are downloading substitutes Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Received-SPF: pass client-ip=66.111.4.26; envelope-from=leo@famulari.name; helo=out2-smtp.messagingengine.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) I noticed that some bots are downloading substitutes from ci.guix.gnu.org. We should add a robots.txt file to reduce this waste. Specifically, I see bots from Bing and Semrush: https://www.bing.com/bingbot.htm https://www.semrush.com/bot.html From debbugs-submit-bounces@debbugs.gnu.org Mon Dec 06 17:18:21 2021 Received: (at 52338) by debbugs.gnu.org; 6 Dec 2021 22:18:21 +0000 Received: from localhost ([127.0.0.1]:35611 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1muMJR-0002TR-BP for submit@debbugs.gnu.org; Mon, 06 Dec 2021 17:18:21 -0500 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:60679) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1muMJP-0002TE-Nw for 52338@debbugs.gnu.org; Mon, 06 Dec 2021 17:18:20 -0500 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 4FB8A5C0227; Mon, 6 Dec 2021 17:18:14 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Mon, 06 Dec 2021 17:18:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=famulari.name; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding; s=mesmtp; bh=mFU1BiXur1urRvIk4SysYUz QRlRjZevakml5r6IXBG4=; b=OndJF9pkFbMlePbxo5GYki3dXwrArxue/bvTLRB 9RO8AEVkE/BYwGvs1a6imO6BMcecsuSfA2b5g+/Hf/WXzBXQdhCrPwliNR0NI20E OQTecsXnpBaZoohHy7YV7Xyq8jPgFbI6fX15pkTcy7c9NpKOMGE6oOt5Lv4rcihc it3M= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:date:from :message-id:mime-version:subject:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=mFU1BiXur1urRvIk4 SysYUzQRlRjZevakml5r6IXBG4=; b=n8xPLa1Dqgfh/ho3M8VUgM2iVoCLRUkOU U1QzBOWMQtkDx9fIYb8ZguhELWbbBxQch3fDz59NgPPbX9TWR44ub9duPlihNRZ2 IdAwnpmP/dYahtGVZzsZwldc8DQ2i6M6qAue9pQ4z4As7l3vaYGCSG3zJCzimDjH a26/mXznRGXl7yZEtPUYeO2UfiJXWO25yMCqYeNEfEo6DF/P3UQYLdeZfTyAVqUA /W5yX8GwXY17rHh4Ci5FRMQIGfsyPzWxzwFY52ueGKJCvvAjGH9u1jLsxrAAJYuh Dia8gbMMHxjqCLAh3rntGnp/+kX/r7V52+XolYjiCgdsA5Xh4rT5g== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrjeefgdduheelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgggfestdekredtre dttdenucfhrhhomhepnfgvohcuhfgrmhhulhgrrhhiuceolhgvohesfhgrmhhulhgrrhhi rdhnrghmvgeqnecuggftrfgrthhtvghrnhephedvleejudevleethfefueevteelvdelge dutdejvdegveelkeethfelgfehveehnecuffhomhgrihhnpehguhhigihsugdrohhrghdp ghhnuhdrohhrghenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpehlvghosehfrghmuhhlrghrihdrnhgrmhgv X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA for <52338@debbugs.gnu.org>; Mon, 6 Dec 2021 17:18:14 -0500 (EST) From: Leo Famulari To: 52338@debbugs.gnu.org Subject: [maintenance] hydra: berlin: Create robots.txt. Date: Mon, 6 Dec 2021 17:18:10 -0500 Message-Id: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> X-Mailer: git-send-email 2.34.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 52338 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) I tested that `guix system build` does succeed with this change, but I would like a review on whether the resulting Nginx configuration is correct, and if this is the correct path to disallow. It generates an Nginx location block like this: ------ location /robots.txt { add_header Content-Type text/plain; return 200 "User-agent: * Disallow: /nar "; } ------ * hydra/nginx/berlin.scm (berlin-locations): Add a robots.txt Nginx location. --- hydra/nginx/berlin.scm | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/hydra/nginx/berlin.scm b/hydra/nginx/berlin.scm index 1f4b0be..3bb2129 100644 --- a/hydra/nginx/berlin.scm +++ b/hydra/nginx/berlin.scm @@ -174,7 +174,14 @@ PUBLISH-URL." (nginx-location-configuration (uri "/berlin.guixsd.org-export.pub") (body - (list "root /var/www/guix;")))))) + (list "root /var/www/guix;"))) + + (nginx-location-configuration + (uri "/robots.txt") + (body + (list + "add_header Content-Type text/plain;" + "return 200 \"User-agent: *\nDisallow: /nar/\n\";")))))) (define guix.gnu.org-redirect-locations (list -- 2.34.0 From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 09 08:27:49 2021 Received: (at 52338) by debbugs.gnu.org; 9 Dec 2021 13:27:49 +0000 Received: from localhost ([127.0.0.1]:42873 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvJSe-0002oR-QZ for submit@debbugs.gnu.org; Thu, 09 Dec 2021 08:27:48 -0500 Received: from eggs.gnu.org ([209.51.188.92]:34132) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvJSd-0002oF-9L for 52338@debbugs.gnu.org; Thu, 09 Dec 2021 08:27:47 -0500 Received: from [2001:470:142:3::e] (port=42764 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvJSX-00086l-Vi; Thu, 09 Dec 2021 08:27:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=AIpbgcpe3biXFQ7sZcsjdHlc7o2rRWDgsqdPmGA8LTE=; b=b8MWdka0M9rIWS3riZBn DwFWrwJchGU9IuCmuFeGY3PwXBcUUex+kwLZKYU0OC9sfE3we5JigRIZ07o1B2Kz9WLOm45g9OR1F O8gy6ZkPTwuY6xsQStulztUeOHxthKEun2B6yHMZPyPvv2s3sCr/36kOOHqPhXsoVpPXeiqRQOzM/ RPG+g3zxIu/KheshgjEDUjMvOjX944XDnQv2GOVfG58dcNtK2TKyyoeEhxiNSMNpST7E0uLtX+LVd egbMDSunsW2FoIsX509/SPDmqlfTE84QU4/UVVbE+dVKTWrVY/Z60QiS3vPyS7gL8D0XP7GQd34St K5ms8vWCDBUYZQ==; Received: from lfbn-lyo-1-2042-171.w90-66.abo.wanadoo.fr ([90.66.207.171]:49532 helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvJSX-0000z8-0A; Thu, 09 Dec 2021 08:27:41 -0500 From: Mathieu Othacehe To: Leo Famulari Subject: Re: bug#52338: Crawler bots are downloading substitutes References: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> Date: Thu, 09 Dec 2021 14:27:38 +0100 In-Reply-To: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> (Leo Famulari's message of "Mon, 6 Dec 2021 17:18:10 -0500") Message-ID: <87tufh6h85.fsf_-_@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 52338 Cc: 52338@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hello Leo, > + (nginx-location-configuration > + (uri "/robots.txt") > + (body > + (list > + "add_header Content-Type text/plain;" > + "return 200 \"User-agent: *\nDisallow: /nar/\n\";")))))) Nice, the bots are also accessing the Cuirass web interface, do you think it would be possible to extend this snippet to prevent it? Thanks, Mathieu From debbugs-submit-bounces@debbugs.gnu.org Thu Dec 09 11:36:01 2021 Received: (at submit) by debbugs.gnu.org; 9 Dec 2021 16:36:01 +0000 Received: from localhost ([127.0.0.1]:44652 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvMOm-0001oz-UC for submit@debbugs.gnu.org; Thu, 09 Dec 2021 11:36:01 -0500 Received: from lists.gnu.org ([209.51.188.17]:60954) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvMOl-0001oq-3J for submit@debbugs.gnu.org; Thu, 09 Dec 2021 11:35:59 -0500 Received: from eggs.gnu.org ([209.51.188.92]:60572) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvMOk-0007zj-TR for bug-guix@gnu.org; Thu, 09 Dec 2021 11:35:58 -0500 Received: from tobias.gr ([80.241.217.52]:33378) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvMOh-0006Ut-D0; Thu, 09 Dec 2021 11:35:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=2018; bh=2TFMU6E3AYyG6 7io9x9Indy/PjAf9SfY1ZFLVnMAWUg=; h=in-reply-to:date:subject:cc:to: from:references; d=tobias.gr; b=NdkB3lLLdZFvqDg4d3Bd86+gPkEZjV++e/FDyW mnT1AoCJGaOi2faKY3USyzovdTvPToudjIZgySLzTR2nGsM/4v679fW7cMamSifJR+ck/L n6nyJCcTdCWYnJAZPmsEZZI9OWCx+BahZWuTnDTQeSwPGOqKJVsA//brh/Yz1vCxCcvGGk kWVQtQTNl1qygYHtrMA453Yh7wAnvOrwoEdF/lBpolHOPmlv3OLgxKY//32pmNtEm+gBJM NoZvY4aAQFAOEDvC3J/iHGzJ7auxgZIxm+KM3+lzPlIc8Lyf0idUqlrDrlJqUQCmxMP6cU ZVSlnzaeBMM0jyMx7vGEvUfA== Received: by submission.tobias.gr (OpenSMTPD) with ESMTPSA id efe962d3 (TLSv1.3:AEAD-AES256-GCM-SHA384:256:NO); Thu, 9 Dec 2021 16:35:48 +0000 (UTC) References: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> <87tufh6h85.fsf_-_@gnu.org> From: Tobias Geerinckx-Rice To: Mathieu Othacehe Subject: Re: bug#52338: Crawler bots are downloading substitutes Date: Thu, 09 Dec 2021 16:42:24 +0100 In-reply-to: <87tufh6h85.fsf_-_@gnu.org> BIMI-Selector: v=BIMI1; s=default; Message-ID: <87sfv1ivl2.fsf@nckx> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" Received-SPF: pass client-ip=80.241.217.52; envelope-from=me@tobias.gr; helo=tobias.gr X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.6 (-) X-Debbugs-Envelope-To: submit Cc: 52338@debbugs.gnu.org, bug-guix@gnu.org, Leo Famulari X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.6 (--) --==-=-= Content-Type: multipart/mixed; boundary="=-=-=" --=-=-= Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Mathieu Othacehe =E5=86=99=E9=81=93=EF=BC=9A > Hello Leo, > >> + (nginx-location-configuration >> + (uri "/robots.txt") It's a micro-optimisation, but it can't hurt to generate =E2=80=98location= =20 =3D /robots.txt=E2=80=99 instead of =E2=80=98location /robots.txt=E2=80=99 = here. >> + (body >> + (list >> + "add_header Content-Type text/plain;" >> + "return 200 \"User-agent: *\nDisallow:=20 >> /nar/\n\";")))))) Use \r\n instead of \n, even if \n happens to work. There are many =E2=80=98buggy=E2=80=99 crawlers out there. It's in their o= wn=20 interest to be fussy whilst claiming to respect robots.txt. The=20 less you deviate from the most basic norm imaginable, the better. I tested whether embedding raw \r\n bytes in nginx.conf strings=20 like this works, and it seems to, even though a human would=20 probably not do so. > Nice, the bots are also accessing the Cuirass web interface, do=20 > you > think it would be possible to extend this snippet to prevent it? You can replace =E2=80=98/nar/=E2=80=99 with =E2=80=98/=E2=80=99 to disallo= w everything: Disallow: / If we want crawlers to index only the front page (so people can=20 search for =E2=80=98Guix CI=E2=80=99, I guess), that's possible: Disallow: / Allow: /$ Don't confuse =E2=80=98$=E2=80=99 with =E2=80=98supports regexps=E2=80=99. = Buggy bots might fall=20 back to =E2=80=98Disallow: /=E2=80=99. This is where it gets ugly: nginx doesn't support escaping =E2=80=98$=E2=80= =99 in=20 strings. At all. It's insane. --=-=-= Content-Type: text/plain; format=flowed Content-Disposition: inline geo $dollar { default "$"; } # stackoverflow.com/questions/57466554 server { location = /robots.txt { return 200 "User-agent: *\r\nDisallow: /\r\nAllow: /$dollar\r\n"; } } --=-=-= Content-Type: text/plain; format=flowed *Obviously.* An alternative to that is to serve a real on-disc robots.txt. Kind regards, T G-R --=-=-=-- --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iIMEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCYbIwmQ0cbWVAdG9iaWFz LmdyAAoJEA2w/4hPVW15y2MBAILKgUIzreTZdQAAQaTODJziTLB3oomvmrwEpsjM VhnaAP9/P3wC8RwFz3hIJqUIRnXEp5/d9fgqVk/96ouiXhOGAw== =fmbL -----END PGP SIGNATURE----- --==-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 10 11:22:33 2021 Received: (at submit) by debbugs.gnu.org; 10 Dec 2021 16:22:33 +0000 Received: from localhost ([127.0.0.1]:47777 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvifJ-0004fd-7K for submit@debbugs.gnu.org; Fri, 10 Dec 2021 11:22:33 -0500 Received: from lists.gnu.org ([209.51.188.17]:38958) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvifF-0004fL-Qp for submit@debbugs.gnu.org; Fri, 10 Dec 2021 11:22:30 -0500 Received: from eggs.gnu.org ([209.51.188.92]:56464) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvifE-0007mY-Gw for bug-guix@gnu.org; Fri, 10 Dec 2021 11:22:29 -0500 Received: from wout1-smtp.messagingengine.com ([64.147.123.24]:48153) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvifC-00043J-OO; Fri, 10 Dec 2021 11:22:28 -0500 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id CB50C320188B; Fri, 10 Dec 2021 11:22:22 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Fri, 10 Dec 2021 11:22:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=famulari.name; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=mesmtp; bh=6ZmSU/O5hqF12LiEepokezF2 h6uLyC9htMOwJZtoN9Y=; b=w/u+BeZJypjOF0z/f4ymH/pv5eCHLQNUpAVWcknZ At1fxOhdXrUIRAVCsNKtshOjsnB8p8sd91l47IoI2S9qMKWDtnyDDkqwkYDLyVZx Gb7sBODiTmc/2KAqiWg4RJOFLomrVX9ONb8TiyX3HMIAtoHXnRo/2uqAskvGPBjb nGQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=6ZmSU/ O5hqF12LiEepokezF2h6uLyC9htMOwJZtoN9Y=; b=PQ7LWxvfrVKLetk/+Eatry Kx20ICTke7bqTi/XuptwdR1xzLF9+lAlpMwjfNxxFAmoVSjKkVMYzmWac9u4Cymt 7By+axQ/QWyUpSg/IcyDdsqHMwr+HsYk0vcvCzjQi502NUN6IYSj2mmhcCRe57L6 jSFRrYKeHDV72rzN/KuzMvFUZ8DtoQwrfukETR8JW4SQkrjjXJhTEdTfcsmG06Y2 q9E1iefd2wl1xSxZL4b13blgnrN8FImO9L84pWWEEL8R/4ileWaD0leOx2JX7yek yPaoTTRemz7+tHMpZXP6iYWWlOLlz58vH0VH80xeZSTS2q+HkZKQVK03tbIdkuig == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrkedvgdekkecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvuffkfhggtggujgesghdtreertddtvdenucfhrhhomhepnfgvohcuhfgr mhhulhgrrhhiuceolhgvohesfhgrmhhulhgrrhhirdhnrghmvgeqnecuggftrfgrthhtvg hrnhepudekveegteekleetgfeitdejgfejkeffudethedvhfeukeduleeikeejfeehffet necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheplhgvoh esfhgrmhhulhgrrhhirdhnrghmvg X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 10 Dec 2021 11:22:21 -0500 (EST) Date: Fri, 10 Dec 2021 11:22:15 -0500 From: Leo Famulari To: Tobias Geerinckx-Rice Subject: Re: bug#52338: Crawler bots are downloading substitutes Message-ID: References: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> <87tufh6h85.fsf_-_@gnu.org> <87sfv1ivl2.fsf@nckx> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="VM6vphGYnYZdfwlP" Content-Disposition: inline In-Reply-To: <87sfv1ivl2.fsf@nckx> Received-SPF: pass client-ip=64.147.123.24; envelope-from=leo@famulari.name; helo=wout1-smtp.messagingengine.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: submit Cc: Mathieu Othacehe , 52338@debbugs.gnu.org, bug-guix@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) --VM6vphGYnYZdfwlP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Dec 09, 2021 at 04:42:24PM +0100, Tobias Geerinckx-Rice wrote: [...] > An alternative to that is to serve a real on-disc robots.txt. Alright, I leave it up to you. I just want to prevent bots from downloading substitutes. I don't really have opinions about any of the details. --VM6vphGYnYZdfwlP Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAmGzfrcACgkQJkb6MLrK fwjOVxAApyg72GXHlubP+5xBaYOitLNivjNzkR32FHiXroQzuuW0EU9RLCpZtghx J/AydzqeOqWveMvdXN05d3WB1KmTjind8kJylG1CArRrzgqVeQIFJSzIWkEkFiXs I1Ca/A4f9FVaH3tAv5snq7fnEN5NXWXBp/q521X1LltNXi0sW4Flq1fm1NCPB4Dc y3yiwCy5t3O++H/00s6KGGk7Hceh7u8Fu43Lq/5jKNVikt955kkidwIyVM3EWUjX hcT4xf1inffa8rAqgw7ilFDGPH1VswBFA7hs75CUS22GhD+eV67+DbKuw0JJ/iKS goKGr+SL89jQ3kK23HmWH9XRni4+lOW44LiOdnJmxtFi9ctlatH+k+M8bCfCibex B9ROf3sjaReR6CscWX3pvG680sjaB67QptGFAsQlCJiZs3DFTJxzfPHSEXvgVFPJ lJguah2uE/h32ZK+8MACKU4bcIlHs/zeg2bIbhxDMcpDLhR9cnlXys+Z3WM8hxOj ZfrIp69aNX9VP9p3ImYotGGs4t/qvGAY8Xf0uadhsa5OQSCO/wtW8PZ6guGa95iu +HI8erZdwCbV8Nu2mf1PK7YbjRBBEvnX/39jfjXUmSO7tYwu+srGJdxFTgc8xnEY t0BNFo/8PxrZepRioFAeYqSgj6d7O+/HbHGVpRfPEg1EVPh81Cc= =GPOf -----END PGP SIGNATURE----- --VM6vphGYnYZdfwlP-- From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 10 11:46:49 2021 Received: (at submit) by debbugs.gnu.org; 10 Dec 2021 16:46:49 +0000 Received: from localhost ([127.0.0.1]:47796 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvj2m-0005Kd-Ml for submit@debbugs.gnu.org; Fri, 10 Dec 2021 11:46:49 -0500 Received: from lists.gnu.org ([209.51.188.17]:49414) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvj2l-0005KU-BQ for submit@debbugs.gnu.org; Fri, 10 Dec 2021 11:46:47 -0500 Received: from eggs.gnu.org ([209.51.188.92]:35102) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvj2l-0008OL-2V for bug-guix@gnu.org; Fri, 10 Dec 2021 11:46:47 -0500 Received: from [2a02:c205:2020:6054::1] (port=33796 helo=tobias.gr) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvj2h-0002ho-T8; Fri, 10 Dec 2021 11:46:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=2018; bh=NmbZd+8K1XsBs 79ndYKx3YDBmBNydZI2ZZuSj+JOmx8=; h=in-reply-to:date:subject:cc:to: from:references; d=tobias.gr; b=ECkXqxLHkWSof9qfSYPINzvdWi+5wtg3TFB+DU xkLOHlHGqt4rI6bpMDbfnJoxg80972yQjIvwagIEGzxsAuKEwUlDZz7LYuDUauBzFGTeJt 6ySOVzi14sxThiTtbxx+PVst31llUDNgHKERAuV6RzoGE1TvrCjjcIzLl6aGv6Wi4Dhb9h JgjD/C64zYORPUk6hn06ETxHNK7cTM1AixULq2dNU8/4YxwH6fXqtdfrd5J8ljXnx8+2Cz Ldus7Acxze14x4c2P1EzGdALhZ7rD11zb8UdoZuyT1M/PK4Ysbp8HE/1/5VtljeqGPMMZb iMIvvmUvc0HYDjDUNRd76HOA== Received: by submission.tobias.gr (OpenSMTPD) with ESMTPSA id 4799cfd6 (TLSv1.3:AEAD-AES256-GCM-SHA384:256:NO); Fri, 10 Dec 2021 16:46:38 +0000 (UTC) References: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> <87tufh6h85.fsf_-_@gnu.org> <87sfv1ivl2.fsf@nckx> From: Tobias Geerinckx-Rice To: Leo Famulari Subject: Re: bug#52338: Crawler bots are downloading substitutes Date: Fri, 10 Dec 2021 17:47:09 +0100 In-reply-to: BIMI-Selector: v=BIMI1; s=default; Message-ID: <87ilvw4db2.fsf@nckx> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a02:c205:2020:6054::1 (failed) Received-SPF: pass client-ip=2a02:c205:2020:6054::1; envelope-from=me@tobias.gr; helo=tobias.gr X-Spam_score_int: -12 X-Spam_score: -1.3 X-Spam_bar: - X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RDNS_NONE=0.793, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit Cc: Mathieu Othacehe , 52338@debbugs.gnu.org, bug-guix@gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --=-=-= Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Leo Famulari =E5=86=99=E9=81=93=EF=BC=9A > Alright, I leave it up to you. Dammit. Kind regards, T G-R --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iIMEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCYbOEoQ0cbWVAdG9iaWFz LmdyAAoJEA2w/4hPVW15WFABAO9dhSlJfA53EQQXHscpg/x6dluiUhbRgZwLBWhR qS6GAQCd/AzcajtJGLT+nYDLNyLarxBEK/mfFoB2kl64p3zRDg== =UDxy -----END PGP SIGNATURE----- --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 10 16:21:57 2021 Received: (at 52338) by debbugs.gnu.org; 10 Dec 2021 21:21:57 +0000 Received: from localhost ([127.0.0.1]:48039 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvnL3-000488-3J for submit@debbugs.gnu.org; Fri, 10 Dec 2021 16:21:57 -0500 Received: from world.peace.net ([64.112.178.59]:41026) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvnL0-00047v-AK for 52338@debbugs.gnu.org; Fri, 10 Dec 2021 16:21:55 -0500 Received: from mhw by world.peace.net with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mvnKt-0004yJ-Ve; Fri, 10 Dec 2021 16:21:48 -0500 From: Mark H Weaver To: Leo Famulari , 52338@debbugs.gnu.org Subject: Re: bug#52338: Crawler bots are downloading substitutes In-Reply-To: References: Date: Fri, 10 Dec 2021 16:21:11 -0500 Message-ID: <87r1ak2m1p.fsf@netris.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-Debbugs-Envelope-To: 52338 X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi Leo, Leo Famulari writes: > I noticed that some bots are downloading substitutes from > ci.guix.gnu.org. > > We should add a robots.txt file to reduce this waste. > > Specifically, I see bots from Bing and Semrush: > > https://www.bing.com/bingbot.htm > https://www.semrush.com/bot.html For what it's worth: during the years that I administered Hydra, I found that many bots disregarded the robots.txt file that was in place there. In practice, I found that I needed to periodically scan the access logs for bots and forcefully block their requests in order to keep Hydra from becoming overloaded with expensive queries from bots. Regards, Mark From debbugs-submit-bounces@debbugs.gnu.org Fri Dec 10 18:03:28 2021 Received: (at submit) by debbugs.gnu.org; 10 Dec 2021 23:03:28 +0000 Received: from localhost ([127.0.0.1]:48184 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvovI-0000Nh-6L for submit@debbugs.gnu.org; Fri, 10 Dec 2021 18:03:28 -0500 Received: from lists.gnu.org ([209.51.188.17]:40686) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvovG-0000NT-36 for submit@debbugs.gnu.org; Fri, 10 Dec 2021 18:03:27 -0500 Received: from eggs.gnu.org ([209.51.188.92]:35618) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvovF-0004oR-U9 for bug-guix@gnu.org; Fri, 10 Dec 2021 18:03:25 -0500 Received: from [2a02:c205:2020:6054::1] (port=33832 helo=tobias.gr) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvovE-00053R-2f for bug-guix@gnu.org; Fri, 10 Dec 2021 18:03:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=2018; bh=82HjlNJbHv1uq 7gIfndeEtFEvn4J6N16aTmLva30JR8=; h=in-reply-to:date:subject:cc:to: from:references; d=tobias.gr; b=XGk2P+rMtVH6gWqFulMEYl/u+o1r9wA7zCIetu xmr1EwlKUotFOlRP/kKQ6l/CPmQG7nc9c8CDK74W9/MjBNrB6dg0WdYcmXiJxHi2e9T26w qtj2StkW+fX2VpQMEZ6MGguD4Nrkpqidp5LsnDrMCH6nhUAcFD/C1MqQEdRoI1maVne6mI DTIK+2NrtlTtnKG61bDECPRgIRatwjJfCrIia5qaC2MAtYS2el+WJnbmFaY3jqrH2YaGEy 9g9Y8VfEJbhbzqbkfoFfPLJx/+dC/4mHjiGu+mrH32tbH+mRyB9uDYEUiobMgnWhkiMGOI ZYUpItvi1/t9LnCOqXZUkztA== Received: by submission.tobias.gr (OpenSMTPD) with ESMTPSA id 7e0f5e2d (TLSv1.3:AEAD-AES256-GCM-SHA384:256:NO); Fri, 10 Dec 2021 23:03:19 +0000 (UTC) References: <87r1ak2m1p.fsf@netris.org> From: Tobias Geerinckx-Rice To: Mark H Weaver Subject: Re: bug#52338: Crawler bots are downloading substitutes Date: Fri, 10 Dec 2021 23:52:51 +0100 In-reply-to: <87r1ak2m1p.fsf@netris.org> BIMI-Selector: v=BIMI1; s=default; Message-ID: <875yrw3vvk.fsf@nckx> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a02:c205:2020:6054::1 (failed) Received-SPF: pass client-ip=2a02:c205:2020:6054::1; envelope-from=me@tobias.gr; helo=tobias.gr X-Spam_score_int: -12 X-Spam_score: -1.3 X-Spam_bar: - X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RDNS_NONE=0.793, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit Cc: 52338@debbugs.gnu.org, bug-guix@gnu.org, Leo Famulari X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --=-=-= Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable All, Mark H Weaver =E5=86=99=E9=81=93=EF=BC=9A > For what it's worth: during the years that I administered Hydra,=20 > I found > that many bots disregarded the robots.txt file that was in place=20 > there. > In practice, I found that I needed to periodically scan the=20 > access logs > for bots and forcefully block their requests in order to keep=20 > Hydra from > becoming overloaded with expensive queries from bots. Very good point. IME (which is a few years old at this point) at least the=20 highlighted BingBot & SemrushThing always respected my robots.txt,=20 but it's definitely a concern. I'll leave this bug open to remind=20 us of that in a few weeks or so=E2=80=A6 If it does become a problem, we (I) might add some basic=20 User-Agent sniffing to either slow down or outright block=20 non-Guile downloaders. Whitelisting any legitimate ones, of=20 course. I think that's less hassle than dealing with dynamic IP=20 blocks whilst being equally effective here. Thanks (again) for taking care of Hydra, Mark, and thank you Leo=20 for keeping an eye on Cuirass :-) T G-R --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iIMEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCYbPc3w0cbWVAdG9iaWFz LmdyAAoJEA2w/4hPVW15Ky0BANhwI9BhRdXrGDsJPEblJvGMpSEWysyED3p7TZVU cF87AQDpw2NAebc3S4G2nEoAhKIoYZWLyyjW6G6HXQVib5WtAA== =bLY/ -----END PGP SIGNATURE----- --=-=-=-- From debbugs-submit-bounces@debbugs.gnu.org Sat Dec 11 04:46:41 2021 Received: (at submit) by debbugs.gnu.org; 11 Dec 2021 09:46:41 +0000 Received: from localhost ([127.0.0.1]:48557 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvyxl-0005p3-Fh for submit@debbugs.gnu.org; Sat, 11 Dec 2021 04:46:41 -0500 Received: from lists.gnu.org ([209.51.188.17]:47264) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvyxj-0005ov-L9 for submit@debbugs.gnu.org; Sat, 11 Dec 2021 04:46:40 -0500 Received: from eggs.gnu.org ([209.51.188.92]:57154) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvyxj-0002NQ-Dx for bug-guix@gnu.org; Sat, 11 Dec 2021 04:46:39 -0500 Received: from [2001:470:142:3::e] (port=54808 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvyxj-0000sC-4t; Sat, 11 Dec 2021 04:46:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=i/0plOltce3Giaa/yhbMPtE1jH5LRPqIuQtTw4/Jf6I=; b=aOHe+na0waB5QMz5DQMQ Lnt/ohEidcAr4vFTrOi8Pq9qCW4b1H05xLROVWgtKsMHUJET3EM4KYP5WMYjaFhmrskXGPKAEDxMU mQRnxU1j1GWevX9Bamc/rNp2sG+J+9LyG4wmrfMo456y0HZar5fnhMGsqF5sAFL7NMLoTzcIhplpL l7nU4QX5xkIQR7PlTJv6cfcOvCCxXUA73WudmHlkYGrnYvDhYM5xWkjnnVouDwoWl3vZp/UOQe37w E5FGzRzwMoCoHNpmGNkdPuCsXgkaJgYKRsi20adaLl2E4qCFS3AwLlerx6JKW+pRyFwd5TIlKACcD thWUySV3tCGMHw==; Received: from [2a01:e0a:19b:d9a0:2ddb:d3d2:32e8:d31a] (port=33142 helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvyxj-0001P9-2Q; Sat, 11 Dec 2021 04:46:39 -0500 From: Mathieu Othacehe To: Tobias Geerinckx-Rice Subject: Re: bug#52338: Crawler bots are downloading substitutes References: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> <87tufh6h85.fsf_-_@gnu.org> <87sfv1ivl2.fsf@nckx> <87ilvw4db2.fsf@nckx> Date: Sat, 11 Dec 2021 10:46:37 +0100 In-Reply-To: <87ilvw4db2.fsf@nckx> (Tobias Geerinckx-Rice's message of "Fri, 10 Dec 2021 17:47:09 +0100") Message-ID: <87sfuzwk1u.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: submit Cc: 52338@debbugs.gnu.org, bug-guix@gnu.org, Leo Famulari X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hey, The Cuirass web interface logs were quite silent this morning and I suspected an issue somewhere. I then realized that you did update the Nginx conf and the bots were no longer knocking at our door, which is great! Thanks to both of you, Mathieu From debbugs-submit-bounces@debbugs.gnu.org Sun Dec 19 11:53:36 2021 Received: (at 52338-done) by debbugs.gnu.org; 19 Dec 2021 16:53:36 +0000 Received: from localhost ([127.0.0.1]:47858 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1myzRI-0004CB-D3 for submit@debbugs.gnu.org; Sun, 19 Dec 2021 11:53:36 -0500 Received: from eggs.gnu.org ([209.51.188.92]:59788) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1myzRG-0004By-Jn for 52338-done@debbugs.gnu.org; Sun, 19 Dec 2021 11:53:35 -0500 Received: from [2001:470:142:3::e] (port=38938 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myzRB-0003fJ-Ae for 52338-done@debbugs.gnu.org; Sun, 19 Dec 2021 11:53:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=y8fOk48Lrh5p4f3NyL0ePJt+pdOb8IEzTwYV1VIxatY=; b=Xh+j+PmPHPFS6QNFsSuA AoRWDHlOlRYMvNJQUEl2YCHX7Gry5U3gFUcQ03jfs3Fb6znX/1V9CHlJ/4xVfySMSFNk7WVRfDVIJ V6bWXKf5fy+KyJuqJQKosSc+Ed/OljDzXXf0BbT+sd9lL24hx+WtSwOzDGFFQOw5kLgwtaYyCciJb 3HosttAEhDXMoyxWAoISYoNLI/1F/vQqJEi74KK9sj2mmhP0EyUbj5k3RyrFXPVmTIVNoVDdOWtAX 7ZyECahDxtO0wrejPvu5YHl+8O7IFPM/02kr9p2fBTCGn7R2UJ0rSu23LfdpVA4ih+ldR6l+Czcj+ lUffGriiPvkJkg==; Received: from [2a01:e0a:19b:d9a0:45b5:a14a:5c75:5737] (port=54170 helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myzRB-00030F-7O for 52338-done@debbugs.gnu.org; Sun, 19 Dec 2021 11:53:29 -0500 From: Mathieu Othacehe To: 52338-done@debbugs.gnu.org Subject: Re: bug#52338: Crawler bots are downloading substitutes References: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> <87tufh6h85.fsf_-_@gnu.org> <87sfv1ivl2.fsf@nckx> <87ilvw4db2.fsf@nckx> <87sfuzwk1u.fsf@gnu.org> Date: Sun, 19 Dec 2021 17:53:27 +0100 In-Reply-To: <87sfuzwk1u.fsf@gnu.org> (Mathieu Othacehe's message of "Sat, 11 Dec 2021 10:46:37 +0100") Message-ID: <87wnk0pmd4.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 52338-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Thanks to both of you, And closing! Mathieu From unknown Fri Aug 15 20:55:04 2025 Received: (at fakecontrol) by fakecontrolmessage; To: internal_control@debbugs.gnu.org From: Debbugs Internal Request Subject: Internal Control Message-Id: bug archived. Date: Mon, 17 Jan 2022 12:24:07 +0000 User-Agent: Fakemail v42.6.9 # This is a fake control message. # # The action: # bug archived. thanks # This fakemail brought to you by your local debbugs # administrator