From unknown Fri Aug 15 20:53:39 2025 X-Loop: help-debbugs@gnu.org Subject: bug#52338: Crawler bots are downloading substitutes Resent-From: Leo Famulari Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Mon, 06 Dec 2021 21:22:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 52338 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: 52338@debbugs.gnu.org X-Debbugs-Original-To: bug-guix@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.16388256634100 (code B ref -1); Mon, 06 Dec 2021 21:22:01 +0000 Received: (at submit) by debbugs.gnu.org; 6 Dec 2021 21:21:03 +0000 Received: from localhost ([127.0.0.1]:35568 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1muLPy-000144-Pb for submit@debbugs.gnu.org; Mon, 06 Dec 2021 16:21:02 -0500 Received: from lists.gnu.org ([209.51.188.17]:53410) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1muLPx-00013p-R3 for submit@debbugs.gnu.org; Mon, 06 Dec 2021 16:21:02 -0500 Received: from eggs.gnu.org ([209.51.188.92]:50116) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1muLPx-0002b6-JL for bug-guix@gnu.org; Mon, 06 Dec 2021 16:21:01 -0500 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:36171) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1muLPv-0001Ru-Sv for bug-guix@gnu.org; Mon, 06 Dec 2021 16:21:01 -0500 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 87E375C0233; Mon, 6 Dec 2021 16:20:58 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Mon, 06 Dec 2021 16:20:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=famulari.name; h=date:from:to:subject:message-id:mime-version:content-type; s= mesmtp; bh=i0j0URj5r82LVUNhFfK90P6QvKS+AA4MQCr5LyiR/M4=; b=shmVI OgNOyHu0tnwpFkKCbeESUsMUVnEZX/+8Od/WLo44ZGtTROtv/RXZDRJZJC+zRd0n 02OP/uqPlZjc5i+XaQYIxorGQbtOGjuFZRwnoSy7OATDpbGX9Vaae27PlgryehdF HqiI0lMR39mhElAqrA26BDUV3pXDjJij3/tCkE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:message-id :mime-version:subject:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; bh=i0j0URj5r82LVUNhFfK90P6QvKS+A A4MQCr5LyiR/M4=; b=hUwiaYFogsArb++AQdfOjqQY0SUkSsbJUPC+lRnaaeeTY XOqb1QsL/ISso31u6XlhJ09zdPTLThIr+CkpyK5mndWkWDQSRh0v+3L7/ew8wQf4 ROQLIfEJtffrzz3dRwboAAi2L4q/Vgvc9qS8niyn/IW0FtqyCnI3mYeJTPfR5ePi ONGmoh1KuMKpDW9O1Y2Bvm10rAnzj7vUk6wsnntK5CDhWvO3/llM3aCa9JGRrkF5 guWVzFMIv0uA5V6AD+0on/F6MxZ6sVf7zv/8INkQvsy0ikG2FhK8X9Fd0dIkemEa WT9ffZPK4opVGSf1Rp/lnMQBLXiGXexkoromLC5wQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrjeefgddugeejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkgggtugesthdtredttd dtvdenucfhrhhomhepnfgvohcuhfgrmhhulhgrrhhiuceolhgvohesfhgrmhhulhgrrhhi rdhnrghmvgeqnecuggftrfgrthhtvghrnhepfeeuueetgfdtleehieeugeetieekhffhge ekheekgfdtkeekkefhtdegleekvedvnecuffhomhgrihhnpehgnhhurdhorhhgpdgsihhn ghdrtghomhdpshgvmhhruhhshhdrtghomhenucevlhhushhtvghrufhiiigvpedtnecurf grrhgrmhepmhgrihhlfhhrohhmpehlvghosehfrghmuhhlrghrihdrnhgrmhgv X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Mon, 6 Dec 2021 16:20:58 -0500 (EST) Date: Mon, 6 Dec 2021 16:20:55 -0500 From: Leo Famulari Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Received-SPF: pass client-ip=66.111.4.26; envelope-from=leo@famulari.name; helo=out2-smtp.messagingengine.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) I noticed that some bots are downloading substitutes from ci.guix.gnu.org. We should add a robots.txt file to reduce this waste. Specifically, I see bots from Bing and Semrush: https://www.bing.com/bingbot.htm https://www.semrush.com/bot.html From unknown Fri Aug 15 20:53:39 2025 X-Loop: help-debbugs@gnu.org Subject: bug#52338: [maintenance] hydra: berlin: Create robots.txt. References: In-Reply-To: Resent-From: Leo Famulari Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Mon, 06 Dec 2021 22:19:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52338 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: 52338@debbugs.gnu.org Received: via spool by 52338-submit@debbugs.gnu.org id=B52338.16388291019517 (code B ref 52338); Mon, 06 Dec 2021 22:19:01 +0000 Received: (at 52338) by debbugs.gnu.org; 6 Dec 2021 22:18:21 +0000 Received: from localhost ([127.0.0.1]:35611 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1muMJR-0002TR-BP for submit@debbugs.gnu.org; Mon, 06 Dec 2021 17:18:21 -0500 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:60679) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1muMJP-0002TE-Nw for 52338@debbugs.gnu.org; Mon, 06 Dec 2021 17:18:20 -0500 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 4FB8A5C0227; Mon, 6 Dec 2021 17:18:14 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Mon, 06 Dec 2021 17:18:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=famulari.name; h=from:to:subject:date:message-id:mime-version :content-transfer-encoding; s=mesmtp; bh=mFU1BiXur1urRvIk4SysYUz QRlRjZevakml5r6IXBG4=; b=OndJF9pkFbMlePbxo5GYki3dXwrArxue/bvTLRB 9RO8AEVkE/BYwGvs1a6imO6BMcecsuSfA2b5g+/Hf/WXzBXQdhCrPwliNR0NI20E OQTecsXnpBaZoohHy7YV7Xyq8jPgFbI6fX15pkTcy7c9NpKOMGE6oOt5Lv4rcihc it3M= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:date:from :message-id:mime-version:subject:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=mFU1BiXur1urRvIk4 SysYUzQRlRjZevakml5r6IXBG4=; b=n8xPLa1Dqgfh/ho3M8VUgM2iVoCLRUkOU U1QzBOWMQtkDx9fIYb8ZguhELWbbBxQch3fDz59NgPPbX9TWR44ub9duPlihNRZ2 IdAwnpmP/dYahtGVZzsZwldc8DQ2i6M6qAue9pQ4z4As7l3vaYGCSG3zJCzimDjH a26/mXznRGXl7yZEtPUYeO2UfiJXWO25yMCqYeNEfEo6DF/P3UQYLdeZfTyAVqUA /W5yX8GwXY17rHh4Ci5FRMQIGfsyPzWxzwFY52ueGKJCvvAjGH9u1jLsxrAAJYuh Dia8gbMMHxjqCLAh3rntGnp/+kX/r7V52+XolYjiCgdsA5Xh4rT5g== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrjeefgdduheelucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgggfestdekredtre dttdenucfhrhhomhepnfgvohcuhfgrmhhulhgrrhhiuceolhgvohesfhgrmhhulhgrrhhi rdhnrghmvgeqnecuggftrfgrthhtvghrnhephedvleejudevleethfefueevteelvdelge dutdejvdegveelkeethfelgfehveehnecuffhomhgrihhnpehguhhigihsugdrohhrghdp ghhnuhdrohhrghenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfh hrohhmpehlvghosehfrghmuhhlrghrihdrnhgrmhgv X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA for <52338@debbugs.gnu.org>; Mon, 6 Dec 2021 17:18:14 -0500 (EST) From: Leo Famulari Date: Mon, 6 Dec 2021 17:18:10 -0500 Message-Id: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> X-Mailer: git-send-email 2.34.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Score: -0.7 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) I tested that `guix system build` does succeed with this change, but I would like a review on whether the resulting Nginx configuration is correct, and if this is the correct path to disallow. It generates an Nginx location block like this: ------ location /robots.txt { add_header Content-Type text/plain; return 200 "User-agent: * Disallow: /nar "; } ------ * hydra/nginx/berlin.scm (berlin-locations): Add a robots.txt Nginx location. --- hydra/nginx/berlin.scm | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/hydra/nginx/berlin.scm b/hydra/nginx/berlin.scm index 1f4b0be..3bb2129 100644 --- a/hydra/nginx/berlin.scm +++ b/hydra/nginx/berlin.scm @@ -174,7 +174,14 @@ PUBLISH-URL." (nginx-location-configuration (uri "/berlin.guixsd.org-export.pub") (body - (list "root /var/www/guix;")))))) + (list "root /var/www/guix;"))) + + (nginx-location-configuration + (uri "/robots.txt") + (body + (list + "add_header Content-Type text/plain;" + "return 200 \"User-agent: *\nDisallow: /nar/\n\";")))))) (define guix.gnu.org-redirect-locations (list -- 2.34.0 From unknown Fri Aug 15 20:53:39 2025 X-Loop: help-debbugs@gnu.org Subject: bug#52338: Crawler bots are downloading substitutes Resent-From: Mathieu Othacehe Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Thu, 09 Dec 2021 13:28:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52338 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Leo Famulari Cc: 52338@debbugs.gnu.org Received: via spool by 52338-submit@debbugs.gnu.org id=B52338.163905646910819 (code B ref 52338); Thu, 09 Dec 2021 13:28:02 +0000 Received: (at 52338) by debbugs.gnu.org; 9 Dec 2021 13:27:49 +0000 Received: from localhost ([127.0.0.1]:42873 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvJSe-0002oR-QZ for submit@debbugs.gnu.org; Thu, 09 Dec 2021 08:27:48 -0500 Received: from eggs.gnu.org ([209.51.188.92]:34132) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvJSd-0002oF-9L for 52338@debbugs.gnu.org; Thu, 09 Dec 2021 08:27:47 -0500 Received: from [2001:470:142:3::e] (port=42764 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvJSX-00086l-Vi; Thu, 09 Dec 2021 08:27:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=AIpbgcpe3biXFQ7sZcsjdHlc7o2rRWDgsqdPmGA8LTE=; b=b8MWdka0M9rIWS3riZBn DwFWrwJchGU9IuCmuFeGY3PwXBcUUex+kwLZKYU0OC9sfE3we5JigRIZ07o1B2Kz9WLOm45g9OR1F O8gy6ZkPTwuY6xsQStulztUeOHxthKEun2B6yHMZPyPvv2s3sCr/36kOOHqPhXsoVpPXeiqRQOzM/ RPG+g3zxIu/KheshgjEDUjMvOjX944XDnQv2GOVfG58dcNtK2TKyyoeEhxiNSMNpST7E0uLtX+LVd egbMDSunsW2FoIsX509/SPDmqlfTE84QU4/UVVbE+dVKTWrVY/Z60QiS3vPyS7gL8D0XP7GQd34St K5ms8vWCDBUYZQ==; Received: from lfbn-lyo-1-2042-171.w90-66.abo.wanadoo.fr ([90.66.207.171]:49532 helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvJSX-0000z8-0A; Thu, 09 Dec 2021 08:27:41 -0500 From: Mathieu Othacehe References: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> Date: Thu, 09 Dec 2021 14:27:38 +0100 In-Reply-To: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> (Leo Famulari's message of "Mon, 6 Dec 2021 17:18:10 -0500") Message-ID: <87tufh6h85.fsf_-_@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hello Leo, > + (nginx-location-configuration > + (uri "/robots.txt") > + (body > + (list > + "add_header Content-Type text/plain;" > + "return 200 \"User-agent: *\nDisallow: /nar/\n\";")))))) Nice, the bots are also accessing the Cuirass web interface, do you think it would be possible to extend this snippet to prevent it? Thanks, Mathieu From unknown Fri Aug 15 20:53:39 2025 X-Loop: help-debbugs@gnu.org Subject: bug#52338: Crawler bots are downloading substitutes Resent-From: Tobias Geerinckx-Rice Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Thu, 09 Dec 2021 16:36:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52338 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Mathieu Othacehe Cc: 52338@debbugs.gnu.org, leo@famulari.name X-Debbugs-Original-Cc: 52338@debbugs.gnu.org, bug-guix@gnu.org, Leo Famulari Received: via spool by submit@debbugs.gnu.org id=B.16390677617009 (code B ref -1); Thu, 09 Dec 2021 16:36:02 +0000 Received: (at submit) by debbugs.gnu.org; 9 Dec 2021 16:36:01 +0000 Received: from localhost ([127.0.0.1]:44652 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvMOm-0001oz-UC for submit@debbugs.gnu.org; Thu, 09 Dec 2021 11:36:01 -0500 Received: from lists.gnu.org ([209.51.188.17]:60954) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvMOl-0001oq-3J for submit@debbugs.gnu.org; Thu, 09 Dec 2021 11:35:59 -0500 Received: from eggs.gnu.org ([209.51.188.92]:60572) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvMOk-0007zj-TR for bug-guix@gnu.org; Thu, 09 Dec 2021 11:35:58 -0500 Received: from tobias.gr ([80.241.217.52]:33378) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvMOh-0006Ut-D0; Thu, 09 Dec 2021 11:35:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=2018; bh=2TFMU6E3AYyG6 7io9x9Indy/PjAf9SfY1ZFLVnMAWUg=; h=in-reply-to:date:subject:cc:to: from:references; d=tobias.gr; b=NdkB3lLLdZFvqDg4d3Bd86+gPkEZjV++e/FDyW mnT1AoCJGaOi2faKY3USyzovdTvPToudjIZgySLzTR2nGsM/4v679fW7cMamSifJR+ck/L n6nyJCcTdCWYnJAZPmsEZZI9OWCx+BahZWuTnDTQeSwPGOqKJVsA//brh/Yz1vCxCcvGGk kWVQtQTNl1qygYHtrMA453Yh7wAnvOrwoEdF/lBpolHOPmlv3OLgxKY//32pmNtEm+gBJM NoZvY4aAQFAOEDvC3J/iHGzJ7auxgZIxm+KM3+lzPlIc8Lyf0idUqlrDrlJqUQCmxMP6cU ZVSlnzaeBMM0jyMx7vGEvUfA== Received: by submission.tobias.gr (OpenSMTPD) with ESMTPSA id efe962d3 (TLSv1.3:AEAD-AES256-GCM-SHA384:256:NO); Thu, 9 Dec 2021 16:35:48 +0000 (UTC) References: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> <87tufh6h85.fsf_-_@gnu.org> From: Tobias Geerinckx-Rice Date: Thu, 09 Dec 2021 16:42:24 +0100 In-reply-to: <87tufh6h85.fsf_-_@gnu.org> BIMI-Selector: v=BIMI1; s=default; Message-ID: <87sfv1ivl2.fsf@nckx> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" Received-SPF: pass client-ip=80.241.217.52; envelope-from=me@tobias.gr; helo=tobias.gr X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.6 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.6 (--) --==-=-= Content-Type: multipart/mixed; boundary="=-=-=" --=-=-= Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Mathieu Othacehe =E5=86=99=E9=81=93=EF=BC=9A > Hello Leo, > >> + (nginx-location-configuration >> + (uri "/robots.txt") It's a micro-optimisation, but it can't hurt to generate =E2=80=98location= =20 =3D /robots.txt=E2=80=99 instead of =E2=80=98location /robots.txt=E2=80=99 = here. >> + (body >> + (list >> + "add_header Content-Type text/plain;" >> + "return 200 \"User-agent: *\nDisallow:=20 >> /nar/\n\";")))))) Use \r\n instead of \n, even if \n happens to work. There are many =E2=80=98buggy=E2=80=99 crawlers out there. It's in their o= wn=20 interest to be fussy whilst claiming to respect robots.txt. The=20 less you deviate from the most basic norm imaginable, the better. I tested whether embedding raw \r\n bytes in nginx.conf strings=20 like this works, and it seems to, even though a human would=20 probably not do so. > Nice, the bots are also accessing the Cuirass web interface, do=20 > you > think it would be possible to extend this snippet to prevent it? You can replace =E2=80=98/nar/=E2=80=99 with =E2=80=98/=E2=80=99 to disallo= w everything: Disallow: / If we want crawlers to index only the front page (so people can=20 search for =E2=80=98Guix CI=E2=80=99, I guess), that's possible: Disallow: / Allow: /$ Don't confuse =E2=80=98$=E2=80=99 with =E2=80=98supports regexps=E2=80=99. = Buggy bots might fall=20 back to =E2=80=98Disallow: /=E2=80=99. This is where it gets ugly: nginx doesn't support escaping =E2=80=98$=E2=80= =99 in=20 strings. At all. It's insane. --=-=-= Content-Type: text/plain; format=flowed Content-Disposition: inline geo $dollar { default "$"; } # stackoverflow.com/questions/57466554 server { location = /robots.txt { return 200 "User-agent: *\r\nDisallow: /\r\nAllow: /$dollar\r\n"; } } --=-=-= Content-Type: text/plain; format=flowed *Obviously.* An alternative to that is to serve a real on-disc robots.txt. Kind regards, T G-R --=-=-=-- --==-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iIMEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCYbIwmQ0cbWVAdG9iaWFz LmdyAAoJEA2w/4hPVW15y2MBAILKgUIzreTZdQAAQaTODJziTLB3oomvmrwEpsjM VhnaAP9/P3wC8RwFz3hIJqUIRnXEp5/d9fgqVk/96ouiXhOGAw== =fmbL -----END PGP SIGNATURE----- --==-=-=-- From unknown Fri Aug 15 20:53:39 2025 X-Loop: help-debbugs@gnu.org Subject: bug#52338: Crawler bots are downloading substitutes Resent-From: Leo Famulari Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Fri, 10 Dec 2021 16:23:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52338 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Tobias Geerinckx-Rice Cc: othacehe@gnu.org, 52338@debbugs.gnu.org X-Debbugs-Original-Cc: Mathieu Othacehe , 52338@debbugs.gnu.org, bug-guix@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.163915335317961 (code B ref -1); Fri, 10 Dec 2021 16:23:01 +0000 Received: (at submit) by debbugs.gnu.org; 10 Dec 2021 16:22:33 +0000 Received: from localhost ([127.0.0.1]:47777 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvifJ-0004fd-7K for submit@debbugs.gnu.org; Fri, 10 Dec 2021 11:22:33 -0500 Received: from lists.gnu.org ([209.51.188.17]:38958) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvifF-0004fL-Qp for submit@debbugs.gnu.org; Fri, 10 Dec 2021 11:22:30 -0500 Received: from eggs.gnu.org ([209.51.188.92]:56464) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvifE-0007mY-Gw for bug-guix@gnu.org; Fri, 10 Dec 2021 11:22:29 -0500 Received: from wout1-smtp.messagingengine.com ([64.147.123.24]:48153) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvifC-00043J-OO; Fri, 10 Dec 2021 11:22:28 -0500 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id CB50C320188B; Fri, 10 Dec 2021 11:22:22 -0500 (EST) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Fri, 10 Dec 2021 11:22:23 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=famulari.name; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=mesmtp; bh=6ZmSU/O5hqF12LiEepokezF2 h6uLyC9htMOwJZtoN9Y=; b=w/u+BeZJypjOF0z/f4ymH/pv5eCHLQNUpAVWcknZ At1fxOhdXrUIRAVCsNKtshOjsnB8p8sd91l47IoI2S9qMKWDtnyDDkqwkYDLyVZx Gb7sBODiTmc/2KAqiWg4RJOFLomrVX9ONb8TiyX3HMIAtoHXnRo/2uqAskvGPBjb nGQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=6ZmSU/ O5hqF12LiEepokezF2h6uLyC9htMOwJZtoN9Y=; b=PQ7LWxvfrVKLetk/+Eatry Kx20ICTke7bqTi/XuptwdR1xzLF9+lAlpMwjfNxxFAmoVSjKkVMYzmWac9u4Cymt 7By+axQ/QWyUpSg/IcyDdsqHMwr+HsYk0vcvCzjQi502NUN6IYSj2mmhcCRe57L6 jSFRrYKeHDV72rzN/KuzMvFUZ8DtoQwrfukETR8JW4SQkrjjXJhTEdTfcsmG06Y2 q9E1iefd2wl1xSxZL4b13blgnrN8FImO9L84pWWEEL8R/4ileWaD0leOx2JX7yek yPaoTTRemz7+tHMpZXP6iYWWlOLlz58vH0VH80xeZSTS2q+HkZKQVK03tbIdkuig == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrkedvgdekkecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpeffhffvuffkfhggtggujgesghdtreertddtvdenucfhrhhomhepnfgvohcuhfgr mhhulhgrrhhiuceolhgvohesfhgrmhhulhgrrhhirdhnrghmvgeqnecuggftrfgrthhtvg hrnhepudekveegteekleetgfeitdejgfejkeffudethedvhfeukeduleeikeejfeehffet necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheplhgvoh esfhgrmhhulhgrrhhirdhnrghmvg X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 10 Dec 2021 11:22:21 -0500 (EST) Date: Fri, 10 Dec 2021 11:22:15 -0500 From: Leo Famulari Message-ID: References: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> <87tufh6h85.fsf_-_@gnu.org> <87sfv1ivl2.fsf@nckx> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="VM6vphGYnYZdfwlP" Content-Disposition: inline In-Reply-To: <87sfv1ivl2.fsf@nckx> Received-SPF: pass client-ip=64.147.123.24; envelope-from=leo@famulari.name; helo=wout1-smtp.messagingengine.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) --VM6vphGYnYZdfwlP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Thu, Dec 09, 2021 at 04:42:24PM +0100, Tobias Geerinckx-Rice wrote: [...] > An alternative to that is to serve a real on-disc robots.txt. Alright, I leave it up to you. I just want to prevent bots from downloading substitutes. I don't really have opinions about any of the details. --VM6vphGYnYZdfwlP Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEsFFZSPHn08G5gDigJkb6MLrKfwgFAmGzfrcACgkQJkb6MLrK fwjOVxAApyg72GXHlubP+5xBaYOitLNivjNzkR32FHiXroQzuuW0EU9RLCpZtghx J/AydzqeOqWveMvdXN05d3WB1KmTjind8kJylG1CArRrzgqVeQIFJSzIWkEkFiXs I1Ca/A4f9FVaH3tAv5snq7fnEN5NXWXBp/q521X1LltNXi0sW4Flq1fm1NCPB4Dc y3yiwCy5t3O++H/00s6KGGk7Hceh7u8Fu43Lq/5jKNVikt955kkidwIyVM3EWUjX hcT4xf1inffa8rAqgw7ilFDGPH1VswBFA7hs75CUS22GhD+eV67+DbKuw0JJ/iKS goKGr+SL89jQ3kK23HmWH9XRni4+lOW44LiOdnJmxtFi9ctlatH+k+M8bCfCibex B9ROf3sjaReR6CscWX3pvG680sjaB67QptGFAsQlCJiZs3DFTJxzfPHSEXvgVFPJ lJguah2uE/h32ZK+8MACKU4bcIlHs/zeg2bIbhxDMcpDLhR9cnlXys+Z3WM8hxOj ZfrIp69aNX9VP9p3ImYotGGs4t/qvGAY8Xf0uadhsa5OQSCO/wtW8PZ6guGa95iu +HI8erZdwCbV8Nu2mf1PK7YbjRBBEvnX/39jfjXUmSO7tYwu+srGJdxFTgc8xnEY t0BNFo/8PxrZepRioFAeYqSgj6d7O+/HbHGVpRfPEg1EVPh81Cc= =GPOf -----END PGP SIGNATURE----- --VM6vphGYnYZdfwlP-- From unknown Fri Aug 15 20:53:39 2025 X-Loop: help-debbugs@gnu.org Subject: bug#52338: Crawler bots are downloading substitutes Resent-From: Tobias Geerinckx-Rice Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Fri, 10 Dec 2021 16:47:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52338 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Leo Famulari Cc: othacehe@gnu.org, 52338@debbugs.gnu.org X-Debbugs-Original-Cc: Mathieu Othacehe , 52338@debbugs.gnu.org, bug-guix@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.163915480920504 (code B ref -1); Fri, 10 Dec 2021 16:47:01 +0000 Received: (at submit) by debbugs.gnu.org; 10 Dec 2021 16:46:49 +0000 Received: from localhost ([127.0.0.1]:47796 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvj2m-0005Kd-Ml for submit@debbugs.gnu.org; Fri, 10 Dec 2021 11:46:49 -0500 Received: from lists.gnu.org ([209.51.188.17]:49414) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvj2l-0005KU-BQ for submit@debbugs.gnu.org; Fri, 10 Dec 2021 11:46:47 -0500 Received: from eggs.gnu.org ([209.51.188.92]:35102) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvj2l-0008OL-2V for bug-guix@gnu.org; Fri, 10 Dec 2021 11:46:47 -0500 Received: from [2a02:c205:2020:6054::1] (port=33796 helo=tobias.gr) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvj2h-0002ho-T8; Fri, 10 Dec 2021 11:46:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=2018; bh=NmbZd+8K1XsBs 79ndYKx3YDBmBNydZI2ZZuSj+JOmx8=; h=in-reply-to:date:subject:cc:to: from:references; d=tobias.gr; b=ECkXqxLHkWSof9qfSYPINzvdWi+5wtg3TFB+DU xkLOHlHGqt4rI6bpMDbfnJoxg80972yQjIvwagIEGzxsAuKEwUlDZz7LYuDUauBzFGTeJt 6ySOVzi14sxThiTtbxx+PVst31llUDNgHKERAuV6RzoGE1TvrCjjcIzLl6aGv6Wi4Dhb9h JgjD/C64zYORPUk6hn06ETxHNK7cTM1AixULq2dNU8/4YxwH6fXqtdfrd5J8ljXnx8+2Cz Ldus7Acxze14x4c2P1EzGdALhZ7rD11zb8UdoZuyT1M/PK4Ysbp8HE/1/5VtljeqGPMMZb iMIvvmUvc0HYDjDUNRd76HOA== Received: by submission.tobias.gr (OpenSMTPD) with ESMTPSA id 4799cfd6 (TLSv1.3:AEAD-AES256-GCM-SHA384:256:NO); Fri, 10 Dec 2021 16:46:38 +0000 (UTC) References: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> <87tufh6h85.fsf_-_@gnu.org> <87sfv1ivl2.fsf@nckx> From: Tobias Geerinckx-Rice Date: Fri, 10 Dec 2021 17:47:09 +0100 In-reply-to: BIMI-Selector: v=BIMI1; s=default; Message-ID: <87ilvw4db2.fsf@nckx> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a02:c205:2020:6054::1 (failed) Received-SPF: pass client-ip=2a02:c205:2020:6054::1; envelope-from=me@tobias.gr; helo=tobias.gr X-Spam_score_int: -12 X-Spam_score: -1.3 X-Spam_bar: - X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RDNS_NONE=0.793, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --=-=-= Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Leo Famulari =E5=86=99=E9=81=93=EF=BC=9A > Alright, I leave it up to you. Dammit. Kind regards, T G-R --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iIMEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCYbOEoQ0cbWVAdG9iaWFz LmdyAAoJEA2w/4hPVW15WFABAO9dhSlJfA53EQQXHscpg/x6dluiUhbRgZwLBWhR qS6GAQCd/AzcajtJGLT+nYDLNyLarxBEK/mfFoB2kl64p3zRDg== =UDxy -----END PGP SIGNATURE----- --=-=-=-- From unknown Fri Aug 15 20:53:39 2025 X-Loop: help-debbugs@gnu.org Subject: bug#52338: Crawler bots are downloading substitutes Resent-From: Mark H Weaver Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Fri, 10 Dec 2021 21:22:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52338 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Leo Famulari , 52338@debbugs.gnu.org Received: via spool by 52338-submit@debbugs.gnu.org id=B52338.163917131715884 (code B ref 52338); Fri, 10 Dec 2021 21:22:01 +0000 Received: (at 52338) by debbugs.gnu.org; 10 Dec 2021 21:21:57 +0000 Received: from localhost ([127.0.0.1]:48039 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvnL3-000488-3J for submit@debbugs.gnu.org; Fri, 10 Dec 2021 16:21:57 -0500 Received: from world.peace.net ([64.112.178.59]:41026) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvnL0-00047v-AK for 52338@debbugs.gnu.org; Fri, 10 Dec 2021 16:21:55 -0500 Received: from mhw by world.peace.net with esmtpsa (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mvnKt-0004yJ-Ve; Fri, 10 Dec 2021 16:21:48 -0500 From: Mark H Weaver In-Reply-To: References: Date: Fri, 10 Dec 2021 16:21:11 -0500 Message-ID: <87r1ak2m1p.fsf@netris.org> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.0 (-) Hi Leo, Leo Famulari writes: > I noticed that some bots are downloading substitutes from > ci.guix.gnu.org. > > We should add a robots.txt file to reduce this waste. > > Specifically, I see bots from Bing and Semrush: > > https://www.bing.com/bingbot.htm > https://www.semrush.com/bot.html For what it's worth: during the years that I administered Hydra, I found that many bots disregarded the robots.txt file that was in place there. In practice, I found that I needed to periodically scan the access logs for bots and forcefully block their requests in order to keep Hydra from becoming overloaded with expensive queries from bots. Regards, Mark From unknown Fri Aug 15 20:53:39 2025 X-Loop: help-debbugs@gnu.org Subject: bug#52338: Crawler bots are downloading substitutes Resent-From: Tobias Geerinckx-Rice Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Fri, 10 Dec 2021 23:04:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52338 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Mark H Weaver Cc: 52338@debbugs.gnu.org, leo@famulari.name X-Debbugs-Original-Cc: 52338@debbugs.gnu.org, bug-guix@gnu.org, Leo Famulari Received: via spool by submit@debbugs.gnu.org id=B.16391774081474 (code B ref -1); Fri, 10 Dec 2021 23:04:01 +0000 Received: (at submit) by debbugs.gnu.org; 10 Dec 2021 23:03:28 +0000 Received: from localhost ([127.0.0.1]:48184 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvovI-0000Nh-6L for submit@debbugs.gnu.org; Fri, 10 Dec 2021 18:03:28 -0500 Received: from lists.gnu.org ([209.51.188.17]:40686) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvovG-0000NT-36 for submit@debbugs.gnu.org; Fri, 10 Dec 2021 18:03:27 -0500 Received: from eggs.gnu.org ([209.51.188.92]:35618) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvovF-0004oR-U9 for bug-guix@gnu.org; Fri, 10 Dec 2021 18:03:25 -0500 Received: from [2a02:c205:2020:6054::1] (port=33832 helo=tobias.gr) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvovE-00053R-2f for bug-guix@gnu.org; Fri, 10 Dec 2021 18:03:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=2018; bh=82HjlNJbHv1uq 7gIfndeEtFEvn4J6N16aTmLva30JR8=; h=in-reply-to:date:subject:cc:to: from:references; d=tobias.gr; b=XGk2P+rMtVH6gWqFulMEYl/u+o1r9wA7zCIetu xmr1EwlKUotFOlRP/kKQ6l/CPmQG7nc9c8CDK74W9/MjBNrB6dg0WdYcmXiJxHi2e9T26w qtj2StkW+fX2VpQMEZ6MGguD4Nrkpqidp5LsnDrMCH6nhUAcFD/C1MqQEdRoI1maVne6mI DTIK+2NrtlTtnKG61bDECPRgIRatwjJfCrIia5qaC2MAtYS2el+WJnbmFaY3jqrH2YaGEy 9g9Y8VfEJbhbzqbkfoFfPLJx/+dC/4mHjiGu+mrH32tbH+mRyB9uDYEUiobMgnWhkiMGOI ZYUpItvi1/t9LnCOqXZUkztA== Received: by submission.tobias.gr (OpenSMTPD) with ESMTPSA id 7e0f5e2d (TLSv1.3:AEAD-AES256-GCM-SHA384:256:NO); Fri, 10 Dec 2021 23:03:19 +0000 (UTC) References: <87r1ak2m1p.fsf@netris.org> From: Tobias Geerinckx-Rice Date: Fri, 10 Dec 2021 23:52:51 +0100 In-reply-to: <87r1ak2m1p.fsf@netris.org> BIMI-Selector: v=BIMI1; s=default; Message-ID: <875yrw3vvk.fsf@nckx> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a02:c205:2020:6054::1 (failed) Received-SPF: pass client-ip=2a02:c205:2020:6054::1; envelope-from=me@tobias.gr; helo=tobias.gr X-Spam_score_int: -12 X-Spam_score: -1.3 X-Spam_bar: - X-Spam_report: (-1.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RDNS_NONE=0.793, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) --=-=-= Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable All, Mark H Weaver =E5=86=99=E9=81=93=EF=BC=9A > For what it's worth: during the years that I administered Hydra,=20 > I found > that many bots disregarded the robots.txt file that was in place=20 > there. > In practice, I found that I needed to periodically scan the=20 > access logs > for bots and forcefully block their requests in order to keep=20 > Hydra from > becoming overloaded with expensive queries from bots. Very good point. IME (which is a few years old at this point) at least the=20 highlighted BingBot & SemrushThing always respected my robots.txt,=20 but it's definitely a concern. I'll leave this bug open to remind=20 us of that in a few weeks or so=E2=80=A6 If it does become a problem, we (I) might add some basic=20 User-Agent sniffing to either slow down or outright block=20 non-Guile downloaders. Whitelisting any legitimate ones, of=20 course. I think that's less hassle than dealing with dynamic IP=20 blocks whilst being equally effective here. Thanks (again) for taking care of Hydra, Mark, and thank you Leo=20 for keeping an eye on Cuirass :-) T G-R --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iIMEARYKACsWIQT12iAyS4c9C3o4dnINsP+IT1VteQUCYbPc3w0cbWVAdG9iaWFz LmdyAAoJEA2w/4hPVW15Ky0BANhwI9BhRdXrGDsJPEblJvGMpSEWysyED3p7TZVU cF87AQDpw2NAebc3S4G2nEoAhKIoYZWLyyjW6G6HXQVib5WtAA== =bLY/ -----END PGP SIGNATURE----- --=-=-=-- From unknown Fri Aug 15 20:53:39 2025 X-Loop: help-debbugs@gnu.org Subject: bug#52338: Crawler bots are downloading substitutes Resent-From: Mathieu Othacehe Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Sat, 11 Dec 2021 09:47:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 52338 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Tobias Geerinckx-Rice Cc: 52338@debbugs.gnu.org, leo@famulari.name X-Debbugs-Original-Cc: 52338@debbugs.gnu.org, bug-guix@gnu.org, Leo Famulari Received: via spool by submit@debbugs.gnu.org id=B.163921600122389 (code B ref -1); Sat, 11 Dec 2021 09:47:01 +0000 Received: (at submit) by debbugs.gnu.org; 11 Dec 2021 09:46:41 +0000 Received: from localhost ([127.0.0.1]:48557 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvyxl-0005p3-Fh for submit@debbugs.gnu.org; Sat, 11 Dec 2021 04:46:41 -0500 Received: from lists.gnu.org ([209.51.188.17]:47264) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1mvyxj-0005ov-L9 for submit@debbugs.gnu.org; Sat, 11 Dec 2021 04:46:40 -0500 Received: from eggs.gnu.org ([209.51.188.92]:57154) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvyxj-0002NQ-Dx for bug-guix@gnu.org; Sat, 11 Dec 2021 04:46:39 -0500 Received: from [2001:470:142:3::e] (port=54808 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvyxj-0000sC-4t; Sat, 11 Dec 2021 04:46:39 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=i/0plOltce3Giaa/yhbMPtE1jH5LRPqIuQtTw4/Jf6I=; b=aOHe+na0waB5QMz5DQMQ Lnt/ohEidcAr4vFTrOi8Pq9qCW4b1H05xLROVWgtKsMHUJET3EM4KYP5WMYjaFhmrskXGPKAEDxMU mQRnxU1j1GWevX9Bamc/rNp2sG+J+9LyG4wmrfMo456y0HZar5fnhMGsqF5sAFL7NMLoTzcIhplpL l7nU4QX5xkIQR7PlTJv6cfcOvCCxXUA73WudmHlkYGrnYvDhYM5xWkjnnVouDwoWl3vZp/UOQe37w E5FGzRzwMoCoHNpmGNkdPuCsXgkaJgYKRsi20adaLl2E4qCFS3AwLlerx6JKW+pRyFwd5TIlKACcD thWUySV3tCGMHw==; Received: from [2a01:e0a:19b:d9a0:2ddb:d3d2:32e8:d31a] (port=33142 helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvyxj-0001P9-2Q; Sat, 11 Dec 2021 04:46:39 -0500 From: Mathieu Othacehe References: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> <87tufh6h85.fsf_-_@gnu.org> <87sfv1ivl2.fsf@nckx> <87ilvw4db2.fsf@nckx> Date: Sat, 11 Dec 2021 10:46:37 +0100 In-Reply-To: <87ilvw4db2.fsf@nckx> (Tobias Geerinckx-Rice's message of "Fri, 10 Dec 2021 17:47:09 +0100") Message-ID: <87sfuzwk1u.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) Hey, The Cuirass web interface logs were quite silent this morning and I suspected an issue somewhere. I then realized that you did update the Nginx conf and the bots were no longer knocking at our door, which is great! Thanks to both of you, Mathieu From unknown Fri Aug 15 20:53:39 2025 MIME-Version: 1.0 X-Mailer: MIME-tools 5.505 (Entity 5.505) X-Loop: help-debbugs@gnu.org From: help-debbugs@gnu.org (GNU bug Tracking System) To: Leo Famulari Subject: bug#52338: closed (Re: bug#52338: Crawler bots are downloading substitutes) Message-ID: References: <87wnk0pmd4.fsf@gnu.org> X-Gnu-PR-Message: they-closed 52338 X-Gnu-PR-Package: guix Reply-To: 52338@debbugs.gnu.org Date: Sun, 19 Dec 2021 16:54:02 +0000 Content-Type: multipart/mixed; boundary="----------=_1639932842-16171-1" This is a multi-part message in MIME format... ------------=_1639932842-16171-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Your bug report #52338: Crawler bots are downloading substitutes which was filed against the guix package, has been closed. The explanation is attached below, along with your original report. If you require more details, please reply to 52338@debbugs.gnu.org. --=20 52338: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D52338 GNU Bug Tracking System Contact help-debbugs@gnu.org with problems ------------=_1639932842-16171-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at 52338-done) by debbugs.gnu.org; 19 Dec 2021 16:53:36 +0000 Received: from localhost ([127.0.0.1]:47858 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1myzRI-0004CB-D3 for submit@debbugs.gnu.org; Sun, 19 Dec 2021 11:53:36 -0500 Received: from eggs.gnu.org ([209.51.188.92]:59788) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1myzRG-0004By-Jn for 52338-done@debbugs.gnu.org; Sun, 19 Dec 2021 11:53:35 -0500 Received: from [2001:470:142:3::e] (port=38938 helo=fencepost.gnu.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myzRB-0003fJ-Ae for 52338-done@debbugs.gnu.org; Sun, 19 Dec 2021 11:53:29 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=MIME-Version:In-Reply-To:Date:References:Subject:To: From; bh=y8fOk48Lrh5p4f3NyL0ePJt+pdOb8IEzTwYV1VIxatY=; b=Xh+j+PmPHPFS6QNFsSuA AoRWDHlOlRYMvNJQUEl2YCHX7Gry5U3gFUcQ03jfs3Fb6znX/1V9CHlJ/4xVfySMSFNk7WVRfDVIJ V6bWXKf5fy+KyJuqJQKosSc+Ed/OljDzXXf0BbT+sd9lL24hx+WtSwOzDGFFQOw5kLgwtaYyCciJb 3HosttAEhDXMoyxWAoISYoNLI/1F/vQqJEi74KK9sj2mmhP0EyUbj5k3RyrFXPVmTIVNoVDdOWtAX 7ZyECahDxtO0wrejPvu5YHl+8O7IFPM/02kr9p2fBTCGn7R2UJ0rSu23LfdpVA4ih+ldR6l+Czcj+ lUffGriiPvkJkg==; Received: from [2a01:e0a:19b:d9a0:45b5:a14a:5c75:5737] (port=54170 helo=meije) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1myzRB-00030F-7O for 52338-done@debbugs.gnu.org; Sun, 19 Dec 2021 11:53:29 -0500 From: Mathieu Othacehe To: 52338-done@debbugs.gnu.org Subject: Re: bug#52338: Crawler bots are downloading substitutes References: <2f52f6b48db55f8a79b07dbb242b297ab49d6083.1638828946.git.leo@famulari.name> <87tufh6h85.fsf_-_@gnu.org> <87sfv1ivl2.fsf@nckx> <87ilvw4db2.fsf@nckx> <87sfuzwk1u.fsf@gnu.org> Date: Sun, 19 Dec 2021 17:53:27 +0100 In-Reply-To: <87sfuzwk1u.fsf@gnu.org> (Mathieu Othacehe's message of "Sat, 11 Dec 2021 10:46:37 +0100") Message-ID: <87wnk0pmd4.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: -2.3 (--) X-Debbugs-Envelope-To: 52338-done X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -3.3 (---) > Thanks to both of you, And closing! Mathieu ------------=_1639932842-16171-1 Content-Type: message/rfc822 Content-Disposition: inline Content-Transfer-Encoding: 7bit Received: (at submit) by debbugs.gnu.org; 6 Dec 2021 21:21:03 +0000 Received: from localhost ([127.0.0.1]:35568 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1muLPy-000144-Pb for submit@debbugs.gnu.org; Mon, 06 Dec 2021 16:21:02 -0500 Received: from lists.gnu.org ([209.51.188.17]:53410) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1muLPx-00013p-R3 for submit@debbugs.gnu.org; Mon, 06 Dec 2021 16:21:02 -0500 Received: from eggs.gnu.org ([209.51.188.92]:50116) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1muLPx-0002b6-JL for bug-guix@gnu.org; Mon, 06 Dec 2021 16:21:01 -0500 Received: from out2-smtp.messagingengine.com ([66.111.4.26]:36171) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1muLPv-0001Ru-Sv for bug-guix@gnu.org; Mon, 06 Dec 2021 16:21:01 -0500 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 87E375C0233; Mon, 6 Dec 2021 16:20:58 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute5.internal (MEProxy); Mon, 06 Dec 2021 16:20:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=famulari.name; h=date:from:to:subject:message-id:mime-version:content-type; s= mesmtp; bh=i0j0URj5r82LVUNhFfK90P6QvKS+AA4MQCr5LyiR/M4=; b=shmVI OgNOyHu0tnwpFkKCbeESUsMUVnEZX/+8Od/WLo44ZGtTROtv/RXZDRJZJC+zRd0n 02OP/uqPlZjc5i+XaQYIxorGQbtOGjuFZRwnoSy7OATDpbGX9Vaae27PlgryehdF HqiI0lMR39mhElAqrA26BDUV3pXDjJij3/tCkE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:message-id :mime-version:subject:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm1; bh=i0j0URj5r82LVUNhFfK90P6QvKS+A A4MQCr5LyiR/M4=; b=hUwiaYFogsArb++AQdfOjqQY0SUkSsbJUPC+lRnaaeeTY XOqb1QsL/ISso31u6XlhJ09zdPTLThIr+CkpyK5mndWkWDQSRh0v+3L7/ew8wQf4 ROQLIfEJtffrzz3dRwboAAi2L4q/Vgvc9qS8niyn/IW0FtqyCnI3mYeJTPfR5ePi ONGmoh1KuMKpDW9O1Y2Bvm10rAnzj7vUk6wsnntK5CDhWvO3/llM3aCa9JGRrkF5 guWVzFMIv0uA5V6AD+0on/F6MxZ6sVf7zv/8INkQvsy0ikG2FhK8X9Fd0dIkemEa WT9ffZPK4opVGSf1Rp/lnMQBLXiGXexkoromLC5wQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrjeefgddugeejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpeffhffvuffkgggtugesthdtredttd dtvdenucfhrhhomhepnfgvohcuhfgrmhhulhgrrhhiuceolhgvohesfhgrmhhulhgrrhhi rdhnrghmvgeqnecuggftrfgrthhtvghrnhepfeeuueetgfdtleehieeugeetieekhffhge ekheekgfdtkeekkefhtdegleekvedvnecuffhomhgrihhnpehgnhhurdhorhhgpdgsihhn ghdrtghomhdpshgvmhhruhhshhdrtghomhenucevlhhushhtvghrufhiiigvpedtnecurf grrhgrmhepmhgrihhlfhhrohhmpehlvghosehfrghmuhhlrghrihdrnhgrmhgv X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Mon, 6 Dec 2021 16:20:58 -0500 (EST) Date: Mon, 6 Dec 2021 16:20:55 -0500 From: Leo Famulari To: bug-guix@gnu.org Subject: Crawler bots are downloading substitutes Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Received-SPF: pass client-ip=66.111.4.26; envelope-from=leo@famulari.name; helo=out2-smtp.messagingengine.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: -1.4 (-) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -2.4 (--) I noticed that some bots are downloading substitutes from ci.guix.gnu.org. We should add a robots.txt file to reduce this waste. Specifically, I see bots from Bing and Semrush: https://www.bing.com/bingbot.htm https://www.semrush.com/bot.html ------------=_1639932842-16171-1--