GNU bug report logs -
#52338
Crawler bots are downloading substitutes
Previous Next
Reported by: Leo Famulari <leo <at> famulari.name>
Date: Mon, 6 Dec 2021 21:22:01 UTC
Severity: normal
Done: Mathieu Othacehe <othacehe <at> gnu.org>
Bug is archived. No further changes may be made.
Full log
View this message in rfc822 format
Hi Leo,
Leo Famulari <leo <at> famulari.name> writes:
> I noticed that some bots are downloading substitutes from
> ci.guix.gnu.org.
>
> We should add a robots.txt file to reduce this waste.
>
> Specifically, I see bots from Bing and Semrush:
>
> https://www.bing.com/bingbot.htm
> https://www.semrush.com/bot.html
For what it's worth: during the years that I administered Hydra, I found
that many bots disregarded the robots.txt file that was in place there.
In practice, I found that I needed to periodically scan the access logs
for bots and forcefully block their requests in order to keep Hydra from
becoming overloaded with expensive queries from bots.
Regards,
Mark
This bug report was last modified 3 years and 210 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.