GNU bug report logs - #43073
Trim/hide full email headers on debbugs

Previous Next

Package: debbugs.gnu.org;

Reported by: Ruben Rodriguez <ruben <at> fsf.org>

Date: Thu, 27 Aug 2020 18:04:02 UTC

Severity: normal

Full log


View this message in rfc822 format

From: Bob Proulx <bob <at> proulx.com>
To: 43073 <at> debbugs.gnu.org
Subject: bug#43073: Trim/hide full email headers on debbugs
Date: Fri, 4 Dec 2020 19:49:49 -0700
Glenn Morris wrote:
> Note that the "db view tree" is the part that gets indexed by search
> engines. Search engines are (obviously) denied from the cgi bug pages,
> for reasons of system load. So if you get rid of the db pages, it will
> be impossible to search debbugs reports using standard web search
> engines.

I know I asked this in the mailing list but I am going to repeat it
here so it is in the ticket and then add some more.

Where is the seed that the search engines start with to crawl the db
tree?  I couldn't find it.

Meanwhile...  I find this difference between the systems.

    https://debbugs.gnu.org/robots.txt
      User-agent: *
      Disallow: /cgi-bin/
      Disallow: /cgi/

As you say debbugs blocks the robots that comply.

    https://bugs.debian.org/robots.txt
      User-Agent: Googlebot
      User-Agent: bingbot
      User-Agent: yandexbot
      User-Agent: baiduspider
      User-Agent: ia_archiver
      Allow: /cgi-bin/bugreport.cgi?bug=
      Allow: /cgi-bin/pkgreport.cgi?pkg=*;dist=unstable$
      Disallow: /*/
      User-agent: *
      Disallow: /

But the upstream allows the robots to crawl the cgi main bug ticket
display pages.  Maybe they have better resources.  Was this allowed on
debbugs previously and then blocked due to load problems?

I am wondering if we should allow it again as a test and then see what
the current state of things results.  Because then the main pages
would be indexed and this would also avoid the problem.  WDYT?

Bob

Me who keeps making crazy brainstorm suggestions and hoping that maybe
eventually one of them might work out beneficially. :-)




This bug report was last modified 4 years and 175 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.