GNU bug report logs -
#43073
Trim/hide full email headers on debbugs
Previous Next
Full log
View this message in rfc822 format
Glenn Morris wrote:
> Note that the "db view tree" is the part that gets indexed by search
> engines. Search engines are (obviously) denied from the cgi bug pages,
> for reasons of system load. So if you get rid of the db pages, it will
> be impossible to search debbugs reports using standard web search
> engines.
I know I asked this in the mailing list but I am going to repeat it
here so it is in the ticket and then add some more.
Where is the seed that the search engines start with to crawl the db
tree? I couldn't find it.
Meanwhile... I find this difference between the systems.
https://debbugs.gnu.org/robots.txt
User-agent: *
Disallow: /cgi-bin/
Disallow: /cgi/
As you say debbugs blocks the robots that comply.
https://bugs.debian.org/robots.txt
User-Agent: Googlebot
User-Agent: bingbot
User-Agent: yandexbot
User-Agent: baiduspider
User-Agent: ia_archiver
Allow: /cgi-bin/bugreport.cgi?bug=
Allow: /cgi-bin/pkgreport.cgi?pkg=*;dist=unstable$
Disallow: /*/
User-agent: *
Disallow: /
But the upstream allows the robots to crawl the cgi main bug ticket
display pages. Maybe they have better resources. Was this allowed on
debbugs previously and then blocked due to load problems?
I am wondering if we should allow it again as a test and then see what
the current state of things results. Because then the main pages
would be indexed and this would also avoid the problem. WDYT?
Bob
Me who keeps making crazy brainstorm suggestions and hoping that maybe
eventually one of them might work out beneficially. :-)
This bug report was last modified 4 years and 175 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.