Fix the “robots.txt timeout” problem with Google
I started this blog/website in December of 2007. For a month and a half it was crawled regularly by Google, and I slowly climbed the Google ranks for searches like wesg, mac cool apps and MacBook 5.1 surround sound. Typically I would be on the first result page, and if not, I’d be at the top of the second page.
I thought that this was great, because it brought in good traffic, but during the middle of January, I noticed that I had slipped in the rankings. To start checking the problem, I logged into Google Webmaster Tools and looked for the problem. What I found turned out to be a much larger issue than I had thought.
Under the Tools tab, I found that Google could not download the robots.txt file for my website. Basically this is a file that tells search engines like Google what it can and can’t crawl, or index. The problem with it is that Google checks it immediately on arrival at the site, but if there is an error in gathering the file, it will stop crawling and move on. If this happens repeatedly, it can prevent Google from crawling your site indefinitely.
This was happening to me, and if I couldn’t figure out what was wrong, I would drop out of the search results and gain very little traffic. I started Googling around for similar problems, and found many people who had similar issues, and who eventually found the problem. I followed some of the advice I found, and did this:
- Checked the robots.txt and sitemap.xml files to see if they were accessible.
- Checked the files again using a header checker
- Checked the .htaccess file to see if my server was blocking the Googlebot’s only known IP base, 66.249.
- Posted to several forums to see if others could help me.
After doing all of this and resolving nothing, I came to the conclusion that my web host, webserve.ca, was blocking Googlebot. After opening many support tickets, they seemed to disagree, and basically denied that anything they were doing was preventing Google from viewing my site. I kept at it, though, and finally they stated that yes, they had some Google IPs in a blocked area of the firewall. Finally! So this morning when I checked Google Webmaster Tools, I found that the robots.txt file had been successfully downloaded and the sitemap had been accessed. Now we wait for the rankings to return.
[tags]Google, SEO, blog, robots.txt, timeout, web hosting[/tags]