Google bot agent currently indexes over 8 billion web pages. However, before these pages were placed in the index, they were each crawled by a special spider known as the GoogleBot.
Unfortunately, many web masters do not know about the internal workings of this virtual robot.
In fact, Google actually uses a number of spiders to crawl the Web. You can catch these spiders by examining your log files.
Google Bot
Google bot, as you probably know, is the search bot used by Google to scour the web for new pages. Googlebot agent has two versions, Google deep botagent and Google freshbot agent. Deepbot is a deep crawler
that tries to follow every link on the web and download as many pages
as it can for the Google index. It also examines the internal structure
of a site, giving a complete picture for the index.
Google Freshbot agent,
on the other hand, is a newer bot that crawls the web looking for fresh
content. The Google freshbot was implemented to take some of the
pressure off of the GoogleBot. The freshbot recalls pages already in
the index and then crawls them for new, modified, or updated pages. In
this way, Google is better equipped to keep up with the ever-changing
Web.
This means that the more you update
your web site with new, quality content, the more the Googlebot will
come by to check you out.
If you’d like to see the Googlebot crawling around your web property
more often, you need to obtain quality inbound links. However, there is
also one more step that you should take. If you haven't already done
so, you should create a Google Sitemap for your site.
The next Google bot in our lineup is known as the Google MediaBot.
MediaBot - used to analyze Adsense pages
useragent: Mediapartners-Google
MediaBot is the Google crawler for Adsense Publishers.
Mediabot is used to determine which ads Google should display on
Adsense pages. Google recommends that webmasters specifically add a
command in their robots.txt file that grants Mediabot access to their
entire site. To do this, simply enter the following code into your
robots.txt file:
User-agent: Mediapartners-Google*
Disallow:
This will ensure that the Google MediaBot is able to place relevant Adsense ads on your site.
Keep
in mind that ads can still be shown on a page if the MediaBot has not
yet visited. If that is the case, the ads chosen will be based on the
overall theme of the other pages on the site. If no ads can be chosen,
the dreaded public service announcements are displayed instead.
There
is an ongoing debate over whether or not the MediaBot is giving
websites with Adsense an advantage in the search engines. Even Matt Cutts has confirmed that the Adsense Mediabot has indexed webpages for Google's main index.
He states,Pages with AdSense will not be indexed more frequently.
It's literally just a crawl cache, so if e.g. our news crawl fetched a
page and then Googlebot wanted the same page, we'd retrieve the page
from the crawl cache. But there's no boost at all in rankings if you're
in AdSense or Google News. You don't get any more pages crawled either.
Matt Cutts claims that your website does not get any advantage by using
Adsense. However, in my mind, simply getting your site updated in and
of itself is an advantage.
This is very similar to Google Analytics, which also promotes a slightly higher degree of spider activity.
Those who run Google Analytics on their site can expect additional spider activity.
However,
you certainly shouldn't depend on any of these tools for getting your
site indexed. The key to frequent spidering is having quality inbound
links, quality content, and frequent updates.
Have images on your site? If so, you have likely been visited by our
next Google spider, the ImageBot.
Have images on your site? If so, you have likely been visited by our next Google spider the ImageBot.
ImageBot - used to crawl for the Image Search
user agent: GoogleBot-Image
The Imagebot prowls the Web for images to place in Google's image search. Images are ranked based upon their filename, surrounding text, alt text, and page title.
If you have a website that is primarily image based, then you would definitely want to optimize your images to receive some extra Google traffic.
On the other hand, some web sites may not benefit from Google image search.
In most cases, the traffic from the Image search engine is very low
quality and rarely converts into buyers. Many people are often just
looking for images that they can swipe. So, if you want to save some
bandwidth, use your robots.txt file to block ImageBot from accessing
your image directory.
One of the few exceptions might be if you have a site dedicated to downloadable images.
Our final bot is completely dedicated to the Google Adwords program or Google AdsBot.
Google AdsBot - Checks Adwords landing pages for quality
user agent: AdsBot-Google
AdsBot is one of Google's newest spiders.
This new crawler is used to analyze the content of advertising landing
pages, which helps determing the Quality score that Google assigns to
your ads.
Google
uses this Quality score in combination with the amount you bid to