If your web site is not getting fully indexed by search engines, there may be a problem with the linking structure. Sometimes a site is just too large and it is just not feasible to design a linking structure that places each page within three clicks of the home page, which is a general guideline that should be followed. In situations where a site is not getting indexed properly, a sitemap can be used to help search engine spiders find pages deep within your site.
You can easily check to see which pages have been indexed by the major search engines by using the site query operator.
Just substitute your actual domain name in example above and use it in a search box within a Google, Yahoo or MSN search page. The results will indicate which pages have been getting indexed. It is not unusual to find different page counts in each search engine index. Some search engines do a better job than others at indexing a site.
When a sitemap will help
A sitemap will not benefit a site that is already fully indexed. A sitemap’s only function is to help spiders find all of the pages in a site. If you have a 120-page site and you see 120 pages in each search engine index, then a sitemap will probably not benefit the site. Likewise, if you have a 20 or 30 page site that has a good linking structure, a sitemap will be of no benefit. But if you have a 120-page site and you only see 80, 90 or 100 pages–or a lower page count–then you most likely have a problem with your linking structure that is not allowing spiders to find all of your web pages within a few click from your home page. In this situation, a sitemap will likely be beneficial.
What a sitemap does
The purpose for a sitemap is to help search engine spiders find pages deep within a web site. Period. A sitemap will not directly influence your current search engine rankings. An argument can be made, however, that because a sitemap helps search engines find more informational content pages within a site, it increases the chances that the site’s rankings will improve due to a newly found page with more relevant content that represents a particular search phrase. That may be true, but do not expect rank positions to improve for pages already in a search engine’s index.
The whole idea is to provide a page from which a search engine spider can easily find every page within a site. While it is not always possible to structure links so that each web page is within three clicks of the home page, a sitemap places all pages within one click of the sitemap.
Two primary types of sitemaps
While several different types of sitemaps have been used in the past, there are only two in use today: an HTML site map and an XML sitemap. Notice that an HTML site map is two words, while an XML sitemap is spelled as one word. I don’t know why, but those are the standard spellings.
HTML site maps were in widespread use for many years and will benefit small sites. An HTML site map is nothing more than a web page in a site that contains links to all of the other pages within the site. A site map link is typically placed in the menu or the footer of every page within a site. Each link points to the HTML site map, which in essence becomes the central hub of a web site. A spider indexing any page within the site can easily find the link to the site map. From the HTML site map the spider can find every page in the site. What a concept!
The general rules with HTML site maps is that they should have no more than 100 hyperlinks on a page. That includes any menu links, footer links and any other links on the site map page. This means that a large site usually requires multiple site maps. It is also a good idea to add some text descriptions to each link so that users can use the site map to find pages within your site. If you do not add some text, a page with too many links may look like a “link farm page” to a spider, which could result in a penalty. If you are going to use an HTML site map, add a description to each link.
Although HTML site maps are still in use, they have been largely replaced with XML sitemaps. An XML sitemap is a text file in XML file format that stores a list of URLs within a site. The data in this file is stored in a special format which is easily read by search engine spiders. The best news about an XML sitemap is that as of November of 2006, all three major search engines recognize a common standard for an XML sitemap, so you only need to set up a single XML file.
This announcement greatly simplified the use of sitemaps.
It gets better. In April of 2007, the major search engines agreed on a standard for telling spiders where to find a sitemap.
You do not need to provide links from web pages to an XML sitemap. In order to tell a spider where to find your sitemap, you merely have to add the following directive to the robots.txt file in the root directory. This is the format (substitute your domain name):
The standard naming convention for a sitemap is simply sitemap.xml and the sitemap should ideally be placed in the root directory. The addition of the directive in the robots.txt file is there to notify spiders that they should check out your sitemap.
How to build a sitemap using free online tools
The easiest way to build a sitemap is to use one of many free online tools. I’ve been using the freebie tool found at xml-sitemaps.com. To use the tool, simply enter your site’s home page URL and let it run. A file will be generated that contains all of the URLs in your site and the file will be in the proper XML format.
Editing the XML sitemap file
Before copying the sitemap to your site’s root directory, it may be a good idea to edit some of the entries. The XML sitemap protocol includes a priority tag, which allows you to place a level of importance on each page. The scale runs from 0.0 (lowest priority) to 1.0 (highest priority). The home page should be set at 1.0, while most other pages should be set from 0.5 to 0.8. Form, contact pages and other web pages that do not contain content can be set at 0.0 or 0.1. How a search engine uses the priority tags is not real clear, but theoretically it should help them to identify the pages that they should focus on, while disregarding insignificant pages.
The XML file can be edited with Notepad or any type of pure text editor. Do not edit it with Word or any type of Word processor that may embed invisible codes that might make it difficult for a spider to read. It must be saved as a pure text file.
Once again, an XML sitemap will not benefit a site that is already getting indexed properly and completely. But if you have a large or complex web site and the site needs some help, an XML sitemap may be the best solution.