Top Rank Solutions

What is a search engine spider?


by Craig Mazur - Copyright 2003-2006 - All rights reserved

December 17, 2003            Updated: February 8, 2006

Crawlers, robots, algorithms, Web bots and spiders. They feed upon the content of Web pages. They skitter through the endless maze of electronic paths of Cyberspace. Seeking, following, comparing, ranking, and ultimately passing judgement upon each Web page they find, assigning it a position relative to all others. Given the fact that there are billions of Web pages and documents on the Internet (some estimate 20+ billion), and millions being added each month, their endless task is formidable.

These sneaky little spiders are smart little critters. But just what is a search engine spider, and what do they look like? Where did they come from, and how do they travel from site to site? Are they like viruses passing from one computer to another? Just what type of creatures are these busy little guys?

No matter what you want to call them, it is usually considered to be a good thing when a search engine spider visits your Web site. These mysterious little entities can come calling at any time of the day or night. While there, they seemingly scurry through your site and curiously follow links from page to page. They inspect and dissect the code in your Web pages and form an opinion about the quality of your content. Finally, they determine which keywords best represent your pages and rank each relative to all the thousands or sometimes millions of Web pages with similar content.

Their monikers may vary, but each invokes an image of some type of tireless, intelligent, biological or mechanical creature unleashed and roaming the Internet. The reality is much simpler, but doesn't taunt the imagination with the same visual appeal. Search engine spiders are just computer algorithms--cold and lifeless computer code.

A little history

Way back in the early 1990s--eons ago in Cyberspace time--the Internet was primarily made up of a network of servers containing text documents. No images or sound. No Flash, videos or other multimedia. Just text documents. In those days a user had to know a specific address for a site or a document in order to find it. Directories were created where Webmasters could post information about their Web sites so users could find them. As the amount of content on the Internet grew rapidly, the need arose for a more methodical way to gather and index information. Out of this concept arose a range of computer program solutions, which programmers and denizens of the Internet whimsically called robots or spiders in order to invoke images of tireless, methodical beings with free range over the Internet. Meta tags were added to documents to provide the spiders with a description of a page and keywords to be used to find it. As the World Wide Web came into existence and evolved, those methods became outdated and prone to easy deceit.

With billions of Web pages and documents on today's World Wide Web, access to valuable information and content becomes easily diluted, making it difficult to find the exact information you are looking for. The concept of search engines has developed into systems that not only search for and index as many Web pages and documents as they can find, they automatically analyze and categorize the content they find. The ultimate goal of every modern search engine is to provide the most meaningful search results for their users. After all, if users feel that their keyword searches are not producing the results they expect, they will find a search engine that will satisfy their needs.

Current search engine algorithms have evolved into very sophisticated computer programs that request a Web page similar to the way a browser requests a page. A request is sent out from the search engine's computer to retrieve a copy of the Web page code for a specific URL. The code is transferred to the search engine's computer. But instead of displaying the Web page, the algorithm parses, dissects and analyzes the code. The algorithm is made up logical rules that allow it to automatically make hundreds of judgements about a Web page and its content. Based upon this analysis, points and demerits are assigned and are used to rank the page. The algorithm also parses out any other URLs it finds and adds them to its database. Requests are sent out for those URLs and the cycle continues over and over.

All search engine algorithms are proprietary and the rules each one applies to Web pages are closely guarded secrets. They remain secret in order to prevent people from circumventing or defeating them. Many search engines publish guidelines that suggest the right and wrong ways to do things. Other techniques have evolved through trial-and-error that when applied to Web pages have shown to produce positive results. These are called "search engine friendly" techniques, and they are the basis for the Top Rank Solutions philosophy.

Algorithms are in a constant state of evolution and frequently change their ranking criteria without warning. Changes in algorithms become apparent only after large numbers of Web pages change positions in a search engine's results. Sometimes a Web page moves to a higher rank position, and sometimes to a lower position. An elevation to a higher position can mean that a page exhibits traits that are deemed to be better or more significant than similar pages. Sometimes a page that previously ranked well is demoted in rank.

For several years the intelligence of search engine algorithms lagged behind techniques being developed to artificially elevate a page's rank through techniques referred to as "spam" that were designed to fool a search engine algorithm or feed it erroneous information. In recent times, most algorithms have become sophisticated enough to detect these methods and now will penalize a Web page if it detects something in the page design or code that it determines to be suspicious. Some violations are deemed to be so egregious that an entire site may be temporarily or permanently removed from a search engine's index. The best long-term plan for dealing with these issues is to adopt the search engine friendly philosophy and apply it to your Web development techniques.

Sorry to disillusion those of you that may believe otherwise, but there are no friendly, energetic little arachnids or other mythical or mechanical beings cruising the Internet or rummaging through your Web site. And any intelligence that search engine spiders exhibit is not the enlightened work of metaphysical creatures from Cyberspace. A spider is just a metaphor for an algorithm, and an algorithm is just the work of mere mortal programmers.

Top Rank Solutions is located near Phoenix in Mesa, Arizona, and offers services for customers throughout the United States.