Crawlers, Spiders, Robots

An instructional website on Internet literacy for teachers

Crawlers, Spiders, Robots
So there are over 4 million sites in the Web now and the new ones are added every single day, great, wow. The problem is, how do you find exactly what you are looking for? Sounds like the proverbial needle in a haystack dilemma. The Web, being a collection of webpages that reside in millions of computers all over the world, is not organized in any orderly fashion as we would hope it to be. There are no catalogs listing titles, authors, and topics in any particular alphabetical, chronological, or numerical order. This is the main reason why search search engines were developed.
A search engine does not exactly go forth and search these millions of computers for the information you asked for. Search engines are programs that search through databases of HTML documents that are indexed by key words. Search engines rely on software programs called robots to build these databases. Web robots are often referred to as crawlers, spiders, wanderers, worms, ants, and even bots for short. Don't be misled by their names, because robots don't literally move from one site to another. Rather, the software visits a site then scans it for links to other sites and moves on to these other sites. Robots of major search sites can visit a million or more sites a day. They build databases by indexing the contents of Web sites. Depending on how these were programmed, indexing robots parse web pages the titles, the description, the first few paragraphs, and meta tags, or even the entire body of the document. So if I use the word "internet" and "robots" in this page 50 times there is a good chance this page will be pulled up by a search engine in response to a request for "Internet robots." But then this page would obviously not make any sense to anybody. This is where meta tags come in handy.
Tags are codes that tell browsers how to display text, images, and other files in a web page. For example, <I> this </I> the "bracketed" letters are the tags that instruct your browser to display the word this in italics like so. Meta tags are different because these provide information that are not displayed on the web page itself. This includes the author, content, and description of the page. Robots and search engines use keywords and descriptions in meta tags to index HTML documents.
Robots serve many purposes other than indexing. There are robots that do nothing but check or validate links and web pages, robots that monitors new sites, and robots that verify mirror sites -- a website that is replicated in other networks or servers.

Exercise
Press the keys Ctrl and U at the same time to view the page source of this document, or click the View menu in the toolbar above and select Page Source. This command will open another window that will show you the HTML tags of this page. The first lines that start with "meta name" are the meta tags:

Back
Next

Tutorials Menu

Home || Search || Quiz || References || Feedback || Standards || Assessment || Author

Antoinette.Go@usm.edu
Copyright © 2000, All Rights Reserved
http://www.tonettego.net



`An instructional website on Internet literacy for teachers`

Crawlers, Spiders, Robots

Exercise

Home ||  Search || Quiz || References || Feedback || Standards || Assessment || Author

Antoinette.Go@usm.edu

`Exercise`

Home || Search || Quiz || References || Feedback || Standards || Assessment || Author