tinbanner.gif (6408 bytes)


Your free e-store?

Internet-EDI? ICEshop! (Dutch)
Knowledge4Free: gratis kennis!


Search Robot Test '97:
Excite is the Best!

prof dr Martijn Hoogeveen  (results are published in Computable; see also test 1998 & test 1996).


 

  Excite Alta Vista Web Crawler Lycos Infoseek Open Text
Ranking Test 96

2

-

3

1

4

5

Ranking Test 97

1

2

3

4

5

(excluded)

 


Functional comparison

 

Lycos

Excite

Webcrawler

Infoseek

Alta Vista

Domain

Web

Web, usenet

Web, gopher

Web, usenet, email

Web, usenet

Help

Sufficient

Good

Sufficient

Sufficient

Good

Redundancy 
check

Yes

No

Yes

No

Yes

Boolean 
logic

Limited

Limited

Limited

Limited

Yes

Concept 
based 
search

No

Yes

No

No

No

Ranking

Yes

Yes

Yes

Yes

Yes

Proximity 
search

No

No

Yes

Yes

Yes

Advanced search screen

Yes, hard to find

No

No

No

Yes

Multilingual thesaurus

No

No

No

No

No

Natural language interface

No

No

No

No

No

Backgrounds

The biggest problem on the Web is not finding information, because you can easily die in it, but finding answers to your (burning) questions. In general, there are two possibilities for you when you want to find those answers: a directory or a Search Engine, a search robot. Good directories are available in all shapes and sizes. The Yahoo, the Internet equivalent of a yellow pages guide, is a well known example. Directories and other limited database are especially useful if you have questions that apply to the specific domain of the directory or database. In many cases, however, it is not clear from the start if a directory or database is available covering the relevant domain. In these cases a search robot that is covering the whole Web may show its value. But, it should be a good one...!

It is in particular this question "which search robots on the Web are really good" that excercises many Web-minds. Many complaints about fruitless searches, uncertainty about the domains a search robot is covering, etc. give websurfers something to talk about. So, time is ripe to put the most well-known search robots through the mill.

Below an explanation is added with regard to the analyzed characteristics of the search robots.

Some search terms explained

  • Precision. In general: the percentage of hits that are relevant.
  • Recall. In general a percentage that gives an indication of the degree to which all existing, relevant documents are retrieved. As there is no comprehensive insight in the dynamic contents of the gigantic web databases we have taken in this test the percentage of searches that leads to at least one relevant hit.
  • Redundancy check Are double hits filtered out the search result?
  • Concept based search. Is the meaning of key words considered? Are, for example, synonyms or homonyms used?
  • Ranking. Are hits ordered in a ranking list in the order of the degree to which hits match the search terms?
  • Proximity search. Is there the possibility to indicate whether search terms should be adjacent to each other, in the same sentence, the same paragraph, or somewehere in the same text?
  • Response time. Measuring response times does not result in very reliable measures. Therefore this criterium has only a small weight for the end score. By measuring for more searches and at the same time of the day, the results are at least indicative.
  • End score or effectiveness. The end score is the product of precision and recall.

For futher explanation:

Hoogeveen, M. J., & Van der Meer, K. (1994). Integration of Information Retrieval and Database Management in Support of Multimedia Police Work. Journal of Information Science, 20(2), 79-87.

Ph.D thesis Multimedia Retrieval

© 1995-2002 Martijn Hoogeveen