tinbanner.gif (6408 bytes)


Your free e-store?

Internet-EDI? ICEshop! (Dutch)
Knowledge4Free: gratis kennis!


Benchmark Search Robot of the Year 1996:
With a small Difference is Lycos the Best

by prof dr Martijn Hoogeveen (martijn@cyber-ventures.com)
(published in Informatie, January 1996)

SEE ALSO TEST 1997


Conclusion
Introduction
5 Search Robots Compared
Search terms explained
Results
References


Conclusion: Lycos and Excite are the best of this test. Lycos has the least problems with finding needles in haystacks(highest recall) and Excite delivers the least irrelevant hits (highest precision). By a hair the Lycos is "the best search robot of the web". Webcrawler is reasonable, having the best response time(on the average only 6 seconds!). Infoseek and Open Text are very moderate. Open Text has, apart from having access to ftp sites, no added value compared to the others.


Introduction

The biggest problem on the Web is not finding information, because you can easily die in it, but finding answers to your (burning) questions. In general, there are two possibilities for you when you want to find those answers: a directory or a Search Engine, a search robot. Good directories are available in all shapes and sizes. The Yahoo, the Internet equivalent of a yellow pages guide, is a well known example. Directories and other limited database are especially useful if you have questions that apply to the specific domain of the directory or database. In many cases, however, it is not clear from the start if a directory or database is available covering the relevant domain. In these cases a search robot that is covering the whole Web may show its value. But, it should be a good one...!

It is in particular this question "which search robots on the Web are really good" that excercises many Web-minds. Many complaints about fruitless searches, uncertainty about the domains a search robot is covering, etc. give websurfers something to talk about. So, time is ripe to put the most well-known search robots through the mill.

Via the Netscape NetSearch directory and other overviews five search robots are selected: Webcrawler, Lycos, Infoseek, Open Text, and Excite. Successively the functionality of these search robots are analyzed and ten searches test is performed on each of those five. The results in table 2. Below an explanation is added with regard to the analyzed characteristics of the search robots. The 10 questions are selected in such a way to take care of a varied mix that bring the strong and weak points of the search robots to the light. Some should be a piece of cake for any search robot - however bad (Clinton AND White House, Brasil, Beatles). Some questions ask for specific sites (NEC), sometimes a needle in a haystack (Cyber Ventures). Some difficult questions are added with regard to a historic topic (Flower Power) and hotels in Rome to make a reservation. Some syntactically complex questions are included with many key words (the three last ones), the boolean logic of which could only roughly be translated in the syntax of the search robots.

Five Search Robots Compared

In general, the search functionality of the search robots is very basic. That fits the needs of most web surfers. For more advanced searchers a standardized - boolean - syntax should be appreciated to get more grip on the search results. Below follows a short discussion of the search results:

  • Webcrawler. The fastest search robot of these five. America Online(AOL) seems to possess the copyrights of Webcrawler. Webcrawler is a simple search robot for gopher and the web. Webcrawlers gives reasonable search results; end score: 0.67. (http://webcrawler.com/)
  • Lycos. This well-known search robot is the "best search robot of the Web" according to this test (end score: 0.80), winning by a hair from the Excite. The recall is very good: Lycos claims to have indexed 91% of existing web pages; in our test Lycos scores a recall of 90%. Disappointing, however, is the response time (on the average 32.7 seconds). (http://www.lycos.com/)
  • Infoseek A very moderate search robot for the Web; end score: 0.63. Infoseek is particularly useful if you are looking for papers about certain topics. Positive is the good help function.( http://www2.infoseek.com/)
  • Open Text. A very moderate search robot (end score: 0.59) that provides a confusing hotch-potch of references to ftp, gopher and web sites. The precision is not high. Searches results in references to libraries with only very few relevant hits. The ranking mechanism is very unintelligent. Given this test without any doubt the "worst search robot". (http://www.opentext.com:8080/omw/fomw.html)
  • Excite. An honourable number 2 according to this test; end score: 0.79. A useful search robot for the web en usenet. Excite has the highest precision, because of its implementation of concept based retrieval and an intelligent ranking mechanism. (http://www.excite.com/)

Search terms explained

  • Precision. In general: the percentage of hits that are relevant. In this test only the relevance of the first 10 hits are taken into account.
  • Recall. In general a percentage that gives an indication of the degree to which all existing, relevant documents are retrieved. As there is no comprehensive insight in the dynamic contents of the gigantic web databases we have taken in this test the percentage of searches that leads to at least one relevant hit.
  • Redundancy check Are double hits filtered out the search result?
  • Concept based search. Is the meaning of key words considered? Are, for example, synonyms or homonyms used?
  • Ranking. Are hits ordered in a ranking list in the order of the degree to which hits match the search terms?
  • Proximity search. Is there the possibility to indicate whether search terms should be adjacent to each other, in the same sentence, the same paragraph, or somewehere in the same text?
  • Response time. Measuring response times does not result in very reliable measures. Therefore this criterium has only a small weight for the end score. By measuring for more searches and at the same time of the day, the results are at least indicative.
  • End score. The end score depends for about 90% on the precision and recall scores, the effectiveness of the search robot. Other characteristics only lead to adding or distracting 1 percent point to/from the score.

For futher explanation of terms see: Hoogeveen & Van der Meer (1994).

Results

 

Webcrawler

Lycos

Infoseek

Open Text

Excite

Clear what
to expect?

    No

Yes

No, expl.

Yes

Yes

Domain

Web,
gopher

Web

Web

Web, ftp,
gopher

Web, usenet

Precision

56%

72%

62%

46%

78%

'Recall'

80%

90%

70%

70%

80%

Redundancy
check

No

Yes,
incomplete

No

Yes,
incomplete

Yes

Boolean
operators

No,
rudimentary

No,
rudimentary

No,
rudimentary

No

No

Concept
based
search

No

No

No

No

Yes

Ranking

Yes

Yes

No

Yes

Yes

Proximity
search

No

No

Yes

No

No

Help

No

Yes, too
general

Yes,
good

Yes, too
simple

Yes,
reasonable
text book

Mean response
time in seconds

6

32.7

13.1

30.9

26.8

Setting no.
of hits

Yes

Yes

No

No

No

No. of hits

10,25,100

10-40
per page

Multitude
10

Multitude
10

Multitude
10

END SCORE

0.67

0.80

0.63

0.59

0.79

References

Hoogeveen, M. J., & Van der Meer, K. (1994). Integration of Information Retrieval and Database Management in Support of Multimedia Police Work. Journal of Information Science, 20(2), 79-87.

© 1995-2002 Martijn Hoogeveen