tinbanner.gif (6408 bytes)


Your free e-store?

Internet-EDI? ICEshop! (Dutch)
Knowledge4Free: gratis kennis!


Search Robot Test 1998:
Infoseek Performs Best in Dutch Test

 Cyber Ventures & prof dr Martijn Hoogeveen  (results are published in Dutch in Computable; see also test 1997 and test 1996).


 

Infoseek HotBot Excite Alta-Vista Ilse Zoek.nl Lycos Webcrawler
Raking Test 98

1

2

3

4

5

6

7

8

Ranking Test 97

6

-

1

2

5

-

4

3

Ranking Test 96

4

-

2

-

-

-

1

3

Table 1. Infoseek makes its comeback. Dutch search engines are also in their own domain just mediocre.  

 
Figure 1. End score (effectiveness): Infoseek is most succesful in Dutch language domain. Excite falls back to an honourable third place after HotBot, which is tested for the first time. Ilse en Zoek.nl are even on their own Dutch language domain  just in the sub top.
 

Figuur 2. Infoseek, Lycos and Hotbot are number 1,2 and 3 with regard to precision, i.e. the degree to which the search result sets contain irrelevant or outdated links.

Figuur 3. AltaVista produces the best recall, i.e. is the best in finding the needle in the  WWW-haystack.


Functional comparison

 

  

 

Lycos Excite Webcrawler Infoseek Alta-Vista Hotbot Ilse Zoek
Domein Web (Benelux), sound, images  Web (world/europe/Nederland), usenet Web, gopher Web, usenet, email Web, usenet Web, usenet Nederland web & email web
Help function Voldoende Goed Voldoende Voldoende Goed Voldoende Voldoende Onvol-doende
Redundancy check Onvoldoende Not sufficient YES YES YES YES Not sufficient Not sufficient
Boolean logic beperkt (AND of OR) Limited Limited Limited YES YES Limited NO
Concept based search Nee YES NO NO NO NO NO NO
Ranking Ja YES YES YES YES YES YES YES
Proximity search NO NO YES YES YES NO NO YES
Advanced search interface Ja NO NO NO YES NO Limited NO
Multilingual Thesaurus NO NO NO NO NO NO NO NO
Natural language interface NO NO YES NO NO NO NO NO
Time limit NO NO NO NO NO YES NO NO

 
Table 2. functional comparison: from year to year there is little progress in the visible functionality of the search engines..
 

Backgrounds

The biggest problem on the Web is not finding information, because you can easily die in it, but finding answers to your (burning) questions. In general, there are two possibilities for you when you want to find those answers: a directory or a Search Engine, a search robot. Good directories are available in all shapes and sizes. The Yahoo, the Internet equivalent of a yellow pages guide, is a well known example. Directories and other limited database are especially useful if you have questions that apply to the specific domain of the directory or database. In many cases, however, it is not clear from the start if a directory or database is available covering the relevant domain. In these cases a search robot that is covering the whole Web may show its value. But, it should be a good one...!

It is in particular this question "which search robots on the Web are really good" that excercises many Web-minds. Many complaints about fruitless searches, uncertainty about the domains a search robot is covering, etc. give websurfers something to talk about. So, time is ripe to put the most well-known search robots through the mill.

Below an explanation is added with regard to the analyzed characteristics of the search robots.

Some search terms explained

  • Precision. In general: the percentage of hits that are relevant.
  • Recall. In general a percentage that gives an indication of the degree to which all existing, relevant documents are retrieved. As there is no comprehensive insight in the dynamic contents of the gigantic web databases we have taken in this test the percentage of searches that leads to at least one relevant hit.
  • Redundancy check Are double hits filtered out the search result?
  • Concept based search. Is the meaning of key words considered? Are, for example, synonyms or homonyms used?
  • Ranking. Are hits ordered in a ranking list in the order of the degree to which hits match the search terms?
  • Proximity search. Is there the possibility to indicate whether search terms should be adjacent to each other, in the same sentence, the same paragraph, or somewehere in the same text?
  • Response time. Measuring response times does not result in very reliable measures. Therefore this criterium has only a small weight for the end score. By measuring for more searches and at the same time of the day, the results are at least indicative.
  • End score or effectiveness. The end score is the product of precision and recall.
  • Time limit. Can the search result be limited historically?

For futher explanation:

Hoogeveen, M. J., & Van der Meer, K. (1994). Integration of Information Retrieval and Database Management in Support of Multimedia Police Work. Journal of Information Science, 20(2), 79-87.

Ph.D thesis Multimedia Retrieval

© 1995-2002 Martijn Hoogeveen