2010/09/09

The Labs.Com Web Lab WebSherpa
Last update 1998/11/27

The Labs - Design & Functionality For The Net

Search Engine & Analyzation Tool

  1. Introduction
  2. Download
  3. Usage
  4. Limitations
  5. Examples
  6. Other Search Engines
WebSherpa
1. Introduction

WebSherpa was programmed within few hours as I like to have one compact tool to analyze my site, and as side-effect use it as search-engine as well.

Since the initial version (0.001) it has been improved with state-of-the-art search-engine features.

  • Index a web-site (URL based), this means only indexes pages which are 100% accessible via your site, and not any hidden file you might have forgotten to remove from your server.
  • If search returns no result, then lexical close words are listed (maybe some mispelling happened). This feature will be further developed during the next versions.
  • Find missing links or pictures within your site (not yet)
  • Displays logical structure via GIF picture. (under development)

WebSherpa
2. Download

$MyVersion: 0.009 - Fri Nov 27 11:04:07 EST 1998 - kiwi$

websherpa (perl-source)

It (still) requires lynx.

NOTE: This is still alpha version of this program, it will be improved during the next weeks almost weekly.

WebSherpa
3. Usage

Index Site

 

 % ./websherpa -c http://mysite.com/fullpath/ 

The file created will be sherpa.index, unless you set with -i alt-file to advise to use another index-file.

Search Site

 

 % ./websherpa -s 'programming' 

with -i you use can force to use another index-file to search in.

CGI

 

 <form action=websherpa.cgi> 
 <input name=search> 
 </form> 

and create in the same directory a search.html with the line <!-- insert-result --> where you like to have the result output placed into. You can choose another filename, then edit source-code of WebSherpa.

Search:

WebSherpa
4. Limitations

Since WebSherpa is really small package, actually one single perl-script, there are some limitations you may know of:
  • Index a site with less of 5'000 pages, if your server has enough memory (>64MB) then 10'000 should work as well.
  • All URLs and its titles are loaded when a search is started, the searching is hash-based and doesn't take much memory as the word-list isn't loaded into the memory.
  • Indexing a large site takes also long time, since word extracting is done via Perl as well.

Most web-sites have less than 5'000 pages and therefore WebSherpa could be your engine.

WebSherpa
5. Examples

We used WebSherpa for dedicated search-engines at TheLabs:

Perl Search
Perl-manual searching
CPAN Search
CPAN Modules Search
PerlTK Search
PerlTK Manual Search
XLib Search
XLib-Manual Search

WebSherpa
6. Other Search Engines

HTDIG
Very good engine
SWISH
oldtimer, but still good
Enhanced SWISH
New improvements
GLIMPSE
Two pass search engine
SearchTools.Com
Good resource center

                                                                                                                                   

MyDBase Web LabWebTracker

Hipocrisy of the finest: "I agree that no single company can create all the hardware and software. Openness is central because it's the foundation of choice."
-- Steve Balmer (Microsoft) blaming Apple regarding iPhone, February 18, 2009

Last update 1998/11/27

All Rights Reserved - (C) 1997 - 2009 by The Labs.Com

Top of Page

The Labs.Com