WAIT 1.800
Sponsored Links
WAIT 1.800 Ranking & Summary
File size:
0.096 MB
Platform:
Any Platform
License:
Perl Artistic License
Price:
Downloads:
891
Date added:
2007-05-16
Publisher:
Ulrich Pfeifer
WAIT 1.800 description
WAIT Perl module is a rewrite of the freeWAIS-sf engine in Perl and XS.
The central idea of the system is to provide a framework and the building blocks for any indexing and search system the users might want to build. Obviously the framework limits the class of system which can be build.
+------+ +-----+ +------+
==> |Access| ==> |Parse| ==> | |
+------+ +-----+ | |
|| | | +-----+
|| |Filter| ==> |Index|
/ | | +-----+
+-------+ +-----+ | |
<= |Display| <== |Query| <-> | |
+-------+ +-----+ +------+
A collection (aka table) is defined by the instances of the access and parse module together with the filter definitions. At query time in addition a query and a display module must be choosen.
Access
The access module defines which documents are members of a database. Usually an access module is a tied hash, whose keys are the Ids of the documents (did = document id) and whose values are the documents themselves. The indexing process loops over the keys using FIRSTKEY and NEXTKEY. Documents are retrieved with FETCH.
By convention access modules should be members of the WAIT::Document hierarchy. Have a look at the WAIT::Document::Split module to get the idea.
Parse
The task of the parse module is to split the documents into logical parts via the split method. E.g. the WAIT::Parse::Nroff splits manuals piped through nroff(1) into the sections name, synopsis, options, description, author, example, bugs, text, see, and environment. Here is the implementation of WAIT::Parse::Base which handles documents with a pretty simple tagged format:
AU: Pfeifer, U.; Fuhr, N.; Huynh, T.
TI: Searching Structured Documents with the Enhanced Retrieval
Functionality of freeWAIS-sf and SFgate
ER: D. Kroemker
BT: Computer Networks and ISDN Systems; Proceedings of the third
International World-Wide Web Conference
PN: Elsevier
PA: Amsterdam - Lausanne - New York - Oxford - Shannon - Tokyo
PP: 1027-1036
PY: 1995
sub split { # called as method
my %result;
my $fld;
for (split /n/, $_[1]) {
if (s/^(S+):s*//) {
$fld = lc $1;
}
$result{$fld} .= $_ if defined $fld;
}
return %result;
}
Since the original document cannot be reconstructed from its attributes, we need a second method (tag) which marks the regions of the document with tags for the different attributes. This tagged form is used by the display module to hilight search terms in the documents. Besides the tags for the attributes, the method might assign the special tags _b and _i for indicating bold and italic regions.
sub tag {
my @result;
my $tag;
for (split /n/, $_[1]) {
next if /^ww:s*$/;
if (s/^(S+)://) {
push @result, {_b => 1}, "$1:";
$tag = lc $1;
}
if (defined $tag) {
push @result, {$tag => 1}, "$_n";
} else {
push @result, {}, "$_n";
}
}
return @result; # we dont go for speed
}
Obviously one could implement split via tag. The reason for having two functions is speed. We need to call split for each document when indexing a collection. Therefore speed is essential. On the other hand, tag is called in order to display a single document and may be a little slower. It may care about tagging bold and italic regions. See WAIT::Parse::Nroff how this might decrease performance.
Filter definition
From the Information Retrieval perspective, the hardest part of the system is the filter module. The database administrator defines for each attribute, how the contents should be processed before it is stored in the index. Usually the processing contains steps to restrict the character set, case transformation, splitting to words and transforming to word stems. In WAIT these steps are defined naturally as a pipeline of processing steps. The pipelines are made up by functions in the package WAIT::Filter which is pre-populated by the most common functions but may be extended any time.
The equivalent for a typical freeWAIS-sf processing would be this pipeline:
[ isotr, isolc, split2, stop, Stem]
The function isotr replaces unknown characters by blanks. isolc transforms to lower case. split2 splits into words and removes words shorter than two characters. stop removes the freeWAIS-sf stopwords and Stem applies the Porter algorithm for computing the stem of the words.
The filter definition for a collection defines a set of pipelines for the attributes and modifies the pipelines which should be used for prefix and interval searches.
Several complete working examples come with WAIT in the script directory. It is recommended to follow the pattern of the scripts smakewhatis and sman.
The central idea of the system is to provide a framework and the building blocks for any indexing and search system the users might want to build. Obviously the framework limits the class of system which can be build.
+------+ +-----+ +------+
==> |Access| ==> |Parse| ==> | |
+------+ +-----+ | |
|| | | +-----+
|| |Filter| ==> |Index|
/ | | +-----+
+-------+ +-----+ | |
<= |Display| <== |Query| <-> | |
+-------+ +-----+ +------+
A collection (aka table) is defined by the instances of the access and parse module together with the filter definitions. At query time in addition a query and a display module must be choosen.
Access
The access module defines which documents are members of a database. Usually an access module is a tied hash, whose keys are the Ids of the documents (did = document id) and whose values are the documents themselves. The indexing process loops over the keys using FIRSTKEY and NEXTKEY. Documents are retrieved with FETCH.
By convention access modules should be members of the WAIT::Document hierarchy. Have a look at the WAIT::Document::Split module to get the idea.
Parse
The task of the parse module is to split the documents into logical parts via the split method. E.g. the WAIT::Parse::Nroff splits manuals piped through nroff(1) into the sections name, synopsis, options, description, author, example, bugs, text, see, and environment. Here is the implementation of WAIT::Parse::Base which handles documents with a pretty simple tagged format:
AU: Pfeifer, U.; Fuhr, N.; Huynh, T.
TI: Searching Structured Documents with the Enhanced Retrieval
Functionality of freeWAIS-sf and SFgate
ER: D. Kroemker
BT: Computer Networks and ISDN Systems; Proceedings of the third
International World-Wide Web Conference
PN: Elsevier
PA: Amsterdam - Lausanne - New York - Oxford - Shannon - Tokyo
PP: 1027-1036
PY: 1995
sub split { # called as method
my %result;
my $fld;
for (split /n/, $_[1]) {
if (s/^(S+):s*//) {
$fld = lc $1;
}
$result{$fld} .= $_ if defined $fld;
}
return %result;
}
Since the original document cannot be reconstructed from its attributes, we need a second method (tag) which marks the regions of the document with tags for the different attributes. This tagged form is used by the display module to hilight search terms in the documents. Besides the tags for the attributes, the method might assign the special tags _b and _i for indicating bold and italic regions.
sub tag {
my @result;
my $tag;
for (split /n/, $_[1]) {
next if /^ww:s*$/;
if (s/^(S+)://) {
push @result, {_b => 1}, "$1:";
$tag = lc $1;
}
if (defined $tag) {
push @result, {$tag => 1}, "$_n";
} else {
push @result, {}, "$_n";
}
}
return @result; # we dont go for speed
}
Obviously one could implement split via tag. The reason for having two functions is speed. We need to call split for each document when indexing a collection. Therefore speed is essential. On the other hand, tag is called in order to display a single document and may be a little slower. It may care about tagging bold and italic regions. See WAIT::Parse::Nroff how this might decrease performance.
Filter definition
From the Information Retrieval perspective, the hardest part of the system is the filter module. The database administrator defines for each attribute, how the contents should be processed before it is stored in the index. Usually the processing contains steps to restrict the character set, case transformation, splitting to words and transforming to word stems. In WAIT these steps are defined naturally as a pipeline of processing steps. The pipelines are made up by functions in the package WAIT::Filter which is pre-populated by the most common functions but may be extended any time.
The equivalent for a typical freeWAIS-sf processing would be this pipeline:
[ isotr, isolc, split2, stop, Stem]
The function isotr replaces unknown characters by blanks. isolc transforms to lower case. split2 splits into words and removes words shorter than two characters. stop removes the freeWAIS-sf stopwords and Stem applies the Porter algorithm for computing the stem of the words.
The filter definition for a collection defines a set of pipelines for the attributes and modifies the pipelines which should be used for prefix and interval searches.
Several complete working examples come with WAIT in the script directory. It is recommended to follow the pattern of the scripts smakewhatis and sman.
WAIT 1.800 Screenshot
WAIT 1.800 Keywords
WAIT
XS
WAIT 1.800
WAIT Perl
Perl module
Perl
module
words
system
engine
filter
WAIT 1.800
Libraries
Programming
Bookmark WAIT 1.800
WAIT 1.800 Copyright
WareSeeker periodically updates pricing and software information of WAIT 1.800 full version from the publisher, so some information may be slightly out-of-date. You should confirm all information before relying on it. Software piracy is theft, Using crack, password, serial numbers, registration codes, key generators is illegal and prevent future development of WAIT 1.800 Edition. Download links are directly from our publisher sites, torrent files or links from rapidshare.com, yousendit.com or megaupload.com are not allowed
Featured Software
Want to place your software product here?
Please contact us for consideration.
Contact WareSeeker.com
Related Information
Version History
Related Software
Apache::Storage is Perl module containing simple functions to store and retrieve information from within the Apache process. Free Download
AudioCD is a Perl module for basic Audio CD control. Free Download
Test::Data is a Perl module to test functions for particular variable types. Free Download
PLJava is Perl module that will embed Perl into Java. Free Download
Acme::Tests is a Perl module to see how much do you know. Free Download
PerlIO is a Perl module created to load on demand PerlIO layers and root of PerlIO::* name space. Free Download
DBD::Teradata is Perl module with a DBI driver for Teradata. Free Download
Test::WWW::Accessibility is a Perl module to test web pages for accessibility. Free Download
Latest Software
Popular Software
Favourite Software