Text::Bloom 1.07
Sponsored Links
Text::Bloom 1.07 Ranking & Summary
File size:
0.013 MB
Platform:
Any Platform
License:
Perl Artistic License
Price:
Downloads:
802
Date added:
2007-08-14
Publisher:
Andrea Spinelli and Walter Vannini
Text::Bloom 1.07 description
Text::Bloom can evaluate Bloom signature of a set of terms.
SYNOPSIS
my $b = Text::Bloom->new();
$b->Compute( qw( foo bar baz ) );
my $sig = $b->WriteToString();
$b->WriteToFile( afile.sig );
my $b2 = Text::Bloom::NewFromFile( afile.sig );
my $b3 = Text::Bloom->new();
$b3->Compute( qw( foo bar barbaz ) );
my $sim = $b->Similarity( $b2 );
my $b4 = Text::Bloom::NewFromString( $sig );
Text::Bloom applies the Bloom filtering technique to the statistical analysis of documents.
The terms in the document are quantized using a base-36 radix representation; each term thus corresponds to an integer in the range 0..p-1, where p is a prime, currently set to the greatest prime less than 2^32.
Each quantized value is mapped to d integers in the range 0..size-1, where size is an integer less than p, currently 2^17, using a family of hash functions, computed by the HashV function.
Each hashed value is used as the index in a large bit vector. Bits corresponding to terms present in the document are set to 1; all other bits are set to 0.
Of course, collisions may cause the same bit to be set twice, by different terms. It follows that, if the document contains n distinct terms, in the resulting bit vector at most n * d bits are set to 1.
The resulting bit string is a very compact representation of the presence/absence of terms in the document, and is therefore characterised as a signature. Moreover, it does not depend on a pre-set dictionary of terms.
The signature may be used for:
testing whether a given set of terms is present in the document,
computing which fraction of terms are common to two documents.
The bit representation may be written to and read from a file. Text::Bloom prepends a header to the bit stream proper; moreover, whenever the package Compress::Zlib is available, the bit vector is compressed, so that disk space requirements are drastically reduced, especially for small documents.
The hash function is obviously a crucial component of the filter; the reference implementation uses a radix representation of strings. Each term must therefore match the regular expression /[0-9a-z]+/.
There are quite a few viable alternatives, which can be pursued by subclassing and redefining the method QuantizeV.
SYNOPSIS
my $b = Text::Bloom->new();
$b->Compute( qw( foo bar baz ) );
my $sig = $b->WriteToString();
$b->WriteToFile( afile.sig );
my $b2 = Text::Bloom::NewFromFile( afile.sig );
my $b3 = Text::Bloom->new();
$b3->Compute( qw( foo bar barbaz ) );
my $sim = $b->Similarity( $b2 );
my $b4 = Text::Bloom::NewFromString( $sig );
Text::Bloom applies the Bloom filtering technique to the statistical analysis of documents.
The terms in the document are quantized using a base-36 radix representation; each term thus corresponds to an integer in the range 0..p-1, where p is a prime, currently set to the greatest prime less than 2^32.
Each quantized value is mapped to d integers in the range 0..size-1, where size is an integer less than p, currently 2^17, using a family of hash functions, computed by the HashV function.
Each hashed value is used as the index in a large bit vector. Bits corresponding to terms present in the document are set to 1; all other bits are set to 0.
Of course, collisions may cause the same bit to be set twice, by different terms. It follows that, if the document contains n distinct terms, in the resulting bit vector at most n * d bits are set to 1.
The resulting bit string is a very compact representation of the presence/absence of terms in the document, and is therefore characterised as a signature. Moreover, it does not depend on a pre-set dictionary of terms.
The signature may be used for:
testing whether a given set of terms is present in the document,
computing which fraction of terms are common to two documents.
The bit representation may be written to and read from a file. Text::Bloom prepends a header to the bit stream proper; moreover, whenever the package Compress::Zlib is available, the bit vector is compressed, so that disk space requirements are drastically reduced, especially for small documents.
The hash function is obviously a crucial component of the filter; the reference implementation uses a radix representation of strings. Each term must therefore match the regular expression /[0-9a-z]+/.
There are quite a few viable alternatives, which can be pursued by subclassing and redefining the method QuantizeV.
Text::Bloom 1.07 Screenshot
Text::Bloom 1.07 Keywords
Bloom 1.07
in the document
set to
terms
bit
signature
document
bloom
1.07
Text::Bloom
TextBloom
Text::Bloom 1.07
Libraries
Programming
Bookmark Text::Bloom 1.07
Text::Bloom 1.07 Copyright
WareSeeker periodically updates pricing and software information of Text::Bloom 1.07 full version from the publisher, so some information may be slightly out-of-date. You should confirm all information before relying on it. Software piracy is theft, Using crack, password, serial numbers, registration codes, key generators is illegal and prevent future development of Text::Bloom 1.07 Edition. Download links are directly from our publisher sites, torrent files or links from rapidshare.com, yousendit.com or megaupload.com are not allowed
Featured Software
Want to place your software product here?
Please contact us for consideration.
Contact WareSeeker.com
Related Information
terms definitions
medical terms
terms of engagement
terms of endearment
terms associated with
literary terms
terms of endearment part 1
geometry terms
legal terms
terms of endearment list
computer terms
financial terms
terms and conditions
terms of endearment trailer
cooking terms
musical terms
terms of trade
accounting terms
Related Software
execline is a small, non-interactive, shell-like scripting language. Free Download
XAO Suite is a set of perl modules created primarily for building dynamic, database driven web sites. Free Download
Ident2 is an alternative approach to auth/ident services. Free Download
This simple script will show random text or HTML every time a page is loaded. Free Download
Math::Roman contains arbitrary sized Roman numbers and conversion from and to Arabic. Free Download
Bloom::Filter is a sample Perl Bloom filter implementation. Free Download
Autocomp is an accompaniment generator written in Perl and Csound. Free Download
TextSearch is a program that helps you search through a set of text files which are in a hierarchical structure. Free Download
Latest Software
Popular Software
Favourite Software