Main > Programming > Libraries >

Text::Bloom 1.07

Text::Bloom 1.07

Sponsored Links

Text::Bloom 1.07 Ranking & Summary

RankingClick at the star to rank
Ranking Level
User Review: 0 (0 times)
File size: 0.013 MB
Platform: Any Platform
License: Perl Artistic License
Price:
Downloads: 802
Date added: 2007-08-14

Text::Bloom 1.07 description

Text::Bloom can evaluate Bloom signature of a set of terms.

SYNOPSIS

my $b = Text::Bloom->new();
$b->Compute( qw( foo bar baz ) );
my $sig = $b->WriteToString();
$b->WriteToFile( afile.sig );
my $b2 = Text::Bloom::NewFromFile( afile.sig );
my $b3 = Text::Bloom->new();
$b3->Compute( qw( foo bar barbaz ) );
my $sim = $b->Similarity( $b2 );
my $b4 = Text::Bloom::NewFromString( $sig );

Text::Bloom applies the Bloom filtering technique to the statistical analysis of documents.

The terms in the document are quantized using a base-36 radix representation; each term thus corresponds to an integer in the range 0..p-1, where p is a prime, currently set to the greatest prime less than 2^32.

Each quantized value is mapped to d integers in the range 0..size-1, where size is an integer less than p, currently 2^17, using a family of hash functions, computed by the HashV function.

Each hashed value is used as the index in a large bit vector. Bits corresponding to terms present in the document are set to 1; all other bits are set to 0.

Of course, collisions may cause the same bit to be set twice, by different terms. It follows that, if the document contains n distinct terms, in the resulting bit vector at most n * d bits are set to 1.

The resulting bit string is a very compact representation of the presence/absence of terms in the document, and is therefore characterised as a signature. Moreover, it does not depend on a pre-set dictionary of terms.

The signature may be used for:

testing whether a given set of terms is present in the document,
computing which fraction of terms are common to two documents.

The bit representation may be written to and read from a file. Text::Bloom prepends a header to the bit stream proper; moreover, whenever the package Compress::Zlib is available, the bit vector is compressed, so that disk space requirements are drastically reduced, especially for small documents.

The hash function is obviously a crucial component of the filter; the reference implementation uses a radix representation of strings. Each term must therefore match the regular expression /[0-9a-z]+/.

There are quite a few viable alternatives, which can be pursued by subclassing and redefining the method QuantizeV.

Text::Bloom 1.07 Screenshot

Advertisements

Text::Bloom 1.07 Keywords

Bookmark Text::Bloom 1.07

Hyperlink code:
Link for forum:

Text::Bloom 1.07 Copyright

WareSeeker periodically updates pricing and software information of Text::Bloom 1.07 full version from the publisher, so some information may be slightly out-of-date. You should confirm all information before relying on it. Software piracy is theft, Using crack, password, serial numbers, registration codes, key generators is illegal and prevent future development of Text::Bloom 1.07 Edition. Download links are directly from our publisher sites, torrent files or links from rapidshare.com, yousendit.com or megaupload.com are not allowed

Allok Video Splitter 2.2.0 Review:

Name (Required)
Email(Required)
Captcha
Featured Software

Want to place your software product here?
Please contact us for consideration.

Contact WareSeeker.com
Related Software
execline is a small, non-interactive, shell-like scripting language. Free Download
XAO Suite is a set of perl modules created primarily for building dynamic, database driven web sites. Free Download
Ident2 is an alternative approach to auth/ident services. Free Download
This simple script will show random text or HTML every time a page is loaded. Free Download
Math::Roman contains arbitrary sized Roman numbers and conversion from and to Arabic. Free Download
Bloom::Filter is a sample Perl Bloom filter implementation. Free Download
Autocomp is an accompaniment generator written in Perl and Csound. Free Download
TextSearch is a program that helps you search through a set of text files which are in a hierarchical structure. Free Download