Text::Ngrams 1.9
Sponsored Links
Text::Ngrams 1.9 Ranking & Summary
File size:
0.036 MB
Platform:
Any Platform
License:
Perl Artistic License
Price:
Downloads:
827
Date added:
2007-08-22
Publisher:
Simon Cozens
Text::Ngrams 1.9 description
Text::Ngrams is a flexible Ngram analysis (for characters, words, and more).
SYNOPSIS
For default character n-gram analysis of string:
use Text::Ngrams;
my $ng3 = Text::Ngrams->new;
$ng3->process_text(abcdefg1235678hijklmnop);
print $ng3->to_string;
my @ngramsarray = $ng3->get_ngrams;
One can also feed tokens manually:
use Text::Ngrams;
my $ng3 = Text::Ngrams->new;
$ng3->feed_tokens(a);
$ng3->feed_tokens(b);
$ng3->feed_tokens(c);
$ng3->feed_tokens(d);
$ng3->feed_tokens(e);
$ng3->feed_tokens(f);
$ng3->feed_tokens(g);
$ng3->feed_tokens(h);
We can choose n-grams of various sizes, e.g.:
my $ng = Text::Ngrams->new( windowsize => 6 );
or different types of n-grams, e.g.:
my $ng = Text::Ngrams->new( type => byte );
my $ng = Text::Ngrams->new( type => word );
my $ng = Text::Ngrams->new( type => utf8 );
To process a list of files:
$ng->process_files(somefile.txt, otherfile.txt);
This module implement text n-gram analysis, supporting several types of analysis, including character and word n-grams.
The module Text::Ngrams is very flexible. For example, it allows a user to manually feed a sequence of any tokens. It handles several types of tokens (character, word), and also allows a lot of flexibility in automatic recognition and feed of tokens and the way they are combined in an n-gram. It counts all n-gram frequencies up to the maximal specified length. The output format is meant to be pretty much human-readable, while also loadable by the module.
The module can be used from the command line through the script ngrams.pl provided with the package.
Version restrictions:
- If a user customizes a type, it is possible that a resulting n-gram will be ambiguous. In this way, to different n-grams may be counted as one. With predefined types of n-grams, this should not happen. For example, if a user chooses that a token can contain a space, and uses space as an n-gram separator, then a trigram like this "x x x x" is ambiguous.
- Method process_file does not handle multi-line tokens by default. This can be fixed, but it does not seem to be worth the code complication. There are various ways around this if one really needs such tokens: One way is to preprocess them. Another way is to read as much text as necessary at a time then to use process_text, which does handle multi-line tokens.
SYNOPSIS
For default character n-gram analysis of string:
use Text::Ngrams;
my $ng3 = Text::Ngrams->new;
$ng3->process_text(abcdefg1235678hijklmnop);
print $ng3->to_string;
my @ngramsarray = $ng3->get_ngrams;
One can also feed tokens manually:
use Text::Ngrams;
my $ng3 = Text::Ngrams->new;
$ng3->feed_tokens(a);
$ng3->feed_tokens(b);
$ng3->feed_tokens(c);
$ng3->feed_tokens(d);
$ng3->feed_tokens(e);
$ng3->feed_tokens(f);
$ng3->feed_tokens(g);
$ng3->feed_tokens(h);
We can choose n-grams of various sizes, e.g.:
my $ng = Text::Ngrams->new( windowsize => 6 );
or different types of n-grams, e.g.:
my $ng = Text::Ngrams->new( type => byte );
my $ng = Text::Ngrams->new( type => word );
my $ng = Text::Ngrams->new( type => utf8 );
To process a list of files:
$ng->process_files(somefile.txt, otherfile.txt);
This module implement text n-gram analysis, supporting several types of analysis, including character and word n-grams.
The module Text::Ngrams is very flexible. For example, it allows a user to manually feed a sequence of any tokens. It handles several types of tokens (character, word), and also allows a lot of flexibility in automatic recognition and feed of tokens and the way they are combined in an n-gram. It counts all n-gram frequencies up to the maximal specified length. The output format is meant to be pretty much human-readable, while also loadable by the module.
The module can be used from the command line through the script ngrams.pl provided with the package.
Version restrictions:
- If a user customizes a type, it is possible that a resulting n-gram will be ambiguous. In this way, to different n-grams may be counted as one. With predefined types of n-grams, this should not happen. For example, if a user chooses that a token can contain a space, and uses space as an n-gram separator, then a trigram like this "x x x x" is ambiguous.
- Method process_file does not handle multi-line tokens by default. This can be fixed, but it does not seem to be worth the code complication. There are various ways around this if one really needs such tokens: One way is to preprocess them. Another way is to read as much text as necessary at a time then to use process_text, which does handle multi-line tokens.
Text::Ngrams 1.9 Screenshot
Text::Ngrams 1.9 Keywords
Ngrams 1.9
tokens
feed
analysis
new
N-gram
flexible
Text::Ngrams
TextNgrams
Text::Ngrams 1.9
Libraries
Programming
Bookmark Text::Ngrams 1.9
Text::Ngrams 1.9 Copyright
WareSeeker periodically updates pricing and software information of Text::Ngrams 1.9 full version from the publisher, so some information may be slightly out-of-date. You should confirm all information before relying on it. Software piracy is theft, Using crack, password, serial numbers, registration codes, key generators is illegal and prevent future development of Text::Ngrams 1.9 Edition. Download links are directly from our publisher sites, torrent files or links from rapidshare.com, yousendit.com or megaupload.com are not allowed
Featured Software
Want to place your software product here?
Please contact us for consideration.
Contact WareSeeker.com
Related Information
tokens direct
tokens of love
feedback two gapfeedback
tokens jewelry
badge addicts free tokens
analysis from around
feeder
slot machine tokens
tokens the lion sleeps tonight
tokens of affection
pogo free tokens
swot analysis
tokens for sale
feedburner
analysis services
chuck e cheese tokens
gaming tokens
tokens for slot machines
Related Software
NGramJ is an ngram library for NLP with Java. Free Download
Text::Graph is a Perl extension for generating text-based graphs. Free Download
lhs2tex is a preprocessor to generate LaTeX from literate Haskell sources. Free Download
Text::NSP::Measures is a Perl module for computing association scores of Ngrams. Free Download
Text::Kakasi is a perl frontend to kakasi. Free Download
Tiltilation is a action packed ball rolling fun game. Free Download
auto-nng is a software for analysis and classification of data, using artificial neuronal networks. Free Download
Text::Kakasi::JP is a Japanese Perl extension for Text::Kakasi. Free Download
Latest Software
Popular Software
Favourite Software