Text::Scraper 0.02
Sponsored Links
Text::Scraper 0.02 Ranking & Summary
File size:
0.045 MB
Platform:
Any Platform
License:
Perl Artistic License
Price:
Downloads:
798
Date added:
2007-08-22
Publisher:
Chris McEwan
Text::Scraper 0.02 description
Text::Scraper contains structured data from (un)structured text.
SYNOPSIS
use Text::Scraper;
use LWP::Simple;
use Data::Dumper;
#
# 1. Get our template and source text
#
my $tmpl = Text::Scraper->slurp(*DATA);
my $src = get(http://search.cpan.org/recent) || die $!;
#
# 2. Extract data from source
#
my $obj = Text::Scraper->new(tmpl => $tmpl);
my $data = $obj->scrape($src);
#
# 3. Do something really neat...(left as excercise)
#
print "Newest Submission: ", $data->[0]{submissions}[0]{name}, "nn";
print "Scraper model:n", Dumper($obj), "nn";
print "Parsed model:n", Dumper($data) , "nn";
__DATA__
< div class=path>< center>< table>< tr>
< ?tmpl stuff pre_nav ?>
< td class=datecell>< span>< big>< b> < ?tmpl var date_string ?> < /b>< /big>< /span>< /td>
< ?tmpl stuff post_nav ?>
< /tr>< /table>< /center>< /div>
< ul>
< ?tmpl loop submissions ?>
< li>< a href="< ?tmpl var link ?>">< ?tmpl var name ?>< /a>
< ?tmpl if has_description ?>
< small> -- < ?tmpl var description ?>< /small>
< ?tmpl end has_description ?>
< /li>
< ?tmpl end submissions ?>
< /ul>
ABSTRACT
Text::Scraper provides a fully functional base-class to quickly develop Screen-Scrapers and other text extraction tools. Programmatically generated text such as dynamic webpages are trivially reversed engineered.
Using templates, the programmer is freed from staring at fragile, heavily escaped regular expressions, mapping capture groups to named variables or wrestling with the DOM and badly formed HTML. In addition, extracted data can be hierarchical, which is beyond the capabilities of vanilla regular expressions.
Text::Scrapers functionality overlaps some existing CPAN modules - Template::Extract and WWW::Scraper.
Text::Scraper is much more lightweight than either and has a more general application domain than the latter. It has no dependencies on other frameworks, modules or design-decisions. On average, Text::Scraper benchmarks around 250% faster than Template::Extract - and uses significantly less memory.
Unlike both existing modules, Text::Scraper generalizes its functionality to allow the programmer to refine template capture groups beyond (.*?), fully redefine the template syntax and introduce new template constructs bound to custom classes.
SYNOPSIS
use Text::Scraper;
use LWP::Simple;
use Data::Dumper;
#
# 1. Get our template and source text
#
my $tmpl = Text::Scraper->slurp(*DATA);
my $src = get(http://search.cpan.org/recent) || die $!;
#
# 2. Extract data from source
#
my $obj = Text::Scraper->new(tmpl => $tmpl);
my $data = $obj->scrape($src);
#
# 3. Do something really neat...(left as excercise)
#
print "Newest Submission: ", $data->[0]{submissions}[0]{name}, "nn";
print "Scraper model:n", Dumper($obj), "nn";
print "Parsed model:n", Dumper($data) , "nn";
__DATA__
< div class=path>< center>< table>< tr>
< ?tmpl stuff pre_nav ?>
< td class=datecell>< span>< big>< b> < ?tmpl var date_string ?> < /b>< /big>< /span>< /td>
< ?tmpl stuff post_nav ?>
< /tr>< /table>< /center>< /div>
< ul>
< ?tmpl loop submissions ?>
< li>< a href="< ?tmpl var link ?>">< ?tmpl var name ?>< /a>
< ?tmpl if has_description ?>
< small> -- < ?tmpl var description ?>< /small>
< ?tmpl end has_description ?>
< /li>
< ?tmpl end submissions ?>
< /ul>
ABSTRACT
Text::Scraper provides a fully functional base-class to quickly develop Screen-Scrapers and other text extraction tools. Programmatically generated text such as dynamic webpages are trivially reversed engineered.
Using templates, the programmer is freed from staring at fragile, heavily escaped regular expressions, mapping capture groups to named variables or wrestling with the DOM and badly formed HTML. In addition, extracted data can be hierarchical, which is beyond the capabilities of vanilla regular expressions.
Text::Scrapers functionality overlaps some existing CPAN modules - Template::Extract and WWW::Scraper.
Text::Scraper is much more lightweight than either and has a more general application domain than the latter. It has no dependencies on other frameworks, modules or design-decisions. On average, Text::Scraper benchmarks around 250% faster than Template::Extract - and uses significantly less memory.
Unlike both existing modules, Text::Scraper generalizes its functionality to allow the programmer to refine template capture groups beyond (.*?), fully redefine the template syntax and introduce new template constructs bound to custom classes.
Text::Scraper 0.02 Screenshot
Text::Scraper 0.02 Keywords
Scraper 0.02
data from
tmpl var
Structured data
Structured text
TMPL
data
structured
text
un
Var
Text::Scraper
TextScraper
Text::Scraper 0.02
Libraries
Programming
Bookmark Text::Scraper 0.02
Text::Scraper 0.02 Copyright
WareSeeker periodically updates pricing and software information of Text::Scraper 0.02 full version from the publisher, so some information may be slightly out-of-date. You should confirm all information before relying on it. Software piracy is theft, Using crack, password, serial numbers, registration codes, key generators is illegal and prevent future development of Text::Scraper 0.02 Edition. Download links are directly from our publisher sites, torrent files or links from rapidshare.com, yousendit.com or megaupload.com are not allowed
Featured Software
Want to place your software product here?
Please contact us for consideration.
Contact WareSeeker.com
Related Information
structured text programming
structured investment vehicle
transfer data from one computer to another
structured settlements
data from graph
structured wiring
dataquick
structured query language
recovering data from hard drive
data from the goonies
data entry
tmpl story
structured o of trade margin
structured data types
tmpl var name
data bg
structured finance
data from your database will be placed in the document
Related Software
screen-scraper is a tool for extracting data from Web sites. It works much like a database that allows you to mine the data of the world wide web. It provides a graphical interface allowing you to designate URLs, data elements to be extracted, and scripting logic to traverse pages and work with mined data. Once these items have been created, screen-scraper can be invoked from external languages such as .NET, Java, PHP, and Active Server Pages. Free Download
Text::ScriptTemplate is a standalone ASP/JSP/PHP-style template processor. Free Download
PerlIO is a Perl module created to load on demand PerlIO layers and root of PerlIO::* name space. Free Download
XcplayC is a text-GUI for XMMS based on xcplay. Free Download
StealIt is a service menu to take ownership on selected file/directory. Free Download
WWW::Scraper::Dice Perl module contains Scrapes Dice : (skills,locations) => (title, location ,residue). Free Download
Convert::Transcribe is a Perl extension for transcribing natural languages. Free Download
WWW::Scraper::BAJobs it Scrapes BAJobs.com. Free Download
Latest Software
Popular Software
Favourite Software