WWW::Spyder 0.19
Sponsored Links
WWW::Spyder 0.19 Ranking & Summary
File size:
0.017 MB
Platform:
Any Platform
License:
Perl Artistic License
Price:
Downloads:
815
Date added:
2007-08-02
Publisher:
Ashley Pond V.
WWW::Spyder 0.19 description
WWW::Spyder is a Perl module that acts like a web spider.
A web spider that returns plain text, HTML, and other information per page crawled and can determine what pages to get and parse based on supplied terms compared to the text in links as well as page content.
METHODS
$spyder->new()
Construct a new spyder object. Without at least the seed() set, or go_to_seed() turned on, the spyder isnt ready to crawl.
$spyder = WWW::Spyder->new(shift||die"Gimme a URL!n");
# ...or...
$spyder = WWW::Spyder->new( %options );
Options include: sleep_base (in seconds), exit_on (hash of methods and settings). Examples below.
$spyder->seed($url)
Adds a URL (or URLs) to the top of the queues for crawling. If the spyder is constructed with a single scalar argument, that is considered the seed_url.
$spyder->bell([bool])
This will print a bell ("a") to STDERR on every successfully crawled page. It might seem annoying but it is an excellent way to know your spyder is behaving and working. True value turns it on. Right now it cant be turned off.
$spyder->spyder_time([bool])
Returns raw seconds since Spyder was created if given a boolean value, otherwise returns "D day(s) HH::MM:SS."
$spyder->terms([list of terms to match])
The more terms, the more the spyder is going to grasp at. If you give a straight list of strings, they will be turned into very open regexes. E.g.: "king" would match "sulking" and "kinglet" but not "King." It is case sensitive right now. If you want more specific matching or different behavior, pass your own regexes instead of strings.
$spyder->terms( qr/bkings?b/i, qr/bqueens?b/i );
terms() is only settable once right now, then its a done deal.
$spyder->spyder_data()
A comma formatted number of kilobytes retrieved so far. Dont give it an argument. Its a set/get routine.
$spyder->slept()
Returns the total number of seconds the spyder has slept while running. Useful for getting accurate page/time counts (spyder performance) discounting the added courtesy naps.
$spyder->UA->...
The LWP::UserAgent. You can reset them, I do believe, by calling methods on the UA. Here are the initialized values you might want to tweak (see LWP::UserAgent for more information):
$spyder->UA->timeout(30);
$spyder->UA->max_size(250_000);
$spyder->UA->agent(Mozilla/5.0);
Changing the agent name can hurt your spyder b/c some servers wont return content unless its requested by a "browser" they recognize.
You should probably add your email with from() as well.
$spyder->UA->from(bluefintuna@fish.net);
$spyder->cookie_file([local_file])
They live in $ENV{HOME}/spyderCookie by default but you can set your own file if you prefer or want to save different cookie files for different spyders.
A web spider that returns plain text, HTML, and other information per page crawled and can determine what pages to get and parse based on supplied terms compared to the text in links as well as page content.
METHODS
$spyder->new()
Construct a new spyder object. Without at least the seed() set, or go_to_seed() turned on, the spyder isnt ready to crawl.
$spyder = WWW::Spyder->new(shift||die"Gimme a URL!n");
# ...or...
$spyder = WWW::Spyder->new( %options );
Options include: sleep_base (in seconds), exit_on (hash of methods and settings). Examples below.
$spyder->seed($url)
Adds a URL (or URLs) to the top of the queues for crawling. If the spyder is constructed with a single scalar argument, that is considered the seed_url.
$spyder->bell([bool])
This will print a bell ("a") to STDERR on every successfully crawled page. It might seem annoying but it is an excellent way to know your spyder is behaving and working. True value turns it on. Right now it cant be turned off.
$spyder->spyder_time([bool])
Returns raw seconds since Spyder was created if given a boolean value, otherwise returns "D day(s) HH::MM:SS."
$spyder->terms([list of terms to match])
The more terms, the more the spyder is going to grasp at. If you give a straight list of strings, they will be turned into very open regexes. E.g.: "king" would match "sulking" and "kinglet" but not "King." It is case sensitive right now. If you want more specific matching or different behavior, pass your own regexes instead of strings.
$spyder->terms( qr/bkings?b/i, qr/bqueens?b/i );
terms() is only settable once right now, then its a done deal.
$spyder->spyder_data()
A comma formatted number of kilobytes retrieved so far. Dont give it an argument. Its a set/get routine.
$spyder->slept()
Returns the total number of seconds the spyder has slept while running. Useful for getting accurate page/time counts (spyder performance) discounting the added courtesy naps.
$spyder->UA->...
The LWP::UserAgent. You can reset them, I do believe, by calling methods on the UA. Here are the initialized values you might want to tweak (see LWP::UserAgent for more information):
$spyder->UA->timeout(30);
$spyder->UA->max_size(250_000);
$spyder->UA->agent(Mozilla/5.0);
Changing the agent name can hurt your spyder b/c some servers wont return content unless its requested by a "browser" they recognize.
You should probably add your email with from() as well.
$spyder->UA->from(bluefintuna@fish.net);
$spyder->cookie_file([local_file])
They live in $ENV{HOME}/spyderCookie by default but you can set your own file if you prefer or want to save different cookie files for different spyders.
WWW::Spyder 0.19 Screenshot
WWW::Spyder 0.19 Keywords
WWW
UA
Spyder 0.19
Web spider
Perl module
Acts Like
SPYDER
terms
web
spider
url
seed
WWW::Spyder
WWWSpyder
WWW::Spyder 0.19
Libraries
Bookmark WWW::Spyder 0.19
WWW::Spyder 0.19 Copyright
WareSeeker periodically updates pricing and software information of WWW::Spyder 0.19 full version from the publisher, so some information may be slightly out-of-date. You should confirm all information before relying on it. Software piracy is theft, Using crack, password, serial numbers, registration codes, key generators is illegal and prevent future development of WWW::Spyder 0.19 Edition. Download links are directly from our publisher sites, torrent files or links from rapidshare.com, yousendit.com or megaupload.com are not allowed
Featured Software
Want to place your software product here?
Please contact us for consideration.
Contact WareSeeker.com
Related Information
funnel web spider
mitsubishi eclipse spyder
spyderco
funnel web spiders
toyota mr2 spyder
web spider program
web spider stage
spyder vs3
spyder paintball markers
web spiders
perl modules
can am spyder
spyder paintball
nursery web spider
spyder ski wear
web spider code
sydney funnel web spider
spyder jackets
Related Software
WWW::BF2Player is a Perl module that can fetch information about game servers from BF2Player.com Free Download
WWW::Orkut::Spider is a Perl extension for spidering the orkut community. Free Download
WWW::Scraper::Dice Perl module contains Scrapes Dice : (skills,locations) => (title, location ,residue). Free Download
WWW::Myspace is a Perl module to access MySpace.com profile information from Perl. Free Download
WWW::OpenSVN is an automated interface for the OpenSVN online Subversion repositories service. Free Download
WWW::Poll is a Perl extension to build web polls. Free Download
WWW::Scraper::Monster is a Perl module that scrapes Monster.com. Free Download
WWW::Scraper::FlipDog it Scrapes www.FlipDog.com. Free Download
Latest Software
Popular Software
Favourite Software