www scraper google 3.05
Sponsored Links
Sponsored Links
Secleted [ 0 ] software to compare
Results 1 - 15 of about 1527
WWW::Scraper::Google 3.05
WWW::Scraper::Google scrapes www.Google.com. more>>
WWW::Scraper::Google scrapes www.Google.com.
Caveat Kleptor
Please note that using the Google Scraper module (may) be a violation of Googles "Terms of Service", of which your humble author has been repeatedly reminded. The TOS is not as easy to locate as some of these correspondents have suggested (without a smile), but you can find the TOS at http://www.google.com/terms_of_service.html
Briefly, the relevant part is the "No Automated Querying" section. Its a kind of "do as I say, not as I do" dictum. Your author has tried to divine exactly what it means. On the surface its pretty clear, but if you follow the thread you will realize that it doesnt lead to a place any of us want to be. However, Google Incs desire is clear enough. They do not want to be *abused* for the exclusive benefit of someone else.
Scraper is not a tool well suited for this kind of abuse. It is designed to be generally configurable and, as such, it is not particularly efficient. It obeys the "robot.txt" rules published by the web-server. It would require some effort on a users part to cirumvent this feature. The Google.pm does not do a "meta-search" on Google. Even if your humble author removed Google.pm from the Scraper suite, it would be trivially easy for someone to build a Google module for Scraper (their format is very simple compared to others).
I believe that Google Inc. understands a little interloping (in moderation) is beneficial to all. I should note that Google Inc. has not notified your author of any concern on their part. This has been done by third parties who, for whatever reasons of their own, feel it necessary to interject themselves in others disputes, even when no such dispute exists.
Keep in mind that this is Googles livelihood. Should your use of Scraper be your hobby, or even part of your livelihood, remember it never helps to hit someone where they live. They will defend themselves to the death (even if that death is yours).
Scraper is a handy little tool for getting to stuff you cant get to otherwise. Lets keep it that way!
<<lessCaveat Kleptor
Please note that using the Google Scraper module (may) be a violation of Googles "Terms of Service", of which your humble author has been repeatedly reminded. The TOS is not as easy to locate as some of these correspondents have suggested (without a smile), but you can find the TOS at http://www.google.com/terms_of_service.html
Briefly, the relevant part is the "No Automated Querying" section. Its a kind of "do as I say, not as I do" dictum. Your author has tried to divine exactly what it means. On the surface its pretty clear, but if you follow the thread you will realize that it doesnt lead to a place any of us want to be. However, Google Incs desire is clear enough. They do not want to be *abused* for the exclusive benefit of someone else.
Scraper is not a tool well suited for this kind of abuse. It is designed to be generally configurable and, as such, it is not particularly efficient. It obeys the "robot.txt" rules published by the web-server. It would require some effort on a users part to cirumvent this feature. The Google.pm does not do a "meta-search" on Google. Even if your humble author removed Google.pm from the Scraper suite, it would be trivially easy for someone to build a Google module for Scraper (their format is very simple compared to others).
I believe that Google Inc. understands a little interloping (in moderation) is beneficial to all. I should note that Google Inc. has not notified your author of any concern on their part. This has been done by third parties who, for whatever reasons of their own, feel it necessary to interject themselves in others disputes, even when no such dispute exists.
Keep in mind that this is Googles livelihood. Should your use of Scraper be your hobby, or even part of your livelihood, remember it never helps to hit someone where they live. They will defend themselves to the death (even if that death is yours).
Scraper is a handy little tool for getting to stuff you cant get to otherwise. Lets keep it that way!
Download (0.10MB)
Added: 2006-11-23 License: Perl Artistic License Price:
1075 downloads
ScraperPOD 3.05
ScraperPOD is a framework for scraping results from search engines. more>>
ScraperPOD is a framework for scraping results from search engines.
SYNOPSIS
use WWW::Scraper;
# Name your Scraper module / search engine as the first parameter,
use WWW::Scraper(eBay);
# or in the new() method
$scraper = new WWW::Scraper(eBay);
Classic WWW::Search mode
# Use a Scraper engine just as you would a WWW::Search engine.
$scraper = new WWW::Scraper(carsforsale, Honda, { lbxModel => Accord, lbxVehicleYearFrom => 1998 });
while ( $response = $scraper->next_result() ) {
# harvest results via hash-table reference.
print $scraper->{sellerPhoneNumber};
}
Canonical Request/Response mode (not yet implemented)
$scraper = new WWW::Scraper(carsforsale, Request => Autos, Response => Autos);
# or, since carsforsale.pm defaults to the Request and Response classes of Autos
$scraper = new WWW::Scraper(carsforsale);
#
# Set field values via field-named canonical access methods.
$scraper->scraperRequest->make(Honda);
$scraper->scraperRequest->model(Accord);
$scraper->scraperRequest->minYear(1998);
#
# Note: this is *not* next_result().
while ( $response = $scraper->next_response() ) {
#
# harvest results via field-named access methods.
print $response->sellerPhoneNumber();
}
Variant Requests to a single search engine
$scraper = new WWW::Scraper(carsforsale);
$scraper->scraperRequest->make(Honda);
$scraper->scraperRequest->minYear(1998);
#
for ( $model = (Accord Civic) ) {
$scraper->scraperRequest->model($model);
$response = $scraper->next_response() ) {
# all response fields are returned as a reference to the value.
print ${$response->sellerPhoneNumber()};
}
Single Request to variant search engines
# Set the request parameters in a Request object (sub-class Autos).
$request = new WWW::Scraper::Request(Autos);
$request->make(Honda);
$request->model(Accord);
$request->minYear(1998);
#
for ( $searchEngine = (carsforsale 1001cars) ) {
$scraper = new WWW::Scraper($searchEngine, Request => $request);
for ( $response = $scraper->next_response() ) {
# all response fields are returned as a reference to the value.
print ${$response->sellerPhoneNumber()};
<<lessSYNOPSIS
use WWW::Scraper;
# Name your Scraper module / search engine as the first parameter,
use WWW::Scraper(eBay);
# or in the new() method
$scraper = new WWW::Scraper(eBay);
Classic WWW::Search mode
# Use a Scraper engine just as you would a WWW::Search engine.
$scraper = new WWW::Scraper(carsforsale, Honda, { lbxModel => Accord, lbxVehicleYearFrom => 1998 });
while ( $response = $scraper->next_result() ) {
# harvest results via hash-table reference.
print $scraper->{sellerPhoneNumber};
}
Canonical Request/Response mode (not yet implemented)
$scraper = new WWW::Scraper(carsforsale, Request => Autos, Response => Autos);
# or, since carsforsale.pm defaults to the Request and Response classes of Autos
$scraper = new WWW::Scraper(carsforsale);
#
# Set field values via field-named canonical access methods.
$scraper->scraperRequest->make(Honda);
$scraper->scraperRequest->model(Accord);
$scraper->scraperRequest->minYear(1998);
#
# Note: this is *not* next_result().
while ( $response = $scraper->next_response() ) {
#
# harvest results via field-named access methods.
print $response->sellerPhoneNumber();
}
Variant Requests to a single search engine
$scraper = new WWW::Scraper(carsforsale);
$scraper->scraperRequest->make(Honda);
$scraper->scraperRequest->minYear(1998);
#
for ( $model = (Accord Civic) ) {
$scraper->scraperRequest->model($model);
$response = $scraper->next_response() ) {
# all response fields are returned as a reference to the value.
print ${$response->sellerPhoneNumber()};
}
Single Request to variant search engines
# Set the request parameters in a Request object (sub-class Autos).
$request = new WWW::Scraper::Request(Autos);
$request->make(Honda);
$request->model(Accord);
$request->minYear(1998);
#
for ( $searchEngine = (carsforsale 1001cars) ) {
$scraper = new WWW::Scraper($searchEngine, Request => $request);
for ( $response = $scraper->next_response() ) {
# all response fields are returned as a reference to the value.
print ${$response->sellerPhoneNumber()};
Download (0.10MB)
Added: 2006-06-15 License: GPL (GNU General Public License) Price:
1227 downloads
WWW::Search::Scraper::Google 2.27
WWW::Search::Scraper::Google is a Perl module that scrapes www.Google.com more>>
WWW::Search::Scraper::Google is a Perl module that scrapes www.Google.com.
SYNOPSIS
require WWW::Search::Scraper;
$search = new WWW::Search::Scraper(Google);
This class is an Google specialization of WWW::Search. It handles making and interpreting Google searches http://www.Google.com.
<<lessSYNOPSIS
require WWW::Search::Scraper;
$search = new WWW::Search::Scraper(Google);
This class is an Google specialization of WWW::Search. It handles making and interpreting Google searches http://www.Google.com.
Download (0.13MB)
Added: 2006-11-24 License: Perl Artistic License Price:
1066 downloads
WWW::Scraper::CraigsList 3.05
WWW::Scraper::CraigsList is a Perl module for scrapes CraigsList. more>>
WWW::Scraper::CraigsList is a Perl module for scrapes CraigsList.
SYNOPSIS
require WWW::Scraper;
$search = new WWW::Scraper(CraigsList);
This class is an CraigsList specialization of WWW::Search. It handles making and interpreting CraigsList searches http://www.CraigsList.com.
This class exports no public interface; all interaction should be done through WWW::Search objects.
OPTIONS
None at this time (2001.04.25)
search_url=URL
Specifies who to query with the CraigsList protocol. The default is at http://www.CraigsList.com/cgi-bin/job-search.
search_debug, search_parse_debug, search_ref Specified at WWW::Search.
Internet/Web Engineering Category options: - ALL JOBS art - web design jobs bus - business jobs mar - marketing jobs eng - internet engineering jobs etc - etcetera jobs wri - writing jobs sof - software jobs acc - finance jobs ofc - office jobs med - media jobs hea - health science jobs ret - retail jobs npo - nonprofit jobs lgl - legal jobs egr - engineering jobs sls - sales jobs sad - sys admin jobs tel - network jobs tfr - tv video radio jobs hum - human resource jobs tch - tech support jobs edu - education jobs trd - skilled trades jobs
Checkboxes - additive to search(?)
addOne value=telecommuting - telecommute addTwo value=contract - contract addThree value=internship - internships addFour value=part-time - part-time addFive value=non-profit - non-profit
Enhancements:
- Perl
<<lessSYNOPSIS
require WWW::Scraper;
$search = new WWW::Scraper(CraigsList);
This class is an CraigsList specialization of WWW::Search. It handles making and interpreting CraigsList searches http://www.CraigsList.com.
This class exports no public interface; all interaction should be done through WWW::Search objects.
OPTIONS
None at this time (2001.04.25)
search_url=URL
Specifies who to query with the CraigsList protocol. The default is at http://www.CraigsList.com/cgi-bin/job-search.
search_debug, search_parse_debug, search_ref Specified at WWW::Search.
Internet/Web Engineering Category options: - ALL JOBS art - web design jobs bus - business jobs mar - marketing jobs eng - internet engineering jobs etc - etcetera jobs wri - writing jobs sof - software jobs acc - finance jobs ofc - office jobs med - media jobs hea - health science jobs ret - retail jobs npo - nonprofit jobs lgl - legal jobs egr - engineering jobs sls - sales jobs sad - sys admin jobs tel - network jobs tfr - tv video radio jobs hum - human resource jobs tch - tech support jobs edu - education jobs trd - skilled trades jobs
Checkboxes - additive to search(?)
addOne value=telecommuting - telecommute addTwo value=contract - contract addThree value=internship - internships addFour value=part-time - part-time addFive value=non-profit - non-profit
Enhancements:
- Perl
Download (0.10MB)
Added: 2007-02-22 License: Perl Artistic License Price:
591 downloads
WWW::Cache::Google 0.04
WWW::Cache::Google is Perl module URI class for Google cache. more>>
WWW::Cache::Google is Perl module URI class for Google cache.
SYNOPSIS
use WWW::Cache::Google;
$cache = WWW::Cache::Google->new(http://www.yahoo.com/);
$url = $cache->as_string; # cache URL
$html = $cache->fetch; # fetches via LWP::Simple
Oops, 404 Not Found. But wait ... there might be a google cache!
WWW::Cache::Google provides an easy way conversion from an URL to Google cache URL.
If all you want is only to get cache content, consider using Google Web APIs at http://www.google.com/apis/index.html
$html = SOAP::Lite
->uri(urn:GoogleSearch)
->proxy(http://api.google.com/search/beta2) # may change
->doGetCachedPage($GoogleKey, http://cpan.org/)
->result;
<<lessSYNOPSIS
use WWW::Cache::Google;
$cache = WWW::Cache::Google->new(http://www.yahoo.com/);
$url = $cache->as_string; # cache URL
$html = $cache->fetch; # fetches via LWP::Simple
Oops, 404 Not Found. But wait ... there might be a google cache!
WWW::Cache::Google provides an easy way conversion from an URL to Google cache URL.
If all you want is only to get cache content, consider using Google Web APIs at http://www.google.com/apis/index.html
$html = SOAP::Lite
->uri(urn:GoogleSearch)
->proxy(http://api.google.com/search/beta2) # may change
->doGetCachedPage($GoogleKey, http://cpan.org/)
->result;
Download (0.003MB)
Added: 2006-11-21 License: GPL (GNU General Public License) Price:
1071 downloads
WWW::Scraper::Beaucoup 3.05
WWW::Scraper::Beaucoup it Scrapes Beaucoups Super Search. more>>
WWW::Scraper::Beaucoup it Scrapes Beaucoups Super Search.
SYNOPSIS
use WWW::Scraper;
use WWW::Scraper::Response::Job;
$search = new WWW::Scraper(Beaucoup);
$search->setup_query($query, {options});
while ( my $response = $scraper->next_response() ) {
# $response is a WWW::Scraper::Response::Job.
}
Beaucoup extends WWW::Scraper.
It handles making and interpreting Beaucoup searches of http://www.Beaucoup.com.
<<lessSYNOPSIS
use WWW::Scraper;
use WWW::Scraper::Response::Job;
$search = new WWW::Scraper(Beaucoup);
$search->setup_query($query, {options});
while ( my $response = $scraper->next_response() ) {
# $response is a WWW::Scraper::Response::Job.
}
Beaucoup extends WWW::Scraper.
It handles making and interpreting Beaucoup searches of http://www.Beaucoup.com.
Download (0.10MB)
Added: 2006-08-25 License: Perl Artistic License Price:
1155 downloads
WWW::Scraper::NorthernLight 3.05
WWW::Scraper::NorthernLight it Scrapes NorthernLight.com. more>>
WWW::Scraper::NorthernLight it Scrapes NorthernLight.com.
SYNOPSIS
require WWW::Scraper;
$search = new WWW::Scraper(NorthernLight);
This class is an NorthernLight specialization of WWW::Search. It handles making and interpreting NorthernLight searches http://www.NorthernLight.com.
This class exports no public interface; all interaction should be done through WWW::Search objects.
<<lessSYNOPSIS
require WWW::Scraper;
$search = new WWW::Scraper(NorthernLight);
This class is an NorthernLight specialization of WWW::Search. It handles making and interpreting NorthernLight searches http://www.NorthernLight.com.
This class exports no public interface; all interaction should be done through WWW::Search objects.
Download (0.10MB)
Added: 2006-08-26 License: Perl Artistic License Price:
1154 downloads
WWW::Scraper::FlipDog 0.01
WWW::Scraper::FlipDog it Scrapes www.FlipDog.com. more>>
WWW::Scraper::FlipDog it Scrapes www.FlipDog.com.
SYNOPSIS
use WWW::Scraper;
use WWW::Scraper::Response::Job;
$search = new WWW::Scraper(FlipDog);
$search->setup_query($query, {options});
while ( my $response = $scraper->next_response() ) {
# $response is a WWW::Scraper::Response::Job.
}
FlipDog extends WWW::Scraper.
It handles making and interpreting FlipDog searches of http://www.FlipDog.com.
<<lessSYNOPSIS
use WWW::Scraper;
use WWW::Scraper::Response::Job;
$search = new WWW::Scraper(FlipDog);
$search->setup_query($query, {options});
while ( my $response = $scraper->next_response() ) {
# $response is a WWW::Scraper::Response::Job.
}
FlipDog extends WWW::Scraper.
It handles making and interpreting FlipDog searches of http://www.FlipDog.com.
Download (0.037MB)
Added: 2006-08-26 License: Perl Artistic License Price:
1154 downloads
WWW::Scraper::Monster 0.01
WWW::Scraper::Monster is a Perl module that scrapes Monster.com. more>>
WWW::Scraper::Monster is a Perl module that scrapes Monster.com.
SYNOPSIS
use WWW::Search;
my $oSearch = new WWW::Search(Monster);
my $sQuery = WWW::Search::escape_query("unix and (c++ or java)");
$oSearch->native_query($sQuery,
{st => CA,
tm => 14d});
while (my $res = $oSearch->next_result()) {
print $res->company . "t" . $res->title . "t" . $res->change_date
. "t" . $res->location . "t" . $res->url . "n";
}
This class is a Monster specialization of WWW::Search. It handles making and interpreting Monster searches at http://www.monster.com. Monster supports Boolean logic with "and"s "or"s. See http://jobsearch.monster.com/jobsearch_tips.asp for a full description of the query language.
The returned WWW::Scraper::Response objects contain url, title, company, location and change_date fields.
<<lessSYNOPSIS
use WWW::Search;
my $oSearch = new WWW::Search(Monster);
my $sQuery = WWW::Search::escape_query("unix and (c++ or java)");
$oSearch->native_query($sQuery,
{st => CA,
tm => 14d});
while (my $res = $oSearch->next_result()) {
print $res->company . "t" . $res->title . "t" . $res->change_date
. "t" . $res->location . "t" . $res->url . "n";
}
This class is a Monster specialization of WWW::Search. It handles making and interpreting Monster searches at http://www.monster.com. Monster supports Boolean logic with "and"s "or"s. See http://jobsearch.monster.com/jobsearch_tips.asp for a full description of the query language.
The returned WWW::Scraper::Response objects contain url, title, company, location and change_date fields.
Download (0.038MB)
Added: 2007-06-14 License: Perl Artistic License Price:
862 downloads
WWW::Scraper::Dice 0.01
WWW::Scraper::Dice Perl module contains Scrapes Dice : (skills,locations) => (title, location ,residue). more>>
WWW::Scraper::Dice Perl module contains Scrapes Dice : (skills,locations) => (title, location ,residue).
SYNOPSIS
use WWW::Search;
my $oSearch = new WWW::Scraper(Dice);
my $sQuery = WWW::Scraper::escape_query("unix and (c++ or java)");
$oSearch->native_query($sQuery,
{method => bool,
state => CA,
daysback => 14});
while (my $res = $oSearch->next_result()) {
if(isHitGood($res->url)) {
my ($company,$title,$date,$location) =
($res->company, $res->title, $res->date, $res->location);
print "$company $title $date $location " . $res->url . "n";
}
}
<<lessSYNOPSIS
use WWW::Search;
my $oSearch = new WWW::Scraper(Dice);
my $sQuery = WWW::Scraper::escape_query("unix and (c++ or java)");
$oSearch->native_query($sQuery,
{method => bool,
state => CA,
daysback => 14});
while (my $res = $oSearch->next_result()) {
if(isHitGood($res->url)) {
my ($company,$title,$date,$location) =
($res->company, $res->title, $res->date, $res->location);
print "$company $title $date $location " . $res->url . "n";
}
}
Download (0.037MB)
Added: 2007-06-14 License: Perl Artistic License Price:
862 downloads
WWW::Scraper::BAJobs 0.01
WWW::Scraper::BAJobs it Scrapes BAJobs.com. more>>
WWW::Scraper::BAJobs it Scrapes BAJobs.com.
SYNOPSIS
require WWW::Scraper;
$search = new WWW::Scraper(BAJobs);
This class is an BAJobs specialization of WWW::Search. It handles making and interpreting BAJobs searches http://www.BAJobs.com.
This class exports no public interface; all interaction should be done through WWW::Search objects.
<<lessSYNOPSIS
require WWW::Scraper;
$search = new WWW::Scraper(BAJobs);
This class is an BAJobs specialization of WWW::Search. It handles making and interpreting BAJobs searches http://www.BAJobs.com.
This class exports no public interface; all interaction should be done through WWW::Search objects.
Download (0.037MB)
Added: 2006-08-26 License: Perl Artistic License Price:
1154 downloads
WWW::Scraper::Brainpower 0.01
WWW::Scraper::Brainpower it Scrapes Brainpower.com. more>>
WWW::Scraper::Brainpower it Scrapes Brainpower.com.
SYNOPSIS
use WWW::Scraper;
use WWW::Scraper::Response::Job;
$search = new WWW::Scraper(Brainpower);
$search->setup_query($query, {options});
while ( my $response = $scraper->next_response() ) {
# $response is a WWW::Scraper::Response::Job.
}
Brainpower extends WWW::Scraper.
It handles making and interpreting Brainpower searches of http://www.Brainpower.com.
<<lessSYNOPSIS
use WWW::Scraper;
use WWW::Scraper::Response::Job;
$search = new WWW::Scraper(Brainpower);
$search->setup_query($query, {options});
while ( my $response = $scraper->next_response() ) {
# $response is a WWW::Scraper::Response::Job.
}
Brainpower extends WWW::Scraper.
It handles making and interpreting Brainpower searches of http://www.Brainpower.com.
Download (0.037MB)
Added: 2006-08-26 License: Perl Artistic License Price:
1154 downloads
WWW::Search::Google 0.22
WWW::Search::Google is a Perl module to search Google via SOAP. more>>
WWW::Search::Google is a Perl module to search Google via SOAP.
SYNOPSIS
use WWW::Search;
my $search = WWW::Search->new(Google, key => $key);
$search->native_query("leon brocard");
while (my $result = $search->next_result()) {
print $result->title, "n";
print $result->url, "n";
print $result->description, "n";
print "n";
}
This class is a Google specialization of WWW::Search. It handles searching Google http://www.google.com/ using its new SOAP API http://www.google.com/apis/.
All interaction should be done through WWW::Search objects.
Note that you must register for a Google Web API account and have a valid Google API license key before using this module.
This module reports errors via croak().
This module uses Net::Google to do all the dirty work.
<<lessSYNOPSIS
use WWW::Search;
my $search = WWW::Search->new(Google, key => $key);
$search->native_query("leon brocard");
while (my $result = $search->next_result()) {
print $result->title, "n";
print $result->url, "n";
print $result->description, "n";
print "n";
}
This class is a Google specialization of WWW::Search. It handles searching Google http://www.google.com/ using its new SOAP API http://www.google.com/apis/.
All interaction should be done through WWW::Search objects.
Note that you must register for a Google Web API account and have a valid Google API license key before using this module.
This module reports errors via croak().
This module uses Net::Google to do all the dirty work.
Download (0.003MB)
Added: 2006-11-21 License: Perl Artistic License Price:
1067 downloads
WWW::PDAScraper 0.1
WWW::PDAScraper is a Perl class for scraping PDA-friendly content from websites. more>>
WWW::PDAScraper is a Perl class for scraping PDA-friendly content from websites.
Synopsis
use WWW::PDAScraper;
my $scraper = WWW::PDAScraper->new qw ( NewScientist Yahoo::Entertainment );
$scraper->scrape();
or
use WWW::PDAScraper;
my $scraper = WWW::PDAScraper->new;
$scraper->scrape qw( NewScientist Yahoo::Entertainment );
or
perl -MWWW::PDAScraper -e "scrape qw( NewScientist Yahoo::Entertainment )"
Having written various kludgey scripts to download PDA-friendly content from various websites, I decided to try and write a generalised solution which would
* parse out the section of a news page which contains the links we want
* munge those links into the URL for the print-friendly version, if possible
* download those pages and make an index page for them
The moving of the pages to your PDA is not part of the scope of the module: the open-source browser and "distiller", Plucker, from http://plkr.org/ is recommended. Just get it to read the index.html file with a depth of 1 from disk, using a URL like file:///path/to/index.html
The Sub-modules
WWW::PDAScraper uses a set of rules for scraping a particular website from a second module, i.e. WWW::PDAScraper::Yahoo::Entertainment::TV contains the rules for scraping the Yahoo TV News website:
package WWW::PDAScraper::Yahoo::Entertainment::TV;
# WWW::PDAScraper.pm rules for scraping the
# Yahoo TV website
sub config {
return {
name => Yahoo TV,
start_from => http://news.yahoo.com/i/763,
chunk_spec => [ "_tag", "div", "id", "indexstories" ],
url_regex => [ $, &printer=1 ]
};
}
1;
A more or less random selection of modules is included, as well as a full set for Yahoo, to demonstrate a logical set of modules in categories.
Creating a new sub-module ought to be relatively simple, see the template provided, WWW::PDAScraper::Template.pm - you need name, start_from, then either chunk_spec or url_spec, then optionally a url_regex for transformation into the print-friendly URL.
Then either move your new module to the same location as the other ones on your system, or make sure theyre available to your script with a line like use lib /path/to/local/modules/PDAScraper/
<<lessSynopsis
use WWW::PDAScraper;
my $scraper = WWW::PDAScraper->new qw ( NewScientist Yahoo::Entertainment );
$scraper->scrape();
or
use WWW::PDAScraper;
my $scraper = WWW::PDAScraper->new;
$scraper->scrape qw( NewScientist Yahoo::Entertainment );
or
perl -MWWW::PDAScraper -e "scrape qw( NewScientist Yahoo::Entertainment )"
Having written various kludgey scripts to download PDA-friendly content from various websites, I decided to try and write a generalised solution which would
* parse out the section of a news page which contains the links we want
* munge those links into the URL for the print-friendly version, if possible
* download those pages and make an index page for them
The moving of the pages to your PDA is not part of the scope of the module: the open-source browser and "distiller", Plucker, from http://plkr.org/ is recommended. Just get it to read the index.html file with a depth of 1 from disk, using a URL like file:///path/to/index.html
The Sub-modules
WWW::PDAScraper uses a set of rules for scraping a particular website from a second module, i.e. WWW::PDAScraper::Yahoo::Entertainment::TV contains the rules for scraping the Yahoo TV News website:
package WWW::PDAScraper::Yahoo::Entertainment::TV;
# WWW::PDAScraper.pm rules for scraping the
# Yahoo TV website
sub config {
return {
name => Yahoo TV,
start_from => http://news.yahoo.com/i/763,
chunk_spec => [ "_tag", "div", "id", "indexstories" ],
url_regex => [ $, &printer=1 ]
};
}
1;
A more or less random selection of modules is included, as well as a full set for Yahoo, to demonstrate a logical set of modules in categories.
Creating a new sub-module ought to be relatively simple, see the template provided, WWW::PDAScraper::Template.pm - you need name, start_from, then either chunk_spec or url_spec, then optionally a url_regex for transformation into the print-friendly URL.
Then either move your new module to the same location as the other ones on your system, or make sure theyre available to your script with a line like use lib /path/to/local/modules/PDAScraper/
Download (0.069MB)
Added: 2006-12-14 License: Perl Artistic License Price:
1044 downloads
WWW::Cache::Google::Imode 0.04
WWW::Cache::Google::Imode is a URI class for Google proxy on i-mode. more>>
WWW::Cache::Google::Imode is a URI class for Google proxy on i-mode.
SYNOPSIS
use WWW::Cache::Google::Imode;
$cache = WWW::Cache::Google::Imode->new(http://www.yahoo.com/);
$url = $cache->as_string; # cache URL
$html = $cache->fetch; # fetches via LWP::Simple
Easy conversion from HTML to CHTML. Thats google on i-mode!
WWW::Cache::Google::Imode provides an easy way conversion from an URL to Google i-mode proxy/cache URL.
<<lessSYNOPSIS
use WWW::Cache::Google::Imode;
$cache = WWW::Cache::Google::Imode->new(http://www.yahoo.com/);
$url = $cache->as_string; # cache URL
$html = $cache->fetch; # fetches via LWP::Simple
Easy conversion from HTML to CHTML. Thats google on i-mode!
WWW::Cache::Google::Imode provides an easy way conversion from an URL to Google i-mode proxy/cache URL.
Download (0.003MB)
Added: 2006-11-22 License: Perl Artistic License Price:
1067 downloads
Secleted [ 0 ] software to compare
Copyright Notice:
Software piracy is theft, Using crack, password, serial numbers, registration codes, key generators is illegal and prevent future software development. The above www scraper google 3.05 search only lists software in full, demo and trial versions for free download. Download links are directly from our mirror sites or publisher sites, torrent files or links from rapidshare.com, yousendit.com or megaupload.com are not allowed