Main > Free Download Search >

Free extract data software for linux

extract data

Sponsored Links
Sponsored Links
Secleted [ 0 ] software to compare
Results 1 - 15 of about 5301
PDFMiner 20090721

PDFMiner 20090721


PDFMiner is a suite of programs that help extracting and analyzing text data of PDF documents. more>>

PDFMiner 20090721 brings users the convenience of a suite of programs that help extracting and analyzing text data of PDF documents. Unlike other PDF-related tools, it allows to obtain the exact location of texts in a page, as well as other extra information such as font information or ruled lines.

It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes instead of text analysis.

Major Features:

  1. Written entirely in Python. (for version 2.4 or newer)
  2. PDF-1.7 specification support. (well, almost)
  3. Non-ASCII languages and vertical writing scripts support.
  4. Various font types (Type1, TrueType, Type3, and CID) support.
  5. Basic encryption (RC4) support.
  6. PDF to HTML conversion (with a sample converter web app).
  7. Outline (TOC) extraction.
  8. Tagged contents extraction.
  9. Infer text running by using clustering technique.
Requirements:
  • Python
<<less
Added: 2009-07-22 License: MIT/X Consortium Lic... Price: FREE
13 downloads
Bottle 0.4.4

Bottle 0.4.4


WSGI micro web framework + templates more>>

Bottle 0.4.4 is created as a fast, simple and useful one-file WSGI framework and templates with a ton of features.

Bottle is a fast, simple and useful one-file WSGI framework. It is not a full-stack framework with a ton of features, but a useful mirco-framework for small web-applications that stays out of your way.

Bottle only depends on the Python Standard Library. If you want to use a HTTP server other than wsgiref.simple_server you may need cherrypy, flup or paste (your choice).

Major Features:

  1. Request dispatching: Map requests to handler-callables using URL-routes.
  2. URL parameters: Use regular expressions /object/(?P[0-9]+) or simplified syntax /object/:id to extract data out of URLs.
  3. WSGI abstraction: Dont worry about cgi and wsgi internals.
  4. Input: request.GET[parameter] or request.POST[form-field]
  5. HTTP header: response.header[Content-Type] = text/html.
  6. Cookie Management: response.COOKIES[session] = new_key.
  7. Static files: send_file(movie.flv, /downloads/) with automatic mime-type guessing.
  8. Errors: Throw HTTP errors using abort(404, Not here) or subclass HTTPError and use custom error handlers.
  9. Templates: Integrated template language.
  10. Plain simple: Execute python code with %... or use the inline syntax {{...}} for one-line expressions.
  11. No IndentationErrors: Blocks are closed by %end. Indentation is optional.
  12. Extremely fast: Parses and renders templates 5 to 10 times faster than mako.
  13. Support for Mako-Templates (requires mako).
  14. HTTP Server: Build in WSGI/HTTP Gateway server (for development and production mode)
  15. Currently supports wsgiref.simple_server (default), cherrypy, flup, paste and fapws3.
  16. Speed optimisations:
  • Sendfile: Support for platform-specific high-performance file-transmission facilities, such as the Unix sendfile()
  • Depends on wsgi.file_wrapper provided by your WSGI-Server implementation.
  • Self optimising routes: Frequently used routes are tested first (optional)
  • Fast static routes (single dict lookup)

Requirements:

  • Python
<<less
Added: 2009-07-08 License: MIT/X Consortium Lic... Price: FREE
14 downloads
LogMiner 1.23

LogMiner 1.23


A powerful log analysis package for Apache more>>
LogMiner 1.23 offers users a user-friendly log analysis package for Apache (or other web servers using the combined log format). LogMiner can extract and present several reports, about visits, hits, traffic, requests, navigation paths, browsers and OSs used by users and so on. Data is stored in a PostgreSQL database, using a schema which has been optimized to reduce redundancy at minimum.

Major Features:

  1. data is stored in a DBMS backend and reports are generated on-the-fly, while Webalizer generates plain html files. A DBMS allows to extract and aggregate data in many ways, whenever you need. A drawback is that you won't have the processing speed of Webalizer when parsing log files.
  2. LogMiner allows to navigate to previous months easily.
  3. Webalizer reports are hardcoded in the program. LogMiner implements reports in a more extensible way. Each report is in fact a simple PHP class, usually supported by a PL/pgSQL function (although youre free to insert your SQL queries in the PHP code if you like).
  4. LogMiner offers more reports than Webalizer: for instance, the OS charts and the navigation graphs.
  5. Depending on your needs, you might prefer LogMiner over Webalizer, especially if you like having a central SQL repository for your data which enables you to extract the data you need at any time, or to add a kind of report which wasnt planned from the start and apply it to older data.
<<less
Added: 2009-06-30 License: GPL Price: FREE
11 downloads
 
Other version of LogMiner
LogMiner 1.20A DBMS allows to extract and aggregate data in many ways, whenever you need. A drawback is that ... a central SQL repository for your data which enables you to extract the data you need at any time
License:GPL (GNU General Public License)
Download (0.20MB)
810 downloads
Added: 2007-08-10
AbiWord for linux 2.4.6

AbiWord for linux 2.4.6


A free word processing program for Linux, full featured word processor. more>> AbiWord is a full-featured word processor originally developed by the SourceGear Corporation, and is now maintained by an open group of volunteers.
Today AbiWord compiles as a native application on a wide collection of computers and can handle an equally impressive number of file formats. In addition, AbiWords feature set includes most everything one would expect in a modern word processor, plus numerous ground-breaking and advanced features allowing it to compete with many proprietary word processors successfully. A short list of features includes:
- A familiar interface
- Outstanding file import and export, with support for MS Word, WordPerfect, and more
- Unlimited undo and redo capacity
- Solid (X)HTML export, with CSS styles support
- Images
- Spelling support, with optional underlining
- Bullets and Lists
- Styles
- Table of Contents generation and customization through the Stylist
- Complete, intuitive revisions-tracking support
- Nested tables support, nearly unmatched in the field
- Mail merge
- Bidirectional text support
Command-line and server use modes for document processing capabilities
One of the most lasting differences between AbiWord and most word processors is the default file format. Unlike documents saved normally in many competing word-processors, document saved with AbiWord is written in plainly readable text with XML markup, making it possible to use any text editor to view AbiWord documents. With this style of data storage, you can feel assured that your precious data is safe and readable, even without using the original AbiWord program that created it. Users are even free to create their own program to parse the AbiWord markup and extract data from it. No matter how AbiWord is used, users can be sure that their important data is well kept.
<<less
Download (3.51MB)
Added: 2009-04-02 License: Freeware Price: Free
204 downloads
Text::Scraper 0.02

Text::Scraper 0.02


Text::Scraper contains structured data from (un)structured text. more>>
Text::Scraper contains structured data from (un)structured text.

SYNOPSIS

use Text::Scraper;

use LWP::Simple;
use Data::Dumper;

#
# 1. Get our template and source text
#
my $tmpl = Text::Scraper->slurp(*DATA);
my $src = get(http://search.cpan.org/recent) || die $!;

#
# 2. Extract data from source
#
my $obj = Text::Scraper->new(tmpl => $tmpl);
my $data = $obj->scrape($src);

#
# 3. Do something really neat...(left as excercise)
#
print "Newest Submission: ", $data->[0]{submissions}[0]{name}, "nn";
print "Scraper model:n", Dumper($obj), "nn";
print "Parsed model:n", Dumper($data) , "nn";

__DATA__

< div class=path>< center>< table>< tr>
< ?tmpl stuff pre_nav ?>
< td class=datecell>< span>< big>< b> < ?tmpl var date_string ?> < /b>< /big>< /span>< /td>
< ?tmpl stuff post_nav ?>
< /tr>< /table>< /center>< /div>

< ul>
< ?tmpl loop submissions ?>
< li>< a href="< ?tmpl var link ?>">< ?tmpl var name ?>< /a>
< ?tmpl if has_description ?>
< small> -- < ?tmpl var description ?>< /small>
< ?tmpl end has_description ?>
< /li>
< ?tmpl end submissions ?>
< /ul>

ABSTRACT

Text::Scraper provides a fully functional base-class to quickly develop Screen-Scrapers and other text extraction tools. Programmatically generated text such as dynamic webpages are trivially reversed engineered.

Using templates, the programmer is freed from staring at fragile, heavily escaped regular expressions, mapping capture groups to named variables or wrestling with the DOM and badly formed HTML. In addition, extracted data can be hierarchical, which is beyond the capabilities of vanilla regular expressions.

Text::Scrapers functionality overlaps some existing CPAN modules - Template::Extract and WWW::Scraper.
Text::Scraper is much more lightweight than either and has a more general application domain than the latter. It has no dependencies on other frameworks, modules or design-decisions. On average, Text::Scraper benchmarks around 250% faster than Template::Extract - and uses significantly less memory.

Unlike both existing modules, Text::Scraper generalizes its functionality to allow the programmer to refine template capture groups beyond (.*?), fully redefine the template syntax and introduce new template constructs bound to custom classes.

<<less
Download (0.045MB)
Added: 2007-08-22 License: Perl Artistic License Price:
796 downloads
Local Data Manager 6.6.5

Local Data Manager 6.6.5


Local Data Manager is a collection of cooperating programs that select, capture, manage, and distribute arbitrary data products. more>>
Local Data Manager (LDM) is a collection of cooperating programs that select, capture, manage, and distribute arbitrary data products.
The system is designed for event-driven data distribution, and is currently used in the Unidata Internet Data Distribution (IDD) project. The LDM system includes network client and server programs and their shared protocols.
An important characteristic of the LDM is its support for flexible, site-specific configuration.
Enhancements:
- Fixes for timestamp bugs.
<<less
Download (0.61MB)
Added: 2007-08-09 License: BSD License Price:
809 downloads
Data Crow 2.12 / 3.0 Alpha 2

Data Crow 2.12 / 3.0 Alpha 2


Data Crow retrieves information from the web for you. more>>
Always wanted to manage all your collections in one product? You want a product you can customize to your needs? Your search ends here! Using Data Crow allows you to create a huge database containing all your collected items. A lot of work? No! Data Crow project retrieves information from the web for you. Including front covers, screenshots and links to the online information. Follow the easy installation of this free product and see for yourself.
Main features:
- Skinnable UI
- Internal help system (activated by the F1 key)
- Nice-looking and easy-to-use interface
- Highly customizable!
- Keeping track of who borrowed what
- Software registration
- Audio CD registration
- Music files registration
- Movie registration
- Book registration
- Reporting Tool (Html, Pdf, Text)
- Amazon.com support (http://www.amazon.com)
- Imdb support (http://www.imdb.com)
- Freedb support (http://www.freedb.org)
- Imports information from CD or your hard disk
- Extracts information from music files (OGG, FLAC, APE and MP3 files)
- Supports parsing for DivX, Xvid, ASF, MKV, OGM, RIFF, MOV, IFO, VOB and Mpeg video
- Add your own, rename, disable and order fields
- Backup and Restore of the database
- SQL query tool, for expert users
- Platform-independent
- Internal HSQL database
Whats New in 2.12 Stable Release:
- Some changes and fixes were made and the overall quality of the product was improved.
Whats New in 3.0 Alpha 2 Development Release:
- General fixes were made and missing functionality was added.
<<less
Download (16.4MB)
Added: 2007-08-08 License: GPL (GNU General Public License) Price:
887 downloads
Google Data Objective-C Client 1.1.0

Google Data Objective-C Client 1.1.0


Google Data Objective-C Client provides a framework and source code that make it easy to access data through Google Data APIs. more>>
Google Data Objective-C Client provides a framework and source code that make it easy to access data through Google Data APIs.
The Google data APIs provide a simple protocol for reading and writing data on the web. Many Google services provide a Google data API.
Each of the following Google services provides a Google data API:
- Base
- Blogger
- Calendar
- Spreadsheets
- Picasa Web Albums
- Notebook
Additional services with Google data APIs that are not yet supported by the Objective-C Client Library:
- Code Search
- Google Apps Provisioning
<<less
Download (0.60MB)
Added: 2007-08-08 License: The Apache License 2.0 Price:
810 downloads
themonospot 0.5.1

themonospot 0.5.1


themonospot can be used to scan an avi file and extract some information about audio and video data flow. more>>
themonospot is a simple application that can be used to scan an avi file and extract some information about audio and video data flow:
- Video codec used
- Frame size
- Average video bitrate
- File size
- Total time
- Frame rate
- Total frames
- Info data
- User data (in MOVI chunk)
- Audio codec used
- Average audio bitrate
- Audio channels
With themonospot is also possible modify FourCC informations (FourCC code in video chunk and FourCC description in stream header).
<<less
Download (0.093MB)
Added: 2007-08-01 License: GPL (GNU General Public License) Price:
819 downloads
WWW::Myspace::Data 0.13

WWW::Myspace::Data 0.13


WWW::Myspace::Data is a WWW::Myspace database interaction. more>>
WWW::Myspace::Data is a WWW::Myspace database interaction.

SYNOPSIS

This module is the database interface for the WWW::Myspace modules. It imports methods into the callers namespace which allow the caller to bypass the loader object by calling the methods directly. This module is intended to be used as a back end for the Myspace modules, but it can also be called directly from a script if you need direct database access.

my %db = (
dsn => dbi:mysql:database_name,
user => username,
password => password,
);

# create a new object
my $data = WWW::Myspace::Data->new( $myspace, { db => %db } );

# set up a database connection
my $loader = $data->loader();

# initialize the database with Myspace login info
my $account_id = $data->set_account( $username, $password );

# now do something useful...
my $update = $data->update_friend( $friend_id );

<<less
Download (0.016MB)
Added: 2007-07-26 License: Perl Artistic License Price:
824 downloads
CPAN::Mini::Extract 1.16

CPAN::Mini::Extract 1.16


CPAN::Mini::Extract is a Perl module that can create CPAN::Mini mirrors with the archives extracted. more>>
CPAN::Mini::Extract is a Perl module that can create CPAN::Mini mirrors with the archives extracted.

SYNOPSIS

# Create a CPAN extractor
my $cpan = CPAN::Mini::Extract->new(
remote => http://mirrors.kernel.org/cpan/,
local => /home/adam/.minicpan,
trace => 1,
extract => /home/adam/.cpanextracted,
extract_filter => sub { /.pm$/ and ! /b(inc|t)b/ },
extract_check => 1,
);

# Run the minicpan process
my $changes = $cpan->run;

CPAN::Mini::Extract provides a base for implementing systems that download "all" of CPAN, extract the dists and then process the files within.
It provides the same syncronisation functionality as CPAN::Mini except that it also maintains a parallel directory tree that contains a directory located at an identical path to each archive file, with a controllable subset of the files in the archive extracted below.

How does it work

CPAN::Mini::Extract starts with a CPAN::Mini local mirror, which it will optionally update before each run. Once the CPAN::Mini directory is current, it will scan both directory trees, extracting any new archives and removing any extracted archives no longer in the minicpan mirror.

<<less
Download (0.026MB)
Added: 2007-07-25 License: Perl Artistic License Price:
821 downloads
Data::Phrasebook::Loader::XML 0.12

Data::Phrasebook::Loader::XML 0.12


Data::Phrasebook::Loader::XML Perl module can abstract your phrases with XML. more>>
Data::Phrasebook::Loader::XML Perl module can abstract your phrases with XML.

SYNOPSIS

use Data::Phrasebook;

my $q = Data::Phrasebook->new(
class => Fnerk,
loader => XML,
file => phrases.xml,
dict => Dictionary, # optional
);

OR

my $q = Data::Phrasebook->new(
class => Fnerk,
loader => XML,
file => {
file => phrases.xml,
ignore_whitespace => 1,
}
);

# simple keyword to phrase mapping
my $phrase = $q->fetch($keyword);

# keyword to phrase mapping with parameters
$q->delimiters( qr{ [% s* (w+) s* %] }x );
my $phrase = $q->fetch($keyword,{this => that});

<<less
Download (0.017MB)
Added: 2007-07-24 License: Perl Artistic License Price:
822 downloads
Data::Diff 0.01

Data::Diff 0.01


Data::Diff is a data structure comparison module. more>>
Data::Diff is a data structure comparison module.

SYNOPSIS

use Data::Diff qw(diff);

# simple procedural interface to raw difference output
$out = diff( $a, $b );

# OO usage
$diff = Data::Diff->new( $a, $b );

$new = $diff->apply();
$changes = $diff->diff_a();

Data::Diff computes the differences between two abirtray complex data structures.

METHODS

Creation

new Data::Diff( $a, $b, $options )

Creates and retruns a new Data::Diff object with the differences between $a and $b.

Access

apply( $options )

Returns the result of applying one side over the other.

raw()

Returns the internal data structure that describes the differences at all levels within.

Functions

Diff( $a, $b, $options )

Compares the two arguments $a and $b and returns the raw comparison between the two.

EXPORT

Nothing by default but you can choose to export the non-OO function Diff().

<<less
Download (0.006MB)
Added: 2007-07-13 License: Perl Artistic License Price:
833 downloads
Data::Serializer 0.41

Data::Serializer 0.41


Data::Serializer package contains modules that serialize data structures. more>>
Data::Serializer package contains modules that serialize data structures.

SYNOPSIS

use Data::Serializer;

$obj = Data::Serializer->new();

$obj = Data::Serializer->new(
serializer => Storable,
digester => MD5,
cipher => DES,
secret => my secret,
compress => 1,
);

$serialized = $obj->serialize({a => [1,2,3],b => 5});
$deserialized = $obj->deserialize($serialized);
print "$deserialized->{b}n";

Provides a unified interface to the various serializing modules currently available. Adds the functionality of both compression and encryption.

EXAMPLES

Please see Data::Serializer::Cookbook(3)

METHODS

new - constructor
$obj = Data::Serializer->new();


$obj = Data::Serializer->new(
serializer => Data::Dumper,
digester => SHA-256,
cipher => Blowfish,
secret => undef,
portable => 1,
compress => 0,
serializer_token => 1,
options => {},
);

new is the constructor object for Data::Serializer objects.

The default serializer is Data::Dumper
The default digester is SHA-256
The default cipher is Blowfish
The default secret is undef
The default portable is 1
The default encoding is hex
The default compress is 0
The default compressor is Compress::Zlib
The default serializer_token is 1
The default options is {} (pass nothing on to serializer)
serialize - serialize reference

$serialized = $obj->serialize({a => [1,2,3],b => 5});

Serializes the reference specified.
Will compress if compress is a true value.
Will encrypt if secret is defined.
deserialize - deserialize reference

$deserialized = $obj->deserialize($serialized);

Reverses the process of serialization and returns a copy of the original serialized reference.

freeze - synonym for serialize
$serialized = $obj->freeze({a => [1,2,3],b => 5});

thaw - synonym for deserialize
$deserialized = $obj->thaw($serialized);

raw_serialize - serialize reference in raw form
$serialized = $obj->raw_serialize({a => [1,2,3],b => 5});

This is a straight pass through to the underlying serializer, nothing else is done. (no encoding, encryption, compression, etc)

raw_deserialize - deserialize reference in raw form
$deserialized = $obj->raw_deserialize($serialized);

This is a straight pass through to the underlying serializer, nothing else is done. (no encoding, encryption, compression, etc)

secret - specify secret for use with encryption
$obj->secret(mysecret);

Changes setting of secret for the Data::Serializer object. Can also be set in the constructor. If specified than the object will utilize encryption.

portable - encodes/decodes serialized data

Uses encoding method to ascii armor serialized data

Aids in the portability of serialized data.

compress - compression of data

Compresses serialized data. Default is not to use it. Will compress if set to a true value $obj->compress(1);

serializer - change the serializer

Currently have 8 supported serializers: Storable, FreezeThaw, Data::Denter, Config::General, YAML, PHP::Serialization, XML::Dumper, and Data::Dumper.
Default is to use Data::Dumper.

Each serializer has its own caveats about usage especially when dealing with cyclical data structures or CODE references. Please see the appropriate documentation in those modules for further information.

cipher - change the cipher method

Utilizes Crypt::CBC and can support any cipher method that it supports.

digester - change digesting method

Uses Digest so can support any digesting method that it supports. Digesting function is used internally by the encryption routine as part of data verification.

compressor - changes compresing module

This method is included for possible future inclusion of alternate compression method Currently Compress::Zlib is the only supported compressor.

encoding - change encoding method

Encodes data structure in ascii friendly manner. Currently the only valid options are hex, or b64.

The b64 option uses Base64 encoding provided by MIME::Base64, but strips out newlines.

serializer_token - add usage hint to data

Data::Serializer prepends a token that identifies what was used to process its data. This is used internally to allow runtime determination of how to extract Serialized data. Disabling this feature is not recommended.

options - pass options through to underlying serializer

Currently is only supported by Config::General, and XML::Dumper.

my $obj = Data::Serializer->new(serializer => Config::General,
options => {
-LowerCaseNames => 1,
-UseApacheInclude => 1,
-MergeDuplicateBlocks => 1,
-AutoTrue => 1,
-InterPolateVars => 1
},
) or die "$!n";

or

my $obj = Data::Serializer->new(serializer => XML::Dumper,
options => { dtd => 1, }
) or die "$!n";
store - serialize data and write it to a file (or file handle)
$obj->store({a => [1,2,3],b => 5},$file, [$mode, $perm]);

or

$obj->store({a => [1,2,3],b => 5},$fh);

Serializes the reference specified using the serialize method and writes it out to the specified file or filehandle.

If a file path is specified you may specify an optional mode and permission as the next two arguments. See IO::File for examples.

Trips an exception if it is unable to write to the specified file.

retrieve - read data from file (or file handle) and return it after deserialization

my $ref = $obj->retrieve($file);

or

my $ref = $obj->retrieve($fh);

Reads first line of supplied file or filehandle and returns it deserialized.

<<less
Download (0.025MB)
Added: 2007-07-12 License: Perl Artistic License Price:
834 downloads
Data::TreeDumper 0.33

Data::TreeDumper 0.33


Data::TreeDumper is an improved replacement for Data::Dumper. more>>
Data::TreeDumper is an improved replacement for Data::Dumper. Powerful filtering capability.

SYNOPSIS

use Data::TreeDumper ;

my $sub = sub {} ;

my $s =
{
A =>
{
a =>
{
}
, bbbbbb => $sub
, c123 => $sub
, d => $sub
}

, C =>
{
b =>
{
a =>
{
a =>
{
}

, b => sub
{
}
, c => 42
}

}
}
, ARRAY => [qw(elment_1 element_2 element_3)]
} ;


#-------------------------------------------------------------------
# package setup data
#-------------------------------------------------------------------

$Data::TreeDumper::Useascii = 0 ;
$Data::TreeDumper::Maxdepth = 2 ;

print DumpTree($s, title) ;
print DumpTree($s, title, MAX_DEPTH => 1) ;
print DumpTrees
(
[$s, "title", MAX_DEPTH => 1]
, [$s2, "other_title", DISPLAY_ADDRESS => 0]
, USE_ASCII => 1
, MAX_DEPTH => 5
) ;

Output:

title:
|- A [H1]
| |- a [H2]
| |- bbbbbb = CODE(0x8139fa0) [C3]
| |- c123 [C4 -> C3]
| `- d [R5]
| `- REF(0x8139fb8) [R5 -> C3]
|- ARRAY [A6]
| |- 0 [S7] = elment_1
| |- 1 [S8] = element_2
| `- 2 [S9] = element_3
`- C [H10]
`- b [H11]
`- a [H12]
|- a [H13]
|- b = CODE(0x81ab130) [C14]
`- c [S15] = 42

<<less
Download (0.026MB)
Added: 2007-07-06 License: Perl Artistic License Price:
840 downloads
Secleted [ 0 ] software to compare
  • Page: 1 of 5
  • 1
  • 2
  • 3
  • 4
  • 5