Main > Free Download Search >

Free extract data from software for linux

extract data from

Sponsored Links
Sponsored Links
Secleted [ 0 ] software to compare
Results 1 - 15 of about 5312
PDFMiner 20090721

PDFMiner 20090721


PDFMiner is a suite of programs that help extracting and analyzing text data of PDF documents. more>>

PDFMiner 20090721 brings users the convenience of a suite of programs that help extracting and analyzing text data of PDF documents. Unlike other PDF-related tools, it allows to obtain the exact location of texts in a page, as well as other extra information such as font information or ruled lines.

It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes instead of text analysis.

Major Features:

  1. Written entirely in Python. (for version 2.4 or newer)
  2. PDF-1.7 specification support. (well, almost)
  3. Non-ASCII languages and vertical writing scripts support.
  4. Various font types (Type1, TrueType, Type3, and CID) support.
  5. Basic encryption (RC4) support.
  6. PDF to HTML conversion (with a sample converter web app).
  7. Outline (TOC) extraction.
  8. Tagged contents extraction.
  9. Infer text running by using clustering technique.
Requirements:
  • Python
<<less
Added: 2009-07-22 License: MIT/X Consortium Lic... Price: FREE
13 downloads
LogMiner 1.23

LogMiner 1.23


A powerful log analysis package for Apache more>>
LogMiner 1.23 offers users a user-friendly log analysis package for Apache (or other web servers using the combined log format). LogMiner can extract and present several reports, about visits, hits, traffic, requests, navigation paths, browsers and OSs used by users and so on. Data is stored in a PostgreSQL database, using a schema which has been optimized to reduce redundancy at minimum.

Major Features:

  1. data is stored in a DBMS backend and reports are generated on-the-fly, while Webalizer generates plain html files. A DBMS allows to extract and aggregate data in many ways, whenever you need. A drawback is that you won't have the processing speed of Webalizer when parsing log files.
  2. LogMiner allows to navigate to previous months easily.
  3. Webalizer reports are hardcoded in the program. LogMiner implements reports in a more extensible way. Each report is in fact a simple PHP class, usually supported by a PL/pgSQL function (although youre free to insert your SQL queries in the PHP code if you like).
  4. LogMiner offers more reports than Webalizer: for instance, the OS charts and the navigation graphs.
  5. Depending on your needs, you might prefer LogMiner over Webalizer, especially if you like having a central SQL repository for your data which enables you to extract the data you need at any time, or to add a kind of report which wasnt planned from the start and apply it to older data.
<<less
Added: 2009-06-30 License: GPL Price: FREE
11 downloads
 
Other version of LogMiner
LogMiner 1.20A DBMS allows to extract and aggregate data in many ways, whenever you need. A drawback is that ... to extract the data you need at any time, or to add a kind of report which wasnt planned from the
License:GPL (GNU General Public License)
Download (0.20MB)
810 downloads
Added: 2007-08-10
cb2Bib 1.3.1

cb2Bib 1.3.1


The cb2Bib is a tool for rapidly extracting unformatted biblographic references from email alerts. more>>
cb2Bib 1.3.1 is a useful tool used for rapidly extracting unformatted biblographic references from email alerts.

The cb2Bib reads the clipboard text contents and process it against a set of predefined patterns. If this automatic detection is successful, cb2Bib formats the clipboard data according to the structured BibTeX reference standard.

Otherwise, if no predefined format pattern is found or if detection proves to be difficult, manual data extraction is greatly simplified by cb2Bib. In most cases, such manual data extraction will provide with a new, personalized pattern to be included within the predefined pattern set for future automatic extractions.

Once the bibliographic reference is correctly extracted, it is added to a specified BibTeX database file. Optionally, article PDF files, if available, are renamed to its citeID and moved to a desired directory as a personal article library

Major Features:

  1. Select the reference to import from the email or web browser: On Unix machines, cb2Bib automatically detects mouse selections and clipboard changes. On Windows machines, copy or Ctrl-C is necessary to activate cb2Bib automatic processing.
  2. cb2Bib automatic processing: Once text is selected cb2Bib initiates the automatic reference extraction. It uses the predefined patterns from file regexp.txt to attempt automatic extraction. See Configuring Files section for setting the user predefined pattern matching expression file. After a successful detection bibliographic fields appear on the cb2Bib item line edits. Manual editing is possible at this stage.
  3. cb2Bib manual processing: If no predefined format pattern is found or if detection proves to be difficult, a manual data extraction must be performed. Select individual reference items from the cb2Bib clipboard area. A popup menu will appear after selection is made. Choose the corresponding bibliographic field. See BiBTeX Entry Types available as cb2Bib fields. Selection is post-processed and added to the cb2Bib item line edit. cb2Bib field tags will show on the cb2Bib clipboard area. Once the manual processing is done, cb2Bib clipboard area will contain the matching pattern. The pattern can be further edited and stored to the regexp.txt file using Insert Regular Expression, Alt+I. See the Extracting Data from the Clipboard and The Regular Expression Editor sections.
  4. Download reference to cb2Bib: The cb2Bib has the built-in functionality to interact with publishers "Download reference to Citation Manager" service. Choose BibTeX format, or any other format that you can translate using External Clipboard Preparsing Command. See Additional, Keyboard Functionality, Alt C. Click "Download" from your browser. When asked "Open with..." select cb2Bib. The cb2Bib will be launched if no running instance is found. If already running, it will place the downloaded reference to the clipboard, and it will start processing. Make sure your running instance is aware of clipboard changes. See Buttons Functionality. For convenience, the shell script c2bimport, and the desktop config file c2bimport.desktop are also provided.
  5. Adding documents: PDF and other documents can be added to the BibTeX reference by dragging the file icon and dropping it into the cb2Bib's panel. Optionally, document files, are renamed to its citeID and moved to a desired directory as a personal article library (See Configuring Documents section). Linked to a reference documents correspond to the BibTeX tag file. Usual reference manager software will permit to retrieve and visualize these files. Download, copy and/or moving is scheduled and performed once the reference is accepted, e.g., once it is saved by pressing Save Reference button.
  6. Multiple retrieving from PDF files: Multiple PDF or convertible to text files can be sequentially processed by dragging a set of files into cb2Bib's PDFImport dialog. By starting the processing button, files are sequentially converted to text and send to cb2Bib clipboard panel for reference extraction. See PDF Reference Import for details.
  7. Journal-Volume-Page Queries: Takes input Journal, Volume, and first page from the corresponding edit lines and attempts to complete the reference. Additionally, queries consider Title, DOI, and an excerpt, which is a simplified clipboard panel contents. See Configuring Network section, the distribution file netqinf.txt, and Release Note cb2Bib 0.3.5 for customization and details.
  8. BibTeX Editor: cb2Bib includes a practical text editor suitable for corrections and additions. cb2Bib capabilities are readily available within the editor. E.g., the reference is first sent to cb2Bib by selecting it, and later retrieved from cb2Bib to the editor using 'right click' + 'Paste Current BibTeX'. Interconversions Unicode LaTeX, long abbreviated journal name, and adding/renaming PDF files are easily available. BibTeX Editor is also accessible through a shell command line.
  9. Advanced features, and processing and extraction details are described in the following sections:
    • Extracting Data from the Clipboard
    • Processing of author's names
    • Processing of journal names
    • Field Recognition Rules
    • The Regular Expression Editor
  10. Configuration information is described in the following sections:
    • Configuration
    • Predefined cite ID placeholders
  11. Utilities and modules are described in the following sections:
    • Search BibTeX files for references
    • Embedded File Editor
    • PDF Reference Import
    • The cb2Bib Command Line
    • Reading and writing bibliographic metadata
    • The cb2Bib Annote
    • The cb2Bib Citer

Enhancements:

  • Added Check Repeated functionality for current reference
  • Fixed parser not processing last field in inverted comma style BibTeX
  • Set netqinf.txt to use internal XML parser for PubMed
  • Fixed packaging, double copying scripts and initial external tool setting
  • Fixed c2bciter script not passing all arguments (Thanks to F. Rusconi)

Requirements:

To compile cb2Bib, the following libraries must be present and accessible:

  • Qt 4.4.0 or higher from Trolltech. On a Linux platform with Qt preinstalled, make sure that the devel packages and Qt tools are also present.
  • WebKit library (optional) to compile cb2Bib Annote viewer. It is already included in Qt > 4.4.0 library. No special action/flag is needed during compilation.
  • X11 header files if compiling on Unix platforms. Concretely, headers X11/Xlib.h and X11/Xatom.h are needed.
  • The header files fcntl.h and unistd.h from glibc-devel package are also required. Otherwise compilation will fail with reference list.cpp:227: `close' undeclared.
<<less
Added: 2009-06-30 License: GPL Price: FREE
14 downloads
AbiWord for linux 2.4.6

AbiWord for linux 2.4.6


A free word processing program for Linux, full featured word processor. more>> AbiWord is a full-featured word processor originally developed by the SourceGear Corporation, and is now maintained by an open group of volunteers.
Today AbiWord compiles as a native application on a wide collection of computers and can handle an equally impressive number of file formats. In addition, AbiWords feature set includes most everything one would expect in a modern word processor, plus numerous ground-breaking and advanced features allowing it to compete with many proprietary word processors successfully. A short list of features includes:
- A familiar interface
- Outstanding file import and export, with support for MS Word, WordPerfect, and more
- Unlimited undo and redo capacity
- Solid (X)HTML export, with CSS styles support
- Images
- Spelling support, with optional underlining
- Bullets and Lists
- Styles
- Table of Contents generation and customization through the Stylist
- Complete, intuitive revisions-tracking support
- Nested tables support, nearly unmatched in the field
- Mail merge
- Bidirectional text support
Command-line and server use modes for document processing capabilities
One of the most lasting differences between AbiWord and most word processors is the default file format. Unlike documents saved normally in many competing word-processors, document saved with AbiWord is written in plainly readable text with XML markup, making it possible to use any text editor to view AbiWord documents. With this style of data storage, you can feel assured that your precious data is safe and readable, even without using the original AbiWord program that created it. Users are even free to create their own program to parse the AbiWord markup and extract data from it. No matter how AbiWord is used, users can be sure that their important data is well kept.
<<less
Download (3.51MB)
Added: 2009-04-02 License: Freeware Price: Free
204 downloads
Text::Scraper 0.02

Text::Scraper 0.02


Text::Scraper contains structured data from (un)structured text. more>>
Text::Scraper contains structured data from (un)structured text.

SYNOPSIS

use Text::Scraper;

use LWP::Simple;
use Data::Dumper;

#
# 1. Get our template and source text
#
my $tmpl = Text::Scraper->slurp(*DATA);
my $src = get(http://search.cpan.org/recent) || die $!;

#
# 2. Extract data from source
#
my $obj = Text::Scraper->new(tmpl => $tmpl);
my $data = $obj->scrape($src);

#
# 3. Do something really neat...(left as excercise)
#
print "Newest Submission: ", $data->[0]{submissions}[0]{name}, "nn";
print "Scraper model:n", Dumper($obj), "nn";
print "Parsed model:n", Dumper($data) , "nn";

__DATA__

< div class=path>< center>< table>< tr>
< ?tmpl stuff pre_nav ?>
< td class=datecell>< span>< big>< b> < ?tmpl var date_string ?> < /b>< /big>< /span>< /td>
< ?tmpl stuff post_nav ?>
< /tr>< /table>< /center>< /div>

< ul>
< ?tmpl loop submissions ?>
< li>< a href="< ?tmpl var link ?>">< ?tmpl var name ?>< /a>
< ?tmpl if has_description ?>
< small> -- < ?tmpl var description ?>< /small>
< ?tmpl end has_description ?>
< /li>
< ?tmpl end submissions ?>
< /ul>

ABSTRACT

Text::Scraper provides a fully functional base-class to quickly develop Screen-Scrapers and other text extraction tools. Programmatically generated text such as dynamic webpages are trivially reversed engineered.

Using templates, the programmer is freed from staring at fragile, heavily escaped regular expressions, mapping capture groups to named variables or wrestling with the DOM and badly formed HTML. In addition, extracted data can be hierarchical, which is beyond the capabilities of vanilla regular expressions.

Text::Scrapers functionality overlaps some existing CPAN modules - Template::Extract and WWW::Scraper.
Text::Scraper is much more lightweight than either and has a more general application domain than the latter. It has no dependencies on other frameworks, modules or design-decisions. On average, Text::Scraper benchmarks around 250% faster than Template::Extract - and uses significantly less memory.

Unlike both existing modules, Text::Scraper generalizes its functionality to allow the programmer to refine template capture groups beyond (.*?), fully redefine the template syntax and introduce new template constructs bound to custom classes.

<<less
Download (0.045MB)
Added: 2007-08-22 License: Perl Artistic License Price:
796 downloads
Local Data Manager 6.6.5

Local Data Manager 6.6.5


Local Data Manager is a collection of cooperating programs that select, capture, manage, and distribute arbitrary data products. more>>
Local Data Manager (LDM) is a collection of cooperating programs that select, capture, manage, and distribute arbitrary data products.
The system is designed for event-driven data distribution, and is currently used in the Unidata Internet Data Distribution (IDD) project. The LDM system includes network client and server programs and their shared protocols.
An important characteristic of the LDM is its support for flexible, site-specific configuration.
Enhancements:
- Fixes for timestamp bugs.
<<less
Download (0.61MB)
Added: 2007-08-09 License: BSD License Price:
809 downloads
Data Crow 2.12 / 3.0 Alpha 2

Data Crow 2.12 / 3.0 Alpha 2


Data Crow retrieves information from the web for you. more>>
Always wanted to manage all your collections in one product? You want a product you can customize to your needs? Your search ends here! Using Data Crow allows you to create a huge database containing all your collected items. A lot of work? No! Data Crow project retrieves information from the web for you. Including front covers, screenshots and links to the online information. Follow the easy installation of this free product and see for yourself.
Main features:
- Skinnable UI
- Internal help system (activated by the F1 key)
- Nice-looking and easy-to-use interface
- Highly customizable!
- Keeping track of who borrowed what
- Software registration
- Audio CD registration
- Music files registration
- Movie registration
- Book registration
- Reporting Tool (Html, Pdf, Text)
- Amazon.com support (http://www.amazon.com)
- Imdb support (http://www.imdb.com)
- Freedb support (http://www.freedb.org)
- Imports information from CD or your hard disk
- Extracts information from music files (OGG, FLAC, APE and MP3 files)
- Supports parsing for DivX, Xvid, ASF, MKV, OGM, RIFF, MOV, IFO, VOB and Mpeg video
- Add your own, rename, disable and order fields
- Backup and Restore of the database
- SQL query tool, for expert users
- Platform-independent
- Internal HSQL database
Whats New in 2.12 Stable Release:
- Some changes and fixes were made and the overall quality of the product was improved.
Whats New in 3.0 Alpha 2 Development Release:
- General fixes were made and missing functionality was added.
<<less
Download (16.4MB)
Added: 2007-08-08 License: GPL (GNU General Public License) Price:
887 downloads
Google Data Objective-C Client 1.1.0

Google Data Objective-C Client 1.1.0


Google Data Objective-C Client provides a framework and source code that make it easy to access data through Google Data APIs. more>>
Google Data Objective-C Client provides a framework and source code that make it easy to access data through Google Data APIs.
The Google data APIs provide a simple protocol for reading and writing data on the web. Many Google services provide a Google data API.
Each of the following Google services provides a Google data API:
- Base
- Blogger
- Calendar
- Spreadsheets
- Picasa Web Albums
- Notebook
Additional services with Google data APIs that are not yet supported by the Objective-C Client Library:
- Code Search
- Google Apps Provisioning
<<less
Download (0.60MB)
Added: 2007-08-08 License: The Apache License 2.0 Price:
810 downloads
themonospot 0.5.1

themonospot 0.5.1


themonospot can be used to scan an avi file and extract some information about audio and video data flow. more>>
themonospot is a simple application that can be used to scan an avi file and extract some information about audio and video data flow:
- Video codec used
- Frame size
- Average video bitrate
- File size
- Total time
- Frame rate
- Total frames
- Info data
- User data (in MOVI chunk)
- Audio codec used
- Average audio bitrate
- Audio channels
With themonospot is also possible modify FourCC informations (FourCC code in video chunk and FourCC description in stream header).
<<less
Download (0.093MB)
Added: 2007-08-01 License: GPL (GNU General Public License) Price:
819 downloads
Exif Viewer 1.27 for Firefox

Exif Viewer 1.27 for Firefox


Exif Viewer is an extension that displays the Exif data. more>>
Exif Viewer is an extension that displays the Exif data.

Extracts and displays the Exchangeable Image File (Exif) data in local and remote JPEG images, as stored by many (but not all) digital still cameras.

<<less
Download (0.053MB)
Added: 2007-07-26 License: MPL (Mozilla Public License) Price:
835 downloads
WWW::Myspace::Data 0.13

WWW::Myspace::Data 0.13


WWW::Myspace::Data is a WWW::Myspace database interaction. more>>
WWW::Myspace::Data is a WWW::Myspace database interaction.

SYNOPSIS

This module is the database interface for the WWW::Myspace modules. It imports methods into the callers namespace which allow the caller to bypass the loader object by calling the methods directly. This module is intended to be used as a back end for the Myspace modules, but it can also be called directly from a script if you need direct database access.

my %db = (
dsn => dbi:mysql:database_name,
user => username,
password => password,
);

# create a new object
my $data = WWW::Myspace::Data->new( $myspace, { db => %db } );

# set up a database connection
my $loader = $data->loader();

# initialize the database with Myspace login info
my $account_id = $data->set_account( $username, $password );

# now do something useful...
my $update = $data->update_friend( $friend_id );

<<less
Download (0.016MB)
Added: 2007-07-26 License: Perl Artistic License Price:
824 downloads
CPAN::Mini::Extract 1.16

CPAN::Mini::Extract 1.16


CPAN::Mini::Extract is a Perl module that can create CPAN::Mini mirrors with the archives extracted. more>>
CPAN::Mini::Extract is a Perl module that can create CPAN::Mini mirrors with the archives extracted.

SYNOPSIS

# Create a CPAN extractor
my $cpan = CPAN::Mini::Extract->new(
remote => http://mirrors.kernel.org/cpan/,
local => /home/adam/.minicpan,
trace => 1,
extract => /home/adam/.cpanextracted,
extract_filter => sub { /.pm$/ and ! /b(inc|t)b/ },
extract_check => 1,
);

# Run the minicpan process
my $changes = $cpan->run;

CPAN::Mini::Extract provides a base for implementing systems that download "all" of CPAN, extract the dists and then process the files within.
It provides the same syncronisation functionality as CPAN::Mini except that it also maintains a parallel directory tree that contains a directory located at an identical path to each archive file, with a controllable subset of the files in the archive extracted below.

How does it work

CPAN::Mini::Extract starts with a CPAN::Mini local mirror, which it will optionally update before each run. Once the CPAN::Mini directory is current, it will scan both directory trees, extracting any new archives and removing any extracted archives no longer in the minicpan mirror.

<<less
Download (0.026MB)
Added: 2007-07-25 License: Perl Artistic License Price:
821 downloads
Data::Phrasebook::Loader::XML 0.12

Data::Phrasebook::Loader::XML 0.12


Data::Phrasebook::Loader::XML Perl module can abstract your phrases with XML. more>>
Data::Phrasebook::Loader::XML Perl module can abstract your phrases with XML.

SYNOPSIS

use Data::Phrasebook;

my $q = Data::Phrasebook->new(
class => Fnerk,
loader => XML,
file => phrases.xml,
dict => Dictionary, # optional
);

OR

my $q = Data::Phrasebook->new(
class => Fnerk,
loader => XML,
file => {
file => phrases.xml,
ignore_whitespace => 1,
}
);

# simple keyword to phrase mapping
my $phrase = $q->fetch($keyword);

# keyword to phrase mapping with parameters
$q->delimiters( qr{ [% s* (w+) s* %] }x );
my $phrase = $q->fetch($keyword,{this => that});

<<less
Download (0.017MB)
Added: 2007-07-24 License: Perl Artistic License Price:
822 downloads
Hardware::Simulator 0000_0005

Hardware::Simulator 0000_0005


Hardware::Simulator is a Perl extension for Perl Hardware Descriptor Language. more>>
Hardware::Simulator is a Perl extension for Perl Hardware Descriptor Language.

SYNOPSIS

use Hardware::Simulator;

# NewSignal( perl_variable [, initial_value]);
# create a signal called $in_clk, give it an initial value of 1
NewSignal(my $in_clk,1);

# Repeater ( time_units , code_ref)
# every time_units, call the code reference, starting at the current time
Repeater ( 5, sub{if ( $in_clk==0) { $in_clk=1;} else { $in_clk=0;}});

# Responder ( [signal_name ... signal_name], code_ref );
# respond to any changes to signals by calling code reference.
# any time out_clk changes, print value of clock and simulation time.
Responder ( $out_clk, sub
{
my $time = SimTime();
print "out_clk = $out_clk. time=$timen";
});

# start processing of events and event scheduling.
EventLoop();

Hardware::Simulator ==> a Perl Hardware Descriptor Language

Hardware::Simulator is a lightweight version of VHDL or Verilog HDL. All of these languages were developed as means to describe hardware.

Hardware::Simulator was created as a means to quickly prototype a basic hardware design and simulate it. VHDL and Verilog are both restrictive in their own ways. Hardware::Simulator was created to quickly put something together as a "proof of concept", to show that a design concept would work or not. and then the design could be translated to VHDL or Verilog.

The problem that started all of this was designing a fifo for a video scaling asic. The chip used a buffer to store incoming video data. The asic read the buffer to generate the outgoing video image. We estimated how large we thought the buffer needed to be, but we wanted to confirm that our numbers were right by running simulations.

The problem was we needed to run hundreds of different simulations, given the permutations of input image formats, output image formats, and input/output clock frequencies. We also had text files containing valid formats and frequencies. A text file as input called for perl to manipulate, split, format, and extract the data properly.

This data then had to be translated onto the a HDL simulation. The problem was that there was no easy way to write a perl script that would simulate hardware, so the only solution was to have perl drive a Verilog simulator and pass all these parameters via command line parameters. so then verilog files had to be created, and the simulator had to be driven, and the end result was a lot of work to simulate a simple fifo.

Time contraints did not allow me to develop a HDL package for perl to solve the original problem, but I took it on in my spare time. and eventually Hardware::Simulator was born.

<<less
Download (0.010MB)
Added: 2007-07-20 License: Perl Artistic License Price:
840 downloads
Data::Diff 0.01

Data::Diff 0.01


Data::Diff is a data structure comparison module. more>>
Data::Diff is a data structure comparison module.

SYNOPSIS

use Data::Diff qw(diff);

# simple procedural interface to raw difference output
$out = diff( $a, $b );

# OO usage
$diff = Data::Diff->new( $a, $b );

$new = $diff->apply();
$changes = $diff->diff_a();

Data::Diff computes the differences between two abirtray complex data structures.

METHODS

Creation

new Data::Diff( $a, $b, $options )

Creates and retruns a new Data::Diff object with the differences between $a and $b.

Access

apply( $options )

Returns the result of applying one side over the other.

raw()

Returns the internal data structure that describes the differences at all levels within.

Functions

Diff( $a, $b, $options )

Compares the two arguments $a and $b and returns the raw comparison between the two.

EXPORT

Nothing by default but you can choose to export the non-OO function Diff().

<<less
Download (0.006MB)
Added: 2007-07-13 License: Perl Artistic License Price:
833 downloads
Secleted [ 0 ] software to compare
  • Page: 1 of 5
  • 1
  • 2
  • 3
  • 4
  • 5