Main > Free Download Search >

Free extractor software for linux

extractor

Sponsored Links
Sponsored Links
Secleted [ 0 ] software to compare
Results 1 - 15 of about 32
ccextractor 0.30

ccextractor 0.30


ccextractor is a fast closed captions extractor for MPEG files. more>>
ccextractor project is a fast closed captions extractor for MPEG files.
ccextractor is mostly a mildly optimized C port of McPoodles excellent but painfully slow Perl script SCC_RIP. It lets you rip the raw closed captions (read: subtitles) data from a number of sources, such as DVD or replay TV.
As an added bonus compared to the original SCC_RIP, ccextractor can extract subtitles from the HDTV transport streams that are becoming more common.
At this point ccextractor extracts the line 21 captions (which must legally be present for a number of years until the transition to digital is complete). Note that in most .ts you can find, there will be subtitle data for both analog (EIA-608) decoders and digital (EIA-708). AFAIK there are not
freely available EIA-708 rippers.
Anyway, since line 21 captions will be available for some time, we have time to build a decent 708 ripper.
Basic Usage
For details on CC, please go to McPoodles page:
http://www.geocities.com/mcpoodle43/SCC_TOOLS/DOCS/SCC_TOOLS.HTML
You will need his tools to use ccextrators output.
The basic idea is that you get the raw closed caption dump from ccextractor.
Then you need other tools (which vary depending on what you want to do) to continue processing.
To get a transcript from a .ts file in .srt (I assume this will be the most common use) do this:
ccextractor -12 input_file
-12 means "extract both subtitle tracks" (actually technical names are fields but tracks is easier to understand). 1 is almost always English. 2 is Spanish in HBO (at least in the few samples Ive seen) but could be anything. Just extract both of them and check.
Example: cctractor -12 house315.ts
ccextractor will create two files, called house315_1.bin and _2.
Then use McPoodles RAW2SCC to create a temporary SCC file (means Scenerist, which is originally the native format for some program, its not important here).
raw2scc house315_1.bin
This creates house315_1.scc
From this .scc file, you can get the final .srt by using McPoodles CCASDI:
ccasdi -s house315_1.srt
Which looks like this (just 3 random lines shown).
514
00:24:07,400 --> 00:24:09,300
Theyve got another trial
going on at Duke.
515
00:24:09,367 --> 00:24:12,567
15% extend their lives
beyond five years.
516
00:24:12,634 --> 00:24:13,701
If youre positive
for protein PHF--
Enhancements:
- This release adds support for DVR-MS files.
- It improves the CC decoder.
- There are several bugfixes, a major speed boost (20%-40%), improved timing for non-TS files, improved format autodetection, and other minor improvements.
<<less
Download (0.033MB)
Added: 2007-05-24 License: GPL (GNU General Public License) Price:
893 downloads
XML Extractor 0.3.0

XML Extractor 0.3.0


XML Extractor is a set of tools for transforming XML-like markup into entities or well-formed XML files. more>>
XML Extractor is a set of tools for transforming XML-like markup into entities or well-formed XML files.

The sourcecode XML metadata extraction tools are intended to be used for extracting and transforming XML-like markup embedded in source code comments into syntactically correct external entities or well-formed XML files.

This can be used for JavaDoc-like code annotation, providing structured comments, or even embedding metadata used by the build process or configuration management tools.

INSTALLATION

For info and options about installing this tool, type:
# python setup.py --help

USAGE

To see usage info for this tool, type:
# python xlf_to_wfx_cli.py --help
<<less
Download (0.020MB)
Added: 2006-10-04 License: LGPL (GNU Lesser General Public License) Price:
1116 downloads
libextractor 0.5.18a

libextractor 0.5.18a


libextractor is a library used to extract meta-data from files of arbitrary type. more>>
libextractor is a library that is used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. libextractor is part of the GNU project. Our official GNU website can be found at http://www.gnu.org/software/libextractor/. libextractor can be downloaded from this site or the GNU mirrors.
The goal is to provide developers of file-sharing networks or WWW-indexing bots with a universal library to obtain simple keywords to match against queries. libextractor contains a shell-command "extract" that, similar to the well-known "file" command, can extract meta-data from a file an print the results to stdout.
Currently, libextractor supports the following formats: HTML, PDF, PS, OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw), DVI, MAN, MP3 (ID3v1 and ID3v2), OGG, WAV, JPEG, GIF, PNG, TIFF, DEB, RPM, TAR(.GZ), ZIP, ELF, REAL, RIFF (AVI), MPEG, QT and ASF.
Also, various additional MIME types are detected.
libextractor is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
Enhancements:
- This release fixes various build problems and a crash with recent versions of libgsf.
vAn incomplete manual was added.
<<less
Download (7.5MB)
Added: 2007-07-05 License: (FDL) GNU Free Documentation License Price:
842 downloads
Obscure-Extractor-GTK 0.2

Obscure-Extractor-GTK 0.2


Obscure-Extractor-GTK can extract data from simple and unusual archives as used by games. more>>
Obscure-Extractor-GTK can extract data from simple and unusual archives as used by games, e.g. Neverwinter Nights, Homeworld 2, BloodRayne.

Mostly a framework where I can easily add new modules when I want to have a look at the inner workings of games, though the Delphi version has some more advanced stuff like support for old InstallShield archives that would need to be ported.

<<less
Download (0.012MB)
Added: 2006-07-24 License: GPL (GNU General Public License) Price:
1202 downloads
Flat File Extractor 0.2.2

Flat File Extractor 0.2.2


Flat File Extractor can be used for reading different flat file structures and printing them in different formats. more>>
Flat File Extractor can be used for reading different flat file structures and printing them in different formats. ffe is a command line tool developed in GNU/Linux environment and it is distributed under GNU General Public License 2 or later.
Main areas of use are:
- Extracting particular fields or records from a flat file
- Converting data from one format to an other, e.g. from CSV to fixed length
- Verifying a flat file structure
- Testing tool for flat file development
- Displaying flat file content in human readable form
Main features:
- Command-line tool
- Reads standard input and writes to standard output as default
- One input file can contain several types of records (lines)
- Fields in a flat file can be fixed length or separated
- Input file structure and output definitions are independent, meaning one output format can be used with several input files
- Input file structure and output format are freely configurable, they are not predefined
- Output can be formatted e.g. as: fixed length, separated, tokenized, XML, SQL,...
- ffe tries to guess the input format, user needs not to give it as a parameter
Enhancements:
- Configuration keyword const has been added
<<less
Download (0.23MB)
Added: 2007-05-30 License: GPL (GNU General Public License) Price:
882 downloads
Unix configuration extractor 4

Unix configuration extractor 4


The Unix configuration extractor is a script more>> The Unix configuration extractor is a script that runs on the server to extract necessary security configurations. This script doesnt make any changes to the server other than creating the dump files<<less
Download (19KB)
Added: 2009-03-31 License: Freeware Price: Free
206 downloads
WWW::Yahoo::KeywordExtractor 0.04

WWW::Yahoo::KeywordExtractor 0.04


WWW::Yahoo::KeywordExtractor is a Perl module to get keywords from summary text via the Yahoo API. more>>
WWW::Yahoo::KeywordExtractor is a Perl module to get keywords from summary text via the Yahoo API.

SYNOPSIS

This module will submit content to the Yahoo keyword extractor API to return a list of relevant keywords.

use WWW::Yahoo::KeywordExtractor;
my $yke = WWW::Yahoo::KeywordExtractor->new();
my $keywords = $yke->extract(content => My wife and I love to cook together.

Carolyn surprises me with new things to love about her everyday.);

print join q{}. Keyword 1: , $keywords->[0], "n";

SUBROUTINES/METHOD

new

The new subroutine creates and returns a WWW:Yahoo::KeywordExtractor object.

extract

This method will return a list of keywords based on sample data. It will die if there is no content arg given.

<<less
Download (0.004MB)
Added: 2006-12-07 License: Perl Artistic License Price:
1051 downloads
Archive::SelfExtract 1.3

Archive::SelfExtract 1.3


Archive::SelfExtract is a Perl module to bundle compressed archives with Perl code. more>>
Archive::SelfExtract is a Perl module to bundle compressed archives with Perl code.

SYNOPSIS

use Archive::SelfExtract;

# writes output script to STDOUT
Archive::SelfExtract::createExtractor( "perlcode.pl", "somefiles.zip" );

# with various options:
Archive::SelfExtract::createExtractor( "perlcode.pl", "somefiles.zip",
perlbin => "/opt/perl58/bin/perl",
output_fh => $someFileHandle,
);
See also the command line tool, mkselfextract.

Archive::SelfExtract allows you create Perl programs out of compressed zip archives. Given a piece of code and an archive, it creates a single file which, when run, unpacks the archive and then runs the code.

This module provides a function for creating a self-extractor script, a function to unpack the archive, and utility functions for wrapped programs

<<less
Download (0.006MB)
Added: 2007-06-21 License: Perl Artistic License Price:
859 downloads
CPAN::Mini::Extract 1.16

CPAN::Mini::Extract 1.16


CPAN::Mini::Extract is a Perl module that can create CPAN::Mini mirrors with the archives extracted. more>>
CPAN::Mini::Extract is a Perl module that can create CPAN::Mini mirrors with the archives extracted.

SYNOPSIS

# Create a CPAN extractor
my $cpan = CPAN::Mini::Extract->new(
remote => http://mirrors.kernel.org/cpan/,
local => /home/adam/.minicpan,
trace => 1,
extract => /home/adam/.cpanextracted,
extract_filter => sub { /.pm$/ and ! /b(inc|t)b/ },
extract_check => 1,
);

# Run the minicpan process
my $changes = $cpan->run;

CPAN::Mini::Extract provides a base for implementing systems that download "all" of CPAN, extract the dists and then process the files within.
It provides the same syncronisation functionality as CPAN::Mini except that it also maintains a parallel directory tree that contains a directory located at an identical path to each archive file, with a controllable subset of the files in the archive extracted below.

How does it work

CPAN::Mini::Extract starts with a CPAN::Mini local mirror, which it will optionally update before each run. Once the CPAN::Mini directory is current, it will scan both directory trees, extracting any new archives and removing any extracted archives no longer in the minicpan mirror.

<<less
Download (0.026MB)
Added: 2007-07-25 License: Perl Artistic License Price:
821 downloads
deco 0.4

deco 0.4


deco application is a generic archive file extractor. more>> <<less
Download (0.016MB)
Added: 2007-08-09 License: GPL v3 Price:
807 downloads
netAI 0.1

netAI 0.1


netAI has been developed for identifying the end host applications that are responsible for traffic flows in the network. more>>
netAI comes from Network Traffic based Application Identification and has been developed for identifying the end host applications that are responsible for traffic flows in the network.
Unlike previous solutions that identify the application based on port numbers or packet payload (either through protocol decoding or signatures) netAI computes various payload independent features (e.g. packet length and packet inter-arrival time statistics) for a traffic flow and uses machine learning (ML) techniques.
ML is a discipline of the wider area of Artificial Intelligence (AI). Before netAI can be used to classify a particular application it must be trained on a representative set of traffic flows of that application. netAI can be used offline (reading packet data from tracefiles) and online (live capturing on network interfaces).
Main features:
- Reading packet data from live network interfaces or tracefiles (tcpdump or Endance format)
- Direct creation of WEKA data files (.arff files) from the packet data
- Interim flow information export (while flows are still active), TCP and time-based flow timeouts
- Flexible packet classification and filtering thanks to NetMate
- New features can be easily added and used
- Flexible selection of features to be used for classification
- A large number of machine learning algorithms can be used thanks to WEKA
- Feature extraction and ML based flow classification can be run on different machines - feature extractor supports data export via UDP or TCP
<<less
Download (0.60MB)
Added: 2006-02-10 License: GPL (GNU General Public License) Price:
1351 downloads
Hachoir metadata 1.0

Hachoir metadata 1.0


Hachoir metadata can extract metadata from archives. more>>
Hachoir metadata can extract metadata from archives (bzip2, gzip, zip, tar), audio (MPEG audio/MP3, WAV, Sun/NeXT audio, Ogg/Vorbis, MIDI, AIFF, AIFC, Real Audio), images (BMP, CUR, EMF, ICO, GIF, JPEG, PCX, PNG, TGA, TIFF, WMF, XCF), and video (ASF/WMV, AVI, Matroska, Quicktime, Ogg/Theora, Real Media).
It supports invalid or truncated files and Unicode text. It can remove duplicate values. Hachoir metadata project can also filter metadata according to priority.
Main features:
- Support invalid / truncated files
- Unicode compliant (charset ISO-8859-XX, UTF-8, UTF-16), convert string to your terminal charset
- Remove duplicate values (and if a string is a substring of another, just keep the longest one)
- Set priority to value, so its possible to filter metadata (option --level)
- Only depends on hachoir-parser (and not on libmatroska, libmpeg2, libvorbis, etc.)
Enhancements:
- This release reads the number of channels, bit rate, and sample rate, and computes the compression rate of Real Audio.
- It reads user comments of JPEG pictures.
- It computes the frame rate of Windows ANI.
- It normalizes language for ID3 and MKV.
- OLE2 and FLV extractors are now fault tolerant.
<<less
Download (0.047MB)
Added: 2007-07-12 License: GPL (GNU General Public License) Price:
841 downloads
GRF Tool 1.2.0

GRF Tool 1.2.0


GRF Tool is the worlds first open source application for extracting GRF archives. more>>
GRF Tool is the worlds first open source application for extracting GRF archives. It aims to be a GRF extractor that does not suck.
Main features:
- A userfriendly and usable interface!
- Very fast loading. A 650 MB GRF archive is loaded in less than 2 seconds *.
- Very fast extraction. You can also abort the extraction process at any time.
- Supports previewing of text files and bitmap images.
- Properly supports Korean text encoding.
- The integrated search bar allows you to find the files youre looking for in no time.
- Works on both Windows and Linux. This is the first GRF extractor that supports Linux!
Version restrictions:
- Version 1.2 of GRF Tool aims to be a program that does one thing, and does it very well: to view and to extract GRF archives. It cannot repack GRF archives. However, thats planned for future releases.
Enhancements:
- Sprite preview support in the Linux/GTK frontend.
- Some bug fixes in the Win32 frontend. This probably solves the error messages that some people get while extracting files.
- Improved GRF file adding support in libgrf.
- Added a pkg-config entry for Linux, so applications can easily use libgrf.
- A Software Development Kit (SDK) for Win32 is now available, making it easier for Win32 software developers to use libgrf.
<<less
Download (0.12MB)
Added: 2005-10-07 License: GPL (GNU General Public License) Price:
1622 downloads
Debug::FaultAutoBT 0.02

Debug::FaultAutoBT 0.02


Debug::FaultAutoBT is a Perl module for automatic backtrace extractor on SIGSEGV, SIGBUS, etc. more>>
Debug::FaultAutoBT is a Perl module for automatic backtrace extractor on SIGSEGV, SIGBUS, etc.

SYNOPSIS

use Debug::FaultAutoBT;

use File::Spec::Functions;
my $tmp_dir = File::Spec::Functions::tmpdir;

my $trace = Debug::FaultAutoBT->new(
dir => "$tmp_dir",
#verbose => 1,
#exec_path => /home/stas/perl/bin/perl,
#core_path_base => catfile($tmp_dir, "mycore"),
#command_path => catfile($tmp_dir, "my-gdb-command"),
#debugger => "gdb",
);

# enable the sighandler
$trace->ready();

# or simply:
Debug::FaultAutoBT->new(dir => "$tmp_dir")->ready;

When a signal, that normally causes a coredump, is delivered This module attempts to automatically extract a backtrace, rather than letting the core file be dumped. This has the following benefits:

no need to setup the environment to allow core file dumped. Sometimes people just dont know how to set it up. Sometimes you arent allowed to set it up (e.g., when the webserver environment is not under your control).

if many Perl programs are run in a row and more than one program segfaults its possible to collect all backtraces, rathen then aborting the run on the first segfault or staying with only the last core file, which will overwrite all the previous ones. For example consider a live webserver or a test suite which may segfault many times for different reasons.

for huge core files, this approach saves disk space. And can be a saver when you dont have disk space left for various reasons (passed the quota?), but still have a few kilo-bytes left.

Currently the following signals are trapped:

SIGQUIT
SIGILL
SIGTRAP
SIGABRT
SIGEMT
SIGFPE
SIGBUS
SIGSEGV
SIGSYS

(If you know of other signals that should be trapped let me know. thanks.)

<<less
Download (0.015MB)
Added: 2007-05-01 License: Perl Artistic License Price:
906 downloads
rfc2mib

rfc2mib


rfc2mib is a script that extracts MIB/PIB/ASN.1 modules from an RFC document. more>>
This Tcl script may be used to extract MIB, PIB and ASN.1 modules from an RFC document.

Unlike most extractors, this script is smart enough to recognize ASN.1-style comments prior to or within the module header, use of the "TagDefaults" part of the module header (not used by MIB modules), module headers that are broken across multiple lines, and macros.
<<less
Download (0.003MB)
Added: 2005-04-13 License: BSD License Price:
1661 downloads
Secleted [ 0 ] software to compare
  • Page: 1 of 3
  • 1
  • 2
  • 3