Main > Multimedia > Graphics >

PDFlib TET 2.2

PDFlib TET 2.2

Sponsored Links

PDFlib TET 2.2 Ranking & Summary

RankingClick at the star to rank
Ranking Level
User Review: 0 (0 times)
File size: MB
Platform: Any Platform
License: Other/Proprietary License with Free Trial
Price: $995
Downloads: 583
Date added: 2007-03-15
Publisher: PDFlib GmbH

PDFlib TET 2.2 description

PDFlib TET (Text Extraction Toolkit) is software for reliably extracting text information from any PDF file. It is available as a library/component and as a command-line tool. TET makes available the text contents of a PDF as Unicode strings or structured XML, plus detailed glyph and font information. With TET you can retrieve the corresponding Unicode values for text in a PDF document, as well as its position on the page.
In addition to low-level text retrieval TET contains advanced content analysis algorithms for determining word boundaries, removing redundant duplicate text (such as shadows and artificial bold). Using the auxiliary pCOS interface you can retrieve arbitrary objects from the PDF, such as metadata, hypertext, etc.
Fully functional evaluation versions of TET including documentation and samples are available from the TET download page for all supported platforms. Purchasing a license and applying the license key will fully enable the evaluation version for production deployment.
With PDFlib TET you can:
- extract text from PDF, e.g. to store it in a database
- implement a search engine for processing PDF
- convert the text content of PDF pages to XML for processing with other tools
- process PDFs based on their contents
Supported PDF Input
PDFlib TET supports all relevant flavors of PDF input:
- all PDF versions up to PDF 1.7 (Acrobat 8)
- all font and encoding types: base 14 fonts, TrueType, PostScript, OpenType, CID fonts
- encrypted PDF with 40- and 128-bit encryption (appropriate permission settings or password required)
Unicode
Although text in PDF is usually not encoded in Unicode, PDFlib TET will normalize the text from a PDF document to Unicode:
- TET converts all text contents to Unicode. In C the text will be returned in the UTF-8 or UTF-16 formats, and as native Unicode strings in all other language bindings.
- Ligatures and other multi-character glyphs will be decomposed into a sequence of their constituent Unicode characters.
- Vendor-specific Unicode assignments (Private Use Area, PUA) are identified, and mapped to characters in the common Unicode area if possible.
- Glyphs without appropriate Unicode mappings are identified as such, and are mapped to a configurable replacement character.
Full CJK Support
TET includes full support for extracting Chinese, Japanese, and Korean text. All predefined CJK CMaps (encodings) are recognized; horizontal and vertical writing modes are supported.
Content Analysis and Word Identification
TET can be used to retrieve low-level glyph information, but also includes advanced algorithms for content analysis:
- Detect word boundaries to retrieve words instead of characters.
- Recombine the parts of hyphenated words.
- Remove duplicate instances of text, e.g. shadow and artificial bold text.
- Recombine paragraphs into reading order.
- Reorder text which is scattered over the page.
- Reconstruct lines of text.
Geometry
TET provides precise metrics for the text, such as the position on the page, glyph widths, text direction. Specific areas on the page can be excluded or included in the text extraction, e.g. to ignore headers and footers or margins.
Version restrictions:
- Unlicensed versions support all features, but will only process PDF documents with up to 10 pages and 1 MB size. Evaluation versions of TET must not be used for production purposes, but only for evaluating the product. Using TET for production purposes requires a valid TET license.
Enhancements:
- repair mode for damaged PDF repairs damaged documents which were rejected by earlier versions of TET
- support for PDF 1.7, the file format of Acrobat 8
- support for AES-encrypted PDF (appropriate password required)
- TET command-line tool: extract the text based on article threads in the document
- updated pCOS interface (the same pCOS as in PDFlib 7)
- Perl language binding
- many new heuristics and workarounds
- Unicode mappings for more documents
- improvements in the Wordfinder
- various bug fixes
- TET Plugin for Acrobat as a free tool and TET technology demo

PDFlib TET 2.2 Screenshot

Advertisements

PDFlib TET 2.2 Keywords

Bookmark PDFlib TET 2.2

Hyperlink code:
Link for forum:

PDFlib TET 2.2 Copyright

WareSeeker periodically updates pricing and software information of PDFlib TET 2.2 full version from the publisher, so some information may be slightly out-of-date. You should confirm all information before relying on it. Software piracy is theft, Using crack, password, serial numbers, registration codes, key generators is illegal and prevent future development of PDFlib TET 2.2 Edition. Download links are directly from our publisher sites, torrent files or links from rapidshare.com, yousendit.com or megaupload.com are not allowed

Allok Video Splitter 2.2.0 Review:

Name (Required)
Email(Required)
Captcha
Featured Software

Want to place your software product here?
Please contact us for consideration.

Contact WareSeeker.com
Related Software
PDFlib is a widely used programming library which allows the programmer to generate and manipulate PDF files. Free Download
PDFlib PLOP is a PDF Linearization, Optimization and Privacy. Free Download
CMap files lets process incoming text and mapping CJK encodings to Unicode. Free Download
PDFlib pCOS (PDF Information Retrieval Tool ) lets you retrieve PDF metadata, hypertext, or any other information. Free Download
PDFTextStream project is a PDF text and metadata extraction library available for Java, Python, and .NET. Free Download
PdfLicenseManager aims to be a simple tool to manage PDF licensing information. Free Download
PDF Toolkit is a simple servicemenu for PDF files. Free Download
PDFMiner is a suite of programs that help extracting and analyzing text data of PDF documents. Free Download