html parser
Sponsored Links
Sponsored Links
Secleted [ 0 ] software to compare
Results 1 - 15 of about 2773
Jericho HTML Parser 2.4
Jerich HTML Parser is a simple but powerful java library allowing analysis and manipulation of parts of an HTML document. more>>
Jerich HTML Parser is a simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid HTML. It also provides high-level HTML form manipulation functions.
Jericho HTML Parser project is an open source library released under the GNU Lesser General Public License (LGPL). You are therefore free to use it in commercial applications subject to the terms detailed in the licence document.
Main features:
- No parse tree of the entire document is ever generated. The document source text is searched only for the markup relevant to the current operation. This allows the library to analyse and modify documents containing incorrect or badly formatted HTML or any other server or client side code, script, macro or markup. Most other parsers cant handle content that they are not explicitly programmed to accept.
- The beginning and end positions in the source text of all parsed segments are accessible, allowing modification of only selected segments of the document without having to reconstruct the entire document from a parse tree. This feature, in combination with the one above, makes the toolkit extremely powerful in its simplicity.
- Provides a simple but comprehensive interface for the analysis and manipulation of HTML form controls, including the extraction and population of initial values, and conversion to read-only or data display modes. Analysis of the form controls also allows data received from the form to be stored and presented in an appropriate manner.
- ASP, JSP, PSP, PHP and Mason server tags can be registered for recognition by the parser, and are recognised as accurately as is possible without incorporating actual parsers for these languages into the library. The library then allows any of these segments to be ignored when parsing the rest of the document so that they do not interfere with the HTML syntax. (see Segment.ignoreWhenParsing())
- Custom tag types can be easily defined and registered for recognition by the parser.
Enhancements:
- This version has been released under a dual licence system, allowing a choice between the Eclipse Public License (EPL) and the LGPL.
- It includes important bugfixes and introduces the following major features: simple rendering of HTML markup into text, integrated logging with various logging frameworks, and easier parsing of HTML tags containing server tags.
<<lessJericho HTML Parser project is an open source library released under the GNU Lesser General Public License (LGPL). You are therefore free to use it in commercial applications subject to the terms detailed in the licence document.
Main features:
- No parse tree of the entire document is ever generated. The document source text is searched only for the markup relevant to the current operation. This allows the library to analyse and modify documents containing incorrect or badly formatted HTML or any other server or client side code, script, macro or markup. Most other parsers cant handle content that they are not explicitly programmed to accept.
- The beginning and end positions in the source text of all parsed segments are accessible, allowing modification of only selected segments of the document without having to reconstruct the entire document from a parse tree. This feature, in combination with the one above, makes the toolkit extremely powerful in its simplicity.
- Provides a simple but comprehensive interface for the analysis and manipulation of HTML form controls, including the extraction and population of initial values, and conversion to read-only or data display modes. Analysis of the form controls also allows data received from the form to be stored and presented in an appropriate manner.
- ASP, JSP, PSP, PHP and Mason server tags can be registered for recognition by the parser, and are recognised as accurately as is possible without incorporating actual parsers for these languages into the library. The library then allows any of these segments to be ignored when parsing the rest of the document so that they do not interfere with the HTML syntax. (see Segment.ignoreWhenParsing())
- Custom tag types can be easily defined and registered for recognition by the parser.
Enhancements:
- This version has been released under a dual licence system, allowing a choice between the Eclipse Public License (EPL) and the LGPL.
- It includes important bugfixes and introduces the following major features: simple rendering of HTML markup into text, integrated logging with various logging frameworks, and easier parsing of HTML tags containing server tags.
Download (0.85MB)
Added: 2007-05-20 License: LGPL (GNU Lesser General Public License) Price:
534 downloads
HTML Parser 1.6-20060610
HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. more>>
HTMLParser is a super-fast real-time parser for real-world HTML. What has attracted most developers to HTMLParser has been its simplicity in design, speed and ability to handle streaming real-world html.
The two fundamental use-cases that are handled by the parser are extraction and transformation (the syntheses use-case, where HTML pages are created from scratch, is better handled by other tools closer to the source of data). While prior versions concentrated on data extraction from web pages, Version 1.4 of the HTMLParser has substantial improvements in the area of transforming web pages, with simplified tag creation and editing, and verbatim toHtml() method output.
In order to use HTMLParser you will need to be able to write code in the Java programming language. Although some example programs are provided that may be useful as they stand, its more than likely you will need (or want) to create your own programs or modify the ones provided to match your intended application.
To use the library, you will need to add either the htmllexer.jar or htmlparser.jar to your classpath when compiling and running. The htmllexer.jar provides low level access to generic string, remark and tag nodes on the page in a linear, flat, sequential manner. The htmlparser.jar, which includes the classes found in htmllexer.jar, provides access to a page as a sequence of nested differentiated tags containing string, remark and other tag nodes. So where the output from calls to the lexer nextNode() method might be:
< html>
< head>
< title>
"Welcome"
< /title>
< /head>
< body>
etc...
The output from the parser NodeIterator would nest the tags as children of the , and other nodes (here represented by indentation):
< html>
< head>
< title>
"Welcome"
< /title>
< /head>
< body>
etc...
The parser attempts to balance opening tags with ending tags to present the structure of the page, while the lexer simply spits out nodes. If your application requires only modest structural knowledge of the page, and is primarily concerned with individual, isolated nodes, you should consider using the lightweight lexer. But if your application requires knowledge of the nested structure of the page, for example processing tables, you will probably want to use the full parser.
Extraction
Extraction encompasses all the information retrieval programs that are not meant to preserve the source page. This covers uses like:
- text extraction, for use as input for text search engine databases for example
- link extraction, for crawling through web pages or harvesting email addresses
- screen scraping, for programmatic data input from web pages
- resource extraction, collecting images or sound
- a browser front end, the preliminary stage of page display
- link checking, ensuring links are valid
- site monitoring, checking for page differences beyond simplistic diffs
There are several facilities in the HTMLParser codebase to help with extraction, including filters, visitors and JavaBeans.
Transformation
Transformation includes all processing where the input and the output are HTML pages. Some examples are:
- URL rewriting, modifying some or all links on a page
- site capture, moving content from the web to local disk
- censorship, removing offending words and phrases from pages
- HTML cleanup, correcting erroneous pages
- ad removal, excising URLs referencing advertising
- conversion to XML, moving existing web pages to XML
During or after reading in a page, operations on the nodes can accomplish many transformation tasks "in place", which can then be output with the toHtml() method. Depending on the purpose of your application, you will probably want to look into node decorators, visitors, or custom tags in conjunction with the PrototypicalNodeFactory.
The HTML Parser is an open source library released under GNU Lesser General Public License, which basically says you are free to use the library "as is" in other (even proprietary) products, as long as due credit is given to the authors and the source code for the HTMLParser is included or available with the other product. For modified or embedded use, please consult the LGPL license.
<<lessThe two fundamental use-cases that are handled by the parser are extraction and transformation (the syntheses use-case, where HTML pages are created from scratch, is better handled by other tools closer to the source of data). While prior versions concentrated on data extraction from web pages, Version 1.4 of the HTMLParser has substantial improvements in the area of transforming web pages, with simplified tag creation and editing, and verbatim toHtml() method output.
In order to use HTMLParser you will need to be able to write code in the Java programming language. Although some example programs are provided that may be useful as they stand, its more than likely you will need (or want) to create your own programs or modify the ones provided to match your intended application.
To use the library, you will need to add either the htmllexer.jar or htmlparser.jar to your classpath when compiling and running. The htmllexer.jar provides low level access to generic string, remark and tag nodes on the page in a linear, flat, sequential manner. The htmlparser.jar, which includes the classes found in htmllexer.jar, provides access to a page as a sequence of nested differentiated tags containing string, remark and other tag nodes. So where the output from calls to the lexer nextNode() method might be:
< html>
< head>
< title>
"Welcome"
< /title>
< /head>
< body>
etc...
The output from the parser NodeIterator would nest the tags as children of the , and other nodes (here represented by indentation):
< html>
< head>
< title>
"Welcome"
< /title>
< /head>
< body>
etc...
The parser attempts to balance opening tags with ending tags to present the structure of the page, while the lexer simply spits out nodes. If your application requires only modest structural knowledge of the page, and is primarily concerned with individual, isolated nodes, you should consider using the lightweight lexer. But if your application requires knowledge of the nested structure of the page, for example processing tables, you will probably want to use the full parser.
Extraction
Extraction encompasses all the information retrieval programs that are not meant to preserve the source page. This covers uses like:
- text extraction, for use as input for text search engine databases for example
- link extraction, for crawling through web pages or harvesting email addresses
- screen scraping, for programmatic data input from web pages
- resource extraction, collecting images or sound
- a browser front end, the preliminary stage of page display
- link checking, ensuring links are valid
- site monitoring, checking for page differences beyond simplistic diffs
There are several facilities in the HTMLParser codebase to help with extraction, including filters, visitors and JavaBeans.
Transformation
Transformation includes all processing where the input and the output are HTML pages. Some examples are:
- URL rewriting, modifying some or all links on a page
- site capture, moving content from the web to local disk
- censorship, removing offending words and phrases from pages
- HTML cleanup, correcting erroneous pages
- ad removal, excising URLs referencing advertising
- conversion to XML, moving existing web pages to XML
During or after reading in a page, operations on the nodes can accomplish many transformation tasks "in place", which can then be output with the toHtml() method. Depending on the purpose of your application, you will probably want to look into node decorators, visitors, or custom tags in conjunction with the PrototypicalNodeFactory.
The HTML Parser is an open source library released under GNU Lesser General Public License, which basically says you are free to use the library "as is" in other (even proprietary) products, as long as due credit is given to the authors and the source code for the HTMLParser is included or available with the other product. For modified or embedded use, please consult the LGPL license.
Download (4.2MB)
Added: 2006-06-11 License: LGPL (GNU Lesser General Public License) Price:
1234 downloads
CyberNeko HTML Parser 0.9.5
NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents. more>>
NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces.
The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags.
NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables you to use the NekoHTML parser with existing XNI tools without modification or rewriting code.
Version restrictions:
- There are HTML documents for which NekoHTML cannot properly generate a well-formed XML document event stream. For example, documents with multiple tags are inherently ill-formed because XML documents may only have a single root element.
- Code added to the core DOM implementation in Xerces-J 2.0.1 introduced a bug in the HTML DOM implementation based on it.
The bug causes the element nodes in the resultant HTML document object to be of type org.apache.xerces.dom.ElementNSImpl instead of the appropriate HTML DOM element objects.
The problem affects NekoHTML users who use the parser with Xerces-J 2.0.1 and anyone using the HTML DOM implementation in Xerces-J 2.0.1.
- There are no other known major limitations with this release. However, additional work can always be done to improve performance, fix bugs, and add functionality.
<<lessThe parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags.
NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables you to use the NekoHTML parser with existing XNI tools without modification or rewriting code.
Version restrictions:
- There are HTML documents for which NekoHTML cannot properly generate a well-formed XML document event stream. For example, documents with multiple tags are inherently ill-formed because XML documents may only have a single root element.
- Code added to the core DOM implementation in Xerces-J 2.0.1 introduced a bug in the HTML DOM implementation based on it.
The bug causes the element nodes in the resultant HTML document object to be of type org.apache.xerces.dom.ElementNSImpl instead of the appropriate HTML DOM element objects.
The problem affects NekoHTML users who use the parser with Xerces-J 2.0.1 and anyone using the HTML DOM implementation in Xerces-J 2.0.1.
- There are no other known major limitations with this release. However, additional work can always be done to improve performance, fix bugs, and add functionality.
Download (0.38MB)
Added: 2005-09-28 License: The Apache License Price:
1486 downloads
Java Mozilla Html Parser 0.1.7
Java Mozilla Html Parser project is a Java package that enables you to parse html pages into a Java Document object. more>>
Java Mozilla Html Parser project is a Java package that enables you to parse html pages into a Java Document object. The parser is a wrapper around Mozillas Html Parser, thus giving the user a browser-quality html parser.
Limitiations and known issues
The most major limitation is performance related , in the sense that the parser serializes the requests. At the moment , the parser is running at a separate thread , which at its time receives request , parses them and gives back the responses to the requester. It all happens because of Mozillas mechanism to keep its object thread safe. in the process of doing that, mozilla forces you to use proxied objects instead of the real objects that you have. My hope is that the open source community will take that project and maintain those issues.
Main features:
- Real world , browser quality DOM parsing
- compatiability with SAX parsers
- sequential performance comparable to pure java implementations of dom parsers
- Win32 , linux and MacOSX platforms are supported.
Enhancements:
- Windows missing dll files inserted into the package
- < title > tag extraction added 3. better handling for html entities
<<lessLimitiations and known issues
The most major limitation is performance related , in the sense that the parser serializes the requests. At the moment , the parser is running at a separate thread , which at its time receives request , parses them and gives back the responses to the requester. It all happens because of Mozillas mechanism to keep its object thread safe. in the process of doing that, mozilla forces you to use proxied objects instead of the real objects that you have. My hope is that the open source community will take that project and maintain those issues.
Main features:
- Real world , browser quality DOM parsing
- compatiability with SAX parsers
- sequential performance comparable to pure java implementations of dom parsers
- Win32 , linux and MacOSX platforms are supported.
Enhancements:
- Windows missing dll files inserted into the package
- < title > tag extraction added 3. better handling for html entities
Download (1.5MB)
Added: 2007-07-30 License: LGPL (GNU Lesser General Public License) Price:
817 downloads
HTML::Parser 3.54
HTML::Parser is a HTML parser class. more>>
HTML::Parser is a HTML parser class. Objects of the HTML::Parser class will recognize markup and separate it from plain text (alias data content) in HTML documents. As different kinds of markup and text are recognized, the corresponding event handlers are invoked.
HTML::Parser is not a generic SGML parser.
We have tried to make it able to deal with the HTML that is actually "out there", and it normally parses as closely as possible to the way the popular web browsers do it instead of strictly following one of the many HTML specifications from W3C. Where there is disagreement, there is often an option that you can enable to get the official behaviour.
The document to be parsed may be supplied in arbitrary chunks. This makes on-the-fly parsing as documents are received from the network possible.
If event driven parsing does not feel right for your application, you might want to use HTML::PullParser. This is an HTML::Parser subclass that allows a more conventional program structure.
SYNOPSIS:
use HTML::Parser ();
# Create parser object
$p = HTML::Parser->new( api_version => 3,
start_h => [&start, "tagname, attr"],
end_h => [&end, "tagname"],
marked_sections => 1,
);
# Parse document text chunk by chunk
$p->parse($chunk1);
$p->parse($chunk2);
#...
$p->eof; # signal end of document
# Parse directly from file
$p->parse_file("foo.html");
# or
open(my $fh, "<<less
HTML::Parser is not a generic SGML parser.
We have tried to make it able to deal with the HTML that is actually "out there", and it normally parses as closely as possible to the way the popular web browsers do it instead of strictly following one of the many HTML specifications from W3C. Where there is disagreement, there is often an option that you can enable to get the official behaviour.
The document to be parsed may be supplied in arbitrary chunks. This makes on-the-fly parsing as documents are received from the network possible.
If event driven parsing does not feel right for your application, you might want to use HTML::PullParser. This is an HTML::Parser subclass that allows a more conventional program structure.
SYNOPSIS:
use HTML::Parser ();
# Create parser object
$p = HTML::Parser->new( api_version => 3,
start_h => [&start, "tagname, attr"],
end_h => [&end, "tagname"],
marked_sections => 1,
);
# Parse document text chunk by chunk
$p->parse($chunk1);
$p->parse($chunk2);
#...
$p->eof; # signal end of document
# Parse directly from file
$p->parse_file("foo.html");
# or
open(my $fh, "<<less
Download (0.082MB)
Added: 2006-05-05 License: Perl Artistic License Price:
1269 downloads
HTML Purifier 2.1.1
HTML Purifier is the premiere PHP solution for all your HTML filtering needs. more>>
HTML Purifier project is the premiere PHP solution for all your HTML filtering needs. Tired of forcing users to use BBCode or some other obscure custom markup language due to the current landscape of deficient or hole-ridden HTML filterers? Look no further: HTMLPurifier will not only remove all malicious code (the stuff of XSS), it will also make sure the HTML is standards compliant.
There are a number of ad hoc HTML filtering solutions out there on the web (some examples including PEARs HTML_Safe, kses and SafeHtmlChecker.class.php) that claim to filter HTML properly, preventing malicious JavaScript and layout breaking HTML from getting through the parser. None of them, however, demonstrates a thorough knowledge of the DTD that defines HTML or the caveats of HTML that cannot be expressed by a DTD.
Configurable filters (such as kses or PHPs built-in striptags() function) have trouble validating the contents of attributes and can be subject to security attacks due to poor configuration. Other filters take the naive approach of blacklisting known threats and tags, failing to account for the introduction of new technologies, new tags, new attributes or quirky browser behavior.
However, HTML Purifier takes a different approach, one that doesnt use specification-ignorant regexes or narrow blacklists. HTML Purifier will decompose the whole document into tokens, and rigorously process the tokens by: removing non-whitelisted elements, transforming bad practice tags like font into span, properly checking the nesting of tags and their children and validating all attributes according to their RFCs.
To my knowledge, there is nothing like this on the web yet. Not even MediaWiki, which allows an amazingly diverse mix of HTML and wikitext in its documents, gets all the nesting quirks right. Existing solutions hope that no JavaScript will slip through, but either do not attempt to ensure that the resulting output is valid XHTML or send the HTML through a draconic XML parser (and yet still get the nesting wrong: SafeHtmlChecker.class.php does not prevent a tags from being nested within each other).
Enhancements:
- This version amends a few bugs in some of newly introduced features for 2.1, namely running the standalone download version in PHP4 and %URI.MakeAbsolute.
<<lessThere are a number of ad hoc HTML filtering solutions out there on the web (some examples including PEARs HTML_Safe, kses and SafeHtmlChecker.class.php) that claim to filter HTML properly, preventing malicious JavaScript and layout breaking HTML from getting through the parser. None of them, however, demonstrates a thorough knowledge of the DTD that defines HTML or the caveats of HTML that cannot be expressed by a DTD.
Configurable filters (such as kses or PHPs built-in striptags() function) have trouble validating the contents of attributes and can be subject to security attacks due to poor configuration. Other filters take the naive approach of blacklisting known threats and tags, failing to account for the introduction of new technologies, new tags, new attributes or quirky browser behavior.
However, HTML Purifier takes a different approach, one that doesnt use specification-ignorant regexes or narrow blacklists. HTML Purifier will decompose the whole document into tokens, and rigorously process the tokens by: removing non-whitelisted elements, transforming bad practice tags like font into span, properly checking the nesting of tags and their children and validating all attributes according to their RFCs.
To my knowledge, there is nothing like this on the web yet. Not even MediaWiki, which allows an amazingly diverse mix of HTML and wikitext in its documents, gets all the nesting quirks right. Existing solutions hope that no JavaScript will slip through, but either do not attempt to ensure that the resulting output is valid XHTML or send the HTML through a draconic XML parser (and yet still get the nesting wrong: SafeHtmlChecker.class.php does not prevent a tags from being nested within each other).
Enhancements:
- This version amends a few bugs in some of newly introduced features for 2.1, namely running the standalone download version in PHP4 and %URI.MakeAbsolute.
Download (0.16MB)
Added: 2007-08-07 License: LGPL (GNU Lesser General Public License) Price:
809 downloads
XML::Parser 2.34
XML::Parser is a perl module for parsing XML documents. more>>
XML::Parser is a perl module for parsing XML documents.
SYNOPSIS
use XML::Parser;
$p1 = new XML::Parser(Style => Debug);
$p1->parsefile(REC-xml-19980210.xml);
$p1->parse( Hello World );
# Alternative
$p2 = new XML::Parser(Handlers => {Start => &handle_start,
End => &handle_end,
Char => &handle_char});
$p2->parse($socket);
# Another alternative
$p3 = new XML::Parser(ErrorContext => 2);
$p3->setHandlers(Char => &text,
Default => &other);
open(FOO, xmlgenerator |);
$p3->parse(*FOO, ProtocolEncoding => ISO-8859-1);
close(FOO);
$p3->parsefile(junk.xml, ErrorContext => 3);
This module provides ways to parse XML documents. It is built on top of XML::Parser::Expat, which is a lower level interface to James Clarks expat library. Each call to one of the parsing methods creates a new instance of XML::Parser::Expat which is then used to parse the document.
Expat options may be provided when the XML::Parser object is created. These options are then passed on to the Expat object on each parse call. They can also be given as extra arguments to the parse methods, in which case they override options given at XML::Parser creation time.
The behavior of the parser is controlled either by "Style" and/or "Handlers" options, or by "setHandlers" method. These all provide mechanisms for XML::Parser to set the handlers needed by XML::Parser::Expat. If neither Style nor Handlers are specified, then parsing just checks the document for being well-formed.
When underlying handlers get called, they receive as their first parameter the Expat object, not the Parser object.
<<lessSYNOPSIS
use XML::Parser;
$p1 = new XML::Parser(Style => Debug);
$p1->parsefile(REC-xml-19980210.xml);
$p1->parse( Hello World );
# Alternative
$p2 = new XML::Parser(Handlers => {Start => &handle_start,
End => &handle_end,
Char => &handle_char});
$p2->parse($socket);
# Another alternative
$p3 = new XML::Parser(ErrorContext => 2);
$p3->setHandlers(Char => &text,
Default => &other);
open(FOO, xmlgenerator |);
$p3->parse(*FOO, ProtocolEncoding => ISO-8859-1);
close(FOO);
$p3->parsefile(junk.xml, ErrorContext => 3);
This module provides ways to parse XML documents. It is built on top of XML::Parser::Expat, which is a lower level interface to James Clarks expat library. Each call to one of the parsing methods creates a new instance of XML::Parser::Expat which is then used to parse the document.
Expat options may be provided when the XML::Parser object is created. These options are then passed on to the Expat object on each parse call. They can also be given as extra arguments to the parse methods, in which case they override options given at XML::Parser creation time.
The behavior of the parser is controlled either by "Style" and/or "Handlers" options, or by "setHandlers" method. These all provide mechanisms for XML::Parser to set the handlers needed by XML::Parser::Expat. If neither Style nor Handlers are specified, then parsing just checks the document for being well-formed.
When underlying handlers get called, they receive as their first parameter the Expat object, not the Parser object.
Download (0.22MB)
Added: 2006-06-14 License: Perl Artistic License Price:
1235 downloads
Nmap Parser 1.11
Nmap Parser is a Perl module to ease the pain of developing scripts or collecting network information from nmap scans. more>>
Nmap Parser is a module that implements a interface to the information contained in an nmap scan. It is implemented by parsing the xml scan data that is generated by nmap.
This will enable anyone who utilizes nmap to quickly create fast and robust security scripts that utilize the powerful port scanning abilities of nmap.
Enhancements:
- Parsing of distance information was added. Ignoring of taskend, taskbegin, and taskprogress information was added.
- Tests for nmap 4.20 were added.
- The license was changed to the MIT-style.
- The "always null" bug for the service->protocol call was fixed.
<<lessThis will enable anyone who utilizes nmap to quickly create fast and robust security scripts that utilize the powerful port scanning abilities of nmap.
Enhancements:
- Parsing of distance information was added. Ignoring of taskend, taskbegin, and taskprogress information was added.
- Tests for nmap 4.20 were added.
- The license was changed to the MIT-style.
- The "always null" bug for the service->protocol call was fixed.
Download (0.035MB)
Added: 2007-06-15 License: GPL (GNU General Public License) Price:
862 downloads
ShaniXmlParser 1.4.15
ShaniXmlParser is an XML/HTML DOM/SAX parser that can be validating. more>>
ShaniXmlParser is an XML/HTML DOM/SAX parser that can be validating. It can parse badly formed XML files.
ShaniXmlParser can parse files with inverted tags and bad escaped &,< and >. ShaniXmlParser expands all HTML entities. ShaniXmlParser is well suited to parse HTML files.
It is up to 3 times faster than the internal JDK 1.5 xerces parser and as fast as the internal JDK 1.4 Crimson parser, compliant with the jaxp/w3c DOM interfaces, and very small.
Enhancements:
- Support of DOM 2 HTML interfaces.
- 668/685 succeeded tests on DOM 2 HTML Test Validation suite.
<<lessShaniXmlParser can parse files with inverted tags and bad escaped &,< and >. ShaniXmlParser expands all HTML entities. ShaniXmlParser is well suited to parse HTML files.
It is up to 3 times faster than the internal JDK 1.5 xerces parser and as fast as the internal JDK 1.4 Crimson parser, compliant with the jaxp/w3c DOM interfaces, and very small.
Enhancements:
- Support of DOM 2 HTML interfaces.
- 668/685 succeeded tests on DOM 2 HTML Test Validation suite.
Download (2.0MB)
Added: 2007-04-25 License: GPL (GNU General Public License) Price:
913 downloads
Shell::Parser 0.04
Shell::Parser is a simple shell script parser. more>>
Shell::Parser is a simple shell script parser.
SYNOPSIS
use Shell::Parser;
my $parser = new Shell::Parser syntax => bash, handlers => {
};
$parser->parse(...);
$parser->eof;
This module implements a rudimentary shell script parser in Perl. It was primarily written as a backend for Syntax::Highlight::Shell, in order to simplify the creation of the later.
<<lessSYNOPSIS
use Shell::Parser;
my $parser = new Shell::Parser syntax => bash, handlers => {
};
$parser->parse(...);
$parser->eof;
This module implements a rudimentary shell script parser in Perl. It was primarily written as a backend for Syntax::Highlight::Shell, in order to simplify the creation of the later.
Download (0.017MB)
Added: 2007-04-06 License: Perl Artistic License Price:
934 downloads
Test-Parser 1.2
Test::Parser is a collection of parsers for different test output file formats. more>>
Test::Parser is a collection of parsers for different test output file formats. These parse the data into a general purpose data structure that can then be used to create reports, do post-processing analysis, etc.
Test-Parser can also export tests in SpikeSources TRPI test description XML language.
<<lessTest-Parser can also export tests in SpikeSources TRPI test description XML language.
Download (0.053MB)
Added: 2006-05-04 License: GPL (GNU General Public License) Price:
1268 downloads
CQL::Parser 1.0
CQL::Parser is a Perl module that compiles CQL strings into parse trees of Node subtypes. more>>
CQL::Parser is a Perl module that compiles CQL strings into parse trees of Node subtypes.
SYNOPSIS
use CQL::Parser;
my $parser = CQL::Parser->new();
my $root = $parser->parse( $cql );
CQL::Parser provides a mechanism to parse Common Query Language (CQL) statements. The best description of CQL comes from the CQL homepage at the Library of Congress http://www.loc.gov/z3950/agency/zing/cql/
CQL is a formal language for representing queries to information retrieval systems such as web indexes, bibliographic catalogs and museum collection information. The CQL design objective is that queries be human readable and human writable, and that the language be intuitive while maintaining the expressiveness of more complex languages.
A CQL statement can be as simple as a single keyword, or as complicated as a set of compoenents indicating search indexes, relations, relational modifiers, proximity clauses and boolean logic. CQL::Parser will parse CQL statements and return the root node for a tree of nodes which describes the CQL statement. This data structure can then be used by a client application to analyze the statement, and possibly turn it into a query for a local repository.
Each CQL component in the tree inherits from CQL::Node and can be one of the following: CQL::AndNode, CQL::NotNode, CQL::OrNode, CQL::ProxNode, CQL::TermNode, CQL::PrefixNode. See the documentation for those modules for their respective APIs.
Here are some examples of CQL statements:
george
dc.creator=george
dc.creator="George Clinton"
clinton and funk
clinton and parliament and funk
(clinton or bootsy) and funk
dc.creator="clinton" and dc.date="1976"
<<lessSYNOPSIS
use CQL::Parser;
my $parser = CQL::Parser->new();
my $root = $parser->parse( $cql );
CQL::Parser provides a mechanism to parse Common Query Language (CQL) statements. The best description of CQL comes from the CQL homepage at the Library of Congress http://www.loc.gov/z3950/agency/zing/cql/
CQL is a formal language for representing queries to information retrieval systems such as web indexes, bibliographic catalogs and museum collection information. The CQL design objective is that queries be human readable and human writable, and that the language be intuitive while maintaining the expressiveness of more complex languages.
A CQL statement can be as simple as a single keyword, or as complicated as a set of compoenents indicating search indexes, relations, relational modifiers, proximity clauses and boolean logic. CQL::Parser will parse CQL statements and return the root node for a tree of nodes which describes the CQL statement. This data structure can then be used by a client application to analyze the statement, and possibly turn it into a query for a local repository.
Each CQL component in the tree inherits from CQL::Node and can be one of the following: CQL::AndNode, CQL::NotNode, CQL::OrNode, CQL::ProxNode, CQL::TermNode, CQL::PrefixNode. See the documentation for those modules for their respective APIs.
Here are some examples of CQL statements:
george
dc.creator=george
dc.creator="George Clinton"
clinton and funk
clinton and parliament and funk
(clinton or bootsy) and funk
dc.creator="clinton" and dc.date="1976"
Download (0.019MB)
Added: 2007-06-20 License: Perl Artistic License Price:
856 downloads
Test::Parser 1.1
Test::Parser is a collection of parsers for different test output file formats. more>>
Test::Parser is a collection of parsers for different test output file formats.
These parse the data into a general purpose data structure that can then be used to create reports, do post-processing analysis, etc.
Test::Parser can also export tests in SpikeSources TRPI test description XML language.
Installation:
To install the script and man pages in the standard areas, give the sequence of commands
$ perl Makefile.PL
$ make
$ make test
$ make install # you probably need to do this step as superuser
If you want to install the script in your own private space, use
$ perl Makefile.PL PREFIX=/home/joeuser
INSTALLMAN1DIR=/home/joeuser/man/man1
INSTALLMAN3DIR=/home/joeuser/man/man3
$ make
$ make test
$ make install # can do this step as joeuser
Note that `make test` does nothing interesting.
Enhancements:
- This release improves the LTP parser and adds a parse_ltp script that prints a tabular summary of the PASS/FAILs of test cases.
<<lessThese parse the data into a general purpose data structure that can then be used to create reports, do post-processing analysis, etc.
Test::Parser can also export tests in SpikeSources TRPI test description XML language.
Installation:
To install the script and man pages in the standard areas, give the sequence of commands
$ perl Makefile.PL
$ make
$ make test
$ make install # you probably need to do this step as superuser
If you want to install the script in your own private space, use
$ perl Makefile.PL PREFIX=/home/joeuser
INSTALLMAN1DIR=/home/joeuser/man/man1
INSTALLMAN3DIR=/home/joeuser/man/man3
$ make
$ make test
$ make install # can do this step as joeuser
Note that `make test` does nothing interesting.
Enhancements:
- This release improves the LTP parser and adds a parse_ltp script that prints a tabular summary of the PASS/FAILs of test cases.
Download (0.044MB)
Added: 2006-04-07 License: GPL (GNU General Public License) Price:
1295 downloads
C++ WSDL Parser 1.9.3
C++ WSDL Parser is an efficient C++ Web services library. more>>
C++ WSDL Parser is an efficient C++ Web services library that includes a standards compliant WSDL parser API, a Schema parser and validator, an XML parser and serializer, and an API for dynamically inspecting and invoking WSDL Web services.
Enhancements:
- Many WSDLs can now be dynamically invoked.
- Added documentation (doxygen for the API).
- Better error reporting when types are found missing.
<<lessEnhancements:
- Many WSDLs can now be dynamically invoked.
- Added documentation (doxygen for the API).
- Better error reporting when types are found missing.
Download (0.56MB)
Added: 2005-10-06 License: LGPL (GNU Lesser General Public License) Price:
1483 downloads
Makefile::Parser 0.11
Makefile::Parser is a Simple Parser for Makefiles. more>>
Makefile::Parser is a Simple Parser for Makefiles.
SYNOPSIS
use Makefile::Parser;
$parser = Makefile::Parser->new;
# Equivalent to ->parse(Makefile);
$parser->parse or
die Makefile::Parser->error;
# Get last value assigned to the specified variable CC:
print $parser->var(CC);
# Get all the variable names defined in the Makefile:
@vars = $parser->vars;
print join( , sort @vars);
@roots = $parser->roots; # Get all the "root targets"
print $roots[0]->name;
@tars = $parser->targets; # Get all the targets
$tar = join("n", $tars[0]->commands);
# Get the default target, say, the first target defined in Makefile:
$tar = $parser->target;
$tar = $parser->target(install);
# Get the name of the target, say, install here:
print $tar->name;
# Get the dependencies for the target install:
@depends = $tar->depends;
# Access the shell command used to build the current target.
@cmds = $tar->commands;
# Parse another file using the same Parser object:
$parser->parse(Makefile.old) or
die Makefile::Parser->error;
# Get the target who is specified by variable EXE_FILE
$tar = $parser->target($parser->var(EXE_FILE));
This is a parser for Makefiles. At this very early stage, the parser only supports a limited set of features, so it may not recognize some advanced features provided by certain make tools like GNU make. Its initial purpose is to provide basic support for another module named Makefile::GraphViz, which is aimed to render the building process specified by a Makefile using the amazing GraphViz library. The Make module is not satisfactory for this purpose, so I decided to build one of my own.
<<lessSYNOPSIS
use Makefile::Parser;
$parser = Makefile::Parser->new;
# Equivalent to ->parse(Makefile);
$parser->parse or
die Makefile::Parser->error;
# Get last value assigned to the specified variable CC:
print $parser->var(CC);
# Get all the variable names defined in the Makefile:
@vars = $parser->vars;
print join( , sort @vars);
@roots = $parser->roots; # Get all the "root targets"
print $roots[0]->name;
@tars = $parser->targets; # Get all the targets
$tar = join("n", $tars[0]->commands);
# Get the default target, say, the first target defined in Makefile:
$tar = $parser->target;
$tar = $parser->target(install);
# Get the name of the target, say, install here:
print $tar->name;
# Get the dependencies for the target install:
@depends = $tar->depends;
# Access the shell command used to build the current target.
@cmds = $tar->commands;
# Parse another file using the same Parser object:
$parser->parse(Makefile.old) or
die Makefile::Parser->error;
# Get the target who is specified by variable EXE_FILE
$tar = $parser->target($parser->var(EXE_FILE));
This is a parser for Makefiles. At this very early stage, the parser only supports a limited set of features, so it may not recognize some advanced features provided by certain make tools like GNU make. Its initial purpose is to provide basic support for another module named Makefile::GraphViz, which is aimed to render the building process specified by a Makefile using the amazing GraphViz library. The Make module is not satisfactory for this purpose, so I decided to build one of my own.
Download (0.018MB)
Added: 2006-10-24 License: Perl Artistic License Price:
1098 downloads
Secleted [ 0 ] software to compare
Copyright Notice:
Software piracy is theft, Using crack, password, serial numbers, registration codes, key generators is illegal and prevent future software development. The above html parser search only lists software in full, demo and trial versions for free download. Download links are directly from our mirror sites or publisher sites, torrent files or links from rapidshare.com, yousendit.com or megaupload.com are not allowed