visual regexp 3.1
cb2Bib 1.3.1
The cb2Bib is a tool for rapidly extracting unformatted biblographic references from email alerts. more>>
The cb2Bib reads the clipboard text contents and process it against a set of predefined patterns. If this automatic detection is successful, cb2Bib formats the clipboard data according to the structured BibTeX reference standard.
Otherwise, if no predefined format pattern is found or if detection proves to be difficult, manual data extraction is greatly simplified by cb2Bib. In most cases, such manual data extraction will provide with a new, personalized pattern to be included within the predefined pattern set for future automatic extractions.
Once the bibliographic reference is correctly extracted, it is added to a specified BibTeX database file. Optionally, article PDF files, if available, are renamed to its citeID and moved to a desired directory as a personal article library
Major Features:
- Select the reference to import from the email or web browser: On Unix machines, cb2Bib automatically detects mouse selections and clipboard changes. On Windows machines, copy or Ctrl-C is necessary to activate cb2Bib automatic processing.
- cb2Bib automatic processing: Once text is selected cb2Bib initiates the automatic reference extraction. It uses the predefined patterns from file regexp.txt to attempt automatic extraction. See Configuring Files section for setting the user predefined pattern matching expression file. After a successful detection bibliographic fields appear on the cb2Bib item line edits. Manual editing is possible at this stage.
- cb2Bib manual processing: If no predefined format pattern is found or if detection proves to be difficult, a manual data extraction must be performed. Select individual reference items from the cb2Bib clipboard area. A popup menu will appear after selection is made. Choose the corresponding bibliographic field. See BiBTeX Entry Types available as cb2Bib fields. Selection is post-processed and added to the cb2Bib item line edit. cb2Bib field tags will show on the cb2Bib clipboard area. Once the manual processing is done, cb2Bib clipboard area will contain the matching pattern. The pattern can be further edited and stored to the regexp.txt file using Insert Regular Expression, Alt+I. See the Extracting Data from the Clipboard and The Regular Expression Editor sections.
- Download reference to cb2Bib: The cb2Bib has the built-in functionality to interact with publishers "Download reference to Citation Manager" service. Choose BibTeX format, or any other format that you can translate using External Clipboard Preparsing Command. See Additional, Keyboard Functionality, Alt C. Click "Download" from your browser. When asked "Open with..." select cb2Bib. The cb2Bib will be launched if no running instance is found. If already running, it will place the downloaded reference to the clipboard, and it will start processing. Make sure your running instance is aware of clipboard changes. See Buttons Functionality. For convenience, the shell script c2bimport, and the desktop config file c2bimport.desktop are also provided.
- Adding documents: PDF and other documents can be added to the BibTeX reference by dragging the file icon and dropping it into the cb2Bib's panel. Optionally, document files, are renamed to its citeID and moved to a desired directory as a personal article library (See Configuring Documents section). Linked to a reference documents correspond to the BibTeX tag file. Usual reference manager software will permit to retrieve and visualize these files. Download, copy and/or moving is scheduled and performed once the reference is accepted, e.g., once it is saved by pressing Save Reference button.
- Multiple retrieving from PDF files: Multiple PDF or convertible to text files can be sequentially processed by dragging a set of files into cb2Bib's PDFImport dialog. By starting the processing button, files are sequentially converted to text and send to cb2Bib clipboard panel for reference extraction. See PDF Reference Import for details.
- Journal-Volume-Page Queries: Takes input Journal, Volume, and first page from the corresponding edit lines and attempts to complete the reference. Additionally, queries consider Title, DOI, and an excerpt, which is a simplified clipboard panel contents. See Configuring Network section, the distribution file netqinf.txt, and Release Note cb2Bib 0.3.5 for customization and details.
- BibTeX Editor: cb2Bib includes a practical text editor suitable for corrections and additions. cb2Bib capabilities are readily available within the editor. E.g., the reference is first sent to cb2Bib by selecting it, and later retrieved from cb2Bib to the editor using 'right click' + 'Paste Current BibTeX'. Interconversions Unicode LaTeX, long abbreviated journal name, and adding/renaming PDF files are easily available. BibTeX Editor is also accessible through a shell command line.
- Advanced features, and processing and extraction details are described in the following sections:
- Extracting Data from the Clipboard
- Processing of author's names
- Processing of journal names
- Field Recognition Rules
- The Regular Expression Editor
- Configuration information is described in the following sections:
- Configuration
- Predefined cite ID placeholders
- Utilities and modules are described in the following sections:
- Search BibTeX files for references
- Embedded File Editor
- PDF Reference Import
- The cb2Bib Command Line
- Reading and writing bibliographic metadata
- The cb2Bib Annote
- The cb2Bib Citer
Enhancements:
- Added Check Repeated functionality for current reference
- Fixed parser not processing last field in inverted comma style BibTeX
- Set netqinf.txt to use internal XML parser for PubMed
- Fixed packaging, double copying scripts and initial external tool setting
- Fixed c2bciter script not passing all arguments (Thanks to F. Rusconi)
Requirements:
To compile cb2Bib, the following libraries must be present and accessible:
- Qt 4.4.0 or higher from Trolltech. On a Linux platform with Qt preinstalled, make sure that the devel packages and Qt tools are also present.
- WebKit library (optional) to compile cb2Bib Annote viewer. It is already included in Qt > 4.4.0 library. No special action/flag is needed during compilation.
- X11 header files if compiling on Unix platforms. Concretely, headers X11/Xlib.h and X11/Xatom.h are needed.
- The header files fcntl.h and unistd.h from glibc-devel package are also required. Otherwise compilation will fail with reference list.cpp:227: `close' undeclared.
Kiwi Log Viewer (Lin) 2.0
Free log file viewer for Linux more>> Kiwi Log Viewer for Linux is a freeware application that displays text based log files in a tabular format. Only a small section of the file is read from disk at a time which saves memory and allows you to view a file that would be too big to fit in memory. The tail option monitors the specified log file for changes and displays any new data that is added in real time. Features colorization based on sub-string or RegExp matches<<less

Visual Paradigm for UML (CE) for Linux 6.1
UML CASE tool - UML diagrams, use case modeling, reverse engineering and more... more>> Visual Paradigm for UML (VP-UML) is a powerful, easy-to-use UML modeling tool that supports full software lifecycle - analysis, design, coding, testing and deployment. This CASE tool helps you build quality applications faster, better and cheaper. You can draw UML diagrams, generate code from class diagrams and vice versa, and generate UML documentation. This UML CASE tool also provides plenty UML resources - UML demos, UML tutorials, and UML sample projects.
VP-UML Features:
+Supporting the latest UML notation (use case diagram, collaboration diagram, sequence diagram, class diagram/object diagram/package diagram, state diagram, activity diagram, component diagram, deployment diagram)
+OO analysis (OOA), OO design (OOD) support
+Textual analysis for identifying candidate use cases, classes, flow of events...
+CRC Card for finding objects
+Use case modeling (use case description...)
+Business Workflow diagram
+Round-trip engineering
+Code Generation - diagram to code, model to code, generate code (UML to code, UML model to Java)
+Reverse engineering - code to diagram, code to model (Java to UML diagram, Java to UML models)
+Instant Reverse for Java, C++, Dot NET dll/exe, XML, CORBA IDL
+Automatic synchronization between diagrams and source code
+Report generator for generating documentation to PDF/HTML
+Automatic diagram layout - rearrange shapes and connectors in UML diagrams in different elegant styles
+Export XMI/Import XMI
+Import Rational Rose mdl file
+MS Visio Integration - drawing UML diagrams with your own shapes by using Visio stencils
+Export diagrams to SVG, PNG, JPG
+Plugin and template
+Multilingual support
+More...
Other UML Plugins/UML Modeling Tools:
Windows Platform:
+SDE for Microsoft Visual Studio .NET
Java Platform (Linux/Mac OS X/Windows):
+SDE for Oracle JDeveloper
+SDE for IBM WebSphere (WSAD)
+SDE for Borland JBuilder
+SDE for IntelliJ IDEA
+SDE for Eclipse
+SDE for NetBeans
+SDE for Sun ONE
+More SDE...<<less
Visual Paradigm for UML 6.1 (Community Edition)
Visual Paradigm for UML is a powerful, easy-to-use UML modelling and CASE tool. more>> <<less
Visual Paradigm for UML is an ALL-IN-ONE Visual Development Platform. VP-UML supports the full development life cycle, the latest UML notation for visual modelingRegexp::Genex 0.07
Regexp::Genex - get the strings a regex will match, with a regex. more>>
SYNPOSIS
# first try:
$ perl -MRegexp::Genex=:all -le print for strings(qr/a(b|c)d{2,3}e*/)
$ perl -x `pmpath Regexp::Genex`
#!/usr/bin/perl -l
use Regexp::Genex qw(:all);
$regex = shift || "a(b|c)d{2,4}?";
print "Trying: $regex";
print for strings($regex);
# abdd
# abddd
# abdddd
# acdd
# acddd
# acdddd
print "nThe regex code for that was:nqr/";
print strings_rx($regex);
print "/xn";
my $generator = generator($regex);
print "Taking first two using generator";
print $generator->() for 1..2;
my $big_rx = b*?c*?d*?; # * becomes {0,20}
my $big = generator($big_rx, ($max_length = 100) );
print "Taking string 100 of $big_rx";
print $big->(100); # (caveats below)
# ccccdddddddddddddddd NOT dx100 as you may expect
TopAZ 0.1
TopAZ is mainly used for viewing, creating, editing & analyzing graphs. more>>
ShiftJIS::Regexp 1.00
ShiftJIS::Regexp contains regular expressions in Shift-JIS. more>>
SYNOPSIS
use ShiftJIS::Regexp qw(:all);
match($string, p{Hiragana}{2}p{Digit}{2});
match($string, pH{2}pD{2});
# these two are equivalent:
This module provides some functions to use regular expressions in Shift-JIS on the byte-oriented perl.
The legal Shift-JIS character in this module must match the following regular expression:
[x00-x7FxA1-xDF]|[x81-x9FxE0-xFC][x40-x7Ex80-xFC]
To avoid false matching in multibyte encoding, this module uses anchoring technique to ensure each matching position places at the character boundaries.
cf. perlfaq6, "How can I match strings with multibyte characters?"
Functions
re(PATTERN)
re(PATTERN, MODIFIER)
Returns a regular expression parsable by the byte-oriented perl.
PATTERN is specified as a string. MODIFIER is specified as a string. Modifiers in the following list are allowed.
i case-insensitive pattern (only for ascii alphabets)
I case-insensitive pattern (greek, cyrillic, fullwidth latin)
j hiragana-katakana-insensitive pattern (but halfwidth katakana
are not considered.)
s treat string as single line
m treat string as multiple lines
x ignore whitespace (i.e. [x20nrtf]) unless backslashed
or inside a character class; but comments are not recognized!
o once parsed (not compiled!) and the result is cached internally.
o modifier
while ( ) {
print replace($_, (perl), $1, igo);
}
is more efficient than
while ( ) {
print replace($_, (perl), $1, ig);
}
because in the latter case the pattern is parsed every time
whenever the function is called.
match(STRING, PATTERN)
match(STRING, PATTERN, MODIFIER)
An emulation of m// operator aware of Shift-JIS. But, to emulate @list = $string =~ m/PATTERN/g, the pattern should be parenthesized (capturing parentheses are not added automatically).
@list = match($string, pH, g); # wrong; returns garbage!
@list = match($string,(pH),g); # good
PATTERN is specified as a string. MODIFIER is specified as a string.
i,I,j,s,m,x,o please see re().
g match globally
z tell the function the pattern matches an empty string
(sorry, due to the poor auto-detection)
replace(STRING or SCALAR REF, PATTERN, REPLACEMENT)
replace(STRING or SCALAR REF, PATTERN, REPLACEMENT, MODIFIER)
An emulation of s/// operator but aware of Shift-JIS.
If a reference to a scalar is specified as the first argument, substitutes the referent scalar and returns the number of substitutions made. If a string (not a reference) is specified as the first argument, returns the substituted string and the specified string is unaffected.
MODIFIER is specified as a string.
i,I,j,s,m,x,o please see re().
g,z please see match().
jsplit(PATTERN or ARRAY REF of [PATTERN, MODIFIER], STRING)
jsplit(PATTERN or ARRAY REF of [PATTERN, MODIFIER], STRING, LIMIT)
An emulation of CORE::split but aware of Shift-JIS.
In scalar/void context, it does not split into the @_ array; in scalar context, only returns the number of fields found.
PATTERN is specified as a string. But as PATTERN has no special meaning; it splits the string on a single space similarly to CORE::split / /.
When you want to split the string on whitespace, pass an undefined value as PATTERN or use the splitspace() function.
jsplit(undef, " x81x40 This is x81x40 perl.");
splitspace(" x81x40 This is x81x40 perl.");
# (This, is, perl.)
If you want to pass pattern with modifiers, specify an arrayref of [PATTERN, MODIFIER] as the first argument. You can also use "Embedded Modifiers").
MODIFIER is specified as a string.
i,I,j,s,m,x,o please see re().
splitspace(STRING)
splitspace(STRING, LIMIT)
This function emulates CORE::split( , STRING, LIMIT). It returns a list given by split STRING on whitespace including "x81x40" (IDEOGRAPHIC SPACE). Leading whitespace characters do not produce any field.
Note: splitspace(STRING, LIMIT) is equivalent to jsplit(undef, STRING, LIMIT).
splitchar(STRING)
splitchar(STRING, LIMIT)
This function emulates CORE::split(//, STRING, LIMIT). It returns a list given by split of STRING into characters.
Note: splitchar(STRING, LIMIT) is equivalent to jsplit(, STRING, LIMIT).
Regexp::Common::time 0.01
Regexp::Common::time Perl module contains date and time regexps. more>>
SYNOPSIS
use Regexp::Common qw(time);
# Piecemeal, Time::Format-like patterns
$RE{time}{tf}{-pat => pattern}
# Piecemeal, strftime-like patterns
$RE{time}{strftime}{-pat => pattern}
# Match ISO8601-style date/time strings
$RE{time}{iso}
# Fuzzy date patterns
# YEAR/MONTH/DAY
$RE{time}{ymd} # Most flexible
$RE{time}{YMD} # Strictest (equivalent to y4m2d2)
# Other available patterns: y2md, y4md, y2m2d2, y4m2d2
# MONTH/DAY/YEAR (American style)
$RE{time}{mdy} # Most flexible
$RE{time}{MDY} # Strictest (equivalent to m2d2y4)
# Other available patterns: mdy2, mdy4, m2d2y2, m2d2y4
# DAY/MONTH/YEAR (European style)
$RE{time}{mdy} # Most flexible
$RE{time}{MDY} # Strictest (equivalent to d2m2y4)
# Other available patterns: dmy2, dmy4, d2m2y2, d2m2y4
# Fuzzy time pattern
# HOUR/MINUTE/SECOND
$RE{time}{hms} # H: matches 1 or 2 digits; 12 or 24 hours
# M: matches 2 digits.
# S: matches 2 digits; may be omitted
# May be followed by "a", "am", "p.m.", etc.
This module creates regular expressions that can be used for parsing dates and times. See Regexp::Common for a general description of how to use this interface.
Parsing dates is a dirty business. Dates are generally specified in one of three possible orders: year/month/day, month/day/year, and day/month/year. Years can be specified with four digits or with two digits (with assumptions made about the century). Months can be specified as one digit, two digits, as a spelled-out name, or as a three-letter abbreviation. Day numbers can be one digit or two digits, with limits depending on the month (and, in the case of February, even the year). Also, different people use different punctuation for separating the various elements.
A human can easily recognize that "October 21, 2005" and "21.10.05" refer to the same date, but its tricky to get a program to come to the same conclusion. This module attempts to make it possible to do so, with a minimum of difficulty.
If you know the exact format of the data to be matched, use one of the specific, piecemeal pattern builders: tf or strftime. If there is some variability, use one of the fuzzy-matching patterns in the dmy, mdy, or ymd families. If the data are wildly variable, such as raw user input, give up and use the Date::Manip or Date::Parse module.
Time values are generally much simpler to parse than date values. Only one fuzzy pattern is provided, and it should suffice for most needs.
Business Process Visual ARCHITECT 2.1
Business Process Visual ARCHITECT is a full-featured business process modeler. more>>
Business Process Visual ARCHITECT provides an easy-to-use diagramming environment for you to model your business process, and is a proven solution for bridging the gap between business analysts and IT professionals.
Main features:
- Frictionless business modeling environment
- On-the-fly syntax check and correction according to BPMN specification
- Advanced printing facility for outputting large business process diagram
- Share business process diagram among your company with Teamwork Server
- Incorporate user-defined images to the business process diagram to increase the readability.
Enhancements:
- Branch and tag capabilities were added to the VP Teamwork Server, including Subversion and CVS repository integration.
- This allows different modeling projects to be run in parallel while keeping the release quality project stable in the trunk.
- There were also a number of enhancements for various other features.
Regexp::Log 0.04
Regexp::Log is a Perl base class for log files regexp builders. more>>
SYNOPSIS
my $foo = Regexp::Log::Foo->new(
format => custom %a %b %c/%d,
capture => [qw( host code )],
);
# the format() and capture() methods can be used to set or get
$foo->format(custom %g %e %a %w/%s %c);
$foo->capture(qw( host code ));
# this is necessary to know in which order
# we will receive the captured fields from the regexp
my @fields = $foo->capture;
# the all-powerful capturing regexp :-)
my $re = $foo->regexp;
while () {
my %data;
@data{@fields} = /$re/; # no need for /o, its a compiled regexp
# now munge the fields
...
}
Class::Struct::FIELDS 1.1
Class::Struct::FIELDS module combine Class::Struct, base and fields. more>>
SYNOPSIS
(This page documents Class::Struct::FIELDS v.1.1.)
use Class::Struct::FIELDS;
# declare struct, based on fields, explicit class name:
struct (CLASS_NAME => { ELEMENT_NAME => ELEMENT_TYPE, ... });
use Class::Struct::FIELDS;
# declare struct, based on fields, explicit class name
# with inheritance:
struct (CLASS_NAME => [qw(BASE_CLASSES ...)],
{ ELEMENT_NAME => ELEMENT_TYPE, ... });
package CLASS_NAME;
use Class::Struct::FIELDS;
# declare struct, based on fields, implicit class name:
struct (ELEMENT_NAME => ELEMENT_TYPE, ...);
package CLASS_NAME;
use Class::Struct::FIELDS;
# declare struct, based on fields, implicit class name
# with inheritance:
struct ([qw(BASE_CLASSES ...)], ELEMENT_NAME => ELEMENT_TYPE, ...);
package MyObj;
use Class::Struct::FIELDS;
# declare struct with four types of elements:
struct (s => $, a => @, h => %, x => &, c => My_Other_Class);
$obj = new MyObj; # constructor
# scalar type accessor:
$element_value = $obj->s; # element value
$obj->s (new value); # assign to element
# array type accessor:
$ary_ref = $obj->a; # reference to whole array
$ary_element_value = $obj->a->[2]; # array element value
$ary_element_value = $obj->a (2); # same thing
$obj->a->[2] = new value; # assign to array element
$obj->a (2, newer value); # same thing
# hash type accessor:
$hash_ref = $obj->h; # reference to whole hash
$hash_element_value = $obj->h->{x}; # hash element value
$hash_element_value = $obj->h (x); # same thing
$obj->h->{x} = new value; # assign to hash element
$obj->h (x, newer value); # same thing
# code type accessor:
$code_ref = $obj->x; # reference to code
$obj->x->(...); # call code
$obj->x (sub {...}); # assign to element
# regexp type accessor:
$regexp = $obj->r; # reference to code
$string =~ m/$obj->r/; # match regexp
$obj->r (qr/ ... /); # assign to element
# class type accessor:
$element_value = $obj->c; # object reference
$obj->c->method (...); # call method of object
$obj->c (My_Other_Class::->new); # assign a new object
Class::Struct::FIELDS exports a single function, struct. Given a list of element names and types, and optionally a class name and/or an array reference of base classes, struct creates a Perl 5 class that implements a "struct-like" data structure with inheritance.
The new class is given a constructor method, new, for creating struct objects.
Each element in the struct data has an accessor method, which is used to assign to the element and to fetch its value. The default accessor can be overridden by declaring a sub of the same name in the package. (See Example 2.)
Each elements type can be scalar, array, hash, code or class.
Text::RewriteRules 0.10
Text::RewriteRules Perl module contains a system to rewrite text using regexp-based rules. more>>
SYNOPSIS
use Text::RewriteRules;
RULES email
.==> DOT
@==> AT
ENDRULES
email("ambs@cpan.org") # returns ambs AT cpan DOT org
RULES/m inc
(d+)=e=> $1+1
ENDRULE
inc("I saw 11 cats and 23 docs") # returns I saw 12 cats and 24 docs
ABSTRACT
This module uses a simplified syntax for regexp-based rules for rewriting text. You define a set of rules, and the system applies them until no more rule can be applied.
Two variants are provided:
traditional rewrite (RULES function):
while it is possible do substitute
| apply first substitution rule
cursor based rewrite (RULES/m function):
add a cursor to the begining of the string
while not reach end of string
| apply substitute just after cursor and advance cursor
| or advance cursor if no rule can be applied
A lot of computer science problems can be solved using rewriting rules.
Rewriting rules consist of mainly two parts: a regexp (LHS: Left Hand Side) that is matched with the text, and the string to use to substitute the content matched with the regexp (RHS: Right Hand Side).
Now, why dont use a simple substitute? Because we want to define a set of rules and match them again and again, until no more regexp of the LHS matches.
A point of discussion is the syntax to define this system. A brief discussion shown that some users would prefer a function to receive an hash with the rules, some other, prefer some syntax sugar.
The approach used is the last: we use Filter::Simple such that we can add a specific non-perl syntax inside the Perl script. This improves legibility of big rewriting rules sytems.
This documentation is divided in two parts: first we will see the reference of the module. Kind of, what it does, with a brief explanation. Follows a tutorial which will be growing through time and releases.
Regexp::Wildcards 0.06
Regexp::Wildcards is a Perl module that converts wildcard expressions to Perl regular expressions. more>>
SYNOPSIS
use Regexp::Wildcards qw/wc2re/;
my $re;
$re = wc2re a{b?,c}* => unix; # Do it Unix style.
$re = wc2re a?,b* => win32; # Do it Windows style.
$re = wc2re *{x,y}? => jokers; # Process the jokers & escape the rest.
$re = wc2re %a_c% => sql; # Turn SQL wildcards into regexps.
In many situations, users may want to specify patterns to match but dont need the full power of regexps. Wildcards make one of those sets of simplified rules. This module converts wildcard expressions to Perl regular expressions, so that you can use them for matching. It handles the * and ? shell jokers, as well as Unix bracketed alternatives {,}, but also % and _ SQL wildcards. Backspace () is used as an escape character. Wrappers are provided to mimic the behaviour of Windows and Unix shells.
VARIABLES
These variables control if the wildcards jokers and brackets must capture their match. They can be globally set by writing in your program
$Regexp::Wildcards::CaptureSingle = 1;
# From then, "exactly one" wildcards are capturing
or can be locally specified via local
{
local $Regexp::Wildcards::CaptureSingle = 1;
# In this block, "exactly one" wildcards are capturing.
...
}
# Back to the situation from before the block
This section describes also how those elements are translated by the functions.
$CaptureSingle
When this variable is true, each occurence of unescaped "exactly one" wildcards (i.e. ? jokers or _ for SQL wildcards) are made capturing in the resulting regexp (they are be replaced by (.)). Otherwise, they are just replaced by .. Default is the latter.
For jokers :
a???b?? is translated to a(.)(.)(.)b?(.) if $CaptureSingle is true
a...b?. otherwise (default)
For SQL wildcards :
a___b__ is translated to a(.)(.)(.)b_(.) if $CaptureSingle is true
a...b_. otherwise (default)
$CaptureAny
By default this variable is false, and successions of unescaped "any" wildcards (i.e. * jokers or % for SQL wildcards) are replaced by one single .*. When it evalutes to true, those sequences of "any" wildcards are made into one capture, which is greedy ((.*)) for $CaptureAny > 0 and otherwise non-greedy ((.*?)).
For jokers :
a***b** is translated to a.*b*.* if $CaptureAny is false (default)
a(.*)b*(.*) if $CaptureAny > 0
a(.*?)b*(.*?) otherwise
For SQL wildcards :
a%%%b%% is translated to a.*b%.* if $CaptureAny is false (default)
a(.*)b%(.*) if $CaptureAny > 0
a(.*?)b%(.*?) otherwise
$CaptureBrackets
If this variable is set to true, valid brackets constructs are made into ( | ) captures, and otherwise they are replaced by non-capturing alternations ((?: | )), which is the default.
a{b},{c} is translated to a(b}|{c) if $CaptureBrackets is true
a(?:b}|{c) otherwise (default)
Regexp::Ignore 0.03
Regexp::Ignore is a Perl module that let us ignore unwanted parts, while parsing text. more>>
WARNING
This is an alpha code. Really. It was written in the end of 2001. It is not yet checked much. The only reason I submit it to CPAN that early is to get feedback about the idea, and hopefully to get some help in finding the many bugs that must still be in it. In our company we use this code, though, and for our needs it runs well.
SYNOPSIS
use Regexp::IgnoreXXX;
my $rei = new Regexp::IgnoreXXX($text,
"");
# split the wanted text from the unwanted text
$rei->split();
# use substitution function
$rei->s((var)_(d+), $2$1, gi);
$rei->s((d+):(d+), $2:$1);
# merge back to get the resulted text
my $changed_text = $rei->merge();
Markup languages, like HTML, are difficult to parse. The reason is that you can have a line like:
< font size=+1 >H< /font >ello < font size=+1 >W< /font >orld
How can we find the string "Hello World", in the above line, and replace it by "Hello Universe" (which is a lot deeper)? Or how can we run a speller on the text and replace the mistakes with suggestions for the correct spelling?
This module come to help you doing exactly that.
Actually the module let you first split the text to the parts you are interested in and the unwanted parts. For example, all the HTML tags can be taken as unwanted parts.
Then it let you parse the part you are interested in (while totally ignoring the unwanted parts).
In the end it let you merge back the unwanted parts with the possibly changed parts you were interested in.
There is just one catch. It uses the assumption that when you replace the above "Hello World" to "Hello Universe", all the unwanted parts between the start of the match to the end of the match, will be pushed after the text that will replace the match. This is not really understood right? Look at the example:
The text:
< font size=+1 >H< /font >ello < font size=+1 >W< /font >orld
will be first split and we will get the "cleaned" text:
Hello World
Then we can parse it using something like:
s/Hello World/Hello Universe/;
This will give us the changed "cleaned" text:
Hello Universe
When we will merge with the unwanted parts we will get
< font size=+1 >Hello Universe< /font >< font size=+1 >< /font >
So, the unwanted parts in the match were pushed after the replacer.
netstiff 20070621
netstiff is a powerful Web page update checker. more>>
Enhancements:
- As a release focus, (beta) FTP support has been added with test methods "diff", "size", and "date".
- Furthermore, stderr is used for errors and warnings in addition to the log file.
- To suppress this behaviour, the -S option has been introduced.
- The configurator experienced some changes, too, e.g. URIs can be titled now.