Unicode::Regex::Set 0.02
Sponsored Links
Unicode::Regex::Set 0.02 Ranking & Summary
File size:
0.005 MB
Platform:
Any Platform
License:
Perl Artistic License
Price:
Downloads:
835
Date added:
2007-07-11
Publisher:
SADAHIRO Tomoyuki
Unicode::Regex::Set 0.02 description
Unicode::Regex::Set is a subtraction and intersection of Character Sets in Unicode Regular Expressions.
SYNOPSIS
use Unicode::Regex::Set qw(parse);
$regex = parse([p{Latin} & p{L&} - A-Z]);
Perl 5.8.0 misses subtraction and intersection of characters, which is described in Unicode Regular Expressions (UTS #18). This module provides a mimic syntax of character classes including subtraction and intersection, taking advantage of look-ahead assertions.
The syntax provided by this module is considerably incompatible with the standard Perls regex syntax.
Any whitespace character (that matches /s/) is allowed between any tokens. Square brackets ([ and ]) are used for grouping. A literal whitespace and square brackets must be backslashed (escaped with a backslash, ). You cannot put literal ] at the start of a group.
A POSIX-style character class like [:alpha:] is allowed since its [ is not a literal.
SEPARATORS (& for intersection, | for union, and - for subtraction) should be enclosed with one or more whitespaces. E.g. [A&Z] is a list of A, &, Z. [A-Z] is a character range from A to Z. [A-Z - Z] is a set by removal of [Z] from [A-Z].
Union operator | may be omitted. E.g. [A-Z | a-z] is equivalent to [A-Z a-z], and also to [A-Za-z].
Intersection operator & has high precedence, so [p{A} p{B} & p{C} p{D}] is equivalent to [p{A} | [p{B} & p{C}] | p{D}].
Subtraction operator - has low precedence, so [p{A} p{B} - p{C} p{D}] is equivalent to [[p{A} | p{B}] - [p{C} | p{D}] ].
[p{A} - p{B} - p{C}] is a set by removal of p{B} and p{C} from p{A}. It is equivalent to [p{A} - [p{B} p{C}]] and [p{A} - p{B} p{C}].
Negation. when ^ just after a group-opening [, i.e. when they are combined as [^, all the tokens following are negated. E.g. [^A-Z a-z] matches anything but neither [A-Z] nor [a-z]. More clearly you can say this with grouping as [^ [A-Z a-z]].
If ^ that is not next to [ is prefixed to a sequence of literal characters, character ranges, and/or metacharacters, such a ^ only negates that sequence; e.g. [A-Z ^p{Latin}] matches A-Z or a non-Latin character. But [A-Z [^p{Latin}]] (or [A-Z P{Latin}], for this is a simple case) is recommended for clarity.
If you want to remove anything other than PERL from [A-Z], use [A-Z & PERL] as well as [A-Z - [^PERL]]. Similarly, if you want to intersect [A-Z] and a thing not JUNK, use [A-Z - JUNK] as well as [A-Z & [^JUNK]].
SYNOPSIS
use Unicode::Regex::Set qw(parse);
$regex = parse([p{Latin} & p{L&} - A-Z]);
Perl 5.8.0 misses subtraction and intersection of characters, which is described in Unicode Regular Expressions (UTS #18). This module provides a mimic syntax of character classes including subtraction and intersection, taking advantage of look-ahead assertions.
The syntax provided by this module is considerably incompatible with the standard Perls regex syntax.
Any whitespace character (that matches /s/) is allowed between any tokens. Square brackets ([ and ]) are used for grouping. A literal whitespace and square brackets must be backslashed (escaped with a backslash, ). You cannot put literal ] at the start of a group.
A POSIX-style character class like [:alpha:] is allowed since its [ is not a literal.
SEPARATORS (& for intersection, | for union, and - for subtraction) should be enclosed with one or more whitespaces. E.g. [A&Z] is a list of A, &, Z. [A-Z] is a character range from A to Z. [A-Z - Z] is a set by removal of [Z] from [A-Z].
Union operator | may be omitted. E.g. [A-Z | a-z] is equivalent to [A-Z a-z], and also to [A-Za-z].
Intersection operator & has high precedence, so [p{A} p{B} & p{C} p{D}] is equivalent to [p{A} | [p{B} & p{C}] | p{D}].
Subtraction operator - has low precedence, so [p{A} p{B} - p{C} p{D}] is equivalent to [[p{A} | p{B}] - [p{C} | p{D}] ].
[p{A} - p{B} - p{C}] is a set by removal of p{B} and p{C} from p{A}. It is equivalent to [p{A} - [p{B} p{C}]] and [p{A} - p{B} p{C}].
Negation. when ^ just after a group-opening [, i.e. when they are combined as [^, all the tokens following are negated. E.g. [^A-Z a-z] matches anything but neither [A-Z] nor [a-z]. More clearly you can say this with grouping as [^ [A-Z a-z]].
If ^ that is not next to [ is prefixed to a sequence of literal characters, character ranges, and/or metacharacters, such a ^ only negates that sequence; e.g. [A-Z ^p{Latin}] matches A-Z or a non-Latin character. But [A-Z [^p{Latin}]] (or [A-Z P{Latin}], for this is a simple case) is recommended for clarity.
If you want to remove anything other than PERL from [A-Z], use [A-Z & PERL] as well as [A-Z - [^PERL]]. Similarly, if you want to intersect [A-Z] and a thing not JUNK, use [A-Z - JUNK] as well as [A-Z & [^JUNK]].
Unicode::Regex::Set 0.02 Screenshot
Unicode::Regex::Set 0.02 Keywords
Character Sets Unicode Regular Expressions
Set 0.02
Character Sets
JUNK
Expressions
PERL
regular expressions
sets in
equivalent to
A-Z
character
intersection
subtraction
b
c
Unicode::Regex::Set
Bookmark Unicode::Regex::Set 0.02
Unicode::Regex::Set 0.02 Copyright
WareSeeker periodically updates pricing and software information of Unicode::Regex::Set 0.02 full version from the publisher, so some information may be slightly out-of-date. You should confirm all information before relying on it. Software piracy is theft, Using crack, password, serial numbers, registration codes, key generators is illegal and prevent future development of Unicode::Regex::Set 0.02 Edition. Download links are directly from our publisher sites, torrent files or links from rapidshare.com, yousendit.com or megaupload.com are not allowed
Featured Software
Want to place your software product here?
Please contact us for consideration.
Contact WareSeeker.com
Related Information
intersection cameras
character counts
perl regular expressions
stephen dorff character public enemies
driving directions intersection
addition and subtraction
character reference letter
intersection design
regular expressions java
intersection grand rapids
character traits
subtraction worksheets
javascript regular expressions
intersection grand rapids michigan
character shoes
subtraction table
cartoon characters
point of intersection
Related Software
Unicode::Escape is a Perl module with escape and unescape Unicode characters other than ASCII. Free Download
Sub::Regex is a Perl module to create synonymous subroutines. Free Download
bsnmp-regex is a module for bsnmpd which allows creation of counters from log files, program output or other text data. Free Download
Unicode::Map8 is a mapping table between 8-bit chars and Unicode. Free Download
Unicode::Overload is a Perl source filter to implement Unicode operations. Free Download
Regexp::Parser is a Perl module for parsing regexes. Free Download
Geo::Coder::Google is a Perl module for Google Maps Geocoding API. Free Download
regexxer project is a nifty search/replace tool featuring Perl-style regular expressions. Free Download
Latest Software
Popular Software
Favourite Software