Main > Free Download Search >

Free unicode software for linux

unicode

Sponsored Links
Sponsored Links
Secleted [ 0 ] software to compare
Results 1 - 15 of about 286
rxvt-unicode 8.3

rxvt-unicode 8.3


rxvt-unicode is an rxvt clone supporting mixed fonts, Xft fonts, and Unicode. more>>
rxvt-unicode is a clone of the well known terminal emulator rxvt, modified to store text in Unicode (either UCS-2 or UCS-4) and to use locale-correct input and output. rxvt-unicode also supports mixing multiple fonts at the same time, including Xft fonts.
Main features:
- Stores text in Unicode (either UCS-2 or UCS-4).
- Uses locale-correct input, output and width: as long as your system supports the locale, rxvt-unicode will display correctly.
- Daemon mode: one daemon can open multiple windows on multiple displays, which improves memory usage and startup time considerably.
- Crash-free. At least I try, but rxvt-unicode certainly crashes much less often than rxvt and its many clones, and reproducible bugs get fixed immediately.
- Completely flicker-free.
- Full combining character support (unlike xterm :).
- Multiple fonts supported at the same time: No need to choose between nice japanese and ugly latin, or no japanese and nice latin characters.
- Supports Xft and core fonts in any combination.
- Can easily be embedded into other applications.
- All documentation accessible through manpages.
- Locale-independent XIM support.
- Many small improvements, such as improved and correct terminfo, improved secondary screen modes, italic and bold font support, tinting and shading.
Version restrictions:
- Complex script support, such as arabic or tibetian - more info is needed. (use mlterm)
- Left-To-Right rendering - more info is needed. (use mlterm)
- Tabs (although a supplied perl script implements a tabbed shell). (use mrxvt)
- IIIMF (Intranet/Internet Input Method Framework) support. (use scim)
Enhancements:
- This release optionally takes advantage of libafterimage for much improved image format and transparency support: transparency is now officially supported.
- Some new options are available: "skipScroll" hides fast scrolling text, "urgentOnBell" sets urgent hints to use an ASCII bell, and the "iso14755_52" resource controls the keycap insert mode.
- Portability has been enhanced, and many minor bugs have been fixed.
<<less
Download (0.86MB)
Added: 2007-08-02 License: GPL (GNU General Public License) Price:
818 downloads
Unicode.php 0.1.1

Unicode.php 0.1.1


Unicode.php provides some PHP classes for maniuplating Unicode data. more>> <<less
Download (0.083MB)
Added: 2006-04-27 License: LGPL (GNU Library General Public License, version 2.0) Price:
1282 downloads
MP3Unicode 1.1.1

MP3Unicode 1.1.1


MP3Unicode is a command line utility to convert ID3 tags in mp3 files between different encodings. more>>
MP3Unicode is a command line utility to convert ID3 tags in mp3 files between different encodings.
For example, mp3unicode --source-encoding cp1251 --id3v1-encoding none --id3v2-encoding unicode file.mp3 will read id3v2 tag (or id3v1 tag if there is no id3v2) from the file, convert the text fields in the tag from cp1251 to Unicode and will write id3v2 tag back, stripping away id3v1 tag.
< span style=color:green >< b >Requirements:< /b >< /span >< br/ >
- Qt library
- TagLib library
< b >Usage:< /b >
mp3unicode [options] [filename]
< b >Options:< /b >
-s, --source-encoding < encoding >
Read current mp3 tags assuming they are encoded with < encoding >. < encoding > is either "unicode" or any valid 8bit encoding, for example "cp1251".
-1, --id3v1-encoding < encoding >
Write id3v1 tag in < encoding >, if < encoding > is "none", then strip id3v1 tag away. < encoding > may be any valid 8bit encoding, note however that it is not possible to write id3v1 in Unicode.
-2, --id3v2-encoding < encoding >
Write id3v2 tag in < encoding >, if < encoding > is "none", then strip id3v2 tag away. < encoding > may be either "unicode" or any valid 8bit encoding.
-p, --preserve-unicode
If source encoding is specified to be some specific encoding and not Unicode, but the actual encoding seems to be Unicode, then assume it is Unicode. E.g., if you want to process a lot of files, some of which are in Unicode (or have Unicode characters somewhere), but some are in cp1251, just issue -s cp1251 -p along with other options. This should work as you would like it to work.
-v, --version
Prints version number, compilation date and time.
<<less
Download (0.011MB)
Added: 2007-04-13 License: GPL (GNU General Public License) Price:
927 downloads
Unicode::Map8 0.12

Unicode::Map8 0.12


Unicode::Map8 is a mapping table between 8-bit chars and Unicode. more>>
Unicode::Map8 is a mapping table between 8-bit chars and Unicode.

SYNOPSIS

require Unicode::Map8;
my $no_map = Unicode::Map8->new("ISO646-NO") || die;
my $l1_map = Unicode::Map8->new("latin1") || die;

my $ustr = $no_map->to16("V}re norske tegn b|r {resn");
my $lstr = $l1_map->to8($ustr);
print $lstr;

print $no_map->tou("V}re norske tegn b|r {resn")->utf8

The Unicode::Map8 class implement efficient mapping tables between 8-bit character sets and 16 bit character sets like Unicode. The tables are efficient both in terms of space allocated and translation speed. The 16-bit strings is assumed to use network byte order.

The following methods are available:

$m = Unicode::Map8->new( [$charset] )

The object constructor creates new instances of the Unicode::Map8 class. I takes an optional argument that specify then name of a 8-bit character set to initialize mappings from. The argument can also be a the name of a mapping file. If the charset/file can not be located, then the constructor returns undef.

If you omit the argument, then an empty mapping table is constructed. You must then add mapping pairs to it using the addpair() method described below.

$m->addpair( $u8, $u16 );

Adds a new mapping pair to the mapping object. It takes two arguments. The first is the code value in the 8-bit character set and the second is the corresponding code value in the 16-bit character set. The same codes can be used multiple times (but using the same pair has no effect). The first definition for a code is the one that is used.

Consider the following example:

$m->addpair(0x20, 0x0020);
$m->addpair(0x20, 0x00A0);
$m->addpair(0xA0, 0x00A0);

It means that the character 0x20 and 0xA0 in the 8-bit charset maps to themselves in the 16-bit set, but in the 16-bit character set 0x0A0 maps to 0x20.

$m->default_to8( $u8 )

Set the code of the default character to use when mapping from 16-bit to 8-bit strings. If there is no mapping pair defined for a character then this default is substituted by to8() and recode8().

$m->default_to16( $u16 )

Set the code of the default character to use when mapping from 8-bit to 16-bit strings. If there is no mapping pair defined for a character then this default is used by to16(), tou() and recode8().

$m->nostrict;

All undefined mappings are replaced with the identity mapping. Undefined character are normally just removed (or replaced with the default if defined) when converting between character sets.

$m->to8( $ustr );

Converts a 16-bit character string to the corresponding string in the 8-bit character set.

$m->to16( $str );

Converts a 8-bit character string to the corresponding string in the 16-bit character set.

$m->tou( $str );

Same an to16() but return a Unicode::String object instead of a plain UCS2 string.

$m->recode8($m2, $str);

Map the string $str from one 8-bit character set ($m) to another one ($m2). Since we assume we know the mappings towards the common 16-bit encoding we can use this to convert between any of the 8-bit character sets.

$m->to_char16( $u8 )

Maps a single 8-bit character code to an 16-bit code. If the 8-bit character is unmapped then the constant NOCHAR is returned. The default is not used and the callback method is not invoked.

$m->to_char8( $u16 )

Maps a single 16-bit character code to an 8-bit code. If the 16-bit character is unmapped then the constant NOCHAR is returned. The default is not used and the callback method is not invoked.

The following callback methods are available. You can override these methods by creating a subclass of Unicode::Map8.

$m->unmapped_to8

When mapping to 8-bit character string and there is no mapping defined (and no default either), then this method is called as the last resort. It is called with a single integer argument which is the code of the unmapped 16-bit character. It is expected to return a string that will be incorporated in the 8-bit string. The default version of this method always returns an empty string.

Example:

package MyMapper;
@ISA=qw(Unicode::Map8);

sub unmapped_to8
{
my($self, $code) = @_;
require Unicode::CharName;
"";
}

$m->unmapped_to16

Likewise when mapping to 16-bit character string and no mapping is defined then this method is called. It should return a 16-bit string with the bytes in network byte order. The default version of this method always returns an empty string.

<<less
Download (0.10MB)
Added: 2007-08-20 License: Perl Artistic License Price:
800 downloads
Unicode::Unihan 0.03

Unicode::Unihan 0.03


Unicode::Unihan is the Unihan Data Base 5.0.0. more>>
Unicode::Unihan is the Unihan Data Base 5.0.0.

SYNOPSIS

use Unicode::Unihan;
my $db = new Unicode::Unihan;
print join("," => $db->Mandarin("x{5c0f}x{98fc}x{5f3e}"), "n";

ABSTRACT

This module provides a user-friendly interface to the Unicode Unihan Database 3.2. With this module, the Unihan database is as easy as shown in the SYNOPSIS above.

The first thing you do is make the database available. Just say

use Unicode::Unihan;
my $db = new Unicode::Unihan;

Thats all you have to say. After that, you can access the database via $db->tag($string) where tag is the tag in the Unihan Database, without k prefix.
$data = $db->tag($string) =item @data = $db->tag($string)

The first form (scalar context) returns the Unihan Database entry of the first character in $string. The second form (array context) checks the entry for each character in $string.

@data = $db->Mandarin("x{5c0f}x{98fc}x{5f3e}");
# @data is now (SHAO4 XIAO3,SI4,DAN4)

@data = $db->JapaneseKun("x{5c0f}x{98fc}x{5f3e}");
# @data is now (CHIISAI KO O,KAU YASHINAU,TAMA HAZUMU HIKU)

<<less
Download (4.9MB)
Added: 2007-07-17 License: Perl Artistic License Price:
831 downloads
Bundle::Unicode 0.01

Bundle::Unicode 0.01


Bundle::Unicode is a Perl bundle to install Unicode modules and their dependencies. more>>
Bundle::Unicode is a Perl bundle to install Unicode modules and their dependencies.

SYNOPSIS

perl -MCPAN -e install Bundle::Unicode

CONTENTS

Unicode::Lite
Unicode::String
Unicode::Map
enum
Unicode::EastAsianWidth
ExtUtils::MakeMaker
Module::Build
Unicode::Collate
Unicode::Collate::Standard
File::Spec
Cwd
Exporter
Test
Test::More
Inline
XSLoader
Unicode::CheckUTF8
Unicode::Char
Unicode::Unihan
Fcntl
File::Path
Lingua::Han::Utils
Unicode::IMAPUtf7
Unicode::Map8
Unicode::Map
Unicode::MapUTF8
App::Info::Lib::Iconv
Encode
Module::Install
FindBin
Jcode
MIME::Base64
Unicode::UTF8simple
Unicode::Japanese
ExtUtils::Manifest
ExtUtils::Embed
Unicode::Escape
Filter::Simple
Unicode::Transliterate
AutoLoader
Clone
Unicode::Wrap
File::Copy
Unicode::Normalize
Unicode::Regex::Set
Unicode::Transform
Unicode::Decompose
Unicode::RecursiveDowngrade
String::Multibyte
String::Multibyte::Unicode
String::Multibyte::Grapheme
Acme::MetaSyntactic
Scalar::List::Util
IO::Compress::Base
Text::Unidecode
ShiftJIS::X0213::MapUTF
ShiftJIS::CP932::MapUTF
ShiftJIS::CP932::Correct
Convert::Base32
Convert::RACE
HTML::Fraction
String::Fraction
XML::Simple
YAML
Convert::CharMap
Config
File::Spec::Functions
File::Basename
Scalar::Util
AppConfig
Template
Template::Config
version
Template::Provider::Unicode::Japanese
Bundle::Encode
ShiftJIS::Collate
ShiftJIS::Regexp
encoding::warnings
Locale::Recode
TeX::Encode
Pod::LaTeX
Pod::Find
Pod::ParseUtils
Pod::Select
Locale::Maketext
Locale::Maketext::Lexicon
Locale::Maketext::Simple
i18n
Convert::ASCIInames
Apache::GuessCharset
HTML::Entities
HTML::Parser

<<less
Download (0.002MB)
Added: 2007-05-26 License: Perl Artistic License Price:
882 downloads
Unicode::MapUTF8 1.11

Unicode::MapUTF8 1.11


Unicode::MapUTF8 is a Perl module with conversions to and from arbitrary character sets and UTF8. more>>
Unicode::MapUTF8 is a Perl module with conversions to and from arbitrary character sets and UTF8.

SYNOPSIS

use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);

# Convert a string in ISO-8859-1 to UTF8
my $output = to_utf8({ -string => An example, -charset => ISO-8859-1 });

# Convert a string in UTF8 encoding to encoding ISO-8859-1
my $other = from_utf8({ -string => Other text, -charset => ISO-8859-1 });

# List available character set encodings
my @character_sets = utf8_supported_charset;

# Add a character set alias
utf8_charset_alias({ ms-japanese => sjis });

# Convert between two arbitrary (but largely compatible) charset encodings
# (SJIS to EUC-JP)
my $utf8_string = to_utf8({ -string =>$sjis_string, -charset => sjis});
my $euc_jp_string = from_utf8({ -string => $utf8_string, -charset => euc-jp })

# Verify that a specific character set is supported
if (utf8_supported_charset(ISO-8859-1) {
# Yes
}

Provides an adapter layer between core routines for converting to and from UTF8 and other encodings. In essence, a way to give multiple existing Unicode modules a single common interface so you dont have to know the underlaying implementations to do simple UTF8 to-from other character set encoding conversions. As such, it wraps the Unicode::String, Unicode::Map8, Unicode::Map and Jcode modules in a standardized and simple API.

This also provides general character set conversion operation based on UTF8 - it is possible to convert between any two compatible and supported character sets via a simple two step chaining of conversions.

As with most things Perlish - if you give it a few big chunks of text to chew on instead of lots of small ones it will handle many more characters per second.

By design, it can be easily extended to encompass any new charset encoding conversion modules that arrive on the scene.
This module is intended to provide good Unicode support to versions of Perl prior to 5.8. If you are using Perl 5.8.0 or later, you probably want to be using the Encode module instead. This module does work with Perl 5.8, but Encode is the preferred method in that environment.

<<less
Download (0.016MB)
Added: 2007-02-28 License: Perl Artistic License Price:
585 downloads
Unicode::Collate 0.52

Unicode::Collate 0.52


Unicode::Collate is a Unicode Collation Algorithm. more>>
Unicode::Collate is a Unicode Collation Algorithm.

SYNOPSIS

use Unicode::Collate;

#construct
$Collator = Unicode::Collate->new(%tailoring);

#sort
@sorted = $Collator->sort(@not_sorted);

#compare
$result = $Collator->cmp($a, $b); # returns 1, 0, or -1.

# If %tailoring is false (i.e. empty),
# $Collator should do the default collation.

This module is an implementation of Unicode Technical Standard #10 (a.k.a. UTS #10) - Unicode Collation Algorithm (a.k.a. UCA).

Constructor and Tailoring

The new method returns a collator object.

$Collator = Unicode::Collate->new(
UCA_Version => $UCA_Version,
alternate => $alternate, # deprecated: use of variable is recommended.
backwards => $levelNumber, # or @levelNumbers
entry => $element,
hangul_terminator => $term_primary_weight,
ignoreName => qr/$ignoreName/,
ignoreChar => qr/$ignoreChar/,
katakana_before_hiragana => $bool,
level => $collationLevel,
normalization => $normalization_form,
overrideCJK => &overrideCJK,
overrideHangul => &overrideHangul,
preprocess => &preprocess,
rearrange => @charList,
table => $filename,
undefName => qr/$undefName/,
undefChar => qr/$undefChar/,
upper_before_lower => $bool,
variable => $variable,
);

<<less
Download (0.27MB)
Added: 2007-06-29 License: Perl Artistic License Price:
847 downloads
Unicode::Escape 0.0.2

Unicode::Escape 0.0.2


Unicode::Escape is a Perl module with escape and unescape Unicode characters other than ASCII. more>>
Unicode::Escape is a Perl module with escape and unescape Unicode characters other than ASCII.

SYNOPSIS

# Escape Unicode charactors like u3042u3043u3044.
# JSON thinks No more Garble!!

# case 1
use Unicode::Escape;
my $escaped1 = Unicode::Escape::escape($str1, euc-jp); # $str1 contains charactor that is not ASCII. $str1 is encoded by euc-jp.
my $escaped2 = Unicode::Escape::escape($str2); # default is utf8 # $str2 contains charactor that is not ASCII.
my $unescaped1 = Unicode::Escape::unescape($str3, shiftjis); # $str3 contains escaped Unicode character. return value is encoded by shiftjis.
my $unescaped2 = Unicode::Escape::unescape($str4); # default is utf8 # $str4 contains escaped Unicode character.

# case 2
use Unicode::Escape qw(escape unescape);
my $escaped1 = escape($str1, euc-jp); # $str1 contains charactor that is not ASCII. $str1 is encoded by euc-jp.
my $escaped2 = escape($str2); # default is utf8 # $str2 contains charactor that is not ASCII.
my $unescaped1 = unescape($str3, shiftjis); # $str3 contains escaped Unicode character. return value is encoded by shiftjis.
my $unescaped2 = unescape($str4); # default is utf8 # $str4 contains escaped Unicode character.

# case 3
use Unicode::Escape;
my $escaper = Unicode::Escape->new($str, shiftjis); # $str contains charactor that is not ASCII. $str is encoded by shiftjis.(default is utf8)
my $escaped = $escaper->escape;

# case 4
use Unicode::Escape;
my $escaper = Unicode::Escape->new($str); # $str contains escaped Unicode character.
my $unescaped1 = $escaper->unescape(shiftjis);
my $unescaped2 = $escaper->unescape; # default is utf8.

Escape and unescape Unicode characters other than ASCII. When the server response is javascript code, it is convenient.

<<less
Download (0.005MB)
Added: 2007-01-17 License: Perl Artistic License Price:
1016 downloads
Unicode Utilities 2.25

Unicode Utilities 2.25


Unicode Utilities project are a set of programs for manipulating and analyzing Unicode text. more>>
Unicode Utilities project are a set of programs for manipulating and analyzing Unicode text. uniname defaults to printing the character offset of each character, its byte offset, its hex code value, its encoding, the glyph itself, and its name. Command line options allow undesired information to be suppressed and the Unicode range to be added.
unidesc reports the character ranges to which different portions of the text belong. unihist generates a histogram of the characters in its input. ExplicateUTF8 is intended for debugging or for learning about Unicode. It determines and explains the validity of a sequence of bytes as a UTF-8 encoding. unirev reverses UTF-8 strings.
Enhancements:
- Adds to unidesc the option -r which causes it to list the ranges detected after reading all input rather than listing them as they are encountered, and adds to uniname the option -B which causes it to ignore characters within the Basic Multilingual Plane.
<<less
Download (0.25MB)
Added: 2007-07-04 License: GPL (GNU General Public License) Price:
849 downloads
Unicode::Overload 0.01

Unicode::Overload 0.01


Unicode::Overload is a Perl source filter to implement Unicode operations. more>>
Unicode::Overload is a Perl source filter to implement Unicode operations.

SYNOPSIS

use charnames :full;
use Unicode::Overload (
"N{UNION}" => infix =>
sub { my %a = map{$_=>1}@{$_[0]};
my %b = map{$_=>1}@{$_[1]};
return keys(%a,$b); },
"N{SUPERSCRIPT TWO}" => postfix => sub { $_[0] ** 2 },
"N{NOT SIGN}" => prefix => sub { !$_[0] },
[ "N{LEFT FLOOR}", "N{RIGHT FLOOR}" ] => outfix =>
sub { POSIX::floor($_[0]) },
);

@union = (@a N{UNION @b); # Parentheses REQUIRED
die "Pythagoras was WRONG!" # Same here
unless sqrt((3)N{SUPERSCRIPT TWO} + (4)N{SUPERSCRIPT TWO}) == 5;
$b = N{NOT SIGN}($b); # Required here too
die "Fell through floor" # Balanced characters form their own parentheses
unless N{LEFT FLOOR}-3.2N{RIGHT FLOOR} == 4;

Allows you to declare your own Unicode operators and have them behave as prefix (like sigma or integral), postfix (like superscripted 2), infix (like union), or outfix (like the floor operator, with the L-like and J-like brackets).

To keep this document friendly to people without UTF-8 terminals, the N{} syntax for Unicode characters will be used throughout, but please note that the N{} characters can be replaced with the actual UTF-8 characters anywhere.
Also, please note that since Perl 5 doesnt support the notion of arbitrary operators, this module cheats and uses source filters to do its job. As such, all "operators" must have their arguments enclosed in parentheses. This limitation will be lifted when a better way to do this is found.

Also, note that since these arent "real" operators there is no way (at the moment) to specify precedence. All Unicode "operators" have the precedence (such as it is) of function calls, as they all get transformed into function calls inline before interpreting.

In addition, due to a weird unicode-related bug, only one character per operator is currently permitted. Despite behaving correctly elsewhere, substr() thinks that one character equals one byte inside Unicode::Overload .

Anyway, this module defines four basic types of operators. Prefix and infix should be familiar to most users of perl, as prefix operators are basically function calls without the parens. Infix operators are of course the familiar + etcetera.

The best analogy for postfix operators is probably the algebraic notation for squares. $a**2 is perls notation, ($a)N{SUPERSCRIPT TWO} is the Unicode::Overload equivalent, looking much closer to a mathematical expression, with the 2 in its proper position.

Outfix is the last operator, and a little odd. Outfix can best be thought of as user-definable brackets. One of the more common uses for this notation again comes from mathematics in the guise of the floor operator. Looking like brackets with the top bar missing, they return effectively POSIX::floor() of their contents.

Since outfix operators define their own brackets, extra parentheses are not needed on this type of operator.

A quick summary follows:

prefix

Operator goes directly before the parentheses containing its operands. Whitespace is allowed between the operator and opening parenthesis. This acts like a function call.

Sample: N{NOT SIGN}($b)

postfix

Operator goes directly after the parentheses containing its operands. Whitespace is allowed between the closing parenthesis and operator. This doesnt have a good Perl equivalent, but there are many equivalents in algebra, probably the most common being:

Sample: ($a+$b)N{SUPERSCRIPT TWO}

infix

Operator goes somewhere inside the parentheses. Whitespace is allowed between either parenthesis and the operator.

Sample: ($a N{ELEMENT OF} @list)

outfix

Operators surround their arguments and are translated into parentheses. As such, whitespace is allowed anywhere inside the operator pairs. There is no requirement that the operators be visually symmetrical, although it helps.

Sampe: $c=N{LEFT FLOOR}$a_+$bN{RIGHT FLOOR}

The requirements for parentheses will be removed as soon as I can figure out how to make these operators behave closer to perl builtins. Nesting is perfectly legal, but multiple infix operators cant coexists within one set of parentheses.

<<less
Download (0.005MB)
Added: 2007-07-12 License: Perl Artistic License Price:
834 downloads
Unicode::Normalize 1.02

Unicode::Normalize 1.02


Unicode::Normalize Perl module contains Unicode Normalization Forms. more>>
Unicode::Normalize Perl module contains Unicode Normalization Forms.

SYNOPSIS

(1) using function names exported by default:
use Unicode::Normalize;

$NFD_string = NFD($string); # Normalization Form D
$NFC_string = NFC($string); # Normalization Form C
$NFKD_string = NFKD($string); # Normalization Form KD
$NFKC_string = NFKC($string); # Normalization Form KC

(2) using function names exported on request:

use Unicode::Normalize normalize;

$NFD_string = normalize(D, $string); # Normalization Form D
$NFC_string = normalize(C, $string); # Normalization Form C
$NFKD_string = normalize(KD, $string); # Normalization Form KD
$NFKC_string = normalize(KC, $string); # Normalization Form KC

Parameters:

$string is used as a string under character semantics (see perlunicode).

$code_point should be an unsigned integer representing a Unicode code point.

Note: Between XSUB and pure Perl, there is an incompatibility about the interpretation of $code_point as a decimal number. XSUB converts $code_point to an unsigned integer, but pure Perl does not. Do not use a floating point nor a negative sign in $code_point.

Normalization Forms

$NFD_string = NFD($string)

It returns the Normalization Form D (formed by canonical decomposition).

$NFC_string = NFC($string)

It returns the Normalization Form C (formed by canonical decomposition followed by canonical composition).

$NFKD_string = NFKD($string)

It returns the Normalization Form KD (formed by compatibility decomposition).

$NFKC_string = NFKC($string)

It returns the Normalization Form KC (formed by compatibility decomposition followed by canonical composition).

$FCD_string = FCD($string)

If the given string is in FCD ("Fast C or D" form; cf. UTN #5), it returns the string without modification; otherwise it returns an FCD string.

Note: FCD is not always unique, then plural forms may be equivalent each other. FCD() will return one of these equivalent forms.

$FCC_string = FCC($string)

It returns the FCC form ("Fast C Contiguous"; cf. UTN #5).
Note: FCC is unique, as well as four normalization forms (NF*).

$normalized_string = normalize($form_name, $string)

It returns the normalization form of $form_name.

As $form_name, one of the following names must be given.

C or NFC for Normalization Form C (UAX #15)
D or NFD for Normalization Form D (UAX #15)
KC or NFKC for Normalization Form KC (UAX #15)
KD or NFKD for Normalization Form KD (UAX #15)

FCD for "Fast C or D" Form (UTN #5)
FCC for "Fast C Contiguous" (UTN #5)

<<less
Download (0.024MB)
Added: 2007-08-20 License: Perl Artistic License Price:
796 downloads
Unicode Data Browser 1.5

Unicode Data Browser 1.5


UnicodeDataBrowser is a very useful browser designed for the UnicodeData.txt file which consists of much useful information but is not easily read by humans. more>>

UnicodeDataBrowser 1.5 is a very useful browser designed for the UnicodeData.txt file which consists of much useful information but is not easily read by humans. The browser creates a scrollable table in which columns represent properties.

The table may be sorted on any column. Abbreviations are expanded and characters cross-referenced in decomposition and casing fields are named. Regular expression search restricted to a selected column is available. The set of characters for which information is displayed may be restricted to those characters matching a regular expression on a specified property.

Each such filtering operation applies to the output of the previous filtering operation unless the table is reset to the original full set of characters, so filtering on multiple properties is possible.

Enhancements: Adds canonical decomposition info for Hangul syllables.

<<less
Added: 2009-07-25 License: GPL v3 Price: FREE
1 downloads
Unicode::Regex::Set 0.02

Unicode::Regex::Set 0.02


Unicode::Regex::Set is a subtraction and intersection of Character Sets in Unicode Regular Expressions. more>>
Unicode::Regex::Set is a subtraction and intersection of Character Sets in Unicode Regular Expressions.

SYNOPSIS

use Unicode::Regex::Set qw(parse);

$regex = parse([p{Latin} & p{L&} - A-Z]);

Perl 5.8.0 misses subtraction and intersection of characters, which is described in Unicode Regular Expressions (UTS #18). This module provides a mimic syntax of character classes including subtraction and intersection, taking advantage of look-ahead assertions.

The syntax provided by this module is considerably incompatible with the standard Perls regex syntax.

Any whitespace character (that matches /s/) is allowed between any tokens. Square brackets ([ and ]) are used for grouping. A literal whitespace and square brackets must be backslashed (escaped with a backslash, ). You cannot put literal ] at the start of a group.

A POSIX-style character class like [:alpha:] is allowed since its [ is not a literal.
SEPARATORS (& for intersection, | for union, and - for subtraction) should be enclosed with one or more whitespaces. E.g. [A&Z] is a list of A, &, Z. [A-Z] is a character range from A to Z. [A-Z - Z] is a set by removal of [Z] from [A-Z].
Union operator | may be omitted. E.g. [A-Z | a-z] is equivalent to [A-Z a-z], and also to [A-Za-z].

Intersection operator & has high precedence, so [p{A} p{B} & p{C} p{D}] is equivalent to [p{A} | [p{B} & p{C}] | p{D}].

Subtraction operator - has low precedence, so [p{A} p{B} - p{C} p{D}] is equivalent to [[p{A} | p{B}] - [p{C} | p{D}] ].

[p{A} - p{B} - p{C}] is a set by removal of p{B} and p{C} from p{A}. It is equivalent to [p{A} - [p{B} p{C}]] and [p{A} - p{B} p{C}].

Negation. when ^ just after a group-opening [, i.e. when they are combined as [^, all the tokens following are negated. E.g. [^A-Z a-z] matches anything but neither [A-Z] nor [a-z]. More clearly you can say this with grouping as [^ [A-Z a-z]].

If ^ that is not next to [ is prefixed to a sequence of literal characters, character ranges, and/or metacharacters, such a ^ only negates that sequence; e.g. [A-Z ^p{Latin}] matches A-Z or a non-Latin character. But [A-Z [^p{Latin}]] (or [A-Z P{Latin}], for this is a simple case) is recommended for clarity.

If you want to remove anything other than PERL from [A-Z], use [A-Z & PERL] as well as [A-Z - [^PERL]]. Similarly, if you want to intersect [A-Z] and a thing not JUNK, use [A-Z - JUNK] as well as [A-Z & [^JUNK]].

<<less
Download (0.005MB)
Added: 2007-07-11 License: Perl Artistic License Price:
835 downloads
Unicode Error Detector 1.0

Unicode Error Detector 1.0


Unicode Error Detector is a product for Plone used to pinpoint errors in your application leading to UnicodeDecodeErrors. more>>
Unicode Error Detector is a product for Plone used to pinpoint errors in your application leading to UnicodeDecodeErrors.

Do not use this product unless you are actively debugging a Unicode Error. Never use this product in production sites.

UnicodeDecodeErrors typically occur when you try to add a Unicode string to a non-ascii string. This product patches StringIO used by page templates to check if the appended string is a Unicode string, and if it is, it replaces the string with an error marker.

As there is some overhead associated with inspecting the strings instead of just appending to the output, this product is meant for debugging purposes only.

Usage

Put the product in your Products directory and restart Zope. Load the template causing the UnicodeDecodeError, and this tool will indicate the location by printing THIS IS WHERE THE ERROR IS in the rendered template.

You can then inspect the template and/or code more closely to figure out where the decode error happens.

<<less
Download (0.001MB)
Added: 2007-03-28 License: GPL (GNU General Public License) Price:
942 downloads
Secleted [ 0 ] software to compare
  • Page: 1 of 5
  • 1
  • 2
  • 3
  • 4
  • 5