unicode character

Unicode Data Browser 1.5
UnicodeDataBrowser is a very useful browser designed for the UnicodeData.txt file which consists of much useful information but is not easily read by humans. more>>
UnicodeDataBrowser 1.5 is a very useful browser designed for the UnicodeData.txt file which consists of much useful information but is not easily read by humans. The browser creates a scrollable table in which columns represent properties.
The table may be sorted on any column. Abbreviations are expanded and characters cross-referenced in decomposition and casing fields are named. Regular expression search restricted to a selected column is available. The set of characters for which information is displayed may be restricted to those characters matching a regular expression on a specified property.
Each such filtering operation applies to the output of the previous filtering operation unless the table is reset to the original full set of characters, so filtering on multiple properties is possible.
Enhancements: Adds canonical decomposition info for Hangul syllables.
<<lessUnicode::Collate 0.52
Unicode::Collate is a Unicode Collation Algorithm. more>>
SYNOPSIS
use Unicode::Collate;
#construct
$Collator = Unicode::Collate->new(%tailoring);
#sort
@sorted = $Collator->sort(@not_sorted);
#compare
$result = $Collator->cmp($a, $b); # returns 1, 0, or -1.
# If %tailoring is false (i.e. empty),
# $Collator should do the default collation.
This module is an implementation of Unicode Technical Standard #10 (a.k.a. UTS #10) - Unicode Collation Algorithm (a.k.a. UCA).
Constructor and Tailoring
The new method returns a collator object.
$Collator = Unicode::Collate->new(
UCA_Version => $UCA_Version,
alternate => $alternate, # deprecated: use of variable is recommended.
backwards => $levelNumber, # or @levelNumbers
entry => $element,
hangul_terminator => $term_primary_weight,
ignoreName => qr/$ignoreName/,
ignoreChar => qr/$ignoreChar/,
katakana_before_hiragana => $bool,
level => $collationLevel,
normalization => $normalization_form,
overrideCJK => &overrideCJK,
overrideHangul => &overrideHangul,
preprocess => &preprocess,
rearrange => @charList,
table => $filename,
undefName => qr/$undefName/,
undefChar => qr/$undefChar/,
upper_before_lower => $bool,
variable => $variable,
);
Unicode Utilities 2.25
Unicode Utilities project are a set of programs for manipulating and analyzing Unicode text. more>>
unidesc reports the character ranges to which different portions of the text belong. unihist generates a histogram of the characters in its input. ExplicateUTF8 is intended for debugging or for learning about Unicode. It determines and explains the validity of a sequence of bytes as a UTF-8 encoding. unirev reverses UTF-8 strings.
Enhancements:
- Adds to unidesc the option -r which causes it to list the ranges detected after reading all input rather than listing them as they are encountered, and adds to uniname the option -B which causes it to ignore characters within the Basic Multilingual Plane.
libunicode 0.7
libunicode is a library of unicode string functions and charset converters. more>>
which can be divided into three categories:
- Character handling
- String handling
- Charsets handling
Libunicode uses ISO/IEC 10646-defined UTF-16 encoding for storing and minipulating all character entities. It will supports other encoding standards (e.g., UTF-8, ISO 8859-x, etc.) for input and output only.
Libunicode bases, where applicable, on "Single Unix Specification, Version 2(R)" (susv2) as API and semantics reference. susv2 is the unification and superset of de jure POSIX and ANSI C (run-time library part) and de facto BSD standards. This means that, if you know standard character and string handling functions, you can readily use libunicode; and, if you have apllication using standard character/string processing facilities, you may with minimal troubles make it Unicode-aware.
Also, dont let word "Unix" in standard name confuse you. Susv2, as same as POSIX, is standard for *Open* operating systems, where MS Windows, MacOs, etc. fit. Such name was choosen by OpenGroup, maintainer of susv2, to unite and defend market sectors actively attacked by Microsoft with its "decommodizing" tactics. Libunicode is bright example of opposite approach, offering crossplatform portability and comptability for Unix and Win32 systems. (*)
(*) Opinions presented in the paragraph above are solely opinion of documentation author and should not be considered as reflecting real state of the things.
Libunicode defines new type, Uchar, which can handle any non-surrogate UTF-16 character without space overhead.
Library offer two APIs, one being precise remapping of susv2 functions, and one offering slightly higher-level API, with automatic memory management fully controlled by user.
Functions of 1st API (fully standard-compliant, the one you probably will use) uses u_ prefix, e.g. standard
char *strchr(const char *s, char c);
becomes
Uchar *u_strchr(const Uchar *s, Uchar c);
Functions of 2nd API use uni_ prefix. They are conceived to be used in special environments, for example, in Apache webserver modules. Most functions has completely identical u_ and uni_ implementation, but following have differring from standard argument structure and semantics:
uni_strcat
uni_strncat
uni_strdup
uni_strndup
uni_strcpy
uni_strncpy
You should consult library reference for their full description.
CoC Character Generator Alpha 7
CoC is a character generator for Call of Cthulhu. more>>
A main goal for the comming 2.0 release will be an implementation of the Byakhee save file format, to let users of CoC CharGen and Byakhee to exchange character files.
Unicode::Normalize 1.02
Unicode::Normalize Perl module contains Unicode Normalization Forms. more>>
SYNOPSIS
(1) using function names exported by default:
use Unicode::Normalize;
$NFD_string = NFD($string); # Normalization Form D
$NFC_string = NFC($string); # Normalization Form C
$NFKD_string = NFKD($string); # Normalization Form KD
$NFKC_string = NFKC($string); # Normalization Form KC
(2) using function names exported on request:
use Unicode::Normalize normalize;
$NFD_string = normalize(D, $string); # Normalization Form D
$NFC_string = normalize(C, $string); # Normalization Form C
$NFKD_string = normalize(KD, $string); # Normalization Form KD
$NFKC_string = normalize(KC, $string); # Normalization Form KC
Parameters:
$string is used as a string under character semantics (see perlunicode).
$code_point should be an unsigned integer representing a Unicode code point.
Note: Between XSUB and pure Perl, there is an incompatibility about the interpretation of $code_point as a decimal number. XSUB converts $code_point to an unsigned integer, but pure Perl does not. Do not use a floating point nor a negative sign in $code_point.
Normalization Forms
$NFD_string = NFD($string)
It returns the Normalization Form D (formed by canonical decomposition).
$NFC_string = NFC($string)
It returns the Normalization Form C (formed by canonical decomposition followed by canonical composition).
$NFKD_string = NFKD($string)
It returns the Normalization Form KD (formed by compatibility decomposition).
$NFKC_string = NFKC($string)
It returns the Normalization Form KC (formed by compatibility decomposition followed by canonical composition).
$FCD_string = FCD($string)
If the given string is in FCD ("Fast C or D" form; cf. UTN #5), it returns the string without modification; otherwise it returns an FCD string.
Note: FCD is not always unique, then plural forms may be equivalent each other. FCD() will return one of these equivalent forms.
$FCC_string = FCC($string)
It returns the FCC form ("Fast C Contiguous"; cf. UTN #5).
Note: FCC is unique, as well as four normalization forms (NF*).
$normalized_string = normalize($form_name, $string)
It returns the normalization form of $form_name.
As $form_name, one of the following names must be given.
C or NFC for Normalization Form C (UAX #15)
D or NFD for Normalization Form D (UAX #15)
KC or NFKC for Normalization Form KC (UAX #15)
KD or NFKD for Normalization Form KD (UAX #15)
FCD for "Fast C or D" Form (UTN #5)
FCC for "Fast C Contiguous" (UTN #5)
SWF::Builder::Character::Sound 0.15
SWF::Builder::Character::Sound is a SWF Sound character. more>>
SYNOPSIS
my $sound = $mc->new_sound( ring.mp3 );
$sound->play;
This module creates SWF sound characters from MP3 or raw Microsoft WAV files.
$sound = $mc->new_sound( $filename )
loads a sound file and returns a new sound character. It supports only MP3 now.
$sound->play( [ %options ] )
plays the sound.
Options:
MovieClip => $mc, Frame => $frame
MovieClip(MC) is a parent movie clip on which the sound is played. If MC is not set, the sound is played on the movie clip in which it is defined. Frame is the frame number on which the sound is played.
Multiple => 0/1
avoids/allows multiple playing. If 0, dont start the sound if already playing.
Loop => $count
sets the loop count.
In => $in_msec, Out => $out_msec
In sets the beginning point of the sound and Out sets the last in milliseconds.
Envelope => [ $msec1, $volumelevel1, $msec2, $volumelevel2, ... ]
sets the sound envelope. Volume level is set to $volumelevel1 at $msec1, and $volumelevel2 at $msec2, ... Volume level can take a number from 0 to 32768, or a reference to the array of volume levels of left and right channels.
$sound->stop( [ MovieClip => $mc, Frame => $frame ] )
stops playing the sound. It can take MovieClip and Frame options as same as the play method.
$sound->start_streaming( [ MovieClip => $mc, Frame => $frame ] )
starts the streaming sound, which synchronizes with the movie timeline. It can take MovieClip and Frame options as same as the play method.
$sound->Latency( $msec )
sets the sound latency in milliseconds.
Unicode::Unihan 0.03
Unicode::Unihan is the Unihan Data Base 5.0.0. more>>
SYNOPSIS
use Unicode::Unihan;
my $db = new Unicode::Unihan;
print join("," => $db->Mandarin("x{5c0f}x{98fc}x{5f3e}"), "n";
ABSTRACT
This module provides a user-friendly interface to the Unicode Unihan Database 3.2. With this module, the Unihan database is as easy as shown in the SYNOPSIS above.
The first thing you do is make the database available. Just say
use Unicode::Unihan;
my $db = new Unicode::Unihan;
Thats all you have to say. After that, you can access the database via $db->tag($string) where tag is the tag in the Unihan Database, without k prefix.
$data = $db->tag($string) =item @data = $db->tag($string)
The first form (scalar context) returns the Unihan Database entry of the first character in $string. The second form (array context) checks the entry for each character in $string.
@data = $db->Mandarin("x{5c0f}x{98fc}x{5f3e}");
# @data is now (SHAO4 XIAO3,SI4,DAN4)
@data = $db->JapaneseKun("x{5c0f}x{98fc}x{5f3e}");
# @data is now (CHIISAI KO O,KAU YASHINAU,TAMA HAZUMU HIKU)
Unidecode 0.04.1
US-ASCII transliterations of Unicode text more>>
Unidecode 0.04.1 offers you a powerful Python module that offers ASCII transliterations of Unicode text. It often happens that you have non-Roman text data in Unicode, but you cant display it -- usually because you're trying to show it to a user via an application that doesn't support Unicode, or because the fonts you need aren't accessible.
You could represent the Unicode characters as "???????" or " BA A0q0...", but that's nearly useless to the user who actually wants to read what the text says.
Major Features:
- Provides a function, unidecode(...) that takes Unicode data and tries to represent it in ASCII characters (i.e., the universally displayable characters between 0x00 and 0x7F).
- The representation is almost always an attempt at transliteration -- i.e., conveying, in Roman letters, the pronunciation expressed by the text in some other writing system.
Requirements: Python
rxvt-unicode 8.3
rxvt-unicode is an rxvt clone supporting mixed fonts, Xft fonts, and Unicode. more>>
Main features:
- Stores text in Unicode (either UCS-2 or UCS-4).
- Uses locale-correct input, output and width: as long as your system supports the locale, rxvt-unicode will display correctly.
- Daemon mode: one daemon can open multiple windows on multiple displays, which improves memory usage and startup time considerably.
- Crash-free. At least I try, but rxvt-unicode certainly crashes much less often than rxvt and its many clones, and reproducible bugs get fixed immediately.
- Completely flicker-free.
- Full combining character support (unlike xterm :).
- Multiple fonts supported at the same time: No need to choose between nice japanese and ugly latin, or no japanese and nice latin characters.
- Supports Xft and core fonts in any combination.
- Can easily be embedded into other applications.
- All documentation accessible through manpages.
- Locale-independent XIM support.
- Many small improvements, such as improved and correct terminfo, improved secondary screen modes, italic and bold font support, tinting and shading.
Version restrictions:
- Complex script support, such as arabic or tibetian - more info is needed. (use mlterm)
- Left-To-Right rendering - more info is needed. (use mlterm)
- Tabs (although a supplied perl script implements a tabbed shell). (use mrxvt)
- IIIMF (Intranet/Internet Input Method Framework) support. (use scim)
Enhancements:
- This release optionally takes advantage of libafterimage for much improved image format and transparency support: transparency is now officially supported.
- Some new options are available: "skipScroll" hides fast scrolling text, "urgentOnBell" sets urgent hints to use an ASCII bell, and the "iso14755_52" resource controls the keycap insert mode.
- Portability has been enhanced, and many minor bugs have been fixed.
MP3Unicode 1.1.1
MP3Unicode is a command line utility to convert ID3 tags in mp3 files between different encodings. more>>
For example, mp3unicode --source-encoding cp1251 --id3v1-encoding none --id3v2-encoding unicode file.mp3 will read id3v2 tag (or id3v1 tag if there is no id3v2) from the file, convert the text fields in the tag from cp1251 to Unicode and will write id3v2 tag back, stripping away id3v1 tag.
< span style=color:green >< b >Requirements:< /b >< /span >< br/ >
- Qt library
- TagLib library
< b >Usage:< /b >
mp3unicode [options] [filename]
< b >Options:< /b >
-s, --source-encoding < encoding >
Read current mp3 tags assuming they are encoded with < encoding >. < encoding > is either "unicode" or any valid 8bit encoding, for example "cp1251".
-1, --id3v1-encoding < encoding >
Write id3v1 tag in < encoding >, if < encoding > is "none", then strip id3v1 tag away. < encoding > may be any valid 8bit encoding, note however that it is not possible to write id3v1 in Unicode.
-2, --id3v2-encoding < encoding >
Write id3v2 tag in < encoding >, if < encoding > is "none", then strip id3v2 tag away. < encoding > may be either "unicode" or any valid 8bit encoding.
-p, --preserve-unicode
If source encoding is specified to be some specific encoding and not Unicode, but the actual encoding seems to be Unicode, then assume it is Unicode. E.g., if you want to process a lot of files, some of which are in Unicode (or have Unicode characters somewhere), but some are in cp1251, just issue -s cp1251 -p along with other options. This should work as you would like it to work.
-v, --version
Prints version number, compilation date and time.
Unicode::MapUTF8 1.11
Unicode::MapUTF8 is a Perl module with conversions to and from arbitrary character sets and UTF8. more>>
SYNOPSIS
use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);
# Convert a string in ISO-8859-1 to UTF8
my $output = to_utf8({ -string => An example, -charset => ISO-8859-1 });
# Convert a string in UTF8 encoding to encoding ISO-8859-1
my $other = from_utf8({ -string => Other text, -charset => ISO-8859-1 });
# List available character set encodings
my @character_sets = utf8_supported_charset;
# Add a character set alias
utf8_charset_alias({ ms-japanese => sjis });
# Convert between two arbitrary (but largely compatible) charset encodings
# (SJIS to EUC-JP)
my $utf8_string = to_utf8({ -string =>$sjis_string, -charset => sjis});
my $euc_jp_string = from_utf8({ -string => $utf8_string, -charset => euc-jp })
# Verify that a specific character set is supported
if (utf8_supported_charset(ISO-8859-1) {
# Yes
}
Provides an adapter layer between core routines for converting to and from UTF8 and other encodings. In essence, a way to give multiple existing Unicode modules a single common interface so you dont have to know the underlaying implementations to do simple UTF8 to-from other character set encoding conversions. As such, it wraps the Unicode::String, Unicode::Map8, Unicode::Map and Jcode modules in a standardized and simple API.
This also provides general character set conversion operation based on UTF8 - it is possible to convert between any two compatible and supported character sets via a simple two step chaining of conversions.
As with most things Perlish - if you give it a few big chunks of text to chew on instead of lots of small ones it will handle many more characters per second.
By design, it can be easily extended to encompass any new charset encoding conversion modules that arrive on the scene.
This module is intended to provide good Unicode support to versions of Perl prior to 5.8. If you are using Perl 5.8.0 or later, you probably want to be using the Encode module instead. This module does work with Perl 5.8, but Encode is the preferred method in that environment.
SWF::Builder::Character::EditText 0.16
SWF::Builder::Character::EditText is a SWF dynamic editable text object. more>>
SYNOPSIS
my $text = $mc->new_dynamic_text( $font )
->size(10)
->color(000000)
->text(This is a text.);
my $text_i = $text->place;
my $field = $mc->new_input_field;
$field->place;
^This module creates dynamic editable text objects, which can be changed at playing time.
Basic dynamic editable text object
$etext = $mc->new_edit_text( [$font, $text] )
returns a new basic dynamic editable text object. It has interfaces to raw DefineEditText tag. $font is an SWF::Builder::Font object.
$etext->font( $font )
applies the font to the text. $font is an SWF::Builder::Font object. Unlike static text, the font is applied to the whole text. If the text will be changed in the playing time, you should add glyph data of all characters which will be used to the font by $font->add_glyph or turn off the embed flag of the font.
$etext->size( $size )
sets a font size to $size in pixel. Unlike static text, the font size of the whole text is changed.
$etext->color( $color )
sets color of the text. The color can take a six or eight-figure hexadecimal string, an array reference of R, G, B, and optional alpha value, an array reference of named parameters such as [Red => 255], and SWF::Element::RGB/RGBA object. Unlike static text, the color is applied to the whole text.
$etext->text( $string )
writes the $string.
$etext->leading( $leading )
sets the vertical distance between the lines in pixel.
$etext->box_size( $width, $height )
sets the bounding box of the text and stops auto-sizing the box. When either $width or $height is undef, it is unchanged. Fixing bounding box may cause unexpected text clipping. You should set DefineEditText flag Multiline and/or WordWrap. See SWF::Element.
$etext->draw_border
draws the border.
$etext->align( left / right / center / justify )
sets the text alignment.
$etext->methos for SWF::Element::Tag::DefineEditText
You can control details of the texts to call methods for DefineEditText tag. See SWF::Element.
Preset dynamic text object
The following objects are inheritants of the basic dynamic editable text. These are preset some proper flags of DefineEditText tag.
$dtext = $mc->new_dynamic_text( [$font, $text] )
returns a new dynamic text. It is read-only, multiline text enabled, and auto-sized its bounding box.
$htmltext = $mc->new_html_text( [$html] )
returns a new HTML text. It is read-only, multiline text enabled, and auto-sized its bounding box. The text is treated as a subset of HTML. Supported tags are < a >, < b >, < br >, < font >, < i >, < img >, < li >, < p >, < span >, < u >, and two special tags, < tab > and < textformat >. See Macromedia Flash File Format Specification and ActionScript Reference Guide for further information.
$htmltext->use_font( $font, ... )
tells $htmltext what fonts are used in the HTML. In general, upright, italic, bold, and bold italic font are in the different TrueType font files. You should prepare 2-4 fonts if you use < b > and < i > tags, like this:
my $fp = $ENV{SYSTEMROOT}./fonts; # for Windows.
my $font = $m->new_font("$fp/arial.ttf");
$font->add_glyph(a, z);
my $fonti = $m->new_font("$fp/ariali.ttf");
$fonti->add_glyph(a, z);
my $ht = $m->new_html_text;
$ht->text(test < i >string< /i >< /font >);
$ht->use_font($font, $fonti);
$mc->new_text_area( $width, $height )
returns a new editable text area. It takes area width and height in pixel.
$mc->new_input_field( [$length] )
returns a new one-line input field. $length is a max length of input string.
$mc->new_password_field( [$length] )
returns a new one-line password field. $length is a max length of input string.
Unicode.php 0.1.1
Unicode.php provides some PHP classes for maniuplating Unicode data. more>> <<less
Unicode Error Detector 1.0
Unicode Error Detector is a product for Plone used to pinpoint errors in your application leading to UnicodeDecodeErrors. more>>
Do not use this product unless you are actively debugging a Unicode Error. Never use this product in production sites.
UnicodeDecodeErrors typically occur when you try to add a Unicode string to a non-ascii string. This product patches StringIO used by page templates to check if the appended string is a Unicode string, and if it is, it replaces the string with an error marker.
As there is some overhead associated with inspecting the strings instead of just appending to the output, this product is meant for debugging purposes only.
Usage
Put the product in your Products directory and restart Zope. Load the template causing the UnicodeDecodeError, and this tool will indicate the location by printing THIS IS WHERE THE ERROR IS in the rendered template.
You can then inspect the template and/or code more closely to figure out where the decode error happens.