翻訳と辞書
Words near each other
・ Universal Camouflage Pattern
・ Universal card
・ Universal Carrier
・ Universal Catholic Church
・ Universal Century
・ Universal Century Gundam Online
・ Universal Century technology
・ Universal Channel
・ Universal Channel (Asia)
・ Universal Channel (Australia)
・ Universal Channel (Greece)
・ Universal Channel (Japan)
・ Universal Channel (Philippines)
・ Universal Channel (Turkey)
・ Universal Channel (UK and Ireland)
Universal Character Set characters
・ Universal charger
・ Universal Chess Interface
・ Universal Child
・ Universal church
・ Universal Church of the Kingdom of God
・ Universal Church of the Way and its Virtue
・ Universal Church of Truth
・ Universal Cinema Classics
・ Universal Circulating Herald
・ Universal Circulating Music Library
・ Universal City
・ Universal City Station
・ Universal City Studios, Inc. v. Nintendo Co., Ltd.
・ Universal City Studios, Inc. v. Reimerdes


Dictionary Lists
翻訳と辞書 辞書検索 [ 開発暫定版 ]
スポンサード リンク

Universal Character Set characters : ウィキペディア英語版
Universal Character Set characters

The Unicode Consortium (UC) and the International Organisation for Standardisation (ISO) collaborate on the Universal Character Set (UCS). The UCS is an international standard to map characters used in natural language, mathematics, music, and other domains to machine readable values. By creating this mapping, the UCS enables computer software vendors to interoperate and transmit UCS encoded text strings from one to another. Because it is a universal map, it can be used to represent multiple languages at the same time. This avoids the confusion of using multiple legacy character encodings, which can result in the same sequence of codes having multiple meanings and thus be improperly decoded if the wrong one is chosen.
UCS has a potential capacity to encode over 1 million characters. Each UCS character is abstractly represented by a code point, which is an integer between 0 and 1,114,111, used to represent each character within the internal logic of text processing software (1,114,112 = 220 + 216 ''or'' 17 × 216, or hexadecimal 110000 code points). As of Unicode 8.0, released in June 2015, 264,256 (24%) of these code points are allocated, including 120,737 (11%) assigned characters, 137,468 (12%) reserved for private use, 2,048 for surrogates, and 66 designated non-characters, leaving 849,856 (76%) unassigned. The number of encoded characters is made up as follows:
* 120,520 graphical characters (some of which do not have a visible glyph, but are still counted as graphical)
* 217 special purpose characters for control and formatting.
ISO maintains the basic mapping of characters from character name to code point. Often the terms "character" and "code point" will get used interchangeably. However, when a distinction is made, a code point refers to the integer of the character: what one might think of as its address. While a character in UCS 10646 includes the combination of the code point and its name, Unicode adds many other useful properties to the character set, such as block, category, script, and directionality.
In addition to the UCS, Unicode also provides other implementation details such as:
# transcending mappings between UCS and other character sets
# different collations of characters and character strings for different languages
# an algorithm for laying out bidirectional text, where text on the same line may shift between left-to-right and right-to-left
# a case folding algorithm
Computer software end users enter these characters into programs through various input methods. Input methods can be through keyboard or a graphical character palette.
The UCS can be divided in various ways, such as by plane, block, character category, or character property.〔(【引用サイトリンク】 The Unicode Standard )
==Planes==
(詳細はoctets. The characters outside the first plane usually have very specialized or rare use.
# Basic Multilingual Plane (BMP). This plane contains most of the characters needed for scripts and languages in routine use in the world today. The plane is nearly filled with only 144 of the 65,534 code points remaining to be allocated.
# Supplementary Multilingual Plane (SMP). Currently used for many ancient scripts and characters as well as musical and mathematical notation.
# Supplementary Ideographic plane (SIP). Used for ideographic characters used in many languages in China, Japan, Korea, Taiwan, Vietnam and Singapore.
# Supplementary Special-purpose Plane (SSP). For special-purpose characters such as compatibility control characters.
# Private Use Plane A. Together the Private Use planes provide 131,068 characters — in addition to the 6,400 private use code points provided in the BMP — for definition by organizations outside Unicode and ISO 10646. Such private use definers might be operating system vendors, font vendors, or other independent standards organizations.
# Private Use Plane B.
Each plane corresponds with the value of the one or two hexadecimal digits (0–9, A–F) preceding the four final ones: hence U+24321 is in Plane 2, U+4321 is in Plane 0 (implicitly read U+04321), and U+10A200 would be in Plane 16 (hex 10 = decimal 16). Within one plane, the range of code points is hexadecimal 0000–FFFF, yielding a maximum of 65,536 code points. Some planes restrict code points to a subset of that range.

抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)
ウィキペディアで「Universal Character Set characters」の詳細全文を読む



スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース

Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.