UTF-32 について

Words near each other

・ Utetheisa timorensis
・ Utetheisa transiens
・ Utetheisa vandenberghi
・ Utetheisa varians
・ Utetheisa variolosa
・ Utetheisa vollenhovii
・ Utetheisa watubela
・ Utetheisa witti
・ Utetheisa ypsilon
・ Uteute
・ UTEX Industries
・ UTEXAS
・ UTF
・ UTF-1
・ UTF-16
・ UTF-32
・ UTF-7
・ UTF-8
・ UTF-9 and UTF-18
・ UTF-EBCDIC
・ UTF1 (gene)
・ UTFO
・ UTFO (album)
・ UTFSE
・ UTFSF
・ Utgard
・ Utgard (software)
・ Utgard Peak
・ Utgård
・ Uth

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

UTF-32 ：ウィキペディア英語版

UTF-32

UTF-32 (or UCS-4) stands for Unicode Transformation Format 32 bits. It is a protocol to encode Unicode code points that uses exactly 32 bits per Unicode code point. This makes UTF-32 a fixed-length encoding, in contrast to all other ''Unicode transformation formats'' which are variable-length encodings. The UTF-32 form of a code point is a direct representation of that code point's numerical value.〔SIL, (Mapping code points to Unicode encoding forms ), §1: UTF-32〕
The main advantage of UTF-32, versus variable-length encodings, is that the Unicode code points are directly indexable. Examining the n'th code point is a constant time operation.〔http://www.ibm.com/developerworks/xml/library/x-utf8/〕 In contrast, a variable-length code requires sequential access to find the n'th code point. This makes UTF-32 a simple replacement in code that uses integers to index characters out of strings, as was commonly done for ASCII.
The main disadvantage of UTF-32 is that it is space inefficient, using four bytes per code point. Non-BMP characters are so rare in most texts, they may as well be considered non-existent for sizing issues, making UTF-32 up to twice the size of UTF-16 and up to four times the size of UTF-8.
==History==
The original ISO 10646 standard defines a 31-bit ''encoding form'' called UCS-4, in which each encoded character in the Universal Character Set (UCS) is represented by a 32-bit friendly ''code value'' in the ''code space'' of integers between 0 and hexadecimal 7FFFFFFF.
Because only 17 planes are actually in use, all current code points are between 0 and 0x10FFFF. UTF-32 is a subset of UCS-4 that uses only this range. Since the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes, UTF-32 will be able to represent all Unicode characters. Accordingly, UCS-4 and UTF-32 are now identical except that the UTF-32 standard has additional Unicode semantics.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「UTF-32」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース