|
UTF-1 is a way of transforming ISO 10646/Unicode into a stream of bytes. Due to the design, it is not possible to resynchronise if decoding starts in the middle of a character (this makes truncation hard, among other things) and simple byte-oriented search routines cannot be reliably used with it. UTF-1 is also fairly slow due to its use of division by a number which is not a power of 2. Due to these issues, UTF-1 never gained wide acceptance and has been replaced by UTF-8. == Design == UTF-1 is a multi-byte encoding like UTF-8; a single Unicode code point can be encoded in one, two, three, or five octets. While the ASCII range is encoded as one octet, as in UTF-8, the ASCII octets 0x21 - 0x7E (decimal 33 - 126) are also used in UTF-1 multi-byte encodings; therefore UTF-1 is unsuited for many Internet protocols, including MIME. UTF-1 does not use the C0 and C1 control codes in other encodings – any 0x00–0x20 or 0x7F–0x9F octet stands for the corresponding code points in ISO-8859-1 (U+0000–0020 and U+007F–009F, respectively). This design with 66 ''protected'' octets tried to be ISO 2022 compatible. The UTF-1 encoding scheme uses "modulo 190" arithmetic (256-66=190); it was designed to encode the complete 31 bits of the original Universal Character Set (UCS-4). For comparison, UTF-8 ''protects'' all 128 ASCII octets, and needs two bits in trailing bytes of multi-byte encodings for this purpose, resulting in "modulo 64" arithmetic (8-2=6, 26=64). BOCU-1 ''protects'' only the minimal set required for MIME-compatibility (0x00, 0x07–0x0F, 0x1A–0x1B, and 0x20), resulting in "modulo 243" arithmetic (256-13=243). 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「UTF-1」の詳細全文を読む スポンサード リンク
|