UTF-8 について

Words near each other

・ Utetheisa vandenberghi
・ Utetheisa varians
・ Utetheisa variolosa
・ Utetheisa vollenhovii
・ Utetheisa watubela
・ Utetheisa witti
・ Utetheisa ypsilon
・ Uteute
・ UTEX Industries
・ UTEXAS
・ UTF
・ UTF-1
・ UTF-16
・ UTF-32
・ UTF-7
・ UTF-8
・ UTF-9 and UTF-18
・ UTF-EBCDIC
・ UTF1 (gene)
・ UTFO
・ UTFO (album)
・ UTFSE
・ UTFSF
・ Utgard
・ Utgard (software)
・ Utgard Peak
・ Utgård
・ Uth
・ Uth Records
・ Utha

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

UTF-8 ：ウィキペディア英語版

UTF-8
UTF-8 is a character encoding capable of encoding all possible characters, or ''code points'', in Unicode.
The encoding is variable-length and uses 8-bit ''code units''. It was designed for backward compatibility with ASCII, and to avoid the complications of endianness and byte order marks in the alternative UTF-16 and UTF-32 encodings. The name is derived from: ''U''niversal Coded Character Set + ''T''ransformation ''F''ormat''8''-bit.
UTF-8 is the dominant character encoding for the World Wide Web, accounting for 85.1% of all Web pages in September 2015 (with the most popular East Asian encoding, GB 2312, at 1.0%).〔(【引用サイトリンク】 Usage Statistics of Character Encodings for Websites, (updated daily) )〕〔〔(【引用サイトリンク】 UTF-8 Usage Statistics )〕 The Internet Mail Consortium (IMC) recommends that all e-mail programs be able to display and create mail using UTF-8, and the W3C recommends UTF-8 as the ''default encoding'' in XML and HTML.
UTF-8 encodes each of the 1,112,064 valid code points in the Unicode code space (1,114,112 code points minus 2,048 surrogate code points) using one to four 8-bit bytes (a group of 8 bits is known as an octet in the Unicode Standard). Code points with lower numerical values (i.e., earlier code positions in the Unicode character set, which tend to occur more frequently) are encoded using fewer bytes. The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as ASCII, making valid ASCII text valid UTF-8-encoded Unicode as well. And ASCII bytes do not occur when encoding non-ASCII code points into UTF-8, making UTF-8 safe to use within most programming and document languages that interpret certain ASCII characters in a special way, e.g. as end of string.
The official IANA code for the UTF-8 character encoding is UTF-8.
==History==
By early 1992, the search was on for a good byte-stream encoding of multi-byte character sets. The draft ISO 10646 standard contained a non-required annex called UTF-1 that provided a byte-stream encoding of its 32-bit code points. This encoding was not satisfactory on performance grounds, but did introduce the notion that bytes in the range of 0–127 continue representing the ASCII characters in UTF, thereby providing backward compatibility with ASCII.
In July 1992, the X/Open committee XoJIG was looking for a better encoding. Dave Prosser of Unix System Laboratories submitted a proposal for one that had faster implementation characteristics and introduced the improvement that 7-bit ASCII characters would only represent themselves; all multibyte sequences would include only bytes where the high bit was set. This original proposal, the File System Safe UCS Transformation Format (FSS-UTF), was similar in concept to UTF-8, but lacked the crucial property of self-synchronization.
In August 1992, this proposal was circulated by an IBM X/Open representative to interested parties. Ken Thompson of the Plan 9 operating system group at Bell Labs made a small but crucial modification to the encoding, making it slightly less bit-efficient than the previous proposal but allowing it to be self-synchronizing, meaning that it was no longer necessary to read from the beginning of the string to find code point boundaries. Thompson's design was outlined on September 2, 1992, on a placemat in a New Jersey diner with Rob Pike. In the following days, Pike and Thompson implemented it and updated Plan 9 to use it throughout, and then communicated their success back to X/Open.〔
UTF-8 was first officially presented at the USENIX conference in San Diego, from January 25 to 29, 1993.
Google reported that in 2008 UTF-8 (misleadingly labelled "Unicode") became the most common encoding for HTML files.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「UTF-8」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース