|
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the so-called CJK languages into a single set of unified characters. Han characters are a common feature of written Chinese (hanzi), Japanese (kanji), and Korean (hanja). Modern Chinese, Japanese and Korean typefaces typically use regional or historical variants of a given Han character. In the formulation of Unicode, an attempt was made to unify these variants by considering them different glyphs representing the same "grapheme", or orthographic unit, hence, "Han unification", with the resulting character repertoire sometimes contracted to Unihan. Unihan can also refer to the Unihan Database maintained by the Unicode Consortium, which provides information about all of the unified Han characters encoded in the Unicode standard, including mappings to various national and industry standards, indices into standard dictionaries, encoded variants, pronunciations in various languages, and an English definition. The database is available to the public as (text files ) and via an (interactive Web site ). The latter also includes representative glyphs and definitions for compound words drawn from the free Japanese EDICT and Chinese CEDICT dictionary projects (which are provided for convenience and are not a formal part of the Unicode standard). ==Rationale and controversy== Rules for Han unification are given in the East Asian Scripts chapter of the various versions of the Unicode Standard (Chapter 12 in Unicode 6.0).〔(The Unicode Standard, Version 6.0, Chapter 12 East Asian Scripts, 12.1 Han )〕 The Ideographic Rapporteur Group (IRG),〔http://www.ogcio.gov.hk/ccli/eng/structure/irg.html — this page is no longer active, but its original content is still (available via the Internet Archive )〕 made up of experts from the Chinese-speaking countries, North and South Korea, Japan, Vietnam, and other countries, is responsible for the process. One possible rationale is the desire to limit the size of the full Unicode character set, where CJK characters as represented by discrete ideograms may approach or exceed 100,000 (while those required for ordinary literacy in any language are probably under 3,000). Version 1 of Unicode was designed to fit into 16 bits and only 20,940 characters (32%) out of the possible 65,536 were reserved for these CJK Unified Ideographs. Later Unicode has been extended to 21 bits allowing many more CJK characters (75,960 are assigned, with room for more). (''The secret life of Unicode'' ) article located on IBM DeveloperWorks attempts to illustrate part of the motivation for Han unification: In fact, the three ideographs for "one" are encoded separately in Unicode, as they are not considered national variants. The first and second are used on financial instruments to prevent tampering (they may be considered variants), while the third is the common form in all three countries. However, Han unification has also caused considerable controversy, particularly among the Japanese public, who, with the nation's literati, have a history of protesting the culling of historically and culturally significant variants. (See Kanji#Orthographic reform and lists of kanji. Today, the list of characters officially recognized for use in proper names continues to expand at a modest pace.) 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Han unification」の詳細全文を読む スポンサード リンク
|