|
(; lit. "character transformation"), from the Japanese 文字 (moji) "character" + 化け (bake, pronounced "bah-kay") "transform", is the garbled text that is the result of text being decoded using an unintended character encoding.〔"(Will Unicode soon be the universal code? )" ''IEEE Spectrum'', vol. 49, issue 7, p. 60 (July 2012). ''The advantage of Unicode is that if everyone adopted it, it would eradicate the problem of mojibake, Japanese for “character transformation.” Mojibake is the jumble that results when characters are encoded in one system but decoded in another.''〕 The result is a systematic replacement of symbols with completely unrelated ones, often from a different writing system. This display may include the generic replacement character � in places where the binary representation is considered invalid. A replacement can also involve multiple consecutive symbols, as viewed in one encoding, when the same binary code constitutes one symbol in the other encoding. This is either because of differing constant length encoding (as in Asian 16-bit encodings vs European 8-bit encodings), or the use of variable length encodings (notably UTF-8 and UTF-16). Failed rendering of glyphs due to either missing fonts or missing glyphs in a font is a different issue that is not to be confused with mojibake. Symptoms of this failed rendering include blocks with the codepoint displayed in hexadecimal or using the generic replacement character �. Importantly, these replacements are ''valid'' and are the result of correct error handling by the software. ==Causes== To correctly reproduce the original text that was encoded, the correspondence between two things must be preserved: the encoded data, and the notion of its encoding. As mojibake is the instance of incompliance between these, it can be achieved by manipulating the data itself, or just relabeling it. Mojibake is often seen with text data that have been tagged with a wrong encoding; or not tagged at all, but moved between computers with different default encodings. A major source of trouble are communication protocols that rely on settings on each computer rather than sending or storing metadata together with the data. The differing default settings between computers are in part due to differing deployments of Unicode among operating system families, and partly the legacy encodings' specializations for different writing systems of human languages. Whereas Linux distributions mostly switched to UTF-8 (around 2004 ) for all uses of text, Microsoft Windows still uses codepages for text files, that differ between languages. For some languages, an example being Japanese, several encodings have historically been employed, causing users to see mojibake relatively often. As a Japanese example, the word ''mojibake'' "文字化け", when encoded in UTF-8, is incorrectly displayed as "æ–‡å—化ã‘" in software that assumes text to be in the Windows-1252 or ISO-8859-1 encodings, usually labelled Western. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「mojibake」の詳細全文を読む スポンサード リンク
|