RFC 3629 - UTF-8 | Search for a title, author or keyword | ||||||||
RFC 3629 - UTF-8 UTF-8, a transformation format of ISO 10646. Request for Comments: 3629. Network Working Group. F. Yergeau. November 2003. ISO/IEC 10646-1 defines a large character set called the Universal Character Set ( UCS ) which encompasses most of the world's writing systems. The same set of characters is defined by the Unicode standard. The originally proposed encodings of the UCS, however, were not compatible with many current applications and protocols, and this has led to the development of UTF-8, the object of this memo. UTF-8 has the characteristic of preserving the full US-ASCII range, providing compatibility with file systems, parsers and other software that rely on US-ASCII values but are transparent to other values. ISO/IEC 10646 and Unicode define several encoding forms of their common repertoire: UTF-8, UCS-2, UTF-16, UCS-4 and UTF-32. In an encoding form, each character is represented as one or more encoding units. All standard UCS encoding forms except UTF-8 have an encoding unit larger than one octet, making them hard to use in many current applications and protocols that assume 8 or even 7 bit characters. UTF-8 has a one-octet encoding unit. This memo obsoletes and replaces RFC 2279.
|
|||||||||
RFC 3629 - UTF-8 | Disclaimer: this link points to content provided by other sites. |