Forms of Unicode | Search for a title, author or keyword | ||||||||
Forms of Unicode By Dr. Mark Davis, IBM developer and President of the Unicode Consortium, IBM, September 1999. In the beginning, Unicode was a simple, fixed-width 16-bit encoding. Under its initial design principles, there was enough room in 16 bits for all modern writing systems. But over the course of Unicode's growth and development, those principles had to give way. When characters were added to ensure compatibility with legacy character sets, available space dwindled rapidly. Many of these compatibility characters are superfluous, and were required only because different platform technologies at the time couldn't handle the representation of those characters as originally designed. So 16 bits were not enough anymore. Unicode needed an extension mechanism to get up to a larger number of characters. The standard mechanism uses pairs of Unicode values called surrogates to address over 1,000,000 possible values. Additionally, some systems couldn't easily handle extending their interfaces to use 16-bit units in processing. These systems needed a form of Unicode that could be handled in 8-bit bytes. Other systems found it easier to use larger units of 32 bits for representing Unicode. Other systems found it easier to use larger units of 32 bits for representing Unicode. As a result of these different requirements, there are now three different forms of Unicode: UTF-8, UTF-16, and UTF-32.
|
|||||||||
Forms of Unicode | Disclaimer: this link points to content provided by other sites. |