> I would be perfectly fine with a proposal that said "we use 4-byte characters, just like Linux wchar_t."<br>> I would also be OK with a proposal that said "we use 2-byte characters, just like Windows, and only support the 65535 character subset."<br>
> Significantly better performance, slightly worse coverage of 10646.<br><br>There is no big deal when we talk about code points or code elements (in UTF-16). Their importance is greatly exaggerated. Get me example, when "<span class="Apple-style-span" style="color: rgb(80, 0, 80); font-family: arial, sans-serif; font-size: 13px; background-color: rgb(255, 255, 255); ">O(1) INDEXING OF UNICODE CODE POINTS" is useful.</span><br>
There is an opinion of ICU's developers.<br><br><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
Using UTF-8 strings with ICU<br>As mentioned in the overview of this chapter, ICU and most other Unicode-supporting software uses 16-bit Unicode for internal processing. However, there are circumstances where UTF-8 is used instead. This is usually the case for software that does little or no processing of non-ASCII characters, and/or for APIs that predate Unicode, use byte-based strings, and cannot be changed or replaced for various reasons.<br>
A common perception is that UTF-8 has an advantage because it was designed for compatibility with byte-based, ASCII-based systems, although it was designed for string storage (of Unicode characters in Unix file names) rather than for processing performance.<br>
While ICU mostly does not natively use UTF-8 strings, there are many ways to work with UTF-8 strings and ICU. For more information see the newer UTF-8 subpage.<br></blockquote><div> </div><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
Using UTF-32 strings with ICU<br>It is even rarer to use UTF-32 for string processing than UTF-8. While 32-bit Unicode is convenient because it is the only fixed-width UTF, there are few or no legacy systems with 32-bit string processing that would benefit from a compatible format, and the memory bandwidth requirements of UTF-32 diminish the performance and handling advantage of the fixed-width format.<br>
Over time, the wchar_t type of some C/C++ compilers became a 32-bit integer, and some C libraries do use it for Unicode processing. However, application software with good Unicode support tends to have little use for the rudimentary Unicode and Internationalization support of the standard C/C++ libraries and often uses custom types (like ICU's) and UTF-16 or UTF-8.</blockquote>
<br>From <a href="http://userguide.icu-project.org/strings" target="_blank">http://userguide.icu-project.org/strings</a><br><br>-- <br>Best regards,<br>Uvarov Michael<br><br>