[erlang-questions] correct terminology for referring to strings

Masklinn <>
Tue Jul 31 17:13:01 CEST 2012


On 2012-07-31, at 16:37 , Richard Carlsson wrote:

> On 07/31/2012 04:19 PM, Michael Turner wrote:
>>> At runtime, Erlang's strings are just plain sequences of Unicode code points
>>> (you can think of it as UTF-32 if you like).
>> 
>> Can you go further and say that it actually *is* UTF-32? A footnote
>> like "[*] Basically, UTF-32; see ref XYZ for details" might be
>> helpful.
> 
> I'm loath to say that it *is* UTF-32, because with that term follows a bunch of connotations such as word width and endianism, which don't apply to the representation as Erlang integers. I'd like to just refer to it as Unicode, but apparently that makes most people think it's either UTF-8 or UTF-16.

Say it's a sequence of code points (reified as integers)? That's exactly
what it is. If people don't know what a code point is, they can look it
up. In any case, this shouldn't bring along any undue semantic baggage
and misconception.


More information about the erlang-questions mailing list