[erlang-questions] Fwd: String encoding and character set

Thu Jan 18 01:53:16 CET 2007

Yes they do.

If the list is flat and contains only integer values (code-points) 
between 0 and 255, then a special external representation is used, which 
is efficient. But if the list is not flat, or contains values < 0 or > 
255, then the normal external representation for lists and integers is 
used, which is quite inefficient in that case.

Therefore, if you want to transfer non-ASCII strings efficiently, you 
should rather encode them yourself into binaries, not using 
list_to_binary/1.

Alex Arnon wrote:
> No they do not - the list is expected to contain byte values.
> 
> On 1/17/07, *Dmitrii 'Mamut' Dimandt* <dmitriid@REDACTED 
> <mailto:dmitriid@REDACTED>> wrote:
> 
>     Do list_to_binary/binary_to_list preserve codepoints? That is, does
>     L1 = binary_to_list(list_to_binary(L2)) imply that L1 = L2? If not,
>     then we loose an effective way of sending strings as binary
> 
> 
>     Romain Lenglet wrote:
>      > As Robert explained, the current convention for representing
>     strings in
>      > Erlang is a flat list of Unicode code-points as integers. Every
>     element
>      > in such a list is a character, represented by its Unicode code-point
>      > integer value. The 11th character of a string is the 11th element
>     in the
>      > list. If you want to encode such a string, you are free to do so, and
>      > that is relatively easy. But the current convention is to represent
>      > strings *unencoded*, as such lists of Unicode code points.
>      >
> 
>     _______________________________________________
>     erlang-questions mailing list
>     erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>     http://www.erlang.org/mailman/listinfo/erlang-questions
> 
>