[erlang-questions] Fwd: String encoding and character set
Romain Lenglet
rlenglet@REDACTED
Thu Jan 18 01:53:16 CET 2007
Yes they do.
If the list is flat and contains only integer values (code-points)
between 0 and 255, then a special external representation is used, which
is efficient. But if the list is not flat, or contains values < 0 or >
255, then the normal external representation for lists and integers is
used, which is quite inefficient in that case.
Therefore, if you want to transfer non-ASCII strings efficiently, you
should rather encode them yourself into binaries, not using
list_to_binary/1.
Alex Arnon wrote:
> No they do not - the list is expected to contain byte values.
>
> On 1/17/07, *Dmitrii 'Mamut' Dimandt* <dmitriid@REDACTED
> <mailto:dmitriid@REDACTED>> wrote:
>
> Do list_to_binary/binary_to_list preserve codepoints? That is, does
> L1 = binary_to_list(list_to_binary(L2)) imply that L1 = L2? If not,
> then we loose an effective way of sending strings as binary
>
>
> Romain Lenglet wrote:
> > As Robert explained, the current convention for representing
> strings in
> > Erlang is a flat list of Unicode code-points as integers. Every
> element
> > in such a list is a character, represented by its Unicode code-point
> > integer value. The 11th character of a string is the 11th element
> in the
> > list. If you want to encode such a string, you are free to do so, and
> > that is relatively easy. But the current convention is to represent
> > strings *unencoded*, as such lists of Unicode code points.
> >
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
>
More information about the erlang-questions
mailing list