[erlang-questions] Fwd: String encoding and character set
Thu Jan 18 01:53:16 CET 2007
Yes they do.
If the list is flat and contains only integer values (code-points)
between 0 and 255, then a special external representation is used, which
is efficient. But if the list is not flat, or contains values < 0 or >
255, then the normal external representation for lists and integers is
used, which is quite inefficient in that case.
Therefore, if you want to transfer non-ASCII strings efficiently, you
should rather encode them yourself into binaries, not using
Alex Arnon wrote:
> No they do not - the list is expected to contain byte values.
> On 1/17/07, *Dmitrii 'Mamut' Dimandt* <
> <mailto:>> wrote:
> Do list_to_binary/binary_to_list preserve codepoints? That is, does
> L1 = binary_to_list(list_to_binary(L2)) imply that L1 = L2? If not,
> then we loose an effective way of sending strings as binary
> Romain Lenglet wrote:
> > As Robert explained, the current convention for representing
> strings in
> > Erlang is a flat list of Unicode code-points as integers. Every
> > in such a list is a character, represented by its Unicode code-point
> > integer value. The 11th character of a string is the 11th element
> in the
> > list. If you want to encode such a string, you are free to do so, and
> > that is relatively easy. But the current convention is to represent
> > strings *unencoded*, as such lists of Unicode code points.
> erlang-questions mailing list
More information about the erlang-questions