[erlang-questions] Fwd: String encoding and character set

Romain Lenglet rlenglet@REDACTED
Thu Jan 18 02:30:49 CET 2007


Matthias Lang wrote:
> Romain Lenglet writes:
>  > Yes they do.
>  > 
>  > If the list is flat and contains only integer values (code-points) 
>  > between 0 and 255, then a special external representation is used, which 
>  > is efficient. But if the list is not flat, or contains values < 0 or > 
>  > 255, then the normal external representation for lists and integers is 
>  > used, which is quite inefficient in that case.
> 
> Are you mixing up term_to_binary and list_to_binary?

Yes. (^_^)

> list_to_binary gives you a badarg if you have values > 255.
> 
> What are you talking about?

I was talking about term_to_binary/1. Sorry. I misread Dmitrii's 
original message.

> Matthias
> 
> 
>  > 
>  > Therefore, if you want to transfer non-ASCII strings efficiently, you 
>  > should rather encode them yourself into binaries, not using 
>  > list_to_binary/1.
>  > 
>  > Alex Arnon wrote:
>  > > No they do not - the list is expected to contain byte values.
>  > > 
>  > > On 1/17/07, *Dmitrii 'Mamut' Dimandt* <dmitriid@REDACTED 
>  > > <mailto:dmitriid@REDACTED>> wrote:
>  > > 
>  > >     Do list_to_binary/binary_to_list preserve codepoints? That is, does
>  > >     L1 = binary_to_list(list_to_binary(L2)) imply that L1 = L2? If not,
>  > >     then we loose an effective way of sending strings as binary
>  > > 
>  > > 
>  > >     Romain Lenglet wrote:
>  > >      > As Robert explained, the current convention for representing
>  > >     strings in
>  > >      > Erlang is a flat list of Unicode code-points as integers. Every
>  > >     element
>  > >      > in such a list is a character, represented by its Unicode code-point
>  > >      > integer value. The 11th character of a string is the 11th element
>  > >     in the
>  > >      > list. If you want to encode such a string, you are free to do so, and
>  > >      > that is relatively easy. But the current convention is to represent
>  > >      > strings *unencoded*, as such lists of Unicode code points.
>  > >      >
>  > > 
>  > >     _______________________________________________
>  > >     erlang-questions mailing list
>  > >     erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>  > >     http://www.erlang.org/mailman/listinfo/erlang-questions
>  > > 
>  > > 
>  > 
>  > _______________________________________________
>  > erlang-questions mailing list
>  > erlang-questions@REDACTED
>  > http://www.erlang.org/mailman/listinfo/erlang-questions




More information about the erlang-questions mailing list