[erlang-questions] unicode: what about printing terms?

Mon Oct 22 15:28:37 CEST 2012

Hi Patrik,

On Mon, Oct 22, 2012 at 3:00 PM, Patrik Nyblom <pan@REDACTED> wrote:
> On 10/19/2012 07:27 PM, Vlad Dumitrescu wrote:
>> How is printing going to work when atoms/variables can be unicode?
>
> Well, the ~ts is because the same data can be interpreted in different ways,
> the interpretation of a binary can not be done depending on it's content
> alone (except if you resort to guessing, which of course "mostly" works).
> Hence the difference between ~ts and ~s. In the atom case there is no need
> for interpretation. Atoms are Unicode in R18, period. So there's no need for
> a t modifier for atoms. Binaries will however be subject to interpretation
> regardless of the default.

Maybe this unicode-talk got me confused...

When printing an atom name or a string or a binary, it is serialized
to bytes, right? So an encoding must be used, and it might be up to
the application which one to use. Today ~ts is used for binaries and
strings.

A similar question about sending atoms on the wire, as messages: I
have no control over the recipients and they might expect latin-1 or
utf-8. Are you going to add new tags for the external term format?

regards,
Vlad