[erlang-questions] What is the correct representation of text in erlang programs?

Steve Davis steven.charles.davis@REDACTED
Mon Apr 13 22:16:55 CEST 2009


Working with HTTP means a lot of work with text, I have been
frequently facing issues in trying to figure out whether a list of
integers represents a text string or just a list of integers (or
something else), and found myself wishing for a string type in Erlang
(please, please read further before wasting time responding to this
particular thought!).

I read quite a few past threads on this issue, and found a lot of
ideas and discussion about the "rightness" or otherwise of a string
type I found certain arguments pretty convincing, and I have ditched
the whole idea of a "string type" as a maguffin.

So what to do? Although the issue is old, I don't see any readily
available guidance. Here is my thinking on it:

I use JSON as it's familiar and simple, but the same issue applies to
most imported data.

Take a simple incoming JSON object such as:
{
   "hello": "world",
}

You would of course actually receive something like: "{\"hello\":
\"world\"}", or <<"{\"hello\": \"world\"}">>

The question then is: what is the correct term equivalent of this JSON
object? Given that a JSON object is a list of pairs, "K" : V,  then
the most obvious solution would be:

[ { hello, "world"} ]

... where the pair list that represents the object would be a list of
two-tuples with an atom key and the "string" represented as a list of
integers.

But suppose we add in an array (which of course is also a list):
{
   "hello": [ "world", "again" ],
}

Including an array "of strings" for the value makes life considerably
more indeterminate. When processing the data type. You need to decide
whether something Is a list of things or a representation of a text
thing, and that's not always (in fact, rarely) easy.

Looking at it this way, I've come to the conclusion that the "one true
way" to represent text values inside erlang terms is actually as
binaries... so for our example the correct solution would be:

[ {hello: <<"world">>} ]

and

[ {hello: [ <<"world">>, <<"again">>] } ]

Apart from "is this correct?", I have two questions:

1) If you use the rule of "text as binaries only", will this lead to
the correct solution in all cases of imported data types?

2) Are there boundary conditions I should be aware of?

BR
Steve



More information about the erlang-questions mailing list