Strings (was: Re: are Mnesia tables immutable?)
Wed Jul 5 15:24:28 CEST 2006
I have been reading this discussion off-line so I have not been able to
reply quickly. As I see it we are discussing (at least) 3 different
types of strings:
1) internal mutable strings, strings we want to modify and work on (yes
I KNOW Erlang doesn't really have mutable datatypes :-)
2) internal immutable strings, strings we don't want to modify
3) external representation of strings, term_to_binary
Unfortunately we are using the same name for these different things.
Before I go on I would like to point out that the convention of strings
being lists of integers >= 0, =< 255 originated in the code for fwrite
~p because I needed an easy way to decide when a list should be printed
as a string. Just to be helpful. Nothing more. I know because I wrote
the code and "invented" the convention. I am guessing that it became
part of binary encoding because such lists could be encoded efficiently
and useful for strings. But that doesn't mean that strings must only
contain small integers.
I think some people are putting WAY too much significance into a trivial
That being said some comments on representation:
1. Internal mutable strings. Sorry, I can't for the life of me
understand why they should be represented as anything else other than
one unicode character per list element. Easy to work with, backwards
compatible (which I NEVER worried about before, ask Joe) and relatively
efficient. Anything else at this level would be a serious pain in the arse.
2. Internal immutable strings. I am wondering if we really need fix
this. These strings are VERY application dependant and the application
definitely knows what it needs in the way of encoding. Store them as
binaries and provide some libraries for converting between list strings
which can handle everything, and various encodings in the binaries.
3. External representation. In one respect I don't really see the
problem here, if you solve 1&2 then this problem goes away. What I want
from an external representation is that I get back out of it what I put
into it! Nothing more, nothing less! I have chosen the representation so
I don't want "help" in converting it. If I have a list of integers in
then I want the SAME list of integers out, if I have a binary string in
I want the SAME binary string out. If term_to_binary detects that my
list consists of only 8/16/24/32 bit integers and smart-codes that then
fine, as long as I get it back the same way.
I am definitely not an expert on Unicode so I may have missed something
important. But keep it simple so the programmer knows what is happening
and can work with that. KISS principle.
My main worry that if you start baking in hard-wired solutions into the
systems then you a) will get it wrong, b) make a lot of people unhappy
because you made the wrong choice and c) make the system bigger and
harder to maintain. Provide libraries and credit the programmer with
some intelligence in making their own choices as to what they need.
As you may understand I am definitely for having different
representations depending on what you are doing. One-size does
definitely NOT fit all.
More information about the erlang-questions