Strings (was: Re: are Mnesia tables immutable?)

Andrew Lentvorski bsder@REDACTED
Thu Jun 29 08:27:24 CEST 2006

Richard A. O'Keefe wrote:

> So we have two possible approaches here:

We have more than that, but how about choice 0:

0) We leave strings alone and simply declare them by fiat to be lists of 
integers and encoded as UTF-8.

This has the advantage that strings survive very nicely inside BEAM 
files without making any code changes to the Erlang system.  It also 
means that the current term-to-binary stuff works just fine if a bit 
verbose.  UTF-8 is documented everywhere on the planet and survives old 
systems because it makes sure not to use ASCII NUL (0) except as NUL. 
It is also very identifiable as it looks like ASCII or it looks like 
nothing else.  Therefore, dropped bytes and characters are usually 
fairly identifiable, but the decoding can continue so that all the 
information isn't lost.

This requires *0* lines of code and no understanding by those who stay 
within the ASCII character set.

There should, however, be a module which encodes and decodes from the 
internal format to the various multiplicity of encodings.  Probably the 
end result should be a binary object.  That way, if you want to put a 
string on the wire in a particular encoding, you can.  If you don't want 
to, you don't have to.  And there will always be someone who doesn't 
want to.


More information about the erlang-questions mailing list