[erlang-questions] Strings as Lists

Wed Feb 13 08:11:43 CET 2008

On 13 Feb 2008, at 02:32 , Lev Walkin wrote:

> Robert Virding wrote:
>> I think it all boils down to what you are going to *do* with these  
>> strings. If you are just going to store them somewhere for later  
>> then converting them to a binary definitely save space. If,  
>> however, you are going to *work* with them then having them as  
>> lists is definitely much better. It is so much easier than having  
>> fixed sequence of octets. Also most, if not all declarative  
>> languages functional and logic, have very optimised list handling  
>> because lists are so practical to work with.
>> As mentioned in the next mail you can also keep them as iolists  
>> while processing to make it efficient to send the strinigs into the  
>> big wide world. This is sort best of both worlds.
>> Also having them as lists means you get UTF-16 and 32 for free, and  
>> most of your libraries still work straight out of the bag. This,  
>> UTF-16/32, I think will become much more important in the future  
>> when the number of internet users who don't have a latin charset as  
>> their base increases. Think of the influence of a few hundred  
>> million indians and chinese who want 32 bit charsets. :-)
>
> Small correction: UTF-16 and UTF-32 are practically dead, you  
> certainly
> need to think in terms of UTF-8 nowadays.
>
I need to think in terms of none of these. They're all transformation  
formats, in other words exchange formats. Inner representation of  
characters on 32 bits means we should get raw unicode codepoints and  
be done with it, the codepoints are the universal theoretical "values"  
for each character and there is *no reason* to use an UTF or an UCS  
format as the internal representation of characters.