[erlang-questions] Strings as Lists

Lev Walkin vlm@REDACTED
Wed Feb 13 02:32:44 CET 2008


Robert Virding wrote:
> I think it all boils down to what you are going to *do* with these 
> strings. If you are just going to store them somewhere for later then 
> converting them to a binary definitely save space. If, however, you are 
> going to *work* with them then having them as lists is definitely much 
> better. It is so much easier than having fixed sequence of octets. Also 
> most, if not all declarative languages functional and logic, have very 
> optimised list handling because lists are so practical to work with.
> 
> As mentioned in the next mail you can also keep them as iolists while 
> processing to make it efficient to send the strinigs into the big wide 
> world. This is sort best of both worlds.
> 
> Also having them as lists means you get UTF-16 and 32 for free, and most 
> of your libraries still work straight out of the bag. This, UTF-16/32, I 
> think will become much more important in the future when the number of 
> internet users who don't have a latin charset as their base increases. 
> Think of the influence of a few hundred million indians and chinese who 
> want 32 bit charsets. :-)

Small correction: UTF-16 and UTF-32 are practically dead, you certainly
need to think in terms of UTF-8 nowadays.

> Robert
> 
> On 12/02/2008, *Masklinn* <masklinn@REDACTED 
> <mailto:masklinn@REDACTED>> wrote:
> 
> 
>     On 12 Feb 2008, at 17:19 , tsuraan wrote:
> 
>      > Why does erlang internally represent strings as lists?  In every
>      > language
>      > I've used other than Java, a string is a sequence of octets, just
>     like
>      > Erlang's binary type.  I know that you can represent a string
>      > efficiently by
>      > using <<"string">> rather than just "string", but why doesn't erlang
>      > do this
>      > by default?  Is it just because pre-12B binary handling wasn't as
>      > efficient
>      > as list handling, or is Erlang intended to support UTF-32?
>      >
> 
>     A lot of functional languages represent strings as lists rather than
>     arrays (Haskell also does that, and only recently got bytestrings)
>     because lists are their basic collection datatype (due to being a
>     recursive structure and everything), and this allows the use of all
>     the list-related functions on strings.
> 
>     Representing strings as arrays of bytes or character (which is pretty
>     much also what Java does, by the way) is an attribute of imperative
>     languages whose basic collection datatype is the array.
> 
>     My guess is that's the reason why: a lot of string operations were
>     already implemented on lists (reduces code duplication) and string
>     efficiency wasn't really of importance in the erlang world until
>     fairly recently, so strings being represented as lists of integers
>     wasn't much of a problem.
>     _______________________________________________
>     erlang-questions mailing list
>     erlang-questions@REDACTED <mailto:erlang-questions@REDACTED>
>     http://www.erlang.org/mailman/listinfo/erlang-questions
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://www.erlang.org/mailman/listinfo/erlang-questions




More information about the erlang-questions mailing list