[erlang-questions] Erlang basic doubts about String, message passing and context switching overhead

Joe Armstrong <>
Fri Jan 13 15:35:31 CET 2017


When I teach programming I always talk about the distinction between
'strings' and string literals. Think that the two are the same is a big mistake.

In Erlang ' "aaa" ' is a 'string literal' and NOT a string. String
literals in most programming
languages are sequences of characters interposed between ' " ' characters.

If I say (in Erlang)

     X = "aaaa"

Then to the right hand side of the equals we have a string literal.
The parser turns this
into some internal form which is stored in the variable X - thereafter
we can speak of
value of X as being a "string", it's actually defined to be a list of
integers, where each
integer represents a single character.

In C

    x = "aaaa"

the internal representation is *entirely different* - this gets turned
into a zero terminated sequence of chas. Most (but not all) C
compilers store chars in 8 bit bytes
so if your natural language has more than 256 characters problems occur.

Enter UTF8 and friends - these are just smart ways of encoding large
integers (ie characters > 256) into multi byte sequences. Then we have
Unicode which has mappings from
natural language character sets onto integers and complex rules for
overlaying and flowing
characters.

Erlang takes the view that the internal representation of a string
literal should be a list of integers - and since Erlang integers are
bignums then any character in any language can be
represented as a single integer in such a list - so far so good.

The problem now comes when we want to convert from a natural language
text into an Erlang string,  and for manipulating strings, and for
this a number of libraries are available.

People complain about string handling in Erlang, but these complaints
usually seem to
reflect the fact that the libraries that manipulate strings and the
escape conventions used
in string literals differ from those in other programming languages.

String handling in all languages is one unholy mess, where the stupid
conventions in one language are copied by later languages.

For example, English has a start-quote and end-quote symbol and they *differ*.

"This editor,"  I think, "gets quotes wrong" [1] - amazing - how many
programmers has
Google got? - and they can't get quotes right.

Text manipulation on a computer should be far easier than it is.

Cheers

/Joe

[1] Google's Chrome browser







On Fri, Jan 13, 2017 at 2:10 PM, Steve Davis
<> wrote:
>
> On Jan 13, 2017, at 6:20 AM, Jesper Louis Andersen
> <> wrote:
>
> depending on what your definition of "String" is
>
>
> ..and that’s the problem with strings.
>
>
> _______________________________________________
> erlang-questions mailing list
> 
> http://erlang.org/mailman/listinfo/erlang-questions
>


More information about the erlang-questions mailing list