[erlang-questions] Which is best? string:concat or ++?

Richard O'Keefe ok@REDACTED
Tue May 8 00:35:38 CEST 2012


Lists can represent arbitrary Unicode codepoints as single elements.
Concatenation A++B copies A but shares B.
Suffixes of a list can be shared, taking prefixes requires copying.
Lists require at least two full words of memory per codepoint.

Binaries can represent bit-level data, or Latin 1 strings, or UTF-8
encoded Unicode.  When used to represent Unicode, there is no
one-to-one correspondence between characters and bytes, which you
often don't need anyway.
Concatenating A and B *may* have to copy *both* A and B,
but Erlang can be astonishingly clever about binaries.
Any slice of a binary can be shared (but will likely be copied
if and only if it is *small*).
Binaries require one byte of memory per byte (which means up to 3
bytes for a BMP character) plus some fixed overhead.

Binaries are a closer analogue to say Java strings than lists are.

Concatenation is a thing best avoided for all three (lists, binaries,
Java strings).  For example, instead of concatenating A++B++C, you
might just form a list [A,B,C] --- this is the "iolist" approach
that's been mentioned --- or indeed some other kind of tree, and
turn it into a single sequence only when you really really need to.
In practice, this is only when you cross an interface that demands
a string of some other sort.

I have done benchmarks in C, Lisp, Java, Smalltalk, and Prolog
over the years, and the cost of using strings instead of trees
is such as to drive all blood from the face.  It is *scary* how
bad strings of *any* kind can be, compared with using trees.

It is also scary how *dangerous* strings can be, compared with
using trees.




More information about the erlang-questions mailing list