[erlang-questions] strings vs binaries

Richard A. O'Keefe ok@REDACTED
Thu Aug 20 06:42:34 CEST 2015


On 19/08/2015, at 8:23 am, Rick Pettit <rpettit@REDACTED> wrote:

> Generally speaking, you probably want to use binaries these days as they consume far less memory (at least for “large” strings):

Don't *have* large strings.  Seriously.

If you are receiving external data in big chunks,
and passing it on unchanged, fine, use whatever fits,
but even then, the biggest chunk size you can use may
not be the _best_ chunk size you can use.

If you're dealing with structured (or semistructured)
data, a string of any kind is generally a poor choice.

I had a nasty experience a couple of days ago, where
a program in language X (not Erlang) using the standard
regular expression library for language X overflowed the C
stack trying to extract information from a measly 1 MiB
source, which it read as a conventional packed-array-of-
character string.  (Why a 64-bit machine with 8GiB of
memory has a hard ulimit of 64MiB on stack size, let
alone a default soft limit of 8MiB, is beyond me, but
that's another story.)  Aside from the obvious lessons
(don't use X, use a different regexp library, ...)
there's a lesson for all of us:

 +-------------------------------------+
 | the memory needed to HOLD your data |
 | is only a lower bound on the memory |
 | needed to USE your data.            |
 +-------------------------------------+






More information about the erlang-questions mailing list