[erlang-questions] matching binary against a file

Thomas Lindgren thomasl_erlang@REDACTED
Sat Jun 23 13:21:27 CEST 2007

--- Mike McNally <m5@REDACTED> wrote:

> > However, note that building ever-larger binaries
> in a
> > loop may be very expensive. An operation like
> > <<S/binary,C>> may mean copying S and C into a new
> > binary in each iteration (cost = size(S)+size(C)),
> > which turns the algorithm quadratic or worse as
> the
> > accumulated S grows with each iteration.
> Is it the case that the runtime creates new binaries
> (i.e., makes
> copies) in a loop that's iterating on fragments of
> an existing binary?
> In other words, if an iteration matches a portion of
> a binary
>    loop_fun(<<_:X, Remainder/binary>>) ->
>      %% ...
>      loop_fun(Remainder);
> Will that have to create a new binary?  I had
> assumed not, but I know
> very little about how the runtime actually works.

Basically, a binary is implemented as a couple of
pointers into a byte array. So extracting Remainder
will allocate a new box of pointers, but will not copy
the underlying byte array. (Since the binary is
immutable, the array can be safely shared.)

On the other hand, building a new binary <<A,B>> is
implemented by copying the underlying byte arrays. In
principle, it could be done with a different trade off
by adding a "tree binary", but in that case, indexing
into the binary gets messy ... Also, a "tree binary"
is easily emulated by hand. I assume the current
approach was the one that's worked out best.

I should note that the runtime system does a number of
heuristic optimizations with binaries under the hood,
which have changed with new releases, so the exact
behaviour is not specified. And shouldn't be. But this
gives you a rough idea at least.


Sucker-punch spam with award-winning protection. 
Try the free Yahoo! Mail Beta.

More information about the erlang-questions mailing list