[erlang-questions] map over bitstring

Morten Krogh mk@REDACTED
Fri Oct 22 13:15:58 CEST 2010


Hi Jesper

Thanks for the link!

It seems to me that the requirement that binaries must be contiguous in 
memory is quite strict.
If that requirement was abandoned, binaries could be lists of ProcBins, 
or whatever they are called, instead.

That would make it possible to avoid copying a binary both when there 
was no more space
behind it, and when more than one append operation was to be performed.

Of course, the run time system should be clever about when to expand the 
list and when to expand or copy the bytes.

Anyway, I cannot get this whole explanation to fit with what I see using 
memory(binary).

Simple example from my module zip2

append() ->
     A = binary:copy(<<"a">>, 100000000),
     B = <<A/binary, 97>>,
     receive
          {From, get} ->
              From ! {A,B};
      kill ->
         done
     end.

Eshell V5.8.1  (abort with ^G)
1> memory(binary).
374360
2> Pid = spawn(zip2, append, []).
<0.34.0>
3> memory(binary).
300086176
4> Pid ! kill.
kill
5> memory(binary).
86320

Why is the memory usage 300 MB. It should have been 100MB, and in worst 
case, with naive copying, 200MB.
But 300 MB.

When I leave B out, the memory consumption is 100MB as it should be, so 
I don't think it is because I am using the shell incorrectly. Also, I 
was cautious not to
keep anything in the shell. It is a spawned process.

garbage_collect(Pid) doesn't change the 300MB either.


Cheers,

Morten.


On 10/22/10 12:11 PM, Jesper Louis Andersen wrote:
> On Fri, Oct 22, 2010 at 11:25 AM, Morten Krogh<mk@REDACTED>  wrote:
>> Hi,
>>
>> I would like to understand what happens under the hood with this zip
>> function, especially
>>
>> <<R/binary,X,Y>>
>>
>> Does this function work by copying R all the time, and then having the
>> garbage collector
>> collect the old R immediately, or can BEAM share R in some way.
> Binaries are put into their own Arena on the heap where they are
> ref-counted for their use. Since a binary can never hold a pointer to
> a binary, this idea is safe w.r.t cycles. Destructuring a binary can
> then be done with a triple, (P, O, L) where P is a pointer to the
> binary, O is the offset integer and L is the length integer. In
> effect, deconstructing the binary is fast. Constructing the binary is
> done in a way which gradually allocates more and more data for the
> binary (somewhat like a growing array) so that is also reasonably
> fast.
>
> This is the quick overview. The details, which you should peruse is at,
>
> http://www.erlang.org/doc/efficiency_guide/binaryhandling.html
>
>
>



More information about the erlang-questions mailing list