[erlang-questions] Code vs. binary heap storage

Mon Oct 8 13:23:39 CEST 2007

    These questions were very interesting and I have waited with great
interest the answers but...

On 08 Oct 2007 07:43:50 +0200, Bjorn Gustavsson <bjorn@REDACTED> wrote:
> Jay Nelson <jay@REDACTED> writes:
>
> > Given the following code:
> >
> > return_bin(pinocchio) ->  <<"Once upon a time... ">>;
> > return_bin(cinderella) -> <<"In a castle long ago...">>;
> > return_bin(star_wars) -> <<"In a galaxy far, far way...">>.
> >
> > Assume each of the binaries is of some significant size > 32 bytes.
> >
> > If the module containing this function is code-loaded into a node, I
> > want to verify whether the following are true:
> >
> > 1) The binary only exists once regardless of how many processes or
> > modules access the functions.
> > 2) The binary for each branch of the function is stored in the binary
> > heap.
> > 3) The function return is a pointer to the existing binary heap element.
>
> No to all three questions. A new binary is constructed every time the
> code is called, regardless of which process calls it.

    Doesn't this mean that a lot of memory will be allocated for many
calls, even if the values could have been cached and not created at
each call?

> > 4) The memory footprint of a process which can call these functions
> > does not include the size of the binaries, even if the functions are
> > called and a process variable is bound to the value.
>
> Yes, the binary itself will be stored outside of the process (provided
> the size of the binary is > 64).

    Why would you in fact copy an object if the size is smaller than
64 bytes? What is the reason for it?

> > 5) These binaries can never be garbage collected unless the module
> > containing the functions is unloaded (and all other references
> > obtained by calling the functions are released).
>
> No. None of the binaries reference the loaded code.
>
> > 6) Having a process per binary and a process to route requests would
> > not be more or less memory efficient than having a single process
> > with the above function in place of the routing process.
>
> One or or more processes doesn't matter.
>
>
> To share the binaries, you should call return_bin/1 only once for each
> binary and store the result in some sort of dictionary (dict, gb_trees,
> or process dictionary) or ets table, then retrieve the binary from there
> when you need it.

    Wouldn't the lookup be a time consuming operation on the long
term? (I know that the lookup complexity is O(1)... But still there is
a some code involved...)

    Ciprian.