[erlang-questions] Code vs. binary heap storage

Mon Oct 8 07:43:50 CEST 2007

Jay Nelson <jay@REDACTED> writes:

> Given the following code:
> 
> return_bin(pinocchio) ->  <<"Once upon a time... ">>;
> return_bin(cinderella) -> <<"In a castle long ago...">>;
> return_bin(star_wars) -> <<"In a galaxy far, far way...">>.
> 
> Assume each of the binaries is of some significant size > 32 bytes.
>
> If the module containing this function is code-loaded into a node, I  
> want to verify whether the following are true:
> 
> 1) The binary only exists once regardless of how many processes or  
> modules access the functions.
> 2) The binary for each branch of the function is stored in the binary  
> heap.
> 3) The function return is a pointer to the existing binary heap element.

No to all three questions. A new binary is constructed every time the
code is called, regardless of which process calls it.

> 4) The memory footprint of a process which can call these functions  
> does not include the size of the binaries, even if the functions are  
> called and a process variable is bound to the value.

Yes, the binary itself will be stored outside of the process (provided
the size of the binary is > 64).

> 5) These binaries can never be garbage collected unless the module  
> containing the functions is unloaded (and all other references  
> obtained by calling the functions are released).

No. None of the binaries reference the loaded code.

> 6) Having a process per binary and a process to route requests would  
> not be more or less memory efficient than having a single process  
> with the above function in place of the routing process.

One or or more processes doesn't matter.

To share the binaries, you should call return_bin/1 only once for each
binary and store the result in some sort of dictionary (dict, gb_trees,
or process dictionary) or ets table, then retrieve the binary from there
when you need it.

/Bjorn
-- 
Björn Gustavsson, Erlang/OTP, Ericsson AB