[erlang-questions] Update on sharing constant data locally by reference rather than by copy?

Fri Mar 27 08:44:09 CET 2009

Olivier Boudeville wrote:
> Richard Carlsson a écrit :
>> If the entire tables are constant over a long time, you could generate
>> them as modules. Nowadays, compile-time constants (even complex ones)
>> are placed in a constant pool associated with the module. So, you could
>> generate something like this:
>>
>>   -module(autogen_table).
>>   -export([find/1]).
>>   find(0) -> {some_struct, "hello", ...};
>>   ...
>>   find(99) -> {some_other_struct, <<"hi!">>}
>>   find(X) -> throw({not_found, X}).
>>
>> As far as I know, these constants will not be copied to the private
>> heaps of the processes. The generated code will also give you the
>> fastest possible lookup.
>>
> Looks tempting, even if generating the modules sounds a bit complicated;
> I may give it a try to reduce the memory footprint. I was also thinking
> to using an ETS public table, as it seems to make sense in my particular
> use case.

Generating a module as a text file, and then compiling and loading
it directly from within Erlang is the easiest approach, and is usually
pretty easy (depending on how complicated code you need to generate,
but simple tables should be trivial to generate with io:format.)

If you want to avoid going to file, there are some different ways
of doing that, but I suggest you start by generating plain source code.

> More generally I would have imagined that the reference counting used
> for large blobs could be used for most if not all data? This must be the
> hybrid system Christian mentioned. Hope that some time it will be
> integrated.

Large heap binaries are always reference counted; this has nothing to
do with the hybrid memory architecture. But it's only for binaries.

What the hybrid system does is keep a separate, shared heap for all
data that has been passed as messages. On the first send, the data is
copied from the sender to the shared area, and from then on, it will
be shared by all processes. This allows sharing, but also preserves
the property that each process allocates its non-shared data on its
own personal heap, that doesn't require locking, and that can be
garbage collected without stopping any other process.

    /Richard