[erlang-questions] term_to_binary and large data structures

Thu Jun 28 07:25:12 CEST 2018

On 2018-06-28 02:45, Michael Truog wrote:
> On 06/27/2018 07:19 AM, Aaron Seigo wrote:
>> I have a distributed (in the Erlang sense) application which often 
>> produces moderately-sized maps (10k+ entries with lots of tuples in 
>> the mix) which in the past have given inter-node message passing 
>> serious problems: the vm would lock for a long while, use several GB 
>> of RAM, and usually eventually give up. When it didn't outright crash, 
>> it would produce message sizes too big to send between nodes, and/or 
>> the heartbeat messages between nodes would time out resulting in 
>> breakage. Running the same terms through `term_to_binary` produces 
>> similar results.
>> 
>> The good news is that in OTP 21.0 things are quite a bit better: 
>> serialization of the maps goes a lot quicker, memory usage is now only 
>> ~500MB per encoding for terms which would quickly balloon in the 
>> multiple GB's, ... so there is progress and that is really fantastic.
>> 
> 
> Part of what you may be seeing is the amount of memory allocated for
> the receiver of a distributed Erlang message that contains a large
> Erlang map because of the need to over-estimate the total size of the
> map at

Those numbers were all from the sending side; the maps don't seem to be 
an issue on deserialization.

> messages sound big enough that you may want to consider switching
> to a less-dynamic binary format,

Everything is possible ;) but not everything is palatable. The maps are 
generated in part by NIFs so stepping outside the standard set of data 
structures becomes more difficult, and for our use cases maps are the 
"right" data structure not just to represent the data but more 
importantly to work with it.

I'm not a fan of working around a problem when the cause and location of 
it is easily noted. It would be much nicer to improve the serialization 
of data for messages, not only for our needs, but since it would 
positively impact every user of the BEAM for distribution.

Thanks for the pointer to blookup, though; neat approach. We don't 
really have the issue of usage between processes, though, as much as we 
do between nodes. So reference counting can't really help us :)

--
Aaron