[erlang-questions] term_to_binary and large data structures
Aaron Seigo
aseigo@REDACTED
Thu Jun 28 07:25:12 CEST 2018
On 2018-06-28 02:45, Michael Truog wrote:
> On 06/27/2018 07:19 AM, Aaron Seigo wrote:
>> I have a distributed (in the Erlang sense) application which often
>> produces moderately-sized maps (10k+ entries with lots of tuples in
>> the mix) which in the past have given inter-node message passing
>> serious problems: the vm would lock for a long while, use several GB
>> of RAM, and usually eventually give up. When it didn't outright crash,
>> it would produce message sizes too big to send between nodes, and/or
>> the heartbeat messages between nodes would time out resulting in
>> breakage. Running the same terms through `term_to_binary` produces
>> similar results.
>>
>> The good news is that in OTP 21.0 things are quite a bit better:
>> serialization of the maps goes a lot quicker, memory usage is now only
>> ~500MB per encoding for terms which would quickly balloon in the
>> multiple GB's, ... so there is progress and that is really fantastic.
>>
>
> Part of what you may be seeing is the amount of memory allocated for
> the receiver of a distributed Erlang message that contains a large
> Erlang map because of the need to over-estimate the total size of the
> map at
Those numbers were all from the sending side; the maps don't seem to be
an issue on deserialization.
> messages sound big enough that you may want to consider switching
> to a less-dynamic binary format,
Everything is possible ;) but not everything is palatable. The maps are
generated in part by NIFs so stepping outside the standard set of data
structures becomes more difficult, and for our use cases maps are the
"right" data structure not just to represent the data but more
importantly to work with it.
I'm not a fan of working around a problem when the cause and location of
it is easily noted. It would be much nicer to improve the serialization
of data for messages, not only for our needs, but since it would
positively impact every user of the BEAM for distribution.
Thanks for the pointer to blookup, though; neat approach. We don't
really have the issue of usage between processes, though, as much as we
do between nodes. So reference counting can't really help us :)
--
Aaron
More information about the erlang-questions
mailing list