[erlang-questions] term_to_binary and large data structures

Thu Jun 28 02:45:51 CEST 2018

On 06/27/2018 07:19 AM, Aaron Seigo wrote:
> I have a distributed (in the Erlang sense) application which often produces moderately-sized maps (10k+ entries with lots of tuples in the mix) which in the past have given inter-node message passing serious problems: the vm would lock for a long while, use several GB of RAM, and usually eventually give up. When it didn't outright crash, it would produce message sizes too big to send between nodes, and/or the heartbeat messages between nodes would time out resulting in breakage. Running the same terms through `term_to_binary` produces similar results.
>
> The good news is that in OTP 21.0 things are quite a bit better: serialization of the maps goes a lot quicker, memory usage is now only ~500MB per encoding for terms which would quickly balloon in the multiple GB's, ... so there is progress and that is really fantastic.
>

Part of what you may be seeing is the amount of memory allocated for the receiver of a distributed Erlang message that contains a large Erlang map because of the need to over-estimate the total size of the map at https://github.com/erlang/otp/blob/f3790140d0e73f257c78d67de894b606ef53a8e5/erts/emulator/beam/erl_map.h#L196-L197 (not sure if other related logic changed recently, others may want to comment on that, if that is the case).

Your messages sound big enough that you may want to consider switching to a less-dynamic binary format, if that is possible with your usage of the data, to minimize the potential memory consumption.  Get/Put operations on a binary are very slow (e.g., at https://github.com/okeuday/blookup) though it may help you deal with the memory usage in a simpler way.

Best Regards,
Michael