[erlang-questions] term_to_binary and large data structures

Thu Jun 28 07:36:36 CEST 2018

On 2018-06-28 07:03, Dmitry Belyaev wrote:
> So it's not so bad as it's stated in the file, only 33 time worse than 
> the advertised
> format.

My bad; this came from a run of performance tests where the size of them 
map is increased incrementally. It is missing a zero in one of the 
lines; will fix.

> However after applying [compressed: 9] suggested by Fred Hebert, I see:
> 
> iex(6)> :erlang.term_to_binary(tuple_map, compressed: 9) |> byte_size() 
> |> (fn x -> x / 1024 / 1024 end).()
> 0.38570117950439453

There are three problems with this:

a) that data does get decompressed at some point. It doesn't magically 
go away. It does, however, help with network usage.

b) we trade time (to compress) for space (fewer byte to transmit), when 
that space isn't needed in the first place. The time required to 
compress, esp relative to the serialization time, is decidedly 
non-trivial. It takes several times as long to compress with zlib than 
it does to serialize. This could be improved by moving a more modern 
compression algorithm, though the cost is always non-zero of course. In 
our tests, it definitely paid to compress the actual data in the map, 
but there was very little need to compress the structural metadata when 
encoded efficiently.

c) we don't always control the call to term_to_binary, or the equivalent 
eternal term generators, and so don't have access to compression e.g. on 
distribution messages

I suppose we could propose using compression on (larger) distribution 
messages, which would help with the network saturation, and would be a 
better stop-gap than nothing, but it still leaves us with (a) and (b) 
above (and , and (c) everywhere else.

--
Aaron