[erlang-questions] term_to_binary and large data structures
Aaron Seigo
aseigo@REDACTED
Thu Jun 28 07:36:36 CEST 2018
On 2018-06-28 07:03, Dmitry Belyaev wrote:
> So it's not so bad as it's stated in the file, only 33 time worse than
> the advertised
> format.
My bad; this came from a run of performance tests where the size of them
map is increased incrementally. It is missing a zero in one of the
lines; will fix.
> However after applying [compressed: 9] suggested by Fred Hebert, I see:
>
> iex(6)> :erlang.term_to_binary(tuple_map, compressed: 9) |> byte_size()
> |> (fn x -> x / 1024 / 1024 end).()
> 0.38570117950439453
There are three problems with this:
a) that data does get decompressed at some point. It doesn't magically
go away. It does, however, help with network usage.
b) we trade time (to compress) for space (fewer byte to transmit), when
that space isn't needed in the first place. The time required to
compress, esp relative to the serialization time, is decidedly
non-trivial. It takes several times as long to compress with zlib than
it does to serialize. This could be improved by moving a more modern
compression algorithm, though the cost is always non-zero of course. In
our tests, it definitely paid to compress the actual data in the map,
but there was very little need to compress the structural metadata when
encoded efficiently.
c) we don't always control the call to term_to_binary, or the equivalent
eternal term generators, and so don't have access to compression e.g. on
distribution messages
I suppose we could propose using compression on (larger) distribution
messages, which would help with the network saturation, and would be a
better stop-gap than nothing, but it still leaves us with (a) and (b)
above (and , and (c) everywhere else.
--
Aaron
More information about the erlang-questions
mailing list