[erlang-questions] term_to_binary and large data structures

Michał Muskała michal@REDACTED
Wed Jul 4 13:23:15 CEST 2018


I also believe the current format for maps, which is key1, value1, key2, value2, ... is not that great for compression. Often, you'd have maps with exact the same keys (especially in Elixir with structs), and there, a pattern of key1, key2, ..., value1, value2, ..., should be much better (since the entire keys structure could be compressed between similar maps).

Michał.
On 28 Jun 2018, 07:36 +0200, Aaron Seigo <aseigo@REDACTED>, wrote:
> On 2018-06-28 07:03, Dmitry Belyaev wrote:
> > So it's not so bad as it's stated in the file, only 33 time worse than
> > the advertised
> > format.
>
> My bad; this came from a run of performance tests where the size of them
> map is increased incrementally. It is missing a zero in one of the
> lines; will fix.
>
> > However after applying [compressed: 9] suggested by Fred Hebert, I see:
> >
> > iex(6)> :erlang.term_to_binary(tuple_map, compressed: 9) |> byte_size()
> > |> (fn x -> x / 1024 / 1024 end).()
> > 0.38570117950439453
>
> There are three problems with this:
>
> a) that data does get decompressed at some point. It doesn't magically
> go away. It does, however, help with network usage.
>
> b) we trade time (to compress) for space (fewer byte to transmit), when
> that space isn't needed in the first place. The time required to
> compress, esp relative to the serialization time, is decidedly
> non-trivial. It takes several times as long to compress with zlib than
> it does to serialize. This could be improved by moving a more modern
> compression algorithm, though the cost is always non-zero of course. In
> our tests, it definitely paid to compress the actual data in the map,
> but there was very little need to compress the structural metadata when
> encoded efficiently.
>
> c) we don't always control the call to term_to_binary, or the equivalent
> eternal term generators, and so don't have access to compression e.g. on
> distribution messages
>
> I suppose we could propose using compression on (larger) distribution
> messages, which would help with the network saturation, and would be a
> better stop-gap than nothing, but it still leaves us with (a) and (b)
> above (and , and (c) everywhere else.
>
> --
> Aaron
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@REDACTED
> http://erlang.org/mailman/listinfo/erlang-questions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20180704/ab679f61/attachment.htm>


More information about the erlang-questions mailing list