[erlang-questions] term_to_binary and large data structures
Aaron Seigo
aseigo@REDACTED
Wed Jul 4 14:56:08 CEST 2018
On 2018-07-04 13:23, Michał Muskała wrote:
> I also believe the current format for maps, which is key1, value1,
> key2, value2, ... is
> not that great for compression. Often, you'd have maps with exact the
> same keys
> (especially in Elixir with structs), and there, a pattern of key1,
> key2, ..., value1,
> value2, ..., should be much better (since the entire keys structure
> could be compressed
> between similar maps).
I can confirm that this is an accurate observation. While not done in
Packer, there are notes about this in Packer's code which was the result
of some experiments around this. For maps, and *especially* structs in
Elixir, this can indeed be a huge win for some messages.
Even more farther afield: what would be a real win, but much harder to
accomplish, would be streaming compression. There are protocols (e.g.
imap) which can offload compression of common patterns between messages
to entries in the compression look up tables. The compression is applied
to the entire network stream for the life of the connection and all data
that goes through it is compressed in a single stream. So when a message
has the same byte sequence as a previous message the comrpessor ends up
turning that into a reference to an already existing entry in a look-up
table.
The hard(er) part for BEAM distribution and this sort of thing would be
managing the size of the lookup table as these connections are meant to
be both long-lived and not consume infinite resources ;) So unlike
(relatively) short-lived and highly repetitive imap connections, this
would probably require something custom made to task which would keep a
cache of most used terms (with all that comes with that, including cache
invalidation).
Compared to just working on the term serialization, that feels a bit
like rocket science at the moment. But getting maps in the same message
more efficiently packed is definitely doable :)
--
Aaron
More information about the erlang-questions
mailing list