[erlang-questions] term_to_binary and large data structures

Thu Jun 28 07:16:48 CEST 2018

On 2018-06-28 01:14, Fred Hebert wrote:
> On 06/27, Aaron Seigo wrote:
>> We have maps with 10k keys that strain this system and easily saturate 
>> our network. This is not "big" by any modern definition. As a 
>> demonstration of this to ourselves, I wrote an Elixir library that 
>> serializes terms to a more space efficient format. Where 
>> `term_to_binary` creates 500MB monsters, this library conveniently 
>> creates a 1.5MB binary out of the exact same data.
>> 
> 
> Have you tried comparing when `term_to_binary(Term, [{compressed,
> 9}])'?  If you can pack 500MB of data down to 1.5 MB, chances are that
> compression could do some good things on your end.

Yes, and it certainly helps but it is still larger than one would hope 
for (and larger than what that POC produces), but most importantly this 
only is meaningful when we control the call to `term_to_binary`. When it 
is hidden behind code in OTP or a library, or an equivalent function is 
generating an external term format binary, we don't get to use this 
trick.

Which also brings us to the fact that the compression being used is 
still zlib, while there are much better options out there. That POC 
implementation uses zstd which is both faster and produces smaller 
binaries than zlib.

--
Aaron