<div dir="ltr"><div>Hello!</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jun 8, 2021 at 2:57 PM Richard O'Keefe <<a href="mailto:raoknz@gmail.com">raoknz@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-family:monospace,monospace">Why would decoding a term create *any* garbage in typical cases?</div><div style="font-family:monospace,monospace">One source of garbage in my Smalltalk library is that floats are</div><div style="font-family:monospace,monospace">represented as an integer power of two scale modifying an integer</div><div style="font-family:monospace,monospace">(which might be a bignum), so the second integer (if large) is</div><div style="font-family:monospace,monospace">garbage. But Erlang doesn't do that. It represents a float as</div><div style="font-family:monospace,monospace">8 binary bytes. The reason is that my Smalltalk had to deal with</div><div style="font-family:monospace,monospace">double extended, which could be 64, 80, 96, or 128 bits, so the</div><div style="font-family:monospace,monospace">external representation had to deal with it, but Erlang supports</div><div style="font-family:monospace,monospace">64-bit IEEE doubles only.</div><div style="font-family:monospace,monospace"><br></div><div style="font-family:monospace,monospace">Erlang's external format follows the ASN Type-Length-Value</div><div style="font-family:monospace,monospace">principle (more or less), so that when binary_to_term/1 reads</div><div style="font-family:monospace,monospace">something, it knows exactly what to allocate and how big.</div><div style="font-family:monospace,monospace"><br></div><div style="font-family:monospace,monospace">What am I missing here?</div></div></blockquote><div><br></div><div>The garbage I was referring to is the term itself. The term "garbage" may not have been the best choice to describe that data.</div><div><br></div><div>In the initial question the benchmark was done on `{a,<<1,2,3>>, b, [1,2,3], c, {1,2,3}, d, #{a=>1, b=>2, c=>3}}`, which would create 35 words heap data when decoded.</div><div><br></div><div>However, when encoded it is represented by:</div><div>`<<131,104,8,100,0,1,97,109,0,0,0,3,1,2,3,100,0,1,98,107,0,<br></div> 3,1,2,3,100,0,1,99,104,3,97,1,97,2,97,3,100,0,1,100,116,<br> 0,0,0,3,100,0,1,97,97,1,100,0,1,98,97,2,100,0,1,99,97,3>>`<div>which is only 10 words of heapdata.</div><div><br></div><div>So each loop in the decode benchmark would generate 3.5 times as much garbage for the garbage collector to deal with.</div><div><br></div><div>Lukas</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
</blockquote></div>
</blockquote></div></div>