Performance of term_to_binary vs Bbinary_to_term

Valentin Micic v@REDACTED
Tue Jun 8 10:34:23 CEST 2021


That makes perfect sense.
Going forward, I will particularly endeavour to prevent this kind of optimisation in my benchmark code. 

Indeed, using fun() may be a good way to prevent optimisation, however, as intended purpose of (this specific) benchmark is to aid with some form of dimensioning, and given that in our code we avoid using fun() construct (as it makes the code  maintenance somewhat difficult), I really like the approach that Steve outlined, and I am sure I’d be able to put it to some good use in the future.

Really appreciate your interest and assistance.  Thank you.

V/

> On 08 Jun 2021, at 10:12, Steve Strong <steve@REDACTED> wrote:
> 
> I’m guessing term_to_binary, if the result is not used, can always be optimised out since it would do nothing.  binary_to_term, on the other hand, can throw a badarg exception so removing it could change behaviour even if the result is not used
> 
>> On 8 Jun 2021, at 09:10, Björn Gustavsson <bjorn@REDACTED> wrote:
>> 
>> It turns out that the compiler optimizes away the call to
>> term_to_binary/1 but not binary_to_term/1.
>> 
>> If I rewrite the benchmark like this:
>> 
>> run(Term, LoopCnt) ->
>>   T_Start_1 = ?TIMESTAMP,
>>   Result = t(fun erlang:term_to_binary/1, Term, LoopCnt-1),
>>   do_present_results("term_to_binary/1", LoopCnt, ?TIMESTAMP -
>> T_Start_1, Result),
>>   T_Start_2 = ?TIMESTAMP,
>>   Result_2 = t(fun erlang:binary_to_term/1, Result, LoopCnt-1),
>>   do_present_results("binary_to_term/1", LoopCnt, ?TIMESTAMP -
>> T_Start_2, Result_2).
>> 
>> t(F, Input, 0) ->
>>   F(Input);
>> t(F, Input, N) ->
>>   F(Input),
>>   t(F, Input, N-1).
>> 
>> I get different results:
>> 
>> 1> tconvert:run(a, 10000000).
>> 
>> term_to_binary/1 RETURN VALUE:<<131,100,0,1,97>>
>> REQUEST COUNT:10000000
>> ELAPSED TIME (usec):480119
>> TIME PER REQUEST (usec): 0.0480119
>> PROJECTED RATE (req/sec): 20828169.68293277
>> 
>> binary_to_term/1 RETURN VALUE:a
>> REQUEST COUNT:10000000
>> ELAPSED TIME (usec):1086325
>> TIME PER REQUEST (usec): 0.1086325
>> PROJECTED RATE (req/sec): 9205348.30736658
>> ok
>> 2> 1086325 / 480119.
>> 2.2626161430811944
>> 
>> That is, term_to_binary/1 is roughly twice as fast as binary_to_term/1
>> for this example.
>> 
>> /Björn
>> 
>> On Tue, Jun 8, 2021 at 9:29 AM Valentin Micic <v@REDACTED> wrote:
>>> 
>>> As I was surprised with the measurement myself, I am sure that compiler did some significant optimisation  — I am attaching the file with the source code, so you could review it yourself.
>>> Also, it would be interesting to see how this performs on R22 (I haven’t installed it yet).
>>> 
>>> In my view, it doesn’t really mattar how fast the testing code is. What matter here is that there’s an order of magnitude difference in performance between the two BIFs.
>>> 
>>> The calling syntax for the tconvert:run/2 is: tconvert:run( a, 10000000 ).
>>> 
>>> The first argument is a term to be converted, and the second represents a number of iterations — higher this number, more accurate the measurement will be (at least in my opinion).
>>> 
>>> After reading your email I’ve looked at my code again, and noticed a potential slow-down for binary_to_term/1 portion of the test.
>>> 
>>> do_bin_to_term( <<Bin/binary>> , 0 ) -> binary_to_term( Bin );
>>> do_bin_to_term( <<Bin/binary>> , N )
>>> ->
>>>   binary_to_term( <<Bin/binary>> ),
>>>   do_bin_to_term( Bin , N-1 )
>>> .
>>> 
>>> 
>>> When written as
>>> 
>>> do_bin_to_term( <<Bin/binary>> , 0 ) -> binary_to_term( Bin );
>>> do_bin_to_term( <<Bin/binary>> , N )
>>> ->
>>>   binary_to_term( Bin ),
>>>   do_bin_to_term( Bin , N-1 )
>>> .
>>> 
>>> It speeds up the code by factor 2 (well, duh! Cynic would say — so much for compiler optimisation ;-))
>>> 
>>> After this “fix”, binary_to_term/1 portion of the test runs “only” 14 times slower.
>>> 
>>> (cig@REDACTED)322> tconvert:run( a, 10000000 ).
>>> 
>>> term_to_binary/1 RETURN VALUE:<<131,100,0,1,97>>
>>> REQUEST COUNT:10000000
>>> ELAPSED TIME (usec):94664
>>> TIME PER REQUEST (usec): 0.0094664
>>> PROJECTED RATE (req/sec): 105636778.50080284
>>> 
>>> binary_to_term/1 RETURN VALUE:a
>>> REQUEST COUNT:10000000
>>> ELAPSED TIME (usec):1385235
>>> TIME PER REQUEST (usec): 0.1385235
>>> PROJECTED RATE (req/sec): 7218991.723425989
>>> ok
>>> 
>>> 
>>> Kind regards
>>> 
>>> V/
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 08 Jun 2021, at 07:45, Jacob <jacob01@REDACTED> wrote:
>>> 
>>> Hi,
>>> 
>>> I've tried to reproduce the measurement, but according to my
>>> measurements, there is just a factor of 2 on Erlang/OTP 22.
>>> 
>>> 1> timer:tc(fun () -> bench:t2b(a, 1000000) end)
>>> {109357,<<131,100,0,1,97>>}
>>> 2> timer:tc(fun () -> bench:b2t(<<131,100,0,1,97>>, 1000000) end).
>>> {199488,a}
>>> 
>>> 
>>> If I do not use the result of each term_to_binary call, the factor (~14)
>>> is much closer to your measurements:
>>> 
>>> 3> timer:tc(fun () -> bench:broken_t2b(a, 1000000) end).
>>> {14404,<<>>}
>>> 
>>> Are you indeed sure, that the compiler did not optimise away the entire
>>> call?
>>> 
>>> /Jacob
>>> 
>>> ======================== bench.erl ==============================
>>> -module(bench).
>>> 
>>> -export([t2b/2, b2t/2, broken_t2b/2]).
>>> 
>>> 
>>> t2b(T, N) -> t2b(T, N, undefined).
>>> 
>>> t2b(_, 0, R) -> R;
>>> t2b(T, N, _) -> R = term_to_binary(T), t2b(T, N-1, R).
>>> 
>>> b2t(T, N) -> b2t(T, N, undefined).
>>> 
>>> b2t(_, 0, R) -> R;
>>> b2t(T, N, _) -> R = binary_to_term(T), b2t(T, N-1, R).
>>> 
>>> broken_t2b(T, N) -> broken_t2b(T, N, undefined).
>>> 
>>> broken_t2b(_, 0, R) -> R;
>>> broken_t2b(T, N, R) -> _ = term_to_binary(T), broken_t2b(T, N-1, R).
>>> =================================================================
>>> 
>>> 
>>> On 06.06.21 02:07, Valentin Micic wrote:
>>> 
>>> Hi all,
>>> 
>>> I did some performance measurement recently that included conversion of
>>> an arbitrary erlang term to its external binary representation via
>>> term_to_binary/1, as well as reversing the result using binary_to_term/1.
>>> 
>>> I’ve noticed that term_to_binary/1 is significantly faster than
>>> binary_to_term/1.
>>> 
>>> Also, I’ve observed that binary_to_term/1 performance gets considerably
>>> worse as complexity of specified term increases, whilst term_to_binary/1
>>> maintains (more-less) steady performance.
>>> 
>>> (cig@REDACTED)40> tconvert:run( a, 10000000 ).
>>> 
>>> term_to_binary/1 RETURN VALUE:<<131,100,0,1,97>>
>>> REQUEST COUNT:10000000
>>> ELAPSED TIME (usec):97070
>>> TIME PER REQUEST (usec): 0.009707
>>> PROJECTED RATE (req/sec): *103018440*.30081384
>>> 
>>> binary_to_term/1 RETURN VALUE:a
>>> REQUEST COUNT:10000000
>>> ELAPSED TIME (usec):3383483
>>> TIME PER REQUEST (usec): 0.3383483
>>> PROJECTED RATE (req/sec): *2955534*.2822765773
>>> ok
>>> 
>>> (cig@REDACTED)41> tconvert:run( {a,<<1,2,3>>, b, [1,2,3], c, {1,2,3},
>>> d, #{a=>1, b=>2, c=>3}}, 10000000 ).
>>> 
>>> term_to_binary/1 RETURN
>>> VALUE:<<131,104,8,100,0,1,97,109,0,0,0,3,1,2,3,100,0,1,
>>> 
>>> 98,107,0,3,1,2,3,100,0,1,99,104,3,97,1,97,2,97,
>>> 
>>> 3,100,0,1,100,116,0,0,0,3,100,0,1,97,97,1,100,
>>>                               0,1,98,97,2,100,0,1,99,97,3>>
>>> REQUEST COUNT:10000000
>>> ELAPSED TIME (usec):97307
>>> TIME PER REQUEST (usec): 0.0097307
>>> PROJECTED RATE (req/sec): *102767529*.57135664
>>> 
>>> binary_to_term/1 RETURN VALUE:{a,<<1,2,3>>,
>>>                                b,
>>>                                [1,2,3],
>>>                                c,
>>>                                {1,2,3},
>>>                                d,
>>>                                #{a => 1,b => 2,c => 3}}
>>> REQUEST COUNT:10000000
>>> ELAPSED TIME (usec):8747426
>>> TIME PER REQUEST (usec): 0.8747426
>>> PROJECTED RATE (req/sec): *1143193*.4377038456
>>> ok
>>> 
>>> 
>>> 
>>> I’ve performed testing on R21.1.
>>> Any thoughts?
>>> 
>>> V/
>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> Björn Gustavsson, Erlang/OTP, Ericsson AB
> 



More information about the erlang-questions mailing list