[erlang-questions] NIF vs Erlang Binary

Jesper Louis Andersen jesper.louis.andersen@REDACTED
Fri Jul 22 13:03:57 CEST 2011

```On Fri, Jul 22, 2011 at 11:45, Andy W. Song <wsongcn@REDACTED> wrote:

> I did some unit test on my code and felt that it's slow (it can process
> about  24M byte/s) on a virtual machine. HiPE can double the performance but
> still not quite enough. So I wrote an NIF to handle this. The speed is about
> 10~15x faster. Not only that, I feel that the C code is easier to write.

Blindly unrolling the Key a bit gives a factor of 3 speedup:

K = binary:copy(<<Key:32>>, 512 div 32),
<<LongKey:512>> = K,

case Data of
<<A:512, Rest/binary>> ->
C = binary:encode_unsigned(A bxor LongKey),
<<A:32,Rest/binary>> ->
C = binary:encode_unsigned(A bxor Key),
<<A:24>> ->
<<B:24, _:8>> = binary:encode_unsigned(Key),
C = binary:encode_unsigned(A bxor B),
<<Accu/binary,C/binary>>;
<<A:16>> ->
<<B:16, _:16>> = binary:encode_unsigned(Key),
C = binary:encode_unsigned(A bxor B),
<<Accu/binary,C/binary>>;
<<A:8>> ->
<<B:8, _:24>> = binary:encode_unsigned(Key),
C = binary:encode_unsigned(A bxor B),
<<Accu/binary,C/binary>>;
<<>> ->
Accu
end.

Why the call to binary:encode_unsigned? Lets alter that pattern:

case Data of
<<A:512, Rest/binary>> ->
C = A bxor LongKey,

Now it is 5 times faster, same result. The NIF-advantage is now a
factor of 2-3. That is in the ballpark I would expect it to be. You
are doing many more reallocations with the above solution. Then the C
NIF version. What happens if we tune it some more? Lets do runs of
8192 bits at a time...

9 times faster compared to the original here! I expect our speed will
converge to that of C if we turn it up even more and get the amount of
allocation/realloc/concatenation down.

--
J.

```