[erlang-questions] clarify: why bit syntax so slow + benchmark code

Per Gustafsson per.gustafsson@REDACTED
Sat Nov 17 17:19:14 CET 2007


Mateusz Berezecki wrote:
> On Nov 17, 2007, at 11:22 AM, Per Gustafsson wrote:
>
>> The way to write that would be:
>>
>>
>> parse_stream(<<1,some parsing pattern,Rest/binary>>) ->
>> ...
>> parse_stream(Rest);
>> parse_stream(<<2,some other parsing pattern,Rest/binary>>) ->
>> ...
>> parse_stream(Rest);
>> ...
>> parse_stream(<<>>) -> ok.
>>
>> This will minimize he construction of unnecessary sub-binaries and in 
>> R12B you should be able to get good performance for this kind of 
>> approach (It still wouldn't be as fast as C, but a lot closer).
>>
>> If you want some more information about the implementation of the bit 
>> syntax you should read the paper I presented at this years EUC:
>>
>> http://www.erlang.se/euc/07/papers/1700Gustafsson.pdf
>
> Per thanks for the URL. I will read it this weekend.
>
> Thomas I've read the widefinder discussion and it is completely
> not applicable to this kind of problem I am having.
>
> widefinder is for disk IO, log related files. I'm talking
> variable length control structures extracted on the fly
> from the huge volume network stream. I can't "parallelize"
> the stream of which I know nothing of except I know
> the first byte describes some small excerpt of it.
>
> In explicit I'm talking this kind of stuff
>
> <<LenLen:2/unsigned-integer, NameLen:3/unsigned-integer, _CB:1, 0:1,_:1,
> Rest/binary>> = Bin,
>
> LenLen1 = LenLen * 8,
> NameLen1 = NameLen + 1,
>
> << LengthPay:LenLen1/unsigned-integer,
> NameBin:NameLen1/binary,
> Payload:LengthPay/binary,
> Rest2/binary >> = Rest,
>
>
> Is there any way to put this in one line and preferably
> in the function header so as to avoid allocating stuff?
> Why arithmetic expressions are not allowed in bit syntax?
>
> The function returns
>
> {NameBin, Payload, Rest}
>
> but it is recursive, doing 
> extract(Stream)->extract(Rest)->extract(RestOfRest)
> until it parses out a complete excerpt which is usually less than 300 
> bytes.
>
> After parsing it, it then it proceeds to extracting another fragment 
> of data from the stream.
>
> Is this kind of problem suitable for doing in erlang or should I go
> with linked-in C driver?
>
>
> regards,
> Mateusz Berezecki

I think it will be suitable, a function that looks like this:

run(<<LenLen:2, NameLen:3, _CB:1, 0:1,
      _:1,Rest/binary>>, Acc) ->
  NL = NameLen+1,
  <<LengthPay:LenLen/unsigned-integer-unit:8,
    NameBin:NL/binary,Payload:LengthPay/binary,
    Rest2/binary >> = Rest,
  run(Rest2, [{NameBin,Payload}|Acc]);
run(<<>>, Acc) -> Acc.

will generate reasonable code, which does not allocate sub-binaries 
unnecessarily.

The reason that expressions are not allowed in sizes for binary patterns 
is because that expression must be evaluated in the middle of pattern 
matching and the compiler is not really ready to allow that, hopefully 
it will be in the future or it should at least be possible to specify an 
offset in the same way as the unit is specified.

I ran some tests on this function and my old P4 2.4 GHz handles about 
800000 entries per second in R11b (1 600 000 with the native flag) for 
R12b it can handle 1 600 000 (3 000 000 with the native flag) .  The 
results might get a little better before the release of R12 but if this 
is far too slow for your application you need to write a linked in driver.

Per




More information about the erlang-questions mailing list