[erlang-questions] clarify: why bit syntax so slow + benchmark code
Per Gustafsson
per.gustafsson@REDACTED
Sat Nov 17 17:19:14 CET 2007
Mateusz Berezecki wrote:
> On Nov 17, 2007, at 11:22 AM, Per Gustafsson wrote:
>
>> The way to write that would be:
>>
>>
>> parse_stream(<<1,some parsing pattern,Rest/binary>>) ->
>> ...
>> parse_stream(Rest);
>> parse_stream(<<2,some other parsing pattern,Rest/binary>>) ->
>> ...
>> parse_stream(Rest);
>> ...
>> parse_stream(<<>>) -> ok.
>>
>> This will minimize he construction of unnecessary sub-binaries and in
>> R12B you should be able to get good performance for this kind of
>> approach (It still wouldn't be as fast as C, but a lot closer).
>>
>> If you want some more information about the implementation of the bit
>> syntax you should read the paper I presented at this years EUC:
>>
>> http://www.erlang.se/euc/07/papers/1700Gustafsson.pdf
>
> Per thanks for the URL. I will read it this weekend.
>
> Thomas I've read the widefinder discussion and it is completely
> not applicable to this kind of problem I am having.
>
> widefinder is for disk IO, log related files. I'm talking
> variable length control structures extracted on the fly
> from the huge volume network stream. I can't "parallelize"
> the stream of which I know nothing of except I know
> the first byte describes some small excerpt of it.
>
> In explicit I'm talking this kind of stuff
>
> <<LenLen:2/unsigned-integer, NameLen:3/unsigned-integer, _CB:1, 0:1,_:1,
> Rest/binary>> = Bin,
>
> LenLen1 = LenLen * 8,
> NameLen1 = NameLen + 1,
>
> << LengthPay:LenLen1/unsigned-integer,
> NameBin:NameLen1/binary,
> Payload:LengthPay/binary,
> Rest2/binary >> = Rest,
>
>
> Is there any way to put this in one line and preferably
> in the function header so as to avoid allocating stuff?
> Why arithmetic expressions are not allowed in bit syntax?
>
> The function returns
>
> {NameBin, Payload, Rest}
>
> but it is recursive, doing
> extract(Stream)->extract(Rest)->extract(RestOfRest)
> until it parses out a complete excerpt which is usually less than 300
> bytes.
>
> After parsing it, it then it proceeds to extracting another fragment
> of data from the stream.
>
> Is this kind of problem suitable for doing in erlang or should I go
> with linked-in C driver?
>
>
> regards,
> Mateusz Berezecki
I think it will be suitable, a function that looks like this:
run(<<LenLen:2, NameLen:3, _CB:1, 0:1,
_:1,Rest/binary>>, Acc) ->
NL = NameLen+1,
<<LengthPay:LenLen/unsigned-integer-unit:8,
NameBin:NL/binary,Payload:LengthPay/binary,
Rest2/binary >> = Rest,
run(Rest2, [{NameBin,Payload}|Acc]);
run(<<>>, Acc) -> Acc.
will generate reasonable code, which does not allocate sub-binaries
unnecessarily.
The reason that expressions are not allowed in sizes for binary patterns
is because that expression must be evaluated in the middle of pattern
matching and the compiler is not really ready to allow that, hopefully
it will be in the future or it should at least be possible to specify an
offset in the same way as the unit is specified.
I ran some tests on this function and my old P4 2.4 GHz handles about
800000 entries per second in R11b (1 600 000 with the native flag) for
R12b it can handle 1 600 000 (3 000 000 with the native flag) . The
results might get a little better before the release of R12 but if this
is far too slow for your application you need to write a linked in driver.
Per
More information about the erlang-questions
mailing list