[erlang-questions] Speeding up string matching
Gaspar Chilingarov
nm@REDACTED
Thu Sep 21 22:07:08 CEST 2006
Hi all!
I wish to share an experience which I had today when writing code.
I had a following function, which splits binary on enter character. I
used it to split binaries to separate lines.
Original code looked like this -- which is obvious dumb solution for
that task.
break_on_nl(B) -> break_on_nl(B, []).
break_on_nl(<<"\n", Tail/binary>>, Res) -> {lists:reverse(Res), Tail};
break_on_nl(<<>>, Res) -> {lists:reverse(Res), <<>>};
break_on_nl(<<C, Tail/binary>>, Res) -> break_on_nl(Tail, [C|Res]).
I.e. just cutting one character a time, checking that we reached \n and
returning collected string and remaining tail if we reached it.
After some occasional loon on this code I realized, that I can greatly
improve it. Instead rebuilding binary to list and then reversing list I
could just check every byte of binary, check if it is an Enter char, and
then return binary splitted to 2 parts.
Code like that:
break_on_nl1(B) -> break_on_nl1(0, B).
break_on_nl1(Len, Bin) when Len == size(Bin) ->
{Bin, <<>>};
break_on_nl1(Len, Bin) ->
<<Msg:Len/binary, Symb, Tail/binary>> = Bin,
case Symb of
$\n -> {Msg, Tail};
_ -> break_on_nl1(Len+1, Bin)
end.
Main advantage is that I do not modify original binary and it passed
from one call to next one without copying.
This little code rewrite gave me 2 times speed increase, which was quite
essential, because I have strong timing constrains for that code.
I think this trick will be useful for erlang starters. If not, gurus,
please correct me :)
--
Gaspar Chilingarov
System Administrator,
Network security consulting
t +37493 419763 (mob)
i 63174784
e nm@REDACTED
More information about the erlang-questions
mailing list