[erlang-questions] Speeding up string matching

Thu Sep 21 22:07:08 CEST 2006

Hi all!

I wish to share an experience which I had today when writing code.

I had a following function, which splits binary on enter character. I
used it to split binaries to separate lines.

Original code looked like this -- which is obvious dumb solution for
that task.

break_on_nl(B) -> break_on_nl(B, []).
break_on_nl(<<"\n", Tail/binary>>, Res) -> {lists:reverse(Res), Tail};
break_on_nl(<<>>, Res) ->                  {lists:reverse(Res), <<>>};
break_on_nl(<<C, Tail/binary>>, Res) ->    break_on_nl(Tail, [C|Res]).

I.e. just cutting one character a time, checking that we reached \n and
returning collected string and remaining tail if we reached it.

After some occasional loon on this code I realized, that I can greatly
improve it. Instead rebuilding binary to list and then reversing list I
could just check every byte of binary, check if it is an Enter char, and
then return binary splitted to 2 parts.

Code like that:

break_on_nl1(B) -> break_on_nl1(0, B).
break_on_nl1(Len, Bin) when Len == size(Bin) ->
	{Bin, <<>>};
break_on_nl1(Len, Bin) ->
	<<Msg:Len/binary, Symb, Tail/binary>> = Bin,
	case Symb of
		$\n -> {Msg, Tail};
		_ -> break_on_nl1(Len+1, Bin)
	end.

Main advantage is that I do not modify original binary and it passed
from one call to next one without copying.

This little code rewrite gave me 2 times speed increase, which was quite
essential, because I have strong timing constrains for that code.

I think this trick will be useful for erlang starters. If not, gurus,
please correct me :)

-- 
Gaspar Chilingarov

System Administrator,
Network security consulting

t +37493 419763 (mob)
i 63174784
e nm@REDACTED