[erlang-questions] Searching binaries with matching?

Fredrik Svahn fredrik.svahn@REDACTED
Sat Oct 13 18:41:41 CEST 2007


Hi,

I would find it extremely useful to have a binary matching which searches
for the first occurrence of a supplied pattern in a binary, i.e. to extract
the body from an HTML file I could write something like:

<< _/binary, "<body>", Body/binary, "</body>", _/binary >> = MyHTMLFile.

or to split e.g. a SIP message into lines:

get_sipheaders(<<>>, Acc) -> Acc.
get_sipheaders(<<Line/binary, "\r\n", Rest/binary>>, Acc) ->
  get_sipheaders(Rest, [Line | Acc]).

get_sipheaders(<<"line1\r\nline2\r\nline3\r\n">>,[]) would result in
[<<"line3">>, <<"line2">>, <<"line1">>]

or a simple grep:

grep(SearchPattern, File) ->
  {ok, FileBin} = file:read_file(File),
  try <<_/binary, SearchPattern/binary, _/binary>> = FileBin of
     _Match -> match
  catch
    error:{badmatch, _} -> nomatch
  end.

I know that it would be possible to rewrite it and recursively search
through a binary step by step by specifying the size of the first binary,
but IIRC that is not very efficient compared to most modern search
algorithms (for comparison, have a look at e.g.
http://ridiculousfish.com/blog/archives/2006/05/30/old-age-and-treachery/).
I even believe someone stated that it was usually more efficient to convert
the binary to a list before searching it.

My questions to the mailing list would be:
1. Is it possible to do something like this today (or in a near future)
efficiently, e.g. through the regexp library?
2. Is there anything in the syntax above which would make it improper or
even impossible to add a "searching match" to binary matches (given enough
time and resources)?
3. If so, would it be possible to achieve the same thing with a slightly
improved syntax?

BR /Fredrik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20071013/052c54c1/attachment.htm>


More information about the erlang-questions mailing list