Hi,<br><br>I would find it extremely useful to have a binary matching which searches for the first occurrence of a supplied pattern in a binary, i.e. to extract the body from an HTML file I could write something like:<br><br>
<< _/binary, "<body>", Body/binary, "</body>", _/binary >> = MyHTMLFile.<br>
<br>or to split e.g. a SIP message into lines:<br><br>get_sipheaders(<<>>, Acc) -> Acc.<br>get_sipheaders(<<Line/binary, "\r\n", Rest/binary>>, Acc) -><br> get_sipheaders(Rest, [Line | Acc]).
<br><br>get_sipheaders(<<"line1\r\nline2\r\nline3\r\n">>,[]) would result in [<<"line3">>, <<"line2">>, <<"line1">>]<br><br>or a simple grep:
<br><br>grep(SearchPattern, File) -><br> {ok, FileBin} = file:read_file(File),<br> try <<_/binary, SearchPattern/binary, _/binary>> = FileBin of<br> _Match -> match<br> catch <br> error:{badmatch, _} -> nomatch
<br> end.<br> <br>I know that it would be possible to rewrite it and recursively search through a binary step by step by specifying the size of the first binary, but IIRC that is not very efficient compared to most modern search algorithms (for comparison, have a look at
e.g. <a href="http://ridiculousfish.com/blog/archives/2006/05/30/old-age-and-treachery/">http://ridiculousfish.com/blog/archives/2006/05/30/old-age-and-treachery/</a>). I even believe someone stated that it was usually more efficient to convert the binary to a list before searching it.
<br><br>My questions to the mailing list would be:<br>1. Is it possible to do something like this today (or in a near future) efficiently, e.g. through the regexp library?<br>2. Is there anything in the syntax above which would make it improper or even impossible to add a "searching match" to binary matches (given enough time and resources)?
<br>3. If so, would it be possible to achieve the same thing with a slightly improved syntax?<br><br>BR /Fredrik<br>