[erlang-questions] Searching binaries with matching?
Fredrik Svahn
fredrik.svahn@REDACTED
Sat Oct 13 18:41:41 CEST 2007
Hi,
I would find it extremely useful to have a binary matching which searches
for the first occurrence of a supplied pattern in a binary, i.e. to extract
the body from an HTML file I could write something like:
<< _/binary, "<body>", Body/binary, "</body>", _/binary >> = MyHTMLFile.
or to split e.g. a SIP message into lines:
get_sipheaders(<<>>, Acc) -> Acc.
get_sipheaders(<<Line/binary, "\r\n", Rest/binary>>, Acc) ->
get_sipheaders(Rest, [Line | Acc]).
get_sipheaders(<<"line1\r\nline2\r\nline3\r\n">>,[]) would result in
[<<"line3">>, <<"line2">>, <<"line1">>]
or a simple grep:
grep(SearchPattern, File) ->
{ok, FileBin} = file:read_file(File),
try <<_/binary, SearchPattern/binary, _/binary>> = FileBin of
_Match -> match
catch
error:{badmatch, _} -> nomatch
end.
I know that it would be possible to rewrite it and recursively search
through a binary step by step by specifying the size of the first binary,
but IIRC that is not very efficient compared to most modern search
algorithms (for comparison, have a look at e.g.
http://ridiculousfish.com/blog/archives/2006/05/30/old-age-and-treachery/).
I even believe someone stated that it was usually more efficient to convert
the binary to a list before searching it.
My questions to the mailing list would be:
1. Is it possible to do something like this today (or in a near future)
efficiently, e.g. through the regexp library?
2. Is there anything in the syntax above which would make it improper or
even impossible to add a "searching match" to binary matches (given enough
time and resources)?
3. If so, would it be possible to achieve the same thing with a slightly
improved syntax?
BR /Fredrik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20071013/052c54c1/attachment.htm>
More information about the erlang-questions
mailing list